single-thread appserver experiment (rt41361)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
Critical
|
Unassigned |
Bug Description
I'm about to request via RT a single threaded appserver to try and help ease the impact of bug 637758. That bug has two problems:
- a slow SQL query
- 8 seconds of time gone awol that we can't see a cause for.
There are a few possibilities for the 8 second gap.
- no timeslice allocated to the thread (e.g. due to GIL scheduler starvation (which is easy enough to do)
- a bug in some C code that doesn't release the GIL and decides to take a while (in another thread)
- we're in swap and a full gc is triggered
We can fairly easily find out if its something else by running some 1-thread appserver instances in the lpnet farm. Such instances will get plenty of requests, and if we have 2 instances we should see 1-2 OOPS from this API (there are 60 threads in the server farm today, and 68 oops overnight).
If doing this fixes the issue, then we will know its nothing else and have to zero back in on the possible causes (or decide to run single threaded / stackless / something else).
summary: |
- single-thread appserver experiment + single-thread appserver experiment (rt41361) |
Changed in launchpad: | |
importance: | High → Critical |
As I mentioned to Robert, I'm familiar with production instances of other similar applications going to a single threaded model for increased performance. That's a simple change (as opposed to, for instance, switching to stackless).