inconsistent performance between staging and prod
Bug #660291 reported by
Robert Collins
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Stuart Bishop |
Bug Description
Staging is meant to be no better than production - that is, something that works on staging should always work on production.
However since the pg8.4 upgrade we've had multiple queries (e.g. the distribution:+ppas count(*) query) that are tolerable (1.2s) on staging and intolerable (15s) on production.
Discussing with Stuart, there may be a tuning parameter difference, or it may be the greater ram available on prod triggering query plans that are actually less efficient.
If we can make prod fast for these queries we should, otherwise we need to get staging back to being a good predictor for 'bad queries' asap so that developers can have improvements validated.
Changed in launchpad-foundations: | |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: needsrt |
Changed in launchpad: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
The following settings are different on staging:
shared_buffers = 2GB cache_size = 25GB statistics_ target = 1000
random_page_cost = 3.0
effective_
default_
On production we have:
shared_buffers = 5GB cache_size = 120GB statistics_ target = 250
random_page_cost = 1.8
effective_
default_
All of these will affect query plans. The first thing that stands out is our statistics target (how many rows we sample to generate statistics) is too low on production. I'm bumping it up to 2500 and will run a full analyze.
If the statistics target doesn't help, lowering our shared_buffers seems the next step. There never seems to be a straight answer on how big this should be, especially when most of the rules of thumb are years out of date. The reports I trust most indicate performance starts degrading at 10GB, but it is rare to find places running with more than 2GB since most sites are running with <32GB RAM.
random_page_cost is the other variable we should try tweaking, maybe before tweaking the shared_buffers as we need to bounce PG to change shared_buffers. random_page_cost is quite low as, at least with PG 8.3, you needed to take into account the probability of data being in RAM and we have lots of that (we probably should have been running with about 1.2 on production with 8.3, but I took the conservative option since it was performing well enough and didn't want to risk experiments). We should confirm this setting is still sane under PG 8.4.