Archive:EntryResource:getBuildSummariesForSourceIds times out

Bug #662523 reported by Fabien Tassin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Robert Collins

Bug Description

1751ED2047 https://lp-oops.canonical.com/oops.py/?oopsid=1751ED2047

SQL time: 12977 ms
Non-sql time: 470 ms
Total time: 13447 ms
Statement Count: 113

Potato programming:
        source_build_statuses = {}
        for source_pub in source_pubs:
            status_summary = source_pub.getStatusSummaryForBuilds()
            source_build_statuses[source_pub.id] = status_summary

The slow query performance certainly isn't helping, but the basic driver is this scaling per requested source build.

Related branches

Revision history for this message
Fabien Tassin (fta) wrote :
summary: - [regression] lp API at least 3 times slower and timeouts all the time
+ Archive:EntryResource:getBuildSummariesForSourceIds times out
tags: added: dba timeout
tags: removed: dba
description: updated
Changed in soyuz:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Yet another PG8.4 regression.

Revision history for this message
Fabien Tassin (fta) wrote :

Moved from EDGE_SERVICE_ROOT to LPNET_SERVICE_ROOT a few days ago (as edge was completely broken) and it solved the issue for a while.
This morning, at ~7:45am (5:45 UTC), it restarted to timeout:

2010/10/26-04:31:08 66 sec
2010/10/26-04:46:01 59 sec
2010/10/26-05:01:32 90 sec
2010/10/26-05:15:48 46 sec
2010/10/26-05:30:46 44 sec
2010/10/26-05:45:42 41 sec
2010/10/26-06:01:29 88 sec
2010/10/26-06:15:57 55 sec
2010/10/26-06:31:14 73 sec
2010/10/26-06:46:29 87 sec
2010/10/26-07:01:58 116 sec
2010/10/26-07:16:57 116 sec
2010/10/26-07:31:15 74 sec
2010/10/26-07:47:52 170 sec <===
2010/10/26-08:02:52 170 sec
2010/10/26-08:18:03 182 sec
2010/10/26-08:32:57 175 sec
2010/10/26-08:47:59 177 sec
2010/10/26-09:03:02 181 sec
2010/10/26-09:17:57 175 sec
2010/10/26-09:32:59 177 sec
2010/10/26-09:47:52 170 sec

i call getBuildSummariesForSourceIds(source_ids = chunk) with chunks of 25 src ids, it worked for month on edge, it's no longer usable because of timeouts. now it's the same on the production server.
I just lower it down to chunks of 15, but that means more requests and longer sessions.

i've already implemented caching to help solve this: http://paste.ubuntu.com/520087/
i'm out of options.

Revision history for this message
Fabien Tassin (fta) wrote :

15 was still too much. 10 seems to work for now

description: updated
description: updated
Changed in soyuz:
assignee: nobody → Robert Collins (lifeless)
status: Triaged → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
Changed in soyuz:
milestone: none → 10.11
tags: added: qa-needstesting
Changed in soyuz:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

Fabien, is your script small? I'd like to test the changes I've made for this that are now present on qastaging.

Revision history for this message
Robert Collins (lifeless) wrote :

Alternatively, you could point launchpad lib at the qastaging service root (use the full url rather than the symbolic constants) and try yourself. Note that the DB server is smaller there so I expect you'd need to try a few times to get hot-cache.

Revision history for this message
Fabien Tassin (fta) wrote :

it's not particularly small anymore.

see lp:~fta/+junk/ppa-dashboard.trunk

here is how i run it (from within the branch) to create what's visible there: http://people.ubuntu.com/~fta/ppa-dashboard/

=====================================
OUTDIR=$(mktemp -d --tmpdir=.)
BIN=./dashboard.py

OPTS="--dir $OUTDIR --cachedir cache"

D1=$(date +%s)
$BIN $OPTS -r --grouped --output chromium-daily.html chromium-daily/stable,beta chromium-daily/ppa,dev
$BIN $OPTS -r --grouped --output mozillateam-stable.html mozillateam/firefox-stable mozillateam/thunderbird-stable
$BIN $OPTS mozillateam/ffox35
$BIN $OPTS -r ubuntu-mozilla-daily/ppa
$BIN $OPTS -r ubuntu-mozilla-security/ppa
D2=$(date +%s)

DELTA=$(expr $D2 - $D1)
echo $(date +%Y/%m/%d-%H:%M:%S)" $DELTA sec" >> runtime.log
========================================

it should not output anything in the shell. Everything is in $OUTDIR.
the 1st run is slower (as it starts with an empty cache), next runs should be way faster.

Add -v tp $OPTS to see the perfs of individual LP calls.

Compared to the branch, i'm currently running with smaller chunks because of this bug:

=== modified file 'dashboard.py'
--- dashboard.py 2010-10-24 18:35:20 +0000
+++ dashboard.py 2010-10-26 10:06:27 +0000
@@ -348,7 +348,7 @@

     # Split the list of remaining source-ids into chunks of reasonable size to avoid timeouts.
     # ~25 per request should be okay
- sz = 25
+ sz = 10
     chunks = map(lambda i: new_sids[i:i+sz], xrange(0, len(new_sids), sz))
     for chunk in chunks:
       t = self.perf.start("getBuildSummariesForSourceIds(%d sids)" % len(chunk))

Revision history for this message
Robert Collins (lifeless) wrote :

Thanks, I'll see what sort of results I can get from qastaging.

You probably want to drop the import of EDGE_SERVICE_ROOT :)

Revision history for this message
Robert Collins (lifeless) wrote :

Working reliably on qastaging with much higher batch sizes than fta had working on lpnet.

tags: added: qa-ok
removed: qa-needstesting
Changed in soyuz:
status: Fix Committed → Fix Released
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

This caused a regression, bug 672371

Changed in soyuz:
status: Fix Released → Fix Committed
tags: added: bad-commit-11888
removed: qa-ok
tags: added: qa-ok
Revision history for this message
Fabien Tassin (fta) wrote :

was it reverted or something?

got some oops this morning. Like OOPS-1778F618 (sig claims it's r11887)

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 662523] Re: Archive:EntryResource:getBuildSummariesForSourceIds times out

Yeah - this patch caused huge timeouts on +packages pages. We've
sorted it all out so that this patch's benefits will be felt and
+packages will still work in 11926. That should be deployed soon.

Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.