attachments: accessing attachment's message is very slow

Bug #424671 reported by Bryce Harrington
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

[Problem]
Performance processing attachments when referencing the message object really slows things down
https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1662A100

[Discussion]
I love launchpadlib and use it a LOT for helping me get my work done on X.org. X.org bugs are really all about file attachments - crash dumps, log files, config files, yada yada. Pretty much every launchpadlib script I write deals with attachments in some way.

However, if you want to get the name of the uploader of the attachment, or when it was uploaded, this causes scripts to run a LOT slower. For interactive scripts this can result in it taking over a minute to load the bug. Even for non-interactive scripts this delay really reduces their usefulness.

I would like to see launchpadlib provide better optimized access to these details.

Attached is a test script I've used to do some performance measurements when different data is requested on a bug's attachments.

\time -f "%E" ./test_attachments.py

Results
======
0. Just titles: 5.18 sec
1. do_a: 12.58
2. do_message: 42.33
3. do_a + do_message: 51.38
4. do_fb: 11.68
5. do_a + do_fb: 17.95
6. do_a + do_fb + do_message: 1:14.90
7. do_content: 11.84
8. do_a + do_content: 24.87
9. do_a + do_fb + do_content: 25.59
10. do_a + do_fb + do_message + do_content: 1:08.76

As you can see, it's faster to get all the files content (including printing contents to the screen!) than to print only the owner/date_created/subject of the file, by like an order of magnitude

Something feels wrong - I would expect the difference between #1 and #3 to be on the order of maybe 4 sec max, not 40 sec. In this example bug, there is only a single message on the bug, compared with like 20 attachments, so you would expect #3 to be faster than #8, not take twice as long.

Fixes we need to CP from devel
========================

revno: 11156 [merge]
committer: Launchpad Patch Queue Manager <email address hidden>
branch nick: launchpad
timestamp: Mon 2010-07-19 18:32:16 +0100
message:
  [r=lifeless][ui=none][bug=606914] Bug.initial_message has been
        changed to no longer materialise every message on a Bug before
        returning the first one.

Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Gary Poster (gary) wrote :

For now, the goal on this bug is to get an analysis of root cause. We'll regroup after that.

Changed in launchpadlib:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Leonard Richardson (leonardr)
Revision history for this message
Bryce Harrington (bryce) wrote :

Out of curiosity, any progress on this? Anything I could do to help?

Revision history for this message
Leonard Richardson (leonardr) wrote :

The problem is probably in Launchpad itself, so I've moved this bug to launchpad-foundations, where hopefully someone who can investigate it (sinzui?) will see it.

affects: launchpadlib → launchpad-foundations
tags: added: api
Revision history for this message
Martin Pool (mbp) wrote :

jtv just asked me about something like this: we are seeing many oopses caused by attempts to read attachments off big bugs (eg bug 88746).

From a Python shell you can fairly easily reproduce this with

lp.bugs[88746].attachments[0]

which gives something like:

https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1660S153

This oops is pretty pathetic because about 50ms all the actually necessary queries are done, then it spends the rest of its timeout doing completely unnecessary and painful other queries.

I would speculate that something in lazr's toDataStructure is interacting badly with storm.

Revision history for this message
Martin Pool (mbp) wrote :

https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1660E1461 gives a bit more insight into why it's here, that it's doing "if message.bugs.count() == 0" in message_to_canonical_url_data and I think that may be doing something extremely expensive in

        self.path = "comments/%d" % list(bug.messages).index(message)

Revision history for this message
Jonathan Lange (jml) wrote :

You could do something like this:

    def getIndexForMessage(self, message):
        results = store.find(Message, Bug.message = Message.id, Bug.id = self.id).order_by(Message.datecreated)
        return list(results.values(Message.id)).index(Message.id)

(sketch)

Revision history for this message
Bryce Harrington (bryce) wrote :

Yeah, I had hoped to find someone at the epic to help me investigate this bug further. Unfortunately, what I was told is that no one really understands the launchpad librarian. I got deryck to help me study the oops report a bit but we didn't make much headway.

What I've been doing is as I've found bug reports that trigger this bug, to either close the bug report or make my script skip over it in its processing. But it feels like I'm just sweeping the problem under the carpet.

Gary Poster (gary)
tags: added: performance
tags: added: timeout
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

The fix on devel improves things so it sometimes works; leonard's work will make it much better by dropping half the work or so, but we still need to improve the underlying api behaviour.

description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

In light of bug #618849 being fixed, I re-ran this script for case #6. Originally I'd measured a time of 1:18; the script now completes with a time of 0:49.29.

This is being run on the same hardware, with the same version of Ubuntu as the original test. The bug report under measurement in this test case (bug #259156 ) has received a few comments but no additional attachments. So while the testing conditions aren't exact, they're fairly close. The one problem is that the script is triggering a NoCanonicalUrl assertion in launchpad, so I can't be certain the measurements are good. If they are, this suggests a 35-40% improvement which would be quite notable.

Just to check, I reran all the different cases, and found all cases involving message data show a similar reduction in run time, whereas cases not involving message data did not see such improvements. Again, the script terminates with an assertion when do_message is enabled, so this performance "improvement" could just be measuring an early termination.

0. Just titles: 5.18 sec ... 4.05 sec
1. do_a: 12.58 ... 15.78
2. do_message: 42.33 ... 37.61
3. do_a + do_message: 51.38 ... 43.31
4. do_fb: 11.68 ... 13.69
5. do_a + do_fb: 17.95 ... 30.55
6. do_a + do_fb + do_message: 1:14.90 ... 59.17
7. do_content: 11.84 ... 11.55
8. do_a + do_content: 24.87 ... 25.30
9. do_a + do_fb + do_content: 25.59 ... 26.82
10. do_a + do_fb + do_message + do_content: 1:08.76 ... 58.82

Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 424671] Re: attachments: accessing attachment's message is very slow

We still have to pull all Messages out to calculate the message links;
fixing that would make this faster still.

Revision history for this message
Robert Collins (lifeless) wrote :

Bryce, btw, your bug re-requests the message multiple times - each time you dereference it; you should fix that to avoid misrepresenting the overheads.

Revision history for this message
Robert Collins (lifeless) wrote :

Bryce, could you tighten this up to something like:
'bug/messages/1234' takes 2seconds (or whatever it does).

As it stands, its not really actionable as a timeout; and the script is doing several different things. Using the httplib debug option you should be able to identify the slow urls.

We also need a separate bug for the noncanonicalurl thing.

tags: removed: timeout
Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

[ 0.0000 (0.5742 sec)] title: BootDmesg.txt
[ 0.0030 (0.0029 sec)] ispatch: Unspecified
[ 0.0038 (0.0009 sec)] http_etag: "c0d7800dcaa8bd743c212ff7a591ecd11ad660e7-7f6720d5b8164e946312aa16f4e947dda87162e3"
[ 0.0046 (0.0008 sec)] a.url: https://api.launchpad.net/1.0/bugs/422119/+attachment/691854
[ 0.3257 (0.3211 sec)] subject: Re: [regression] Touchpad vertical scrolling stopped working
[ 0.8545 (0.5288 sec)] owner: Andres Monroy-Hernandez
[ 1.1253 (0.2708 sec)] created: 2009-08-31 17:16:02.513303+00:00
[ 3.0988 (1.9735 sec)] modified: Mon, 31 Aug 2009 17:16:02 GMT
[ 3.0988 (0.0001 sec)] fb.url: https://api.launchpad.net/1.0/bugs/422119/+attachment/691854/data
[ 3.0989 (0.0000 sec)] content-type: text/plain; charset="utf-8"
[ 3.0992 (0.0004 sec)] filename: BootDmesg.txt
[ 3.0993 (0.0001 sec)] isatty: False
[ 3.0993 (0.0000 sec)] len: 54019
[ 3.0993 (0.0000 sec)] mode: r
[ 3.0994 (0.0000 sec)] pos: 0
[ 3.0994 (0.0000 sec)] softspace: 0
[ 3.0994 (0.0000 sec)] content: 54019
[ 3.0994 (0.0000 sec)]

Opening the fb consumes the most time. Accessing the message consumes about a second (1/3rd total time).

Revision history for this message
Robert Collins (lifeless) wrote :

You should be able to get url tracking from launchpad lib by exporting and HTTPLIB2 option - I forge the details. That will tell us the actual slow request, which is what we need.

Changed in launchpad:
assignee: Leonard Richardson (leonardr) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.