Oops report does not always log timed-out query

Bug #310818 reported by Jeroen T. Vermeulen
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Stuart Bishop

Bug Description

Timeouts on database queries produce incomplete oops reports in about two-thirds of the cases I've looked at.

In these cases, the query that actually timed out is not logged. This causes several inconveniences:

 * The oops report seems to count the query's execution time as "non-sql time."

 * Oops summaries sometimes list the last-logged query instead of the failing one as the point of timeout.

 * It's not easy to see how long the timed-out query really took.

Examples: OOPS-1085D2293, OOPS-1085C2639, OOPS-1085D2607, OOPS-1085E2170, OOPS-1085F1669, OOPS-1085A2372. The last query in the log is not the same one in the timeout message, and the last query's start time is nowhere near the timeout limit.

It's not consistent though: in the remaining one-third of cases the query was logged. Could this happen because the timed-out query took even longer than the infrastructure accounts for? Or could the timeout happen completely between the time when the ORM issues the query and the time it's actually logged?

Related branches

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Danilo brought up today OOPS-1174S166. The query shown in the traceback doesn't appear in the SQL statement log.
Initially I thought it could be a problem rendering the SQL statements from the raw oops report, but after checking it, it's indeed not recorded in the raw report.

Revision history for this message
Diogo Matsubara (matsubara) wrote :

After confirming with Stuart and Francis, this is not a dupe of bug 31479

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Would it be possible that the failing query wasn't logged because it was canceled (an instance of QueryCanceledError) somehow? Perhaps canceled due to one of the utilities/pgkill*.py scripts?

Revision history for this message
Stuart Bishop (stub) wrote :

Marking a high priority, as this issue is continually causing OOPS reports that are difficult or impossible to diagnose since we are missing the start and end times of the timed out query.

Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.2.8
Stuart Bishop (stub)
Changed in launchpad-foundations:
assignee: nobody → Stuart Bishop (stub)
Stuart Bishop (stub)
Changed in launchpad-foundations:
status: Triaged → In Progress
Revision history for this message
Diogo Matsubara (matsubara) wrote : Bug fixed by a commit

Fixed in devel r9063.

Changed in launchpad-foundations:
status: In Progress → Fix Committed
Stuart Bishop (stub)
Changed in launchpad-foundations:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.