BranchPullerApplication:BranchPuller Timeout updating branch record in puller

Bug #281040 reported by Jonathan Lange
2
Affects Status Importance Assigned to Milestone
Launchpad itself
Invalid
Undecided
Unassigned

Bug Description

The branch puller is sometimes getting this error (from the XML-RPC server):

Traceback (innermost last):
  Module zope.publisher.publish, line 143, in publish
    publication.afterCall(request, object)
  Module canonical.launchpad.webapp.publication, line 363, in afterCall
    txn.commit()
  Module transaction._transaction, line 380, in commit
    self._saveCommitishError() # This raises!
  Module transaction._transaction, line 378, in commit
    self._commitResources()
  Module transaction._transaction, line 431, in _commitResources
    rm.tpc_begin(self)
  Module storm.zope.zstorm, line 268, in tpc_begin
    self._store.flush()
  Module storm.store, line 484, in flush
    self._flush_one(obj_info)
  Module storm.store, line 546, in _flush_one
    self._connection.execute(expr, noresult=True)
  Module storm.databases.postgres, line 257, in execute
    return Connection.execute(self, statement, params, noresult)
  Module storm.database, line 200, in execute
    raw_cursor = self.raw_execute(statement, params)
  Module storm.databases.postgres, line 267, in raw_execute
    return Connection.raw_execute(self, statement, params)
  Module storm.database, line 284, in raw_execute
    statement, params or (), error)
  Module storm.tracer, line 85, in trace
    attr(*args, **kwargs)
  Module canonical.launchpad.webapp.adapter, line 338, in connection_raw_execute_error
    raise TimeoutError(statement, params)
TimeoutError: 'UPDATE Branch SET mirror_failures=%s, mirror_status_message=%s WHERE Branch.id = %s', [<storm.variables.IntVariable object at 0x5e9baa0>, <storm.sqlobject.AutoUnicodeVariable object at 0x5e9bb18>, <storm.variables.IntVariable object at 0x7c72c80>]

It seems odd to time out updating a single row.

See OOPS-1013XMLP7, OOPS-1013XMLP5 and OOPS-1013XMLP6 from the XML-RPC server, and OOPS-1013SMPU14, OOPS-1013SMPU13 and OOPS-1013SMPU12 from the branch puller.

Jonathan Lange (jml)
description: updated
Revision history for this message
Stuart Bishop (stub) wrote :

The OOPS show most of the time is not database queries.

Revision history for this message
Jonathan Lange (jml) wrote :

On IRC later:

<BjornT> stub: it could also be that the update statement is blocking on some block, couldn't it? afaik, there's a bug in the oops reporting, so that the last sql statement isn't included as sql, but as non-sql. considering that the time is pretty much exactly 25 seconds, it looks like it's the statement that is timing out, doesn't it?
<stub> BjornT, jml: I think you are right. The only thing that would block that particular statement (apart from me or a losa messing with the db schema) is another connection has updated that row and not committed.
<stub> Make an educated guess as to what other process could be opening a transaction on Branch that long and stop it doing that?
<stub> We don't have any tools to tell us unfortunately.
<jtv> stub: pg_stat_activity?
<stub> jtv: Doesn't help after the fact
<stub> jml: Also, not all of our branches are mirrors, so it makes sense to break out that metadata into a separate table.
<jml> stub: actually we do mirror all of our branches.
<jml> stub: (but it still makes sense to move that data to a separate table)

Changed in launchpad-bazaar:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Stuart Bishop (stub) wrote :

Incorrect - it just looks that way due to a bug in the OOPS system. It looks like some other process had updated that particular branch and kept the transaction open, causing the appserver to block and eventually timeout.

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Marked launchpad task as invalid as a task has been open in the appropriate project

Changed in launchpad:
status: New → Invalid
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I don't think this has happened for quite a long time now. Did the change to make BranchRevision.branch a deferred constraint fix this one?

visibility: private → public
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Fixing bug 461661 would help with this. It's still happening, although not all that often.

Revision history for this message
Robert Collins (lifeless) wrote :

high as per zero-oops-policy. We probably want a new OOPS too.

Changed in launchpad-code:
importance: Medium → High
summary: - Timeout updating branch record in puller
+ BranchPullerApplication:BranchPuller Timeout updating branch record in
+ puller
tags: added: pg83
tags: removed: oops
Revision history for this message
Stuart Bishop (stub) wrote :

All the OOPSes are over a year old and useless. Bug may no longer exist.

Changed in launchpad-code:
status: Triaged → Fix Released
status: Fix Released → In Progress
status: In Progress → Incomplete
Revision history for this message
Robert Collins (lifeless) wrote :

Very sure that this was the same contention thing we fixed with the xmlrpc reconfiguration.

Changed in launchpad-code:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.