Comment 10 for bug 590708

Revision history for this message
Leonard Richardson (leonardr) wrote :

Here's an overview of what happens:

The tagged interface method is IHasBuildRecords.getBuildRecords(). I presume the method that's timing out is the one in DistroArchSeries, not the method of the same name in Archive.

When a GET request comes in triggering the named operation getBuildRecords, lazr.restful code calls DistroArchSeries.getBuildRecords(). This calls BinaryPackageBuildSet.getBuildsByArchIds, which returns a ResultSet based on a complicated query.

Control then returns to lazr.restful, which puts the ResultSet into a subclass of lazr.batchnavigator BatchNavigator. We then iterate over navigator.batch, which runs the complicated query the first time. And then we look at navigator.listlength, which runs the query again.

= QUESTION =

It sounds like one of these queries would be okay on its own, but both of them together exceed the timeout. Is this true? If someone really wanted the count, and made an HTTP request whose only purpose was to calculate the count, would the count query on its own exceed the timeout?

= SUGGESTED SOLUTION =

We provide the total length of a collection in a batched response as a convenience for the end-user. When the length is expensive to calculate or it's unlikely the end-user will want the length, we can instead provide a _link_ to the length of the collection. That is, instead of the JSON response including a key like this:

  'total_size' : 3000

It can include a key like this:

  'total_size_link' : 'https://edge.launchpad.net/api/1.0/ubuntu/maverick?ws.op=getBuildRecords&ws.show=total_size'

Sending a GET to that link would return a JSON document containing just the size, e.g.

  3000

Right now, getBuildRecords is published with annotations including this one:

    # Really a IBuild see _schema_circular_imports.
    @operation_returns_collection_of(Interface)

To switch from total_size to total_size_link, we would change that annotation:

    # Really a IBuild; see _schema_circular_imports.
    @operation_returns_collection_of(Interface, include_total_size=False)

We would change the WADL so that 'total_size_link' was published as a parameter to this named operation instead of 'total_size'.
We would change lazr.restfulclient so that its __len__ implementation understood what to do when total_size_link was present instead of total_size.

Once everyone is using versions of lazr.restfulclient that understand total_size_link (this will take a long time), we can even experiment with making include_total_size=False the default.

= DOWNSIDES =

This fix breaks backwards compatibility. Old versions of lazr.restfulclient will not be able to obtain the length of a collection with include_total_size=False, because they won't know how to handle total_size_link.

We could maintain a bit of backwards compatibility by exporting this operation for 'beta' with include_total_size=True, un-exporting it for '1.0', and then re-exporting it for '1.0' with include_total_size=True. This would give 'beta' the old behavior and '1.0' the new behavior. (Or, mutatis mutandis to give 'beta' and '1.0' the old behavior and 'devel' the new behavior.)

This is a good solution when we remove total_size to make a named operation more efficient. But here, the old behavior is causing timeouts, so it already doesn't work and we shouldn't try to preserve it at all.

If the COUNT query is inefficient enough to trigger the timeout all by itself, this fix won't really help--there will still be no way to get the length of the collection, . But the timeouts will stop (except for the timeouts caused by anyone trying to get the length of the collection).