iteration in LP API's is O(N^2) due to batching

Bug #626680 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Invalid
High
Unassigned

Bug Description

So, iteration on collections in the Launchpad API's is O(N^2) because they batch.

We should, I think, stop batching. If folk want only a slice, give them a slice, otherwise stream the entire thing down and let the client consume as it goes.

Iteration is O(N^2) because the database has to calculate the first N rows to return N-onwards, and thus the second batch costs the work for the first batch + the work for the first-batch and second-batch, and so-on.

I'm sure this will require work in a bunch of places, so I thought I'd kick the issue of here and let additional tasks etc be raised as needed.

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 626680] [NEW] iteration in LP API's is O(N^2) due to batching

On Mon, Aug 30, 2010 at 3:52 PM, Robert Collins
<email address hidden> wrote:
> Public bug reported:
>
> So, iteration on collections in the Launchpad API's is O(N^2) because
> they batch.
>
> We should, I think, stop batching. If folk want only a slice, give them
> a slice, otherwise stream the entire thing down and let the client
> consume as it goes.
>
> Iteration is O(N^2) because the database has to calculate the first N
> rows to return N-onwards, and thus the second batch costs the work for
> the first batch + the work for the first-batch and second-batch, and so-
> on.
>
> I'm sure this will require work in a bunch of places, so I thought I'd
> kick the issue of here and let additional tasks etc be raised as needed.

Without batching, we would need to limit queries to returning a sane
number of rows. Currently, people can retrieve all Ubuntu bugs. That
would be insane to do in a single request.

The alternative is to make batching saner.

We can get the first batch:

SELECT * FROM Foo ORDER BY id LIMIT 100;

If we remember the key from the last item in the previous batch, we
can get the next batch:

SELECT * FROM Foo WHERE id > 342 ORDER BY id LIMIT 100;

This technique is much faster and preferred. What you lose is the
ability to know how many batches there are, how to jump to an
arbitrary batch, and you must order by a unique column. This is
probably fine for the webservice. Using this technique in the
appserver we lose some batch navigation features.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Robert Collins (lifeless) wrote :

That sounds nice. I'm totally in favour of handling this different in
the webservice: its a different domain.

Computers will do

for bug in ubuntu.bugs:
   ...

Users simply can't ;)

As for whether answering all ubuntu bugs in a single webservice
request makes sense - I think it would, if we removed some
limitations, and partitioned apis out (I want to anyway). However, if
we can batch without the O(N^2) which you suggest a nice strategy for,
one of the strong motivations for not batching in the webservice goes
away.

Revision history for this message
Stuart Bishop (stub) wrote :

On Tue, Aug 31, 2010 at 2:27 PM, Robert Collins
<email address hidden> wrote:

> As for whether answering all ubuntu bugs in a single webservice
> request makes sense - I think it would, if we removed some
> limitations, and partitioned apis out (I want to anyway). However, if

Its 1.6GB of data. Even if we want to pull that much from the database
and serialize it in one transaction, it is unreasonable to expect
clients to be able to materialize it.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Gary Poster (gary)
Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Gary Poster (gary) wrote :

I agree that we want to rethink the way we do batching for the API, quite possibly doing away with it entirely. This is something that has already gone on my list for consideration for the "usability" effort after Leonard finishes work on the desktop client stuff.

We should implement it with what Stuart is describing.

Beyond performance, though, my concern with the batching approach in launchpadlib is that it makes Python programmers expect that they are working with a normal sequence, when they are not. This can lead to surprises, and to code that has to handle odd exceptions, like a len not being the same as the number of objects in an iteration, or members appearing twice, or not at all. I think we can and should have a Python API that clearly sets expectations for webservice interactions. Simply disabling collection iteration may be the answer.

I don't think it will be reasonable to allow the entirety of a large collection like bugs to be returned at once.

This is also related to your first comment in bug 251284, Robert.

tags: added: api
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 626680] Re: iteration in LP API's is O(N^2) due to batching

> I think we can and should have a Python API that clearly sets expectations for webservice interactions.

+1000

:)

Revision history for this message
Robert Collins (lifeless) wrote :

Possibly want to dupe this on the (newer) bug that adeuring is working on.

Changed in launchpad:
importance: Medium → High
Revision history for this message
Robert Collins (lifeless) wrote :

(because, the db implementation is being fixed to use row constraints not offsets to paginate)

Revision history for this message
Robert Collins (lifeless) wrote :

I'm going to just close this in favour of aduerings work, we can migrate batches case by case.

Changed in launchpad:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.