2a fetch over dumb transport reads one group at a time

Bug #402657 reported by John A Meinel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
Andrew Bennetts

Bug Description

This is probably the core issue for bug #402114, however I've co-opted that one to be a meta issue about various problems with 2a fetch over dumb transport performance.

The fetch code for 2a is specifically written to read one group at a time, keeping a buffer of recently read groups. The design was that we would have already built up groups that would be a reasonable size for fetching, so we wouldn't need to spend a lot of time sorting things out at future fetch times.

However, because of bugs like: bug #402645 (fetch might fragment) and bug #402652 (we don't opportunistically combine groups) we can easily end up with a case where we have many small groups.
(Also because of how we split up chk streams, which will soon be another bug post.)

We should instead do something logical, like try to fill up our local LRU Cache with each request (50MB uncompressed size), or at least something like make a min request of 64kB, etc.

Note that because each request *also* verifies the pack header, the overhead is even greater than a simple round trip. (though it is 40 bytes, it is also a seek to the beginning of the file, etc.)

Tags: 2a

Related branches

Revision history for this message
Robert Collins (lifeless) wrote :

Even if we are optimally packed, reading each group one at a time will cause a huge amount of latency.

Changed in bzr:
importance: Medium → Critical
milestone: none → 2.0
Andrew Bennetts (spiv)
Changed in bzr:
assignee: nobody → Andrew Bennetts (spiv)
Martin Pool (mbp)
Changed in bzr:
status: Triaged → In Progress
Revision history for this message
Andrew Bennetts (spiv) wrote :

So the basic issue here is that _get_remaining_record_stream does lots of individual _get_block calls.

I'm working on a patch to add some batching, initial indications are looking good but I'll do more thorough testing on Monday. It attempts to batch up the _get_block calls until there's at least 64k of data to request. It certainly cuts down the number tiny HTTP readv requests made when fetching Launchpad.

There's a selftest failure it somehow introduces too, so more work required. But this bug is definitely In Progress.

Revision history for this message
John A Meinel (jameinel) wrote :

If we are 'optimally packed' then each group is between 2-4MB in size. (A single content will expand to 4MB, mixed content grows to 2MB.)

That is quite a bit bigger than Andrew's current 64kB group request size.

I think it is worthwhile to figure out the buffering/pre-read issue. But the reason it wasn't done earlier is that we guessed it wouldn't be as common to have small groups.

Because of stuff like the fragmentation bug (bug #402645) it turns out to be more common. Also because of the 'fetch from a mixed source creates an unoptimal single pack.

I also ran into a case that I'd like to exploit the packing heuristics. Namely keeping the 'current' content slightly looser packed than 'older' content. This was a moderate win for stuff like 'ls -r -1 --recurse', which also effects the time to build a checkout.

When I was testing it, the lack of wide reads caused a lot more round-trips to read the more minimal data, so having a bit of buffering in the middle would have been helpful.

Andrew Bennetts (spiv)
Changed in bzr:
status: In Progress → Fix Committed
Revision history for this message
Andrew Bennetts (spiv) wrote :

The attached branch fixes this by adding some batching, at least for the originally reported case of "bzr branch http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel".

It's still not 100% optimal (e.g. the continual re-read of the pack file format string on every single readv). And the batching is limited by what the requested stream ordering allows. For ordering='unordered', as in the originally reported case, it makes a big difference. For other operations that use other orderings we may still have more work to do. I suggest we file new, specific bug reports for these cases, if we find any.

Vincent Ladeuil (vila)
Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.