Bug #388269 “many get_parent_map calls during walk to common rev...” : Bugs : Bazaar

Martin Pool (mbp) on 2009-06-17

Changed in bzr:
importance:	Undecided → High
status:	New → Confirmed
status:	Confirmed → New

Revision history for this message

Jelmer Vernooij (jelmer) wrote on 2009-06-19:

#1

bzr-svn also has this problem inside "bzr svn-import" .

Changed in bzr:
status:	New → Triaged

Robert Collins (lifeless) on 2009-06-19

description:	updated
tags:	added: hpss

Martin Pool (mbp) on 2010-02-24

summary:	- getting a branch into an empty shared repository takes a long time to - figure out revisions to send + many get_parent_map calls during walk to common revisions when branching + into empty repository
Changed in bzr:
status:	Triaged → Confirmed
description:	updated
tags:	added: fetch performance

Martin Pool (mbp) on 2011-05-13

tags:

added: affects-linaro

John A Meinel (jameinel) on 2011-07-21

Changed in bzr:
assignee:	nobody → John A Meinel (jameinel)

Revision history for this message

John A Meinel (jameinel) wrote on 2011-07-21:

#2

Download full text (3.8 KiB)

A little bit of background for the gcc-linaro case. In bzr-2.4 we start requesting all the tagged revisions as heads, though this turns out not to help as much as we might like.
The first request is for 2690 tips, of which only 64 are found. (lots of tagged revs that aren't in the history).
We then get enough data back in that one request to step 4 times (64-78 revs). The next request is for 84 tips, which gives us back 16k compressed data, enough to make 32 more steps (at about 80 each time).

Seems pretty good so far.

The number of search tips starts dropping pretty rapidly after that point, though we seem to still do a decent job at filling in extra history. Overall I only see 52 Repository.get_parent_map RPC calls, though while walking the graph we make 71,595 steps. We are walking 10-at-a-time from step 1300, and one-at-a-time from step 6534 on. So there are a *lot* of steps that we do one-by-one on the graph. Which makes some sense, since gcc's history gets pretty linear.

I'm still trying to sort out where the time is spent, given how few RPCs we seem to actually be making. With my extra debugging, it is taking 335s (5.6min) to find all the revisions we want to transmit.

One bit that showed up as a point of concern is the _get_parent_map_rpc code. Which always walks the entire cached dict to formulate the next query. Which looks like it is going to be O(N^2) in nature. Since by the end of all this walking, we'll have something like 100k entries in the cache. However, lsprof shows that *by far* the bulk of the time is in the actual RPC.

I think, though, that the logic about start and stop revisions is causing us harm. Specifically, this is the time spent in each get_parent_map rpc call:
2.3, 0.7, 0.6, 0.8, 1.0, 0.8, 1.0, 1.3, 1.4, 1.8, 1.5, 2.1, 2.0, 3.1, 3.2, 3.7, 2.9, 2.8, 4.2, 4.9, 6.1, 5.0, 5.5, 5.3, 5.5, 5.5, 5.8, 5.9, 6.4, 6.2, 7.0, 6.6, 7.1, 7.0, 7.5, 7.4, 7.7, 8.0, 8.2, 8.2, 8.6, 8.9, 9.1, 8.9, 9.1, 9.4, 9.5, 11.6, 11.1, 10.0, 10.4, 13.8

If you graph that, it follows an almost completely straight line (R^2 = 0.95), with each request taking 0.2s longer than the previous one.

I'll need to profile the server side and see what might be causing this.

I have another .bzr.log where I was logging the call sizes. And I noticed that "key_count" gets really large for the recipe.

The final request looks like this:
327.535 Doing _get_parent_map_rpc, 110260 cache size, 5 start_set, 110256 result_parents, 1 stop_keys, 110255 included_keys, 1 keys
327.535 keys: set(['svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:trunk:1152']) body: 'svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:branches/gcc-4_6-branch:175472 ...\n110260'
327.535 hpss call w/body: 'Repository.get_parent_map', '+branch/gcc-linaro/', 'include-missing:', 'svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:trunk:1152' ('svn-v4:138bc75d-0d04'...)
327.535 378 bytes
337.595 result: ('ok',)
337.595 2526 body bytes read

So that is only 5 start keys, but 110256 keys in the search set. Does that mean we are starting the search over the *entire ancestry* from scratch on each request.... ugh. This does not seem to be the most efficient way to do a get...

A little bit of background for the gcc-linaro case. In bzr-2.4 we start requesting all the tagged revisions as heads, though this turns out not to help as much as we might like.
The first request is for 2690 tips, of which only 64 are found. (lots of tagged revs that aren't in the history).
We then get enough data back in that one request to step 4 times (64-78 revs). The next request is for 84 tips, which gives us back 16k compressed data, enough to make 32 more steps (at about 80 each time).

Seems pretty good so far.

The number of search tips starts dropping pretty rapidly after that point, though we seem to still do a decent job at filling in extra history. Overall I only see 52 Repository.get_parent_map RPC calls, though while walking the graph we make 71,595 steps.  We are walking 10-at-a-time from step 1300, and one-at-a-time from step 6534 on. So there are a *lot* of steps that we do one-by-one on the graph. Which makes some sense, since gcc's history gets pretty linear.

I'm still trying to sort out where the time is spent, given how few RPCs we seem to actually be making.  With my extra debugging, it is taking 335s (5.6min) to find all the revisions we want to transmit.

One bit that showed up as a point of concern is the _get_parent_map_rpc code. Which always walks the entire cached dict to formulate the next query. Which looks like it is going to be O(N^2) in nature. Since by the end of all this walking, we'll have something like 100k entries in the cache. However, lsprof shows that *by far* the bulk of the time is in the actual RPC.

I think, though, that the logic about start and stop revisions is causing us harm. Specifically, this is the time spent in each get_parent_map rpc call:
2.3, 0.7, 0.6, 0.8, 1.0, 0.8, 1.0, 1.3, 1.4, 1.8, 1.5, 2.1, 2.0, 3.1, 3.2, 3.7, 2.9, 2.8, 4.2, 4.9, 6.1, 5.0, 5.5, 5.3, 5.5, 5.5, 5.8, 5.9, 6.4, 6.2, 7.0, 6.6, 7.1, 7.0, 7.5, 7.4, 7.7, 8.0, 8.2, 8.2, 8.6, 8.9, 9.1, 8.9, 9.1, 9.4, 9.5, 11.6,  11.1, 10.0, 10.4, 13.8

If you graph that, it follows an almost completely straight line (R^2 = 0.95), with each request taking 0.2s longer than the previous one.

I'll need to profile the server side and see what might be causing this.

I have another .bzr.log where I was logging the call sizes. And I noticed that "key_count" gets really large for the recipe.

The final request looks like this:
327.535  Doing _get_parent_map_rpc, 110260 cache size, 5 start_set, 110256 result_parents, 1 stop_keys, 110255 included_keys, 1 keys
327.535  keys: set(['svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:trunk:1152']) body: 'svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:branches/gcc-4_6-branch:175472 ...\n110260'
327.535  hpss call w/body: 'Repository.get_parent_map', '+branch/gcc-linaro/', 'include-missing:', 'svn-v4:138bc75d-0d04-0410-961f-82ee72b054a4:trunk:1152' ('svn-v4:138bc75d-0d04'...)
327.535                378 bytes
337.595     result:   ('ok',)
337.595                2526 body bytes read

So that is only 5 start keys, but 110256 keys in the search set. Does that mean we are starting the search over the *entire ancestry* from scratch on each request.... ugh. This does not seem to be the most efficient way to do a get_parent_map call.

I believe the attempt is to succinctly describe the revisions you already know about, and have the remote side give you information about revisions outside of that set.

I believe that if we can avoid the N^2 behavior, we are already good about only sending 52 requests. And for me, many of the early requests complete in less than a second. So that would be ~1min rather than 5.5min for the discovery phase.

We could still try to shortcut the entire discovery phase by noticing the target repository is empty, and I'll consider adding keywords for it. But if we can make this api better, then we can still get good gains when fetching a lot of content into a mostly-but-not-completely empty repository.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-07-21:

#3

So to be clear, I think the slowdown is in

SmartServerRepositoryGetParentMap and the fact that it is doing:
search_result, error = self.recreate_search_from_recipe(
repository, body_lines)

Which is causing us to walk 100,000 revisions to find out what the next 1000 revisions are.

Also note, that by the time we get to even 20k revisions searched, we are pretty much stepping one-by-one. Now, these 100k revs *are* used on the server to make sure that we don't return parents that we already know about. However, I'm pretty sure that while this is probably good when N searched is small, it is pretty terrible when N searched is big.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-07-21:

#4

My current thought is this. We are trying to tell the server about the revisions we've already seen, so it doesn't send us those again.

However, locality says that if I'm currently looking at X, then really only revisions 'near' X are going to be encountered again. This isn't strictly true, because of history shortcutting (daggy fixes, in particular, mean that you'll skip a lot of history in one half of the search, which you'll run into again after a long time in the other half of the search.)
While it isn't strictly true, it could certainly limit the effort tremendously.

Having greatest-distance-from-origin (generation count), would be another way to limit this. Instead of walking all parents equally, you'd favor parents that have higher generation number, presuming that parents with low generation number are skipping all of the interesting revisions. You could do a little bit of both, but you could come up with better batches this way. (So if you had 100 parents to search, and 2 of them are at GDFO 1000, and the other 98 of them are at GDFO 10000, you can skip searching the 2 low ones for now.)

Anyway, I'm going to try something that just generates a fake 'local' search, and see how that performs.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-07-21:

#5

In the associated branch, I use this logic:

If the total walked-revisions exceeds 10,000, start at the current search tips, find their immediate children, and set that as the search graph.

I can easily extend this to something that walks the child graph until we hit something like 1k revs, or something else interesting-but-not-obscene.

In doing so, the time for initial discovery drops (against launchpad) from 5.5min down to 1m15s.

I still need a fair amount of polish and testing on the changes, but I think it shows promise.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-07-21:

#6

I believe the goal of this 'replicate my search' is so that stuff like history shortcuts don't end up asking the server to walk the same history twice. For example:

:
N
|
A
|\
B C
| |
D |
| |
E |
| |
F |
|/
G

if we start walking at G, we'll get to A relatively quickly. When the other side catches up to finding "B", it knows that it doesn't have to keep walking and giving information about A.

However, the *client* is also walking that same information, and will not make requests for A. So the benefit is only in avoiding what would be walked by a single RPC.

Specifically, in the above search, we would benefit telling the server that we've seen A, if the server was at say D, and then it could tell us about B, and then "skip to the end" of the A line.

From what I can tell, this is only about efficiency. If the server sends us back information about A, we'll just either ignore it, or put it into the cache replacing what we already have there.

We save a few bytes-on-the-wire not re-transmitting data we've sent. We lose work on the server because it still has to walk all of those revisions to find out where the client has walked. (And it does the work in request N that it did for all the requests before it.)

The client can't know exactly what the server will walk again, because it hasn't seen B yet. The client does know about N and A.

The client could approximate how many revisions the server will walk. The server-side code is to walk until we have 250,000 bytes of revision-ids to send. At average sizes, that is about 4,000 revisions.

So at best, sending a full search will save the server from sending us 4k revs that we already know about, which happen to be parents of a current search tip, that we already have searched via an older search tip. And also, instead of sending us the 4k we already know about, the server will send us the 4k that we are actually interested in (on the other side of that chain.)

I'm guessing that having us send a search that matches the most recent 1000 or less revisions would give us most of the same benefit, and avoid the O(N^2) behavior.

I'm going to try to think deeply about the involved graphs, because I think we can do something nice here, that will get us 99% of the benefit, without doing a lot of redundant work (by the client or the server).

I believe the goal of this 'replicate my search' is so that stuff like history shortcuts don't end up asking the server to walk the same history twice. For example:

:
 N
 |
 A
 |\
 B C
 | |
 D |
 | |
 E |
 | |
 F |
 |/
 G