Merge lp:~jameinel/bzr/2.4-too-much-walking-388269 into lp:bzr
Status: | Merged |
---|---|
Approved by: | John A Meinel |
Approved revision: | no longer in the source branch. |
Merged at revision: | 6099 |
Proposed branch: | lp:~jameinel/bzr/2.4-too-much-walking-388269 |
Merge into: | lp:bzr |
Diff against target: |
345 lines (+244/-26) 5 files modified
bzrlib/graph.py (+158/-3) bzrlib/remote.py (+12/-20) bzrlib/tests/test_graph.py (+71/-0) bzrlib/tests/test_remote.py (+2/-2) bzrlib/vf_repository.py (+1/-1) |
To merge this branch: | bzr merge lp:~jameinel/bzr/2.4-too-much-walking-388269 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
John A Meinel | Approve | ||
Review via email: mp+71852@code.launchpad.net |
Commit message
Bug #388269, when walking to find revisions to transmit, only send a local part of the search graph, not the whole graph walked for every step.
Description of the change
This was written against bzr-2.4 code, but it involves a change to how it makes queries, so I'm proposing it for trunk first. The nice thing is that it is a client-only change, so it doesn't require updating the server. (And I've been testing it against production launchpad.)
The overall idea is that during '_walk_
A
|
B
|
C
|
D
|\
E F
| |
G |
| |
H |
| |
I |
|/
J
If we start walking at J, we'll get to D relatively quickly via F, however we'll get to D *again* via E-I after a while.
A naive implementation would then ask for the parents of E and A, and we'd get back all of B-D again.
So what we do is send a SearchRecipe description, that is meant to describe all of the revisions we've already seen. (we're walking the server's history from the client, the server can walk it all on its own.)
If our RPCs weren't stateless, the server could have already been holding the search on its side.
However, we try to have stateless RPCs, so we have to send the state back to the server. Which it then faithfully walks to ensure that it knows what keys it doesn't need to send again.
The failure is that when the search gets *big* (like gcc-linaro's 100k revisions), it takes the server a *long* time to walk all of that history to re-build its state. (something like 10s for each get_parent_map call.)
So this patch changes the client. Instead of sending a complete SearchRecipe, it starts at the current tips, walks towards children for DEPTH steps, and then builds a search recipe from those heads.
The idea, is that in the above case, we want to make sure we send a search query that includes D. (So if we are currently at the tips A & E, we want to walk back to at least D/F.)
I played around with a lot of possible settings for DEPTH. From 1, 10, 20, 100, 1000, 10000, and infinite. First, a graph showing why we want this at all:
http://
So without this, it takes ~5 minutes to walk all of gcc-linaro's history, and with some reasonable number it is more like 70s. (almost 5x faster, seen in the blue line)
The other factor to consider, though, is bandwidth cost. When we don't eliminate a sub-graph, the server will re-send data that it has already sent to us. (Seen in the orange line.)
You can see that there is a slight decrease in the data sent except for when sending all data, there is a very large jump.
So some of the time tradeoff is time spent by the server to walk the extra history, vs the extra bandwidth of sending redundant data. You can also see that for gcc-linaro, the time has started growing significantly at DEPTH=1000. But what about other projects.
I used lp:bzr, lp:mysql-server and lp:gcc-linaro as my test cases. Since mysql should be a large but very 'bushy' project, bzr is a more modest sized project, and gcc-linaro is in the "very long history converted from svn/cvs" sort of project.
http://
I forced the axis rather than shrinking everything because of the gcc-linaro blowout.
What is very interesting is that both bzr and mysql are pretty insensitive to value, which is probably why we didn't ever bother fixing this before. My guess is because of the projects behavior to merge across major versions. The tip revision of bzr-2.1 is *way* farther back in ancestry than the tip of bzr-2.4, but you can get their pretty quickly because of merging 2.1=>2.2=>2.3=>2.4. As such, when walking children, it only takes about 5 steps, and you'll have described the full history of everything that changed from bzr-2.4 vs bzr-2.1.
The next check is about how much extra data is being transferred. At DEPTH=1, bzr and mysql transfer 10%-15% extra data, while at DEPTH=10 this drops 1.2%-3%, and at 100 we are at .3%-1.3%. gcc-linaro is a different story, because at all DEPTH from 1-10000, it transfers almost 40% extra data. I'm guessing that there is one query late in the game that covers a very large span. If you normalize against the DEPTH=10,000 case, then it is 1.8%@1 down to 1%@100.
So overall, I'm happy with setting DEPTH=100. We could make it configurable, but I don't think we'd see much benefit of that, and probably negated by a round-trip to read the configuration.
Thanks for the clear explanation and quite cool description of it.
> We could make it configurable, but I don't think we'd see much benefit of that, and probably negated by a round-trip to read the configuration.
Well, we could say it comes from only the global configuration, which would avoid any network roundtrip time and still give the option to experiment with it later.
> +def ignore():
I think that could definitely do with a docstring because the name and the code are not obvious.
Actually it seems to be dead entirely...?
+def _run_search( parent_ map, heads, exclude_keys):
This too could do with a docstring (and it is actually called.)