Bazaar

Bug #388269
Comment #6

Comment 6 for bug 388269

Revision history for this message

John A Meinel (jameinel) wrote on 2011-07-21:

I believe the goal of this 'replicate my search' is so that stuff like history shortcuts don't end up asking the server to walk the same history twice. For example:

:
N
|
A
|\
B C
| |
D |
| |
E |
| |
F |
|/
G

if we start walking at G, we'll get to A relatively quickly. When the other side catches up to finding "B", it knows that it doesn't have to keep walking and giving information about A.

However, the *client* is also walking that same information, and will not make requests for A. So the benefit is only in avoiding what would be walked by a single RPC.

Specifically, in the above search, we would benefit telling the server that we've seen A, if the server was at say D, and then it could tell us about B, and then "skip to the end" of the A line.

From what I can tell, this is only about efficiency. If the server sends us back information about A, we'll just either ignore it, or put it into the cache replacing what we already have there.

We save a few bytes-on-the-wire not re-transmitting data we've sent. We lose work on the server because it still has to walk all of those revisions to find out where the client has walked. (And it does the work in request N that it did for all the requests before it.)

The client can't know exactly what the server will walk again, because it hasn't seen B yet. The client does know about N and A.

The client could approximate how many revisions the server will walk. The server-side code is to walk until we have 250,000 bytes of revision-ids to send. At average sizes, that is about 4,000 revisions.

So at best, sending a full search will save the server from sending us 4k revs that we already know about, which happen to be parents of a current search tip, that we already have searched via an older search tip. And also, instead of sending us the 4k we already know about, the server will send us the 4k that we are actually interested in (on the other side of that chain.)

I'm guessing that having us send a search that matches the most recent 1000 or less revisions would give us most of the same benefit, and avoid the O(N^2) behavior.

I'm going to try to think deeply about the involved graphs, because I think we can do something nice here, that will get us 99% of the benefit, without doing a lot of redundant work (by the client or the server).