Comment 25 for bug 405251

Revision history for this message
John A Meinel (jameinel) wrote :

So interestingly both of the "branch in shared repo" times seem to be dominated by the _build_tree times.

And it would seem that the time is spent in the index level.

Specifically, on "bzr-1.17-secondbranch-3m22s" 75% of the time is spent in _btree_serializer_pyx._parse_leaf_lines

It is also being called 95k times.

If you look at the specific difference between this and the 'bzr.1.14" revision, you can see that in 1.14 we are copying the source files directly from the original tree, rather than extracting the texts from the repository.

Arguably this is incorrect behavior, because the source tree may have a different set of filters than the target, so it isn't necessarily valid to just copy the contents across. (This is at least part of what changed in 1.17 from 1.14, I would assume.)

You could do a few things to test:

1) Instead of "bzr init-repo --1.14..." use "bzr init-repo --no-trees --1.14..." and see how that differs. That would give you a timing run *without* building the working tree. (If you don't want to do the copy again, you can also just do touch .bzr/repository/no-working-trees

And then check to see how the "branch inside shared repository" time changes.

2) Remove the workingtree from the source branch, so that bzr-1.14 won't try to copy the source files from there.

That would give us a data point to see if bzr-1.14 was, in fact, faster at extracting the content than 1.17.

Hitting the index 95k times seems like a bit much for the given operations. Though I'll note that I've fixed at least one bug in this area (where when reading fulltexts from the repository, we would actually read the delta content 2 times, rather than just once).
From what I can tell, there are no more than 13k file texts (13,319 calls to get_bytes_as), but 95,392 calls into the index layer to get content out.

I'm also a bit surprised that it would seem that none of those repeated accesses are already in the cache. I might be reading that wrong. As there looks to be 191k calls to BTreeIndex.iter_entries() (as a generator, that means it is yielding 191k records.)

I'll try looking at the "first copy" cases, to see if that sheds any more light on the subject.