bzr log DIR could layer above iter_changes

Bug #503071 reported by John A Meinel
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

This is a spin-off from bug #374730.

Basically, 'bzr log DIR' currently goes to each revision, and pulls out a 'minimal inventory' that just includes DIR and things underneath. It then runs 'iter_changes' on that.

However, it could be more ideal to run 'iter_changes(..., DIR)' and then filter on that. The main difference is that the current code is O(subtree), while the proposed code is O(changes). What we would like is to be able to combine the two and have O(changes-in-subtree).

Revision history for this message
John A Meinel (jameinel) wrote :

Implementation wise, log DIR is a bit tricky because changes are stored by file-id. So you need to map from paths => file-ids, and then compute the changes on that. (Also, the iter_changes apis don't always know whether they are path based or file-id based.)

So you end up needing to look up in a couple of different chk maps.

The 2a format would allow us to do:

1) compute the mapping from paths => file-ids for this revision
2) compute the mappings in the previous revision, also noting that if the chk root id didn't change, the mapping is known to be identical.
3) Run iter_changes across only those paths/file-ids.
4) continue from step 2

Our current design does suffer a bit from locality issues. A big-enough subdir is likely to have its file-ids spread out across all/most of the chk pages. So we end up reading all the pages for every revision anyway. Also, the deserialization, etc code means that we probably do a bit more extraction than we need to.

(Ideally, 'iter_changes' could even work down at the bytes level, so that we don't have to extract 50 rows to determine that they are all identical between both sides.)

Revision history for this message
Craig Hewetson (craighewetson-deactivatedaccount) wrote :

I'm really keen on this fix :) I've been getting up hill from my fellow colleges about this feature. Is there any way that I can help with this ... maybe not development but testing etc.

Revision history for this message
Matt Doran (matt-doran) wrote :

Yeah I'd like to see some improvements here too. We commonly need to do this to check the history of a sub-component of a large repository ... and it would be nice for this to be as fast as the rest of bzr. :)

Revision history for this message
Per Johansson (per.j) wrote :

It seems to me that doing a bzr log <file> plus bzr diff -c <rev> <file> for all revisions is quite a bit faster than bzr log -v <file>, even though it produces a superset of the information. Is that what this bug is about?

Eg. (11551 is the only rev for this file, where it was added):

; time sh -c 'bzr log fileinbigrepo && bzr diff -c 11551 fileinbigrepo' > /dev/null
real 0m4.035s
user 0m3.826s
sys 0m0.203s

; time bzr log -v fileinbigrepo > /dev/null
real 0m19.138s
user 0m18.892s
sys 0m0.221s

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.