Comment 1 for bug 503071

Revision history for this message
John A Meinel (jameinel) wrote :

Implementation wise, log DIR is a bit tricky because changes are stored by file-id. So you need to map from paths => file-ids, and then compute the changes on that. (Also, the iter_changes apis don't always know whether they are path based or file-id based.)

So you end up needing to look up in a couple of different chk maps.

The 2a format would allow us to do:

1) compute the mapping from paths => file-ids for this revision
2) compute the mappings in the previous revision, also noting that if the chk root id didn't change, the mapping is known to be identical.
3) Run iter_changes across only those paths/file-ids.
4) continue from step 2

Our current design does suffer a bit from locality issues. A big-enough subdir is likely to have its file-ids spread out across all/most of the chk pages. So we end up reading all the pages for every revision anyway. Also, the deserialization, etc code means that we probably do a bit more extraction than we need to.

(Ideally, 'iter_changes' could even work down at the bytes level, so that we don't have to extract 50 rows to determine that they are all identical between both sides.)