.cix thrashing causes us to re-download the whole index multiple times
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
Critical
|
John A Meinel |
Bug Description
being split off from bug #402114
When a .cix index gets to be larger than ~100k entries, (more than 1000 btree pages) we no longer can fit everything into the local cache. Because the access pattern is essentially 'random' this destroys our 'hit rate'.
When doing a fetch, we generally have to access the .cix 4-5 times. (read root nodes, read nodes underneath them, read root nodes for the tree shape, read nodes underneath them, etc.)
This means that when doing a fresh branch of a large project, we will re-read the entire .cix for the largest pack files up to 5 times. (in the case of LP, this is about 10MB+ of data that we re-read 5 times.) At 50MB of data, that is 50% of the size of an optimized pack file for the entire history (106MB).
The easiest thing to do is to just increase the buffering for .cix indexes.
Related branches
- Vincent Ladeuil: Approve
-
Diff: 300 lines7 files modifiedNEWS (+10/-2)
bzrlib/btree_index.py (+15/-9)
bzrlib/index.py (+2/-2)
bzrlib/repofmt/pack_repo.py (+11/-5)
bzrlib/tests/per_repository_chk/test_supported.py (+35/-0)
bzrlib/tests/test_btree_index.py (+39/-0)
bzrlib/tests/test_index.py (+10/-1)
Changed in bzr: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in bzr: | |
importance: | High → Critical |
Changed in bzr: | |
assignee: | nobody → Ian Clatworthy (ian-clatworthy) |
Changed in bzr: | |
assignee: | Ian Clatworthy (ian-clatworthy) → John A Meinel (jameinel) |
Changed in bzr: | |
milestone: | none → 2.0.1 |
I'm not sure what to do about this :(.