Merge lp:~jameinel/bzr/2.0.1-faster-log-dir-bug374730 into lp:bzr/2.0
- 2.0.1-faster-log-dir-bug374730
- Merge into 2.0
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | not available | ||||
Proposed branch: | lp:~jameinel/bzr/2.0.1-faster-log-dir-bug374730 | ||||
Merge into: | lp:bzr/2.0 | ||||
Diff against target: |
843 lines 5 files modified
NEWS (+5/-0) bzrlib/inventory.py (+146/-0) bzrlib/tests/per_inventory/__init__.py (+41/-4) bzrlib/tests/per_inventory/basics.py (+99/-147) bzrlib/tests/test_inv.py (+218/-0) |
||||
To merge this branch: | bzr merge lp:~jameinel/bzr/2.0.1-faster-log-dir-bug374730 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Approve | ||
Review via email: mp+12382@code.launchpad.net |
Commit message
Description of the change
John A Meinel (jameinel) wrote : | # |
Robert Collins (lifeless) wrote : | # |
On Thu, 2009-09-24 at 20:45 +0000, John A Meinel wrote:
> John A Meinel has proposed merging
> lp:~jameinel/bzr/2.0.1-faster-log-dir-bug374730 into lp:bzr/2.0.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #374730 log dir is slow in development-
> https:/
>
>
> This is my 'phase 1' for making 'bzr log DIR' faster.
>
> It implements a custom 'CHKInventory.
> in the actual files that are requested, rather than always doing
> 'iter_entries()' and walking the whole tree.
>
> This makes 'bzr log DIR' scale based on the number of entries in DIR.
> In my testing,
>
> 5m30s bzr.dev log tools
> 5m30s newbzr log bzrlib
> 0m36s newbzr log tools
>
> So when the number of files is low, the time drops from 5m30s => 36s,
> which is great.
>
> The next step is probably going to bypass 'filter()' for 'log DIR' and
> solely layer on top of 'iter_changes'. My initial (broken) testing
> showed that it stayed consistently around 36s for both bzrlib and
> tools, which is certainly a lot better.
>
> I'd like to land this in the 2.0.x series, because it seems like a
> simple bug fix.
>
> The size of the delta is a bit large, because it turned out that
> CHKInventory.
> thread about what 'per_inventory' is actually supposed to be testing.)
> I went ahead and made 'per_inventory' actually care about both
> inventory implementations, and moved the tests around accordingly.
So this looks broadly fine; we didn't do much more than tune log for the
O(tree) datastore we had previously.
See my list response for thoughts on the test layout [short story -
test_inv.py is where I would have added tests].
review +1
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-09-17 17:39:34 +0000 |
3 | +++ NEWS 2009-09-25 19:03:36 +0000 |
4 | @@ -8,6 +8,11 @@ |
5 | Bug Fixes |
6 | ********* |
7 | |
8 | +* Improve the time for ``bzr log DIR`` for 2a format repositories. |
9 | + We had been using the same code path as for <2a formats, which required |
10 | + iterating over all objects in all revisions. |
11 | + (John Arbash Meinel, #374730) |
12 | + |
13 | * Make sure that we unlock the tree if we fail to create a TreeTransform |
14 | object when doing a merge, and there is limbo, or pending-deletions |
15 | directory. (Gary van der Merwe, #427773) |
16 | |
17 | === modified file 'bzrlib/inventory.py' |
18 | --- bzrlib/inventory.py 2009-08-30 22:02:45 +0000 |
19 | +++ bzrlib/inventory.py 2009-09-25 19:03:36 +0000 |
20 | @@ -1197,6 +1197,14 @@ |
21 | raise errors.InconsistentDelta("<deleted>", parent_id, |
22 | "The file id was deleted but its children were not deleted.") |
23 | |
24 | + def create_by_apply_delta(self, inventory_delta, new_revision_id, |
25 | + propagate_caches=False): |
26 | + """See CHKInventory.create_by_apply_delta()""" |
27 | + new_inv = self.copy() |
28 | + new_inv.apply_delta(inventory_delta) |
29 | + new_inv.revision_id = new_revision_id |
30 | + return new_inv |
31 | + |
32 | def _set_root(self, ie): |
33 | self.root = ie |
34 | self._byid = {self.root.file_id: self.root} |
35 | @@ -1546,6 +1554,123 @@ |
36 | else: |
37 | raise ValueError("unknown kind %r" % entry.kind) |
38 | |
39 | + def _expand_fileids_to_parents_and_children(self, file_ids): |
40 | + """Give a more wholistic view starting with the given file_ids. |
41 | + |
42 | + For any file_id which maps to a directory, we will include all children |
43 | + of that directory. We will also include all directories which are |
44 | + parents of the given file_ids, but we will not include their children. |
45 | + |
46 | + eg: |
47 | + / # TREE_ROOT |
48 | + foo/ # foo-id |
49 | + baz # baz-id |
50 | + frob/ # frob-id |
51 | + fringle # fringle-id |
52 | + bar/ # bar-id |
53 | + bing # bing-id |
54 | + |
55 | + if given [foo-id] we will include |
56 | + TREE_ROOT as interesting parents |
57 | + and |
58 | + foo-id, baz-id, frob-id, fringle-id |
59 | + As interesting ids. |
60 | + """ |
61 | + interesting = set() |
62 | + # TODO: Pre-pass over the list of fileids to see if anything is already |
63 | + # deserialized in self._fileid_to_entry_cache |
64 | + |
65 | + directories_to_expand = set() |
66 | + children_of_parent_id = {} |
67 | + # It is okay if some of the fileids are missing |
68 | + for entry in self._getitems(file_ids): |
69 | + if entry.kind == 'directory': |
70 | + directories_to_expand.add(entry.file_id) |
71 | + interesting.add(entry.parent_id) |
72 | + children_of_parent_id.setdefault(entry.parent_id, [] |
73 | + ).append(entry.file_id) |
74 | + |
75 | + # Now, interesting has all of the direct parents, but not the |
76 | + # parents of those parents. It also may have some duplicates with |
77 | + # specific_fileids |
78 | + remaining_parents = interesting.difference(file_ids) |
79 | + # When we hit the TREE_ROOT, we'll get an interesting parent of None, |
80 | + # but we don't actually want to recurse into that |
81 | + interesting.add(None) # this will auto-filter it in the loop |
82 | + remaining_parents.discard(None) |
83 | + while remaining_parents: |
84 | + if None in remaining_parents: |
85 | + import pdb; pdb.set_trace() |
86 | + next_parents = set() |
87 | + for entry in self._getitems(remaining_parents): |
88 | + next_parents.add(entry.parent_id) |
89 | + children_of_parent_id.setdefault(entry.parent_id, [] |
90 | + ).append(entry.file_id) |
91 | + # Remove any search tips we've already processed |
92 | + remaining_parents = next_parents.difference(interesting) |
93 | + interesting.update(remaining_parents) |
94 | + # We should probably also .difference(directories_to_expand) |
95 | + interesting.update(file_ids) |
96 | + interesting.discard(None) |
97 | + while directories_to_expand: |
98 | + # Expand directories by looking in the |
99 | + # parent_id_basename_to_file_id map |
100 | + keys = [(f,) for f in directories_to_expand] |
101 | + directories_to_expand = set() |
102 | + items = self.parent_id_basename_to_file_id.iteritems(keys) |
103 | + next_file_ids = set([item[1] for item in items]) |
104 | + next_file_ids = next_file_ids.difference(interesting) |
105 | + interesting.update(next_file_ids) |
106 | + for entry in self._getitems(next_file_ids): |
107 | + if entry.kind == 'directory': |
108 | + directories_to_expand.add(entry.file_id) |
109 | + children_of_parent_id.setdefault(entry.parent_id, [] |
110 | + ).append(entry.file_id) |
111 | + return interesting, children_of_parent_id |
112 | + |
113 | + def filter(self, specific_fileids): |
114 | + """Get an inventory view filtered against a set of file-ids. |
115 | + |
116 | + Children of directories and parents are included. |
117 | + |
118 | + The result may or may not reference the underlying inventory |
119 | + so it should be treated as immutable. |
120 | + """ |
121 | + (interesting, |
122 | + parent_to_children) = self._expand_fileids_to_parents_and_children( |
123 | + specific_fileids) |
124 | + # There is some overlap here, but we assume that all interesting items |
125 | + # are in the _fileid_to_entry_cache because we had to read them to |
126 | + # determine if they were a dir we wanted to recurse, or just a file |
127 | + # This should give us all the entries we'll want to add, so start |
128 | + # adding |
129 | + other = Inventory(self.root_id) |
130 | + other.root.revision = self.root.revision |
131 | + other.revision_id = self.revision_id |
132 | + if not interesting or not parent_to_children: |
133 | + # empty filter, or filtering entrys that don't exist |
134 | + # (if even 1 existed, then we would have populated |
135 | + # parent_to_children with at least the tree root.) |
136 | + return other |
137 | + cache = self._fileid_to_entry_cache |
138 | + try: |
139 | + remaining_children = collections.deque(parent_to_children[self.root_id]) |
140 | + except: |
141 | + import pdb; pdb.set_trace() |
142 | + raise |
143 | + while remaining_children: |
144 | + file_id = remaining_children.popleft() |
145 | + ie = cache[file_id] |
146 | + if ie.kind == 'directory': |
147 | + ie = ie.copy() # We create a copy to depopulate the .children attribute |
148 | + # TODO: depending on the uses of 'other' we should probably alwyas |
149 | + # '.copy()' to prevent someone from mutating other and |
150 | + # invaliding our internal cache |
151 | + other.add(ie) |
152 | + if file_id in parent_to_children: |
153 | + remaining_children.extend(parent_to_children[file_id]) |
154 | + return other |
155 | + |
156 | @staticmethod |
157 | def _bytes_to_utf8name_key(bytes): |
158 | """Get the file_id, revision_id key out of bytes.""" |
159 | @@ -1885,6 +2010,27 @@ |
160 | # really we're passing an inventory, not a tree... |
161 | raise errors.NoSuchId(self, file_id) |
162 | |
163 | + def _getitems(self, file_ids): |
164 | + """Similar to __getitem__, but lets you query for multiple. |
165 | + |
166 | + The returned order is undefined. And currently if an item doesn't |
167 | + exist, it isn't included in the output. |
168 | + """ |
169 | + result = [] |
170 | + remaining = [] |
171 | + for file_id in file_ids: |
172 | + entry = self._fileid_to_entry_cache.get(file_id, None) |
173 | + if entry is None: |
174 | + remaining.append(file_id) |
175 | + else: |
176 | + result.append(entry) |
177 | + file_keys = [(f,) for f in remaining] |
178 | + for file_key, value in self.id_to_entry.iteritems(file_keys): |
179 | + entry = self._bytes_to_entry(value) |
180 | + result.append(entry) |
181 | + self._fileid_to_entry_cache[entry.file_id] = entry |
182 | + return result |
183 | + |
184 | def has_id(self, file_id): |
185 | # Perhaps have an explicit 'contains' method on CHKMap ? |
186 | if self._fileid_to_entry_cache.get(file_id, None) is not None: |
187 | |
188 | === modified file 'bzrlib/tests/per_inventory/__init__.py' |
189 | --- bzrlib/tests/per_inventory/__init__.py 2009-07-10 07:14:02 +0000 |
190 | +++ bzrlib/tests/per_inventory/__init__.py 2009-09-25 19:03:36 +0000 |
191 | @@ -16,15 +16,52 @@ |
192 | |
193 | """Tests for different inventory implementations""" |
194 | |
195 | -from bzrlib.tests import multiply_tests |
196 | +from bzrlib import ( |
197 | + groupcompress, |
198 | + tests, |
199 | + ) |
200 | |
201 | def load_tests(basic_tests, module, loader): |
202 | """Generate suite containing all parameterized tests""" |
203 | modules_to_test = [ |
204 | 'bzrlib.tests.per_inventory.basics', |
205 | ] |
206 | - from bzrlib.inventory import Inventory |
207 | - scenarios = [('Inventory', {'inventory_class':Inventory})] |
208 | + from bzrlib.inventory import Inventory, CHKInventory |
209 | + |
210 | + def inv_to_chk_inv(test, inv): |
211 | + """CHKInventory needs a backing VF, so we create one.""" |
212 | + factory = groupcompress.make_pack_factory(True, True, 1) |
213 | + trans = test.get_transport('chk-inv') |
214 | + trans.ensure_base() |
215 | + vf = factory(trans) |
216 | + # We intentionally use a non-standard maximum_size, so that we are more |
217 | + # likely to trigger splits, and get increased test coverage. |
218 | + chk_inv = CHKInventory.from_inventory(vf, inv, |
219 | + maximum_size=100, |
220 | + search_key_name='hash-255-way') |
221 | + return chk_inv |
222 | + scenarios = [('Inventory', {'_inventory_class': Inventory, |
223 | + '_inv_to_test_inv': lambda test, inv: inv |
224 | + }), |
225 | + ('CHKInventory', {'_inventory_class': CHKInventory, |
226 | + '_inv_to_test_inv': inv_to_chk_inv, |
227 | + })] |
228 | # add the tests for the sub modules |
229 | - return multiply_tests(loader.loadTestsFromModuleNames(modules_to_test), |
230 | + return tests.multiply_tests( |
231 | + loader.loadTestsFromModuleNames(modules_to_test), |
232 | scenarios, basic_tests) |
233 | + |
234 | + |
235 | +class TestCaseWithInventory(tests.TestCaseWithMemoryTransport): |
236 | + |
237 | + _inventory_class = None # set by load_tests |
238 | + _inv_to_test_inv = None # set by load_tests |
239 | + |
240 | + def make_test_inventory(self): |
241 | + """Return an instance of the Inventory class under test.""" |
242 | + return self._inventory_class() |
243 | + |
244 | + def inv_to_test_inv(self, inv): |
245 | + """Convert a regular Inventory object into an inventory under test.""" |
246 | + return self._inv_to_test_inv(self, inv) |
247 | + |
248 | |
249 | === modified file 'bzrlib/tests/per_inventory/basics.py' |
250 | --- bzrlib/tests/per_inventory/basics.py 2009-07-17 00:49:02 +0000 |
251 | +++ bzrlib/tests/per_inventory/basics.py 2009-09-25 19:03:36 +0000 |
252 | @@ -21,6 +21,8 @@ |
253 | |
254 | from bzrlib import ( |
255 | errors, |
256 | + inventory, |
257 | + osutils, |
258 | ) |
259 | |
260 | from bzrlib.inventory import ( |
261 | @@ -28,22 +30,36 @@ |
262 | InventoryEntry, |
263 | InventoryFile, |
264 | InventoryLink, |
265 | - ROOT_ID, |
266 | TreeReference, |
267 | ) |
268 | |
269 | -from bzrlib.tests import ( |
270 | - TestCase, |
271 | - ) |
272 | - |
273 | - |
274 | -class TestInventory(TestCase): |
275 | - |
276 | - def make_inventory(self, root_id): |
277 | - return self.inventory_class(root_id=root_id) |
278 | +from bzrlib.tests.per_inventory import TestCaseWithInventory |
279 | + |
280 | + |
281 | + |
282 | +class TestInventory(TestCaseWithInventory): |
283 | + |
284 | + def make_init_inventory(self): |
285 | + inv = inventory.Inventory('tree-root') |
286 | + inv.revision = 'initial-rev' |
287 | + inv.root.revision = 'initial-rev' |
288 | + return self.inv_to_test_inv(inv) |
289 | + |
290 | + def make_file(self, file_id, name, parent_id, content='content\n', |
291 | + revision='new-test-rev'): |
292 | + ie = InventoryFile(file_id, name, parent_id) |
293 | + ie.text_sha1 = osutils.sha_string(content) |
294 | + ie.text_size = len(content) |
295 | + ie.revision = revision |
296 | + return ie |
297 | + |
298 | + def make_link(self, file_id, name, parent_id, target='link-target\n'): |
299 | + ie = InventoryLink(file_id, name, parent_id) |
300 | + ie.symlink_target = target |
301 | + return ie |
302 | |
303 | def prepare_inv_with_nested_dirs(self): |
304 | - inv = self.make_inventory('tree-root') |
305 | + inv = inventory.Inventory('tree-root') |
306 | for args in [('src', 'directory', 'src-id'), |
307 | ('doc', 'directory', 'doc-id'), |
308 | ('src/hello.c', 'file', 'hello-id'), |
309 | @@ -53,172 +69,98 @@ |
310 | ('src/zz.c', 'file', 'zzc-id'), |
311 | ('src/sub/a', 'file', 'a-id'), |
312 | ('Makefile', 'file', 'makefile-id')]: |
313 | - inv.add_path(*args) |
314 | - return inv |
315 | - |
316 | - |
317 | -class TestInventoryUpdates(TestInventory): |
318 | - |
319 | - def test_creation_from_root_id(self): |
320 | - # iff a root id is passed to the constructor, a root directory is made |
321 | - inv = self.make_inventory(root_id='tree-root') |
322 | - self.assertNotEqual(None, inv.root) |
323 | - self.assertEqual('tree-root', inv.root.file_id) |
324 | - |
325 | - def test_add_path_of_root(self): |
326 | - # if no root id is given at creation time, there is no root directory |
327 | - inv = self.make_inventory(root_id=None) |
328 | - self.assertIs(None, inv.root) |
329 | - # add a root entry by adding its path |
330 | - ie = inv.add_path("", "directory", "my-root") |
331 | - self.assertEqual("my-root", ie.file_id) |
332 | - self.assertIs(ie, inv.root) |
333 | - |
334 | - def test_add_path(self): |
335 | - inv = self.make_inventory(root_id='tree_root') |
336 | - ie = inv.add_path('hello', 'file', 'hello-id') |
337 | - self.assertEqual('hello-id', ie.file_id) |
338 | - self.assertEqual('file', ie.kind) |
339 | - |
340 | - def test_copy(self): |
341 | - """Make sure copy() works and creates a deep copy.""" |
342 | - inv = self.make_inventory(root_id='some-tree-root') |
343 | - ie = inv.add_path('hello', 'file', 'hello-id') |
344 | - inv2 = inv.copy() |
345 | - inv.root.file_id = 'some-new-root' |
346 | - ie.name = 'file2' |
347 | - self.assertEqual('some-tree-root', inv2.root.file_id) |
348 | - self.assertEqual('hello', inv2['hello-id'].name) |
349 | - |
350 | - def test_copy_empty(self): |
351 | - """Make sure an empty inventory can be copied.""" |
352 | - inv = self.make_inventory(root_id=None) |
353 | - inv2 = inv.copy() |
354 | - self.assertIs(None, inv2.root) |
355 | - |
356 | - def test_copy_copies_root_revision(self): |
357 | - """Make sure the revision of the root gets copied.""" |
358 | - inv = self.make_inventory(root_id='someroot') |
359 | - inv.root.revision = 'therev' |
360 | - inv2 = inv.copy() |
361 | - self.assertEquals('someroot', inv2.root.file_id) |
362 | - self.assertEquals('therev', inv2.root.revision) |
363 | - |
364 | - def test_create_tree_reference(self): |
365 | - inv = self.make_inventory('tree-root-123') |
366 | - inv.add(TreeReference('nested-id', 'nested', parent_id='tree-root-123', |
367 | - revision='rev', reference_revision='rev2')) |
368 | - |
369 | - def test_error_encoding(self): |
370 | - inv = self.make_inventory('tree-root') |
371 | - inv.add(InventoryFile('a-id', u'\u1234', 'tree-root')) |
372 | - e = self.assertRaises(errors.InconsistentDelta, inv.add, |
373 | - InventoryFile('b-id', u'\u1234', 'tree-root')) |
374 | - self.assertContainsRe(str(e), r'\\u1234') |
375 | - |
376 | - def test_add_recursive(self): |
377 | - parent = InventoryDirectory('src-id', 'src', 'tree-root') |
378 | - child = InventoryFile('hello-id', 'hello.c', 'src-id') |
379 | - parent.children[child.file_id] = child |
380 | - inv = self.make_inventory('tree-root') |
381 | - inv.add(parent) |
382 | - self.assertEqual('src/hello.c', inv.id2path('hello-id')) |
383 | - |
384 | - |
385 | -class TestInventoryApplyDelta(TestInventory): |
386 | + ie = inv.add_path(*args) |
387 | + if args[1] == 'file': |
388 | + ie.text_sha1 = osutils.sha_string('content\n') |
389 | + ie.text_size = len('content\n') |
390 | + return self.inv_to_test_inv(inv) |
391 | + |
392 | + |
393 | +class TestInventoryCreateByApplyDelta(TestInventory): |
394 | """A subset of the inventory delta application tests. |
395 | |
396 | See test_inv which has comprehensive delta application tests for |
397 | - inventories, dirstate, and repository based inventories, unlike the tests |
398 | - here which only test in-memory implementations that can support a plain |
399 | - 'apply_delta'. |
400 | + inventories, dirstate, and repository based inventories. |
401 | """ |
402 | - |
403 | - def test_apply_delta_add(self): |
404 | - inv = self.make_inventory('tree-root') |
405 | - inv.apply_delta([ |
406 | - (None, "a", "a-id", InventoryFile('a-id', 'a', 'tree-root')), |
407 | - ]) |
408 | - self.assertEqual('a', inv.id2path('a-id')) |
409 | - |
410 | - def test_apply_delta_delete(self): |
411 | - inv = self.make_inventory('tree-root') |
412 | - inv.apply_delta([ |
413 | - (None, "a", "a-id", InventoryFile('a-id', 'a', 'tree-root')), |
414 | - ]) |
415 | - self.assertEqual('a', inv.id2path('a-id')) |
416 | - a_ie = inv['a-id'] |
417 | - inv.apply_delta([("a", None, "a-id", None)]) |
418 | + def test_add(self): |
419 | + inv = self.make_init_inventory() |
420 | + inv = inv.create_by_apply_delta([ |
421 | + (None, "a", "a-id", self.make_file('a-id', 'a', 'tree-root')), |
422 | + ], 'new-test-rev') |
423 | + self.assertEqual('a', inv.id2path('a-id')) |
424 | + |
425 | + def test_delete(self): |
426 | + inv = self.make_init_inventory() |
427 | + inv = inv.create_by_apply_delta([ |
428 | + (None, "a", "a-id", self.make_file('a-id', 'a', 'tree-root')), |
429 | + ], 'new-rev-1') |
430 | + self.assertEqual('a', inv.id2path('a-id')) |
431 | + inv = inv.create_by_apply_delta([ |
432 | + ("a", None, "a-id", None), |
433 | + ], 'new-rev-2') |
434 | self.assertRaises(errors.NoSuchId, inv.id2path, 'a-id') |
435 | |
436 | - def test_apply_delta_rename(self): |
437 | - inv = self.make_inventory('tree-root') |
438 | - inv.apply_delta([ |
439 | - (None, "a", "a-id", InventoryFile('a-id', 'a', 'tree-root')), |
440 | - ]) |
441 | + def test_rename(self): |
442 | + inv = self.make_init_inventory() |
443 | + inv = inv.create_by_apply_delta([ |
444 | + (None, "a", "a-id", self.make_file('a-id', 'a', 'tree-root')), |
445 | + ], 'new-rev-1') |
446 | self.assertEqual('a', inv.id2path('a-id')) |
447 | a_ie = inv['a-id'] |
448 | - b_ie = InventoryFile(a_ie.file_id, "b", a_ie.parent_id) |
449 | - inv.apply_delta([("a", "b", "a-id", b_ie)]) |
450 | + b_ie = self.make_file(a_ie.file_id, "b", a_ie.parent_id) |
451 | + inv = inv.create_by_apply_delta([("a", "b", "a-id", b_ie)], 'new-rev-2') |
452 | self.assertEqual("b", inv.id2path('a-id')) |
453 | |
454 | - def test_apply_delta_illegal(self): |
455 | + def test_illegal(self): |
456 | # A file-id cannot appear in a delta more than once |
457 | - inv = self.make_inventory('tree-root') |
458 | - self.assertRaises(errors.InconsistentDelta, inv.apply_delta, [ |
459 | - ("a", "a", "id-1", InventoryFile('id-1', 'a', 'tree-root')), |
460 | - ("a", "b", "id-1", InventoryFile('id-1', 'b', 'tree-root')), |
461 | - ]) |
462 | + inv = self.make_init_inventory() |
463 | + self.assertRaises(errors.InconsistentDelta, inv.create_by_apply_delta, [ |
464 | + (None, "a", "id-1", self.make_file('id-1', 'a', 'tree-root')), |
465 | + (None, "b", "id-1", self.make_file('id-1', 'b', 'tree-root')), |
466 | + ], 'new-rev-1') |
467 | |
468 | |
469 | class TestInventoryReads(TestInventory): |
470 | |
471 | def test_is_root(self): |
472 | """Ensure our root-checking code is accurate.""" |
473 | - inv = self.make_inventory('TREE_ROOT') |
474 | - self.assertTrue(inv.is_root('TREE_ROOT')) |
475 | + inv = self.make_init_inventory() |
476 | + self.assertTrue(inv.is_root('tree-root')) |
477 | self.assertFalse(inv.is_root('booga')) |
478 | - inv.root.file_id = 'booga' |
479 | + ie = inv['tree-root'].copy() |
480 | + ie.file_id = 'booga' |
481 | + inv = inv.create_by_apply_delta([("", None, "tree-root", None), |
482 | + (None, "", "booga", ie)], 'new-rev-2') |
483 | self.assertFalse(inv.is_root('TREE_ROOT')) |
484 | self.assertTrue(inv.is_root('booga')) |
485 | - # works properly even if no root is set |
486 | - inv.root = None |
487 | - self.assertFalse(inv.is_root('TREE_ROOT')) |
488 | - self.assertFalse(inv.is_root('booga')) |
489 | |
490 | def test_ids(self): |
491 | """Test detection of files within selected directories.""" |
492 | - inv = self.make_inventory(ROOT_ID) |
493 | + inv = inventory.Inventory('TREE_ROOT') |
494 | for args in [('src', 'directory', 'src-id'), |
495 | ('doc', 'directory', 'doc-id'), |
496 | ('src/hello.c', 'file'), |
497 | ('src/bye.c', 'file', 'bye-id'), |
498 | ('Makefile', 'file')]: |
499 | - inv.add_path(*args) |
500 | + ie = inv.add_path(*args) |
501 | + if args[1] == 'file': |
502 | + ie.text_sha1 = osutils.sha_string('content\n') |
503 | + ie.text_size = len('content\n') |
504 | + inv = self.inv_to_test_inv(inv) |
505 | self.assertEqual(inv.path2id('src'), 'src-id') |
506 | self.assertEqual(inv.path2id('src/bye.c'), 'bye-id') |
507 | - self.assert_('src-id' in inv) |
508 | + self.assertTrue('src-id' in inv) |
509 | |
510 | def test_non_directory_children(self): |
511 | """Test path2id when a parent directory has no children""" |
512 | - inv = self.make_inventory('tree_root') |
513 | - inv.add(InventoryFile('file-id','file', |
514 | - parent_id='tree_root')) |
515 | - inv.add(InventoryLink('link-id','link', |
516 | - parent_id='tree_root')) |
517 | + inv = inventory.Inventory('tree-root') |
518 | + inv.add(self.make_file('file-id','file', 'tree-root')) |
519 | + inv.add(self.make_link('link-id','link', 'tree-root')) |
520 | self.assertIs(None, inv.path2id('file/subfile')) |
521 | self.assertIs(None, inv.path2id('link/subfile')) |
522 | |
523 | def test_iter_entries(self): |
524 | - inv = self.make_inventory('tree-root') |
525 | - for args in [('src', 'directory', 'src-id'), |
526 | - ('doc', 'directory', 'doc-id'), |
527 | - ('src/hello.c', 'file', 'hello-id'), |
528 | - ('src/bye.c', 'file', 'bye-id'), |
529 | - ('src/sub', 'directory', 'sub-id'), |
530 | - ('src/sub/a', 'file', 'a-id'), |
531 | - ('Makefile', 'file', 'makefile-id')]: |
532 | - inv.add_path(*args) |
533 | + inv = self.prepare_inv_with_nested_dirs() |
534 | |
535 | # Test all entries |
536 | self.assertEqual([ |
537 | @@ -230,6 +172,8 @@ |
538 | ('src/hello.c', 'hello-id'), |
539 | ('src/sub', 'sub-id'), |
540 | ('src/sub/a', 'a-id'), |
541 | + ('src/zz.c', 'zzc-id'), |
542 | + ('zz', 'zz-id'), |
543 | ], [(path, ie.file_id) for path, ie in inv.iter_entries()]) |
544 | |
545 | # Test a subdirectory |
546 | @@ -238,6 +182,7 @@ |
547 | ('hello.c', 'hello-id'), |
548 | ('sub', 'sub-id'), |
549 | ('sub/a', 'a-id'), |
550 | + ('zz.c', 'zzc-id'), |
551 | ], [(path, ie.file_id) for path, ie in inv.iter_entries( |
552 | from_dir='src-id')]) |
553 | |
554 | @@ -247,6 +192,7 @@ |
555 | ('Makefile', 'makefile-id'), |
556 | ('doc', 'doc-id'), |
557 | ('src', 'src-id'), |
558 | + ('zz', 'zz-id'), |
559 | ], [(path, ie.file_id) for path, ie in inv.iter_entries( |
560 | recursive=False)]) |
561 | |
562 | @@ -255,24 +201,23 @@ |
563 | ('bye.c', 'bye-id'), |
564 | ('hello.c', 'hello-id'), |
565 | ('sub', 'sub-id'), |
566 | + ('zz.c', 'zzc-id'), |
567 | ], [(path, ie.file_id) for path, ie in inv.iter_entries( |
568 | from_dir='src-id', recursive=False)]) |
569 | |
570 | def test_iter_just_entries(self): |
571 | - inv = self.make_inventory('tree-root') |
572 | - for args in [('src', 'directory', 'src-id'), |
573 | - ('doc', 'directory', 'doc-id'), |
574 | - ('src/hello.c', 'file', 'hello-id'), |
575 | - ('src/bye.c', 'file', 'bye-id'), |
576 | - ('Makefile', 'file', 'makefile-id')]: |
577 | - inv.add_path(*args) |
578 | + inv = self.prepare_inv_with_nested_dirs() |
579 | self.assertEqual([ |
580 | + 'a-id', |
581 | 'bye-id', |
582 | 'doc-id', |
583 | 'hello-id', |
584 | 'makefile-id', |
585 | 'src-id', |
586 | + 'sub-id', |
587 | 'tree-root', |
588 | + 'zz-id', |
589 | + 'zzc-id', |
590 | ], sorted([ie.file_id for ie in inv.iter_just_entries()])) |
591 | |
592 | def test_iter_entries_by_dir(self): |
593 | @@ -387,3 +332,10 @@ |
594 | ('src/sub/a', 'a-id'), |
595 | ('src/zz.c', 'zzc-id'), |
596 | ], [(path, ie.file_id) for path, ie in new_inv.iter_entries()]) |
597 | + |
598 | + def test_inv_filter_entry_not_present(self): |
599 | + inv = self.prepare_inv_with_nested_dirs() |
600 | + new_inv = inv.filter(['not-present-id']) |
601 | + self.assertEqual([ |
602 | + ('', 'tree-root'), |
603 | + ], [(path, ie.file_id) for path, ie in new_inv.iter_entries()]) |
604 | |
605 | === modified file 'bzrlib/tests/test_inv.py' |
606 | --- bzrlib/tests/test_inv.py 2009-09-23 20:48:52 +0000 |
607 | +++ bzrlib/tests/test_inv.py 2009-09-25 19:03:36 +0000 |
608 | @@ -259,6 +259,76 @@ |
609 | return repo.get_inventory('result') |
610 | |
611 | |
612 | +class TestInventoryUpdates(TestCase): |
613 | + |
614 | + def test_creation_from_root_id(self): |
615 | + # iff a root id is passed to the constructor, a root directory is made |
616 | + inv = inventory.Inventory(root_id='tree-root') |
617 | + self.assertNotEqual(None, inv.root) |
618 | + self.assertEqual('tree-root', inv.root.file_id) |
619 | + |
620 | + def test_add_path_of_root(self): |
621 | + # if no root id is given at creation time, there is no root directory |
622 | + inv = inventory.Inventory(root_id=None) |
623 | + self.assertIs(None, inv.root) |
624 | + # add a root entry by adding its path |
625 | + ie = inv.add_path("", "directory", "my-root") |
626 | + ie.revision = 'test-rev' |
627 | + self.assertEqual("my-root", ie.file_id) |
628 | + self.assertIs(ie, inv.root) |
629 | + |
630 | + def test_add_path(self): |
631 | + inv = inventory.Inventory(root_id='tree_root') |
632 | + ie = inv.add_path('hello', 'file', 'hello-id') |
633 | + self.assertEqual('hello-id', ie.file_id) |
634 | + self.assertEqual('file', ie.kind) |
635 | + |
636 | + def test_copy(self): |
637 | + """Make sure copy() works and creates a deep copy.""" |
638 | + inv = inventory.Inventory(root_id='some-tree-root') |
639 | + ie = inv.add_path('hello', 'file', 'hello-id') |
640 | + inv2 = inv.copy() |
641 | + inv.root.file_id = 'some-new-root' |
642 | + ie.name = 'file2' |
643 | + self.assertEqual('some-tree-root', inv2.root.file_id) |
644 | + self.assertEqual('hello', inv2['hello-id'].name) |
645 | + |
646 | + def test_copy_empty(self): |
647 | + """Make sure an empty inventory can be copied.""" |
648 | + inv = inventory.Inventory(root_id=None) |
649 | + inv2 = inv.copy() |
650 | + self.assertIs(None, inv2.root) |
651 | + |
652 | + def test_copy_copies_root_revision(self): |
653 | + """Make sure the revision of the root gets copied.""" |
654 | + inv = inventory.Inventory(root_id='someroot') |
655 | + inv.root.revision = 'therev' |
656 | + inv2 = inv.copy() |
657 | + self.assertEquals('someroot', inv2.root.file_id) |
658 | + self.assertEquals('therev', inv2.root.revision) |
659 | + |
660 | + def test_create_tree_reference(self): |
661 | + inv = inventory.Inventory('tree-root-123') |
662 | + inv.add(TreeReference('nested-id', 'nested', parent_id='tree-root-123', |
663 | + revision='rev', reference_revision='rev2')) |
664 | + |
665 | + def test_error_encoding(self): |
666 | + inv = inventory.Inventory('tree-root') |
667 | + inv.add(InventoryFile('a-id', u'\u1234', 'tree-root')) |
668 | + e = self.assertRaises(errors.InconsistentDelta, inv.add, |
669 | + InventoryFile('b-id', u'\u1234', 'tree-root')) |
670 | + self.assertContainsRe(str(e), r'\\u1234') |
671 | + |
672 | + def test_add_recursive(self): |
673 | + parent = InventoryDirectory('src-id', 'src', 'tree-root') |
674 | + child = InventoryFile('hello-id', 'hello.c', 'src-id') |
675 | + parent.children[child.file_id] = child |
676 | + inv = inventory.Inventory('tree-root') |
677 | + inv.add(parent) |
678 | + self.assertEqual('src/hello.c', inv.id2path('hello-id')) |
679 | + |
680 | + |
681 | + |
682 | class TestDeltaApplication(TestCaseWithTransport): |
683 | |
684 | def get_empty_inventory(self, reference_inv=None): |
685 | @@ -505,6 +575,22 @@ |
686 | inv, delta) |
687 | |
688 | |
689 | +class TestInventory(TestCase): |
690 | + |
691 | + def test_is_root(self): |
692 | + """Ensure our root-checking code is accurate.""" |
693 | + inv = inventory.Inventory('TREE_ROOT') |
694 | + self.assertTrue(inv.is_root('TREE_ROOT')) |
695 | + self.assertFalse(inv.is_root('booga')) |
696 | + inv.root.file_id = 'booga' |
697 | + self.assertFalse(inv.is_root('TREE_ROOT')) |
698 | + self.assertTrue(inv.is_root('booga')) |
699 | + # works properly even if no root is set |
700 | + inv.root = None |
701 | + self.assertFalse(inv.is_root('TREE_ROOT')) |
702 | + self.assertFalse(inv.is_root('booga')) |
703 | + |
704 | + |
705 | class TestInventoryEntry(TestCase): |
706 | |
707 | def test_file_kind_character(self): |
708 | @@ -1128,3 +1214,135 @@ |
709 | self.assertIsInstance(ie2.name, unicode) |
710 | self.assertEqual(('tree\xce\xa9name', 'tree-root-id', 'tree-rev-id'), |
711 | inv._bytes_to_utf8name_key(bytes)) |
712 | + |
713 | + |
714 | +class TestCHKInventoryExpand(tests.TestCaseWithMemoryTransport): |
715 | + |
716 | + def get_chk_bytes(self): |
717 | + factory = groupcompress.make_pack_factory(True, True, 1) |
718 | + trans = self.get_transport('') |
719 | + return factory(trans) |
720 | + |
721 | + def make_dir(self, inv, name, parent_id): |
722 | + inv.add(inv.make_entry('directory', name, parent_id, name + '-id')) |
723 | + |
724 | + def make_file(self, inv, name, parent_id, content='content\n'): |
725 | + ie = inv.make_entry('file', name, parent_id, name + '-id') |
726 | + ie.text_sha1 = osutils.sha_string(content) |
727 | + ie.text_size = len(content) |
728 | + inv.add(ie) |
729 | + |
730 | + def make_simple_inventory(self): |
731 | + inv = Inventory('TREE_ROOT') |
732 | + inv.revision_id = "revid" |
733 | + inv.root.revision = "rootrev" |
734 | + # / TREE_ROOT |
735 | + # dir1/ dir1-id |
736 | + # sub-file1 sub-file1-id |
737 | + # sub-file2 sub-file2-id |
738 | + # sub-dir1/ sub-dir1-id |
739 | + # subsub-file1 subsub-file1-id |
740 | + # dir2/ dir2-id |
741 | + # sub2-file1 sub2-file1-id |
742 | + # top top-id |
743 | + self.make_dir(inv, 'dir1', 'TREE_ROOT') |
744 | + self.make_dir(inv, 'dir2', 'TREE_ROOT') |
745 | + self.make_dir(inv, 'sub-dir1', 'dir1-id') |
746 | + self.make_file(inv, 'top', 'TREE_ROOT') |
747 | + self.make_file(inv, 'sub-file1', 'dir1-id') |
748 | + self.make_file(inv, 'sub-file2', 'dir1-id') |
749 | + self.make_file(inv, 'subsub-file1', 'sub-dir1-id') |
750 | + self.make_file(inv, 'sub2-file1', 'dir2-id') |
751 | + chk_bytes = self.get_chk_bytes() |
752 | + # use a small maximum_size to force internal paging structures |
753 | + chk_inv = CHKInventory.from_inventory(chk_bytes, inv, |
754 | + maximum_size=100, |
755 | + search_key_name='hash-255-way') |
756 | + bytes = ''.join(chk_inv.to_lines()) |
757 | + return CHKInventory.deserialise(chk_bytes, bytes, ("revid",)) |
758 | + |
759 | + def assert_Getitems(self, expected_fileids, inv, file_ids): |
760 | + self.assertEqual(sorted(expected_fileids), |
761 | + sorted([ie.file_id for ie in inv._getitems(file_ids)])) |
762 | + |
763 | + def assertExpand(self, all_ids, inv, file_ids): |
764 | + (val_all_ids, |
765 | + val_children) = inv._expand_fileids_to_parents_and_children(file_ids) |
766 | + self.assertEqual(set(all_ids), val_all_ids) |
767 | + entries = inv._getitems(val_all_ids) |
768 | + expected_children = {} |
769 | + for entry in entries: |
770 | + s = expected_children.setdefault(entry.parent_id, []) |
771 | + s.append(entry.file_id) |
772 | + val_children = dict((k, sorted(v)) for k, v |
773 | + in val_children.iteritems()) |
774 | + expected_children = dict((k, sorted(v)) for k, v |
775 | + in expected_children.iteritems()) |
776 | + self.assertEqual(expected_children, val_children) |
777 | + |
778 | + def test_make_simple_inventory(self): |
779 | + inv = self.make_simple_inventory() |
780 | + layout = [] |
781 | + for path, entry in inv.iter_entries_by_dir(): |
782 | + layout.append((path, entry.file_id)) |
783 | + self.assertEqual([ |
784 | + ('', 'TREE_ROOT'), |
785 | + ('dir1', 'dir1-id'), |
786 | + ('dir2', 'dir2-id'), |
787 | + ('top', 'top-id'), |
788 | + ('dir1/sub-dir1', 'sub-dir1-id'), |
789 | + ('dir1/sub-file1', 'sub-file1-id'), |
790 | + ('dir1/sub-file2', 'sub-file2-id'), |
791 | + ('dir1/sub-dir1/subsub-file1', 'subsub-file1-id'), |
792 | + ('dir2/sub2-file1', 'sub2-file1-id'), |
793 | + ], layout) |
794 | + |
795 | + def test__getitems(self): |
796 | + inv = self.make_simple_inventory() |
797 | + # Reading from disk |
798 | + self.assert_Getitems(['dir1-id'], inv, ['dir1-id']) |
799 | + self.assertTrue('dir1-id' in inv._fileid_to_entry_cache) |
800 | + self.assertFalse('sub-file2-id' in inv._fileid_to_entry_cache) |
801 | + # From cache |
802 | + self.assert_Getitems(['dir1-id'], inv, ['dir1-id']) |
803 | + # Mixed |
804 | + self.assert_Getitems(['dir1-id', 'sub-file2-id'], inv, |
805 | + ['dir1-id', 'sub-file2-id']) |
806 | + self.assertTrue('dir1-id' in inv._fileid_to_entry_cache) |
807 | + self.assertTrue('sub-file2-id' in inv._fileid_to_entry_cache) |
808 | + |
809 | + def test_single_file(self): |
810 | + inv = self.make_simple_inventory() |
811 | + self.assertExpand(['TREE_ROOT', 'top-id'], inv, ['top-id']) |
812 | + |
813 | + def test_get_all_parents(self): |
814 | + inv = self.make_simple_inventory() |
815 | + self.assertExpand(['TREE_ROOT', 'dir1-id', 'sub-dir1-id', |
816 | + 'subsub-file1-id', |
817 | + ], inv, ['subsub-file1-id']) |
818 | + |
819 | + def test_get_children(self): |
820 | + inv = self.make_simple_inventory() |
821 | + self.assertExpand(['TREE_ROOT', 'dir1-id', 'sub-dir1-id', |
822 | + 'sub-file1-id', 'sub-file2-id', 'subsub-file1-id', |
823 | + ], inv, ['dir1-id']) |
824 | + |
825 | + def test_from_root(self): |
826 | + inv = self.make_simple_inventory() |
827 | + self.assertExpand(['TREE_ROOT', 'dir1-id', 'dir2-id', 'sub-dir1-id', |
828 | + 'sub-file1-id', 'sub-file2-id', 'sub2-file1-id', |
829 | + 'subsub-file1-id', 'top-id'], inv, ['TREE_ROOT']) |
830 | + |
831 | + def test_top_level_file(self): |
832 | + inv = self.make_simple_inventory() |
833 | + self.assertExpand(['TREE_ROOT', 'top-id'], inv, ['top-id']) |
834 | + |
835 | + def test_subsub_file(self): |
836 | + inv = self.make_simple_inventory() |
837 | + self.assertExpand(['TREE_ROOT', 'dir1-id', 'sub-dir1-id', |
838 | + 'subsub-file1-id'], inv, ['subsub-file1-id']) |
839 | + |
840 | + def test_sub_and_root(self): |
841 | + inv = self.make_simple_inventory() |
842 | + self.assertExpand(['TREE_ROOT', 'dir1-id', 'sub-dir1-id', 'top-id', |
843 | + 'subsub-file1-id'], inv, ['top-id', 'subsub-file1-id']) |
This is my 'phase 1' for making 'bzr log DIR' faster.
It implements a custom 'CHKInventory. filter( )' which only has to deal in the actual files that are requested, rather than always doing 'iter_entries()' and walking the whole tree.
This makes 'bzr log DIR' scale based on the number of entries in DIR. In my testing,
5m30s bzr.dev log tools
5m30s newbzr log bzrlib
0m36s newbzr log tools
So when the number of files is low, the time drops from 5m30s => 36s, which is great.
The next step is probably going to bypass 'filter()' for 'log DIR' and solely layer on top of 'iter_changes'. My initial (broken) testing showed that it stayed consistently around 36s for both bzrlib and tools, which is certainly a lot better.
I'd like to land this in the 2.0.x series, because it seems like a simple bug fix.
The size of the delta is a bit large, because it turned out that CHKInventory. filter( ) wasn't tested at all. (See my mailing list thread about what 'per_inventory' is actually supposed to be testing.) I went ahead and made 'per_inventory' actually care about both inventory implementations, and moved the tests around accordingly.