get_stream_for_missing_keys does not include inventory CHK pages.

Bug #406686 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
Andrew Bennetts

Bug Description

StreamSource.get_stream_for_missing_keys does not include inventory CHK pages. It needs to otherwise when a stacked repo asks for adjacent inventories to permit deltas, the inventories won't be complete enough to permit deltas. Currently we speculatively include this data from the client in get_stream.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 406686] [NEW] get_stream_for_missing_keys does not include inventory CHK pages.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Public bug reported:
>
> StreamSource.get_stream_for_missing_keys does not include inventory CHK
> pages. It needs to otherwise when a stacked repo asks for adjacent
> inventories to permit deltas, the inventories won't be complete enough
> to permit deltas. Currently we speculatively include this data from the
> client in get_stream.
>
> ** Affects: bzr
> Importance: Undecided
> Status: New
>

So I'll mention that at least in original testing this wasn't needed.

We send:
 1) The minimal chk pages based on a delta comparison
 2) When we need fallback inventories we send *all possible referenced*
pages from the parent inventories.

As an example, say you have a CHK layout of

root1 => [leaf1, leaf2] # rev1
root2 => [leaf1, leaf3] # rev2

Then for the "get_stream()" we would send

root2, leaf3

and then get_stream_for_missing_keys, we would request #rev1-complete
and send

root1 => [leaf1, leaf2]

I wasn't able to find a way that those two combined wouldn't send
everything referenced from root2 (either it changed and we sent it, or
it didn't change and it is contained in the parent's full reference.)

If you can find a way that this isn't true, we can absolutely
re-evaluate it.

And I agree that the generic StreamSource should always be correct.
(Perhaps this is where the issue is?)

Andrew removed the code path to do CHK => CHK fetching because it had a
trivial bug in it, which made it clear it wasn't being called *by the
test suite*.

My current understanding is we want to get inventory delta streaming
working, and then we don't send CHK pages anyway in the generic code
path. So spending too much time here may not be worthwhile.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkpw/2wACgkQJdeBCYSNAANDwgCfWddM6ZKH+FiT1HEIkS1ul1Vm
dLAAoLd6SRGhd8Sg82kWV50iWxKwIQw6
=Gr1u
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-07-30 at 02:03 +0000, John A Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1

> We send:
> 1) The minimal chk pages based on a delta comparison
> 2) When we need fallback inventories we send *all possible referenced*
> pages from the parent inventories.

Where does 2) happen - I couldn't find it. Is it in
self._get_inventory_stream ?

Ah yes, it is. So the problem is that the *generic* code path doesn't do
the right thing here.

We can fix it for now by teaching RemoteStreamSource to make a new
StreamSource and implement get_stream_for_missing_keys itself.

-Rob

Changed in bzr:
importance: Undecided → Critical
milestone: none → 2.0
status: New → Triaged
Revision history for this message
Andrew Bennetts (spiv) wrote :

It turns out I already had a fix for this in my inventory-delta branch.

Changed in bzr:
assignee: nobody → Andrew Bennetts (spiv)
status: Triaged → Fix Committed
Andrew Bennetts (spiv)
Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.