KnitRepository.insert_data_stream() copies data in improper order

Bug #205156 reported by John A Meinel
4
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
Unassigned

Bug Description

Looking at the implementation of "Repository.insert_data_stream()" it is copying the inventory texts to disk as soon as they come in, rather than waiting for the texts to be written.
This is valid for pack repositories, because we will abort the whole pack if it is incomplete. But for Knit repositories if it is interrupted, then Bazaar believes everything was properly copied (because we used "if the Revision is present then the Inventory is, and if the Inventory is, then all Texts are.) We still assume that with packs, but when it is violated we don't add it to the list of pack-names.

This is critical as it makes ^C during a bzr+ssh:// push of knit repositories corrupt the target repo.

John A Meinel (jameinel)
Changed in bzr:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
John A Meinel (jameinel) wrote :

Someone should confirm the output stream sequence. My cursory double check seems to say that the data stream sent by the server should be reasonable.

Specifically, get_data_stream seems to be layered on top of Repository.item_keys_introduced_by() whose default implementation seems to stream the inventories after the file texts.

However, we still have some real-world cases where it seems to have streamed in the wrong order.

My quick test of creating a --knit with bzr.dev, and then committing a couple and manually stepping the 'get_data_stream()' iterator also says that we copy file texts first.

So we still need to figure out how ^C during a push could copy the inventories, but not the file texts.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 205156] Re: KnitRepository.insert_data_stream() copies data in improper order

On Sun, 2008-03-23 at 17:32 +0000, John A Meinel wrote:
> Someone should confirm the output stream sequence. My cursory double
> check seems to say that the data stream sent by the server should be
> reasonable.
>
> Specifically, get_data_stream seems to be layered on top of
> Repository.item_keys_introduced_by() whose default implementation seems
> to stream the inventories after the file texts.
>
> However, we still have some real-world cases where it seems to have
> streamed in the wrong order.
>
> My quick test of creating a --knit with bzr.dev, and then committing a
> couple and manually stepping the 'get_data_stream()' iterator also says
> that we copy file texts first.
>
> So we still need to figure out how ^C during a push could copy the
> inventories, but not the file texts.

I suggest for the short term we assert in the smart server that the
output is correctly ordered. However in the long term we want pack to
pack operations to stream by packs not by fileid so this is
inappropriate; instead we need a capability flag from the client to say
that its ok not to be ordered.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
John A Meinel (jameinel) wrote :

What I do see is that we copy the 'revisions' data in non topological order. Which can create ghosts in the revisions.kndx if you ^C in the middle. It still doesn't explain what I saw 'in the wild' with inventories present but not file texts, but I do know there is at least one bug here.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm attaching a script which lets you copy all revisions in one repository over to another. This will fill in ghosts, etc.

Revision history for this message
Robert Collins (lifeless) wrote :

insert_data_stream was deleted, so the bug is irrelevant now

Changed in bzr:
status: Triaged → Won't Fix
Jelmer Vernooij (jelmer)
Changed in bzr:
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.