after conversion chk formats are badly packed

Bug #376748 reported by Robert Collins
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
Unassigned
Bazaar Git Plugin
Fix Released
Undecided
Jelmer Vernooij
Bazaar Subversion Plugin
Fix Released
Undecided
Jelmer Vernooij

Bug Description

When brisbane-core formats are written into by conversion operations, many chk pages can be created in separate (and therefor not-compressed) operations. This can be fixe by doing a partial pack of the new data after the operation completes.

Some samples we have are bzr pulls going up to 3GB (pack takes it to 33MB), pulls from bzr-svn going up to 11GB (in 2 packs, pack takes it down to 11MB) and so on.

As we can be sure of the circumstances this is a case where doing a partial pack and removing the files we wrote in the original transaction rather than keeping them in obsolete_packs would be a good idea.

InterDifferingSerializer avoids this problem, but it isn't fast to use outside of local conversions.

Revision history for this message
John A Meinel (jameinel) wrote :

I just did a conversion of bzr.dev, and after conversion it isn't particularly terrible. I believe the converted size was 65MB, and post-pack was 32MB. So about 2:1...

While that is bad, it isn't 1.7GiB => 150MiB that we saw with a different project. Which leads me to believe that the 1.7GiB conversion wasn't being done with InterDifferingSerializer, but was instead done via the plain fetch...

Just a thought, though.

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 376748] Re: after conversion chk formats are badly packed

On Fri, May 15, 2009 at 01:57:26PM -0000, John A Meinel wrote:
> I just did a conversion of bzr.dev, and after conversion it isn't
> particularly terrible. I believe the converted size was 65MB, and post-
> pack was 32MB. So about 2:1...

> While that is bad, it isn't 1.7GiB => 150MiB that we saw with a
> different project. Which leads me to believe that the 1.7GiB conversion
> wasn't being done with InterDifferingSerializer, but was instead done
> via the plain fetch...

This particular one was happening when importing from git into Bazaar.
bzr-git fetches happen on a per-revision basis, so multiple versions
of the same file aren't imported in a row (which I would imagine would
be the most optimal ?). The same problem would occur for svn I
imagine.

Cheers,

Jelmer

Revision history for this message
John A Meinel (jameinel) wrote :

well, it depends on how you layer the 'commit_write_groups'. InterDifferingSerializer holds a write group across 100 revs, which causes autopack to occur every 1000 revs. If, instead, you hold the write group across all 50k revs, then yes, things will be bad.

In fact, I think under those circumstances it could expand all texts to fulltexts. It sort of depends what gets put into a single stream. (A group only survives a single 'insert_record_stream', so calling VF.add_lines() repeatedly will give all fulltexts.)

Revision history for this message
Robert Collins (lifeless) wrote :

Bumping to critical, this will cause serious issues with 2.0 if not fixed.

Changed in bzr:
importance: High → Critical
Changed in bzr:
milestone: none → 2.0
description: updated
Revision history for this message
John A Meinel (jameinel) wrote :

So *with* IDS, they are at least reasonably packed after the conversion/fetch/whatever. They are not *optimally* packed, but they are not grotesquely bloated, either.

However, as you are trying to nuke IDS anyway, you are welcome to figure out how to handle the packing problem at the same time.

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-06-11 at 14:27 +0000, John A Meinel wrote:
> So *with* IDS, they are at least reasonably packed after the
> conversion/fetch/whatever. They are not *optimally* packed, but they are
> not grotesquely bloated, either.
>
> However, as you are trying to nuke IDS anyway, you are welcome to figure
> out how to handle the packing problem at the same time.

I think doing a partial pack of the new data is a sane answer:
- in new conversions it will repack everything
- in incremental pulls it will repack only the pulled data

I agree that we don't really have to repack after an IDS conversion,
however I think its nice to give a really good impression:).

-Rob

Changed in bzr:
milestone: 2.0 → 1.17
status: Triaged → Fix Released
Revision history for this message
Robert Collins (lifeless) wrote :

I've added a bzr-svn task, because the fix involves an optional API that bzr-svn needs to start calling. Jelmer - it should be obvious by looking at rev 4470 in bzr.dev.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is fixed in bzr-svn and bzr-git.

  affects bzr-svn
  status fixreleased

  affects bzr-git
  status fixreleased
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iJwEAQECAAYFAkpDXzMACgkQDLQl4QYPZuW9+QP/ZOHi/FcONNm+kMyF7bnqdzp6
ZbowofHxVqe10/8AnYwOCyS35XdBFb7PQoGkSO+XSK3xWw/ROxCp27rHc++xha9/
1Fw2qBRO5wVff2Ay20FU4Ej89I7sOsIKqO6JT6DLCIFR2rvFgyP1ndDgCnFHUy4u
GDoLezfDfYAVvBYmf5w=
=+AUL
-----END PGP SIGNATURE-----

Changed in bzr-svn:
status: New → Fix Released
Revision history for this message
Martin Pool (mbp) wrote :

The current 1.16.1 NEWS shows this as merged there, but is that actually true? I'm checking.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> The current 1.16.1 NEWS shows this as merged there, but is that actually
> true? I'm checking.
>

I see:

bzr cat http://bazaar-vcs.org/bzr/bzr.1.16/bzrlib/repository.py:

...
    def pack(self, hint=None):
...
        :param hint: If not supplied, the whole repository is packed.
            If supplied, the repository may use the hint parameter as a
            hint for the parts of the repository to pack. A hint can be
            obtained from the result of commit_write_group(). Out of
            date hints are simply ignored, because concurrent operations
            can obsolete them rapidly.
        """
...

The hinting was done as part of this work. So I assume Michael Hudson
cherrypicked across the pack after-conversion-fetch code.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpE/KEACgkQJdeBCYSNAAODKgCgqqer5pAo9+ls8/QGGeB0KgRv
ArYAoM872+Si+cORR+4yZtfqGG58gRSd
=gS2A
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

So I just realized.
Robert fixed bug #365615 as a side-effect of working on this bug.
Basically, he just started handling an AbsentContentFactory during autopack because he ran into it while testing the fix for this bug.

So yeah, its in 1.16.1

Changed in bzr:
milestone: 1.17 → 1.16.1
Jelmer Vernooij (jelmer)
Changed in bzr-git:
assignee: nobody → Jelmer Vernooij (jelmer)
Changed in bzr-svn:
assignee: nobody → Jelmer Vernooij (jelmer)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.