log dir is slow in development-rich-root

Bug #374730 reported by Robert Collins
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
John A Meinel
2.0
Fix Released
Medium
John A Meinel

Bug Description

The optimisations for packs pessimise for development-rich-root; more details to come from Ian and John.

Related branches

Revision history for this message
John A Meinel (jameinel) wrote :

Ian and I have a basic outline of how to improve this.

At the moment, 'log DIR' is implemented by iterating over all files in every revision, and checking to see if it was modified.
Instead, we should look at the list of changes (which can now be computed quite quickly), and see if any of those are underneath $DIR.

Changed in bzr:
assignee: nobody → Ian Clatworthy (ian-clatworthy)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :
Revision history for this message
Darren Worrall (dazworrall) wrote :

What's your progress on this Ian? Any chance your enhancements will be ready for 2.0?

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Re: [Bug 374730] Re: log dir is slow in development-rich-root

Darren Worrall wrote:
> What's your progress on this Ian? Any chance your enhancements will be
> ready for 2.0?
>

It's been on the backburner while I've been working on other things for
2.0: easier migration, better GUI tools and docs. I'll unassign myself
from this bug given I'm not currently working on it.

It won't be (directly) fixed in 2.0 though other performance
improvements might have had a positive side effect. Is the bug impacting
you? If so, perhaps you could collect some profiling data and attach it
to the bug report?

Ian C.

Changed in bzr:
assignee: Ian Clatworthy (ian-clatworthy) → nobody
Revision history for this message
Darren Worrall (dazworrall) wrote :

It is yes, I'm looking to migrate a large svn repository to to bzr, and top of the complaint list is the lack of response of log <dir>

Once 2.0 is released I'll collect some profile data and report back :)

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2009-09-23 at 09:41 +0000, Darren Worrall wrote:
>
> Once 2.0 is released I'll collect some profile data and report
> back :)

Its finalised in every respect except having pervasive availability of
binaries. So try now :)

-Rob

Revision history for this message
Darren Worrall (dazworrall) wrote :

Ok, I used bzr svn-import --all to branch a repository with 2000+ revisions twice, once using bzr 1.18, and once using bzr 2.0 (both using bzr-svn 1.0.0rc1), into that versions' default repository format, and for me at least, 2.0 is actually slower:

------- bzr-1.18 -------
bzr --version
Bazaar (bzr) 1.18
<snip>

bzr info
Repository tree (format: rich-root-pack)
<snip>

time bzr log MyDir
real 6m0.226s
user 5m57.994s
sys 0m0.288s
---------------------------

------- bzr-2.0 ---------
bzr --version
Bazaar (bzr) 2.0.0
<snip>

bzr info
Repository tree (format: 2a)
<snip>

time bzr log MyDir
real 8m25.398s
user 8m23.995s
sys 0m0.584s
---------------------------

Attaching lsprof output...

Revision history for this message
Darren Worrall (dazworrall) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Darren Worrall wrote:
> Ok, I used bzr svn-import --all to branch a repository with 2000+
> revisions twice, once using bzr 1.18, and once using bzr 2.0 (both using
> bzr-svn 1.0.0rc1), into that versions' default repository format, and
> for me at least, 2.0 is actually slower:
>
> ------- bzr-1.18 -------
> bzr --version
> Bazaar (bzr) 1.18
> <snip>
>
> bzr info
> Repository tree (format: rich-root-pack)
> <snip>
>
> time bzr log MyDir
> real 6m0.226s
> user 5m57.994s
> sys 0m0.288s
> ---------------------------
>
> ------- bzr-2.0 ---------
> bzr --version
> Bazaar (bzr) 2.0.0
> <snip>
>
> bzr info
> Repository tree (format: 2a)
> <snip>
>
> time bzr log MyDir
> real 8m25.398s
> user 8m23.995s
> sys 0m0.584s
> ---------------------------
>
> Attaching lsprof output...
>

so for starters, did they both give the expected output?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkq6gEIACgkQJdeBCYSNAAOyzgCgyf9d7+I0FdgZMXlPK6pEVUjO
+tkAoL/kv5inOf4Je8ITj1UOJaA53s8H
=pyRX
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

looking at the lsprof of 2.0, it is because it is using the method from 1.18 and applying it directly to the 2.0 data format, which is bogus.

Specifically, the 1.18 method is to read in the entire inventory, then filter out the data which is not in the subdirectory, then compute the remaining delta.

In 2.0 the proper way to do it is to compute the delta, then filter out what is not in the subdirectory.

Reading the whole inventory in 2.0 is slower than in 1.18, but the tradeoff is that we can compute deltas *without* reading the whole inventory, which makes it a lot faster when done properly.

I remember raising the issue in the past, obviously the fixes never got pulled into the codebase.

Thanks for poking us on this bug.

Revision history for this message
Darren Worrall (dazworrall) wrote :

No worries :)

The output is identical, I'll verify against the original svn repository when I get back in the office tomorrow. It certainly looks correct but it needs a script to look at it rather than just my eye :)

If there are patches waiting to be merged I'll happily test them and report back the results.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

John A Meinel wrote:

> In 2.0 the proper way to do it is to compute the delta, then filter out
> what is not in the subdirectory.

Right. I got as far as recording our IRC conversation and experimenting
with some code but I never got as far as generating code worthy of
submitting as a patch.

Ian C.

Revision history for this message
Darren Worrall (dazworrall) wrote :

The output of both bzr log commands contains the same versions as svn log does.

I know it's apples and oranges, but in terms of the competition this is what I'm up against:

time svn log MyDir
real 0m3.024s
user 0m0.012s
sys 0m0.004s

I'll leave these checkouts as is so we can use them as a benchmark in future.

Revision history for this message
Craig Hewetson (craighewetson-deactivatedaccount) wrote :

I did a bzr log dir on two branches. A 0.92 formatted and a 2a formatted branch.
Using Bazaar 2.1dev

The time the log took on 0.92 branch:
12minutes 11 seconds

On the 2a branch:
18minutes 6 seconds

In a windows XP VM.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John A Meinel wrote:
>
>> In 2.0 the proper way to do it is to compute the delta, then filter out
>> what is not in the subdirectory.
>
> Right. I got as far as recording our IRC conversation and experimenting
> with some code but I never got as far as generating code worthy of
> submitting as a patch.
>
> Ian C.
>

I've got lp:~jameinel/bzr/2.0.1-faster-log-dir-bug374730

In progress.

In the first pass, I'm just trying to make the "CHKInventory.filter()"
have a specific implementation, rather than the generic.

Though in doing so, I noticed that "per_inventory" tests aren't actually
running more than 1 permutation yet, so I'm fixing that up before the
above branch will really be ready for merging.

(Note that the CHKInventory.filter() right now is intentionally broken,
so I'll know when I have the tests running properly.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkq7mQUACgkQJdeBCYSNAAMcSgCgmf4WI4SFXC3wo4KSZPrm+w9j
jVUAn2KBBhW+ZZ1f6iM3BLtRr2Y2uWsq
=R1AE
-----END PGP SIGNATURE-----

John A Meinel (jameinel)
Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
John A Meinel (jameinel)
Changed in bzr:
status: In Progress → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :

There is still more that can be done, a 'phase-1' of 'bzr log DIR' has been improved dramatically in my branch.

"time bzr log -n0 --no-aliases tools"

went from
  real 5m16.959s
down to
  real 0m37.888s

This was done with just changing "get_deltas_for_revisions()" to not call filter() but to just always do the iter_changes code. I believe that actually means the resulting log will be wrong, because I'm not applying the specific file_ids anywhere, but it certainly was an interesting result.

I'm reverting that change (for now), and trying a different tack. Namely changing the CHKInventory.filter() operation to find the subset of files that were interesting, and then only extracting those entries from disk and putting them into an Inventory.

Note that doing it this way means it will scale by num_revisions * num_files_in_directory. While the other method should scale by O(changes-to-the-tree).
Ideally we'd combine the two and end up scaling as O(changes-to-those-files). The data on disk is all there to do so, we just don't have the apis built to support it yet.

Revision history for this message
Craig Hewetson (craighewetson-deactivatedaccount) wrote :

I tested the fix and I can confirm that it definitely improved the performance of bzr log <dir>.

I got dozens of revisions instantly and the entire log completed in 1m12s! Compared to the previous 18minutes

I suppose this change will only be in 2.0.1?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Craig Hewetson wrote:
> I tested the fix and I can confirm that it definitely improved the
> performance of bzr log <dir>.
>
> I got dozens of revisions instantly and the entire log completed in
> 1m12s! Compared to the previous 18minutes

and 15min for <2a format, right?

>
> I suppose this change will only be in 2.0.1?
>

It certainly won't be in 2.0.0 :). I'm hoping the change is sufficiently
small to make it into 2.0.1. It will definitely be in bzr.dev and
2.1.0b1 once I get it reviewed, etc.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkq71ZUACgkQJdeBCYSNAAPAYACeMWurCeHm4VZwXr9HsGkaoXZ8
uyAAnjKlvqj2iA/b8TBFxD4lAhikojOZ
=sj7a
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

So here is a comparison point. For 'log tools' on bzr.dev it was 5m30s. 'log tools' using an updated 'filter()' method was 36s. 'log tools' layered on top of 'iter_changes [slightly incorrectly]' was also 36s.

However 'log bzrlib' with filter() was still 5m31s (the bulk of our files are under bzrlib), versus still 36s when layered incorrectly on top of iter_changes.

So certainly I think iter_changes() is the way to go ultimately, but that will be the next phase. I also want to look at whether I can cache the code that expands "tools" => tools/and/everything/else, because that doesn't change much between revisions.

Revision history for this message
Darren Worrall (dazworrall) wrote :

Certainly an improvement on speed here: real 1m5.191s

You were right about the revisions too. With your patch I'm getting 4x more revisions reported than with stock 2.0 or 1.18 (*almost* all the revisions, 2 short, oddly).

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Darren Worrall wrote:
> Certainly an improvement on speed here: real 1m5.191s
>
> You were right about the revisions too. With your patch I'm getting 4x
> more revisions reported than with stock 2.0 or 1.18 (*almost* all the
> revisions, 2 short, oddly).
>

Yeah, there was a bug in there. You should try the current version,
which has been updated a bit. You want revno 4677.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkq76ucACgkQJdeBCYSNAAOqYgCfdz5DrzWBfw/1z8xxkYkMmL2L
SyEAn3dQaRnP8iaqE9uUcQ/XwJ/W1rzn
=48Xp
-----END PGP SIGNATURE-----

Revision history for this message
Craig Hewetson (craighewetson-deactivatedaccount) wrote :

The latest version has a slight decrease in performance (from 1 minute to 3
minutes)
I suppose its because it is working getting all the revisions this time.

Revision history for this message
Darren Worrall (dazworrall) wrote :

Fwiw it does report the correct revisions now vs 1.18/2.0

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Craig Hewetson wrote:
> The latest version has a slight decrease in performance (from 1 minute to 3
> minutes)
> I suppose its because it is working getting all the revisions this time.
>

So revno 4677 works by pulling out the subset of files underneath 'dir'
for each revision, and then comparing them to see if anything has
changed. This means that 'bzr log BIGDIR' is slower than 'bzr log
SMALLDIR'. (in testing bzrlib vs tools, it was 5m30s vs 36s, which is a
lot.)

I've got another branch going:
  lp:///~jameinel/bzr/2.0.1-faster-get-deltas-bug374730

Which changes things at a different level. So that instead of extracting
the subset of files, it computes the changes for each revision, and then
deltas those.

In the 'ideal' case, you would take the intersection of both, so while
you are culling wrt changes you also cull wrt the subset of files you
care about. I'm not 100% sure of the benefit, just because of the
specifics of the data storage.

At the moment, the new method is around 50s for tools (slower), but
bzrlib is 2m10s (faster).

We're still a ways off of 'ideal', but hopefully I'll be able to poke
around a bit more.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkq8ynwACgkQJdeBCYSNAAN2ZQCeM/jlluj5RBQ0z+UFX5wSF5ga
4RMAoIk844C60bIMa38xbLhbI98abTYU
=dg1w
-----END PGP SIGNATURE-----

Revision history for this message
Darren Worrall (dazworrall) wrote :

Sounds good :)

Given you're still experimenting, should this really be fix committed?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Darren Worrall wrote:
> Sounds good :)
>
> Given you're still experimenting, should this really be fix committed?
>

Fix 'phase 1' has been submitted for review and approved (I'll probably
send it for merging later today). That doesn't mean you can't do more
work on something.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkq84GoACgkQJdeBCYSNAAMC3gCdEGn+8vsUbp4MrXQJhry+G/EG
iNwAoNkbU1Y28A5zhibXZpNJIiTpHca/
=HnWI
-----END PGP SIGNATURE-----

Revision history for this message
Craig Hewetson (craighewetson-deactivatedaccount) wrote :

John,

Are going to you create another bug for phase 2? So that it can serve as a reminder for you and something we can subscribe to.
And, has this fix been released? If yes, in 2.0.x?

Revision history for this message
John A Meinel (jameinel) wrote :

I went ahead and spun off bug #503071 to detail future work that could be done to speed up 'bzr log DIR'.

This was landed in 2.0.1 and 2.1.0b1, but it isn't really worth re-opening those milestones.

Changed in bzr:
milestone: none → 2.1.0
status: Fix Committed → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

Technically 2.0.1, but close enough.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.