Comment 36 for bug 405251

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 405251] Re: Huge data transfers/bad performance OVERALL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> Pull and push performance is still abyssmal. I'm currently waiting for a trivial push containing a single small commit (the repo is one commit behind). It has transferred 15MB and is now pausing for no appearent reason; I'm already waiting for 15 minutes. The amount of data changed is very, very small and nowhere near the 15MB transferred.
> Yesterday I attempted a pull of a week's worth of commits; it took 4 hours and transferred some 300MB of data - the evening was a total loss. My colleagues work hard but *not* that hard 8-/.
>
...

> so essentially it seems to check if data is arriving and it's doing
> nothing at all. I checked the target directory and it is locked by that
> smart server process; and it was locked by my push (I checked that
> beforehand). It seems like a nice deadly embrace: the server is waiting
> for data from the client and vice versa.
>
> I am ready to start crying now.
>

There should be no case that we take out a write lock on the source
repository in order to read data.

I really don't know why the remote server would be trying to take out a
lock on your local machine. I'll try to investigate the callgrind, but I
can tell you already that 91% of the time is spent in "UNTIL_NO_EINTR"
which is just waiting for data on a socket.

Trying to pull that out, we end up with about 1.8M ticks in
"until_no_eintr", and that leaves ~94k ticks actually doing processing.
I *think* a tick is 0.1ms, so that is 94s or 1m34s sending data.

Now *what* is it doing during that time? It is possible that your push
is triggering an autopack on the server. And because of the size of your
server and memory available (and the earlier caching concern) the server
is basically thrashing on repacking the repository.

One way around it is to go to the server and manually issue "bzr pack".
Which forces a manual one, and then should decrease the frequency with
which you will see future autopacks. With 14k revisions, the next major
pack you should see shouldn't occur until after 20k revisions.

The 785k entries I mentioned earlier. That is the number of "unique
content texts" present in your repository.

So say you had 100k files, and each one was modified (and committed) 7
times. That would be 700k entries (text keys is our internal term for it).

You mentioned that this repository:
1) Used to have a lot more than ~10k files
2) Had multiple independent ancestries pulled together.
3) Has about 10-20k revisions.

I'm not positive about the specifics, but if you have that many text
keys, and only 10k revisions, that means that you average almost 80
files changing for every commit. Which is generally quite surprising.
Almost universally across projects the average is <10, and quite often
it is down around 5 or so. (This tends to happen because Humans don't
usually work on 80-files simultaneously on *every* commit.)

I could be tremendously wrong on how many revisions you have, but if the
'head -n5' result you posted is accurate, I'm only seeing 3+6789
revisions, which would imply 115 changes per commit.

If some of this bloat is due to files that you have since removed from
the tree, you might consider using a tool like "filter-branch" to
actually regenerate a new history with the extra files removed from the
ancestry.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp7Kn8ACgkQJdeBCYSNAAOeVQCfYzV31K2GQIUqhzB3MFnR85ix
czEAnitfXaakFJ3f2riYXN+57UUpXLFX
=PEy9
-----END PGP SIGNATURE-----