Comment 26 for bug 582157

Revision history for this message
Andrew Bennetts (spiv) wrote : Re: [Bug 582157] Re: pull from launchpad is slow

Martin Pool wrote:
> That's certainly much better. More could be done but perhaps this
> specific bug should be closed, and we can identify some others to do
> next.
>
> A few questions:
>
> * how close is 170kB/s to saturating your local link?

Not very. IIRC in theory I ought to be able to do something like 10x
that, but when the data is coming from the other side of the world I
never expect that.

> * are there any obviously inefficient hpss calls (there probably are
> too many graph calls, and that would be clearly a different bug)

HPSS calls: 16 (1 vfs)

The sequence is, in summary:

  get(/~launchpad-pqm/launchpad/devel) -> ReadError
  BzrDir.open_2.1(~launchpad-pqm/launchpad/devel/)
  BzrDir.open_branchV3(~launchpad-pqm/launchpad/devel/)
  BzrDir.find_repositoryV3(~launchpad-pqm/launchpad/devel/)
  Branch.get_stacked_on_url(~launchpad-pqm/launchpad/devel/) -> NotStacked
  Branch.last_revision_info(~launchpad-pqm/launchpad/devel/)
  Repository.get_rev_id_for_revno(
    ~launchpad-pqm/launchpad/devel/', 10876,
    (11636, '<email address hidden>'))
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_stream_1.19(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Branch.get_tags_bytes(...)

The two post-get_stream get_parent_map calls look a bit funny to me,
maybe related to a missing keys check?

Considering the number of revisions involved 7 get_parent_maps doesn't
seem excessive, though.

There's in principle some improvement we could make in the number of
calls involved in opening the bzrdir/branch/repo, including the
get_stacked_on_url and last_revision_info in the open_branch result for
example, but there's a bit of friction with the bzrlib APIs in doing
that all at once. Adding (and using) something like
"open_read_locked_branch(url)" in the API might be a fairly simple way
to reduce that friction?

> * is 11.5MB a reasonable size for the amount of actual change in that
> repo from r10721 to r10876?

Good question. I'm not sure of a convenient way to measure that, but
that fetch did add 1968 new revisions, so that's something like <6kB per
revision, which sounds pretty good!

Alternatively, comparing a 'bzr pack'ed repo of before and after the
fetch the size difference of .bzr/repository is 2.7M.

So, I guess the answer here is: not bad, but about 4x larger than
optimal for our current compression... but of course achieving optimal
compression is pretty expensive. And we'll inevitably have some
overhead for discovery of where we differ from the remote repository.

[...]
> Taking 28.7s before starting to send bulk data is a bit high.

It is, although 10s of that run was waiting for the SSH connection to be
established (possibly including DNS lookups as the SSH handshake itself
— a second run only took 7.5s to establish a SSH connection), and it's
hard to do much about that.