import_package re-downloads files multiple times
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Distributed Development |
Fix Released
|
Medium
|
John A Meinel |
Bug Description
I'm trying to run an 'import_package' locally. And it seems to try to be smart about branch history, by using shared repositories, etc.
However, while attempting a local import of "gnome-panel", I see >700MB of data transferred. (This is backed up by running -Dhttp and grepping the Content-Length: sections.)
When I look at the log of files downloaded, I see:
3765.689 fetching https+urllib:
3795.972 fetching https+urllib:
3824.341 fetching https+urllib:
3875.749 fetching https+urllib:
3900.679 fetching https+urllib:
3924.091 fetching https+urllib:
3958.858 fetching https+urllib:
If you look at it, this is because there are potentially many Ubuntu packages based on the same orig.tar.gz. It doesn't seem to care that the file exists locally with the same name. (It would overwrite it each time.)
This is also being run from within a single process, so it would likely be able to say "I just downloaded that, trust that it is accurate".
Now the full url is different, so maybe we can't trust it?
Related branches
- Ubuntu Distributed Development Developers: Pending requested
-
Diff: 103 lines (+49/-13)1 file modifiedimport_package.py (+49/-13)
Changed in udd: | |
status: | In Progress → Fix Released |
On Thu, 18 Feb 2010 23:13:53 -0000, John A Meinel <email address hidden> wrote:
> If you look at it, this is because there are potentially many Ubuntu
> packages based on the same orig.tar.gz. It doesn't seem to care that the
> file exists locally with the same name. (It would overwrite it each
> time.)
Oops.
> This is also being run from within a single process, so it would likely
> be able to say "I just downloaded that, trust that it is accurate".
>
> Now the full url is different, so maybe we can't trust it?
We can as long as we don't trust it to be the same from different
distributions.
A cache based on hashes would be perfectly safe, but a little more work.
Without looking I don't know whether the cross-distribution requirement
means it would be just as easy to do the cache.
Thanks,
James