Comment 4 for bug 524123

Revision history for this message
James Westby (james-w) wrote : Re: [Bug 524123] Re: import_package re-downloads files multiple times

On Fri, 19 Feb 2010 02:59:17 -0000, John A Meinel <email address hidden> wrote:
> So if we can trust it by distribution, I could just do:
>
> === modified file 'import_package.py'
> --- import_package.py 2010-02-18 20:26:19 +0000
> +++ import_package.py 2010-02-19 02:54:41 +0000
> @@ -558,7 +558,7 @@
> extract_upstream_branch(update_db, upstream_dir)
> dl_dir = tempfile.mkdtemp()
> try:
> - local_dsc_path = dget(importp.get_url(), temp_dir,
> + local_dsc_path = dget(importp.get_url(), temp_dir + distro,
> possible_transports=possible_transports)
> update_db.import_package(local_dsc_path,
> use_time_from_changelog=True)
>
>
> And then change 'dget' so that if the file already exists in that directory, it skips the download.

That would work fine.

> Alternatively, we could always download the .dsc file (its pretty small and likely different each time), and just check the hash of the file on disk versus the requested file. That is likely to be pretty easy, given that the .dsc already includes the hash.
> It means re-reading the file on disk (unless we cache that), but that is better than downloading and *writing* that file again.
>
> It wouldn't require changing anything on disk that way, just reading an
> already present file.

This is the more elegant solution though. It's how things are supposed
to work :-)

I would be happy to see either really to save the headache that you
talk about.

Thanks,

James