Misleading "Content-Encoding: gzip" header on downloads

Bug #173096 reported by Steve McMahon
32
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Brad Crittenden

Bug Description

When .tar.gz (or .tgz) files are downloaded, they are being sent with "Content-Encoding: gzip" headers that cause all browsers to unpack the file on download. Since the files keep their .tar.gz or .tgz filename extensions, this results in a situation that will confound many users.

Please note that this is not a MIME problem. There are no MIME headers in the server response for the download. Sending a MIME header indicating gzip would not be a problem.

IMHO, this is not the intended use of the "Content-Encoding" header.

Tags: lp-bugs
Brad Crittenden (bac)
Changed in malone:
assignee: nobody → bradcrittenden
importance: Undecided → Critical
milestone: none → 1.1.12
status: New → Confirmed
Revision history for this message
Andrew Bennetts (spiv) wrote :

Our admins tell us that this is being done by the apache instance in front of the launchpadlibrarian.net web server. We want to keep this behaviour when serving build logs; it's extremely useful for Ubuntu developers to be able to view build logs in their web browser with zero effor.t

We should be able to disable the mod_mime magic (or whatever) in apache, and explicitly set or not the encoding in the launchpadlibrarian.net web server, as appropriate for the file being served. For files from the download service, we should never set Content-Encoding. For package build logs, we should set it.

Revision history for this message
James Henstridge (jamesh) wrote :

We've confirmed that there is an "AddEncoding x-gzip gz tgz" directive in the configuration of the apache frontend for the librarian.

As the Ubuntu developers find it useful for the build logs to have "Content-encoding: gzip", removing it outright probably isn't the best idea. The best short term solution is probably to change it to "AddEncoding x-gzip .txt.gz", which should match the build logs but not .tar.gz downloads.

The long term solution is:
 1. store the correct Content-Encoding for library files when creating them
 2. fix up all the stored build logs to say that they are gzipped
 3. fix the librarian to send the correct content encoding
 4. fix Apache to nevert add a Content-Encoding header

Revision history for this message
Brad Crittenden (bac) wrote :

Additionally the content-type stored in the librarian may be incorrect for files linked from the productreleasefiles table. These content-types should be set based upon the file extension.

Revision history for this message
James Henstridge (jamesh) wrote :

Brad: I don't think that's the issue for Plone files. Going through the apache frontend, I see:

    $ telnet launchpadlibrarian.net 80
    Trying 91.189.90.235...
    Connected to launchpadlibrarian.net.
    Escape character is '^]'.
    HEAD /10596639/Plone-3.0.3.tar.gz HTTP/1.1
    Host: launchpadlibrarian.net
    Accept-Encoding: gzip
    Connection: close

    HTTP/1.1 200 OK
    Date: Wed, 05 Dec 2007 03:45:34 GMT
    Server: TwistedWeb/2.4.0
    Content-length: 12425371
    Last-modified: Tue, 27 Nov 2007 23:17:52 GMT
    Content-type: 1
    Via: 1.1 launchpadlibrarian.net
    Connection: close
    Content-Encoding: gzip

    Connection closed by foreign host.

If I skip the apache front end and talk directly to the librarian, I get the following response (using an identical request):

    HTTP/1.1 200 OK
    Content-length: 12425371
    Server: TwistedWeb/2.4.0
    Last-modified: Tue, 27 Nov 2007 23:17:52 GMT
    Connection: close
    Date: Wed, 05 Dec 2007 03:45:11 GMT
    Content-type: 1

We are sending a bogus content-type (something that should be fixed), but that is not the cause of the Content-Encoding header.

Revision history for this message
Wichert Akkerman (wichert) wrote :

Can that Content-Encoding header pleased be removed? It is becoming increasingly painful for our users: they are now downloading something that does not work for them since the extension does not match the file contents.

Revision history for this message
James Troup (elmo) wrote :

Unfortunately, 'AddEncoding x-gzip .txt.gz' doesn't work - I can only
assume apache is doing something like url.split(".")[-1] and matching
on that for the extension argument.

Obviously, we should do the non-immediate fix of having the librarian
send the right Content-Encoding itself, but in the mean time someone else
needs to make the call of whether we break build logs for Ubuntu but
fix tar.gz downloads for everyone.

Revision history for this message
Eleanor Berger (intellectronica) wrote :

I think we should disable the content-encoding. Projects offering their users downloads are compromised by this, and the behaviour is different than expectation.

Even if we don't implement the db-backed solution straight away, we can still hardcode this in the librarian for .txt.gz files and have a fix out within a short time.

Revision history for this message
Steve McMahon (stevemcmahon) wrote :

Could you use a different virtual host name for the case where you need the Content-Encoding header?

Brad Crittenden (bac)
Changed in malone:
status: Confirmed → In Progress
Revision history for this message
James Henstridge (jamesh) wrote :

James: here's an alternative that seems to do the trick for serving static files in my local tests:

<Files buildlog_*.txt.gz>
  AddEncoding x-gzip gz
</Files>

This applies the AddEncoding directive only to files matching the given filename pattern. It didn't affect any .tar.gz files in my tests. Could you check to see if it works for responses proxied from the librarian?

Revision history for this message
Brad Crittenden (bac) wrote :

Changes to the librarian to set the encoding for ".txt.gz" files has been committed to RF 5309.

We will try the Apache config suggested above by itself and then request a CP for the above branch if it does not work. If the Apache-only solution works the changes to the librarian should be backed out.

Revision history for this message
Brad Crittenden (bac) wrote :

The Apache configuration has been modified and the problem is fixed. The librarian changes will remain as there is no need to back them out at this point. A better solution is proposed in bug 174204.

Changed in malone:
status: In Progress → Fix Committed
Revision history for this message
Steve McMahon (stevemcmahon) wrote :

Confirmed fixed. Thank you very much!

Brad Crittenden (bac)
Changed in malone:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.