HostedFiles: Need way to get attachment filenames

Bug #281476 reported by Bryce Harrington
22
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

bug_attachment objects have a title field, which shows the description (if any) that the user typed in when creating the attachment, however there doesn't appear to be a way to obtain the original file name.

Looking at the attachments for bug 279026 for example, the first attachment is Xorg.0.log.old, but it's been given the title "Xorg log (Crash)".

The titles given to attachments tend to vary widely, since users can type in pretty much whatever they want. The original uploaded filename tends to be a bit more reliable since they (usually) don't rename the files before uploading.

(The principle motivation for having this is so I can evaluate New bugs with attachments, to see whether they've attached the required Xorg.0.log[.old] file, and if not, request that they do so. But I'm sure any tools needing to interact with attachments will like getting the original filename.)

Revision history for this message
Gavin Panella (allenap) wrote :

Attached is a workaround. The downside (as noted in the code too) is that the whole attachment must be downloaded to discover the filename, so this bug still ought to be addressed.

Revision history for this message
Gavin Panella (allenap) wrote :

This is really a bug in Launchpad Bugs so reassigning it there.

Changed in launchpadlib:
status: New → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks, that workaround actually works quite well in practice. I was worried about it possibly being slow in downloading the files, but actually it's not that bad at all.

Still would be nice to have a formal api call, but I consider this issue solved for my purposes. Thanks!

Revision history for this message
Leonard Richardson (leonardr) wrote :

When you GET an attachment URL in the web service you're redirected to a URL on the librarian. The last path element of the librarian URL is the file's current filename.

I could create a launchpadlib method HostedFile.stat() which returns the filename and content type. This would involve two HTTP requests: a GET request on the web service to get the filename, and a HEAD request to the librarian to get the content type. We'd avoid a GET to the librarian and wouldn't download the file.

Is this an interesting idea?

Revision history for this message
Steve Alexander (stevea) wrote :

Surely the content mime-type of the file is available in the database too?

Revision history for this message
Gavin Panella (allenap) wrote :

> Surely the content mime-type of the file is available in the
> database too?

It's on the ILibraryFileAlias, which is also where the original
filename is stored, and so is similarly obscured by the
Bytes/HostedFile mechanism. However, unlike the filename, it's not
available on the HostedFileBuffer object returned from
HostedFile.open().

> When you GET an attachment URL in the web service you're redirected
> to a URL on the librarian. The last path element of the librarian
> URL is the file's current filename.
...
> Is this an interesting idea?

It is, but it seems hackish too. It would break the consistency of the
API for clients not using launchpadlib.

Using attachments in launchpadlib is something like:

  >>> attachment = bug.attachments[0]
  >>> attachment_data = attachment.data
  >>> attachment_fd = attachment_data.open()
  >>> attachment_content = attachment_fd.read()
  >>> attachment_filename = attachment_fd.filename

To me at first it would seem intuitive to provide attachment.filename
and attachment.mime_type, but those attributes could legitimately
appear on attachment_data or attachment_fd too. Which is a little
confusing.

Now, the attachment_data step above corresponds to ILibraryFileAlias
(though it's not a regular API export of ILibraryFileAlias), so thats
where filename and mime_type should really be exposed.

The expose-attachment-filename-bug-281476 branch (diff:
<https://pastebin.canonical.com/11277/>) exposes the filename at the
attachment step because it's a regular API export and so easy to do,
and is not necessarily the best thing to do.

Revision history for this message
Leonard Richardson (leonardr) wrote :

We had an idea a while back about serving a JSON representation of a hosted file if you asked for one, and a redirect to the actual file if you didn't ask for any particular representation. Filename and content type were the kind of information that would go into the JSON representation. That would be a more consistent way to publish this data.

Changed in malone:
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

> When you GET an attachment URL in the web service you're redirected to a URL on the librarian. The last path element of the librarian URL is the file's current filename.

Could you provide example code for doing this, that produces the librarian URL?

Fwiw, while the workaround to this problem solved my original need (a non-interactive script for processing bugs), I've discovered for other tools I do need this feature, because the workaround is indeed too slow when dealing with bugs that have lots of files attached.

Bryce Harrington (bryce)
summary: - Need way to get attachment filenames
+ HostedFiles: Need way to get attachment filenames
tags: added: patch-tracking
Revision history for this message
Bryce Harrington (bryce) wrote :

Fwiw, getting the filename slows down run time quite a bit:

        for a in attachments:
            try:
                dbg_run_time(start_time, "+ a-start")
                a_file_name = a.filename
                dbg_run_time(start_time, "+ a-get-filename")
                a_title = a.title
                dbg_run_time(start_time, "+ a-get-title")
                a_owner = a.owner.display_name
                dbg_run_time(start_time, "+ a-get-owner")
                a_age = a.age
                dbg_run_time(start_time, "+ a-get-age")
                a_ispatch = a.is_patch()
                dbg_run_time(start_time, "+ a-get-is-patch")
                a_url = a.url
                dbg_run_time(start_time, "+ a-get-url")

[Wed Mar 09 21:30:02 2011] [error] [client 74.107.147.166] [56.855120] process-attachment
[Wed Mar 09 21:30:06 2011] [error] [client 74.107.147.166] [60.330161] + a-start
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.712297] + a-get-filename
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.712457] + a-get-title
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.713700] + a-get-owner
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.715955] + a-get-age
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.716996] + a-get-is-patch
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.717967] + a-get-url
[Wed Mar 09 21:30:07 2011] [error] [client 74.107.147.166] [61.718162] process-attachment
[Wed Mar 09 21:30:12 2011] [error] [client 74.107.147.166] [66.407895] + a-start
[Wed Mar 09 21:30:14 2011] [error] [client 74.107.147.166] [68.737491] + a-get-filename
[Wed Mar 09 21:30:14 2011] [error] [client 74.107.147.166] [68.737662] + a-get-title
[Wed Mar 09 21:30:14 2011] [error] [client 74.107.147.166] [68.738906] + a-get-owner
[Wed Mar 09 21:30:14 2011] [error] [client 74.107.147.166] [68.739992] + a-get-age
[Wed Mar 09 21:30:14 2011] [error] [client 74.107.147.166] [68.741006] + a-get-is-patch
[Wed Mar 09 21:30:14 2011] [error] [client 74.107.147.166] [68.741972] + a-get-url

Just retrieving the filename alone takes 1.5-2.0 seconds per attachment. Many bugs I process easily have a couple dozen attachments (from apport), so this can be a rather big performance hit.

Gavin Panella (allenap)
Changed in launchpad:
importance: Medium → High
Revision history for this message
Robert Collins (lifeless) wrote :

FWIW I think we should just include the relevant data that we have in the info for the attachment, and *not* ever issue a redirect from the appservers; instead launchpadlib can follow a url link in the attachment. This would be cleaner and more consistent IMNSHO, as well as making it possible to batch deliver info on 20 or 30 attachments at once.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.