no way to download mbox of mailing list archives

Bug #588411 reported by Michael Gliwinski
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
Low
Unassigned

Bug Description

Please provide ability to download lauchpad teams mailing list archives in mbox/maildir format (as available for lists on lists.ubuntu.com via https://lists.ubuntu.com/archives/).

--

As a workaround, if you have an bona fide reason to need the archives, please ask in <https://answers.launchpad.net/launchpad/+addquestion> explaining who you are and what you need.

Curtis Hovey (sinzui)
affects: launchpad → launchpad-registry
Changed in launchpad-registry:
status: New → Triaged
importance: Undecided → Low
tags: added: ml-archive-sucks
Curtis Hovey (sinzui)
tags: added: feature
Revision history for this message
Stefano Maffulli (smaffulli) wrote :

Archives are important for community managers like myself to gather statistics about the mailing lists. Tools like mlstats need to download an mbox in order to populate a database with raw data.

Martin Pool (mbp)
description: updated
summary: - downloading of mailing list archives
+ no way to download mbox of mailing list archives
Revision history for this message
Curtis Hovey (sinzui) wrote :

This feature is challenging to build because the mailman-mhonarc list server is separate from the launchpad webapp. Mailman polls Lp for work or asks list/membership via XMLRPC. Extending the current architecture requires:
* A means to store a request for an archive (usually a db table)
* A new launchpad MailmanXMLRPC method to queries the archive table for work.
* A new launchpad MailmanXMLRPC method to send archives to Lp which
  immediately emails it to the requesting user.
* A new mailman XMLRPCQueueRunner method that calls the Lp work method then
  calls the other method to get the archive if the user is a team member.

This is not a good design since the user membership and the list can change between the time of the request and when the request is serviced. I do not think any user really wants 100 MB of mbox data.

A better approach would be get the mbox directly from the list archive server. This could involve hacking the obsolete mhonarc PERL code, but not sane person wants to do that, and no Lp engineer is willing to entertain that.

There is work in progress to finish Mailman 3 and provide a sane archiver <http://db42.wordpress.com/tag/mailman/>. The proposed archiver could be extended to service archive requests.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 588411] Re: no way to download mbox of mailing list archives

I rather suspect we need a web-scale archiver: I -do not want- to be
running sqlite or whoosh at the scales of data we're looking at,
particularly with cross-list searches.

As far as this bug is concerned, I would say that yes, the mbox should
be obtained by a backend API call to get the mbox from the archive
server, and that it has nothing in common with the browsable archive &
search aspects.

Revision history for this message
Stefano Maffulli (smaffulli) wrote :

Can't you make it as simple as running a weekly cron job that splits the mbox archive and gzips it? And expose just a URL on Launchpad UI. That would be enough for my use case.

Revision history for this message
Curtis Hovey (sinzui) wrote :

We will make is simple for team admins to get the mbox or a slice of the mobx. We are solving Lp's mail archive issues with a new project called grackle (https://launchpad.net/grackle) This project will solve many issues that plague pipermail and mhonarc.

Team admins will be able to get the mbox using a URL like
    https://launchpad.net/~switch/+mailinglist-archive/.mbox

Or get get a slice
    https://launchpad.net/~switch/+mailinglist-archive/.mbox?date_range=2012-01-01..2012-01-31

You may not need the mbox if your goal is to get statistics. Grackle will let you request just the headers for a date range to gather counts about messages, threads, and authors.
    https://launchpad.net/~switch/+mailinglist-archive?date_range=2012-01-01..2012-01-31&headers=headers_only

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.