Integration of Mailman and htdig for archiving

Bug #266554 reported by Ppsys
4
Affects Status Importance Assigned to Milestone
GNU Mailman
Confirmed
Low
Barry Warsaw

Bug Description

This patch is applicable to Mailman 2.0.6 release that
has had search enhancement patch 444879 patch
installed - if your Defaults.py has the
ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE
in it then you've got that patch.

It replaces earlier patches 401670 and 402423 and is
mainly to correct some problems arising from fixes
introduced into Mailman by bug fix releases since the
402423 patch.

This patch integrates htdig with Mailman and provides:

1. per list search facility with a search form on the
list's TOC page.

2. maintenance of privacy of private archives which
requires the user to establish their credentials via
the normal private archive access before any access
via htdig is allowed.

3. a common base URL for both public and private
archive access via htsearch results so that htdig
indices are unaffected by changingan archive from
private to public and vice versa. All access to
archives via htdig is controlled by a new wrapped cgi-
bin script called htdig.py.

4. a new cron activated script and extra crontab entry
which runs htdig regularly to maintain the per list
search indices.

5. automatic creation, deletion and maintenance of
htdig configuration files and such. Beyond installing
htdig and telling Mailman where it is via mm_cfg you
do not have to do any other setup. Well not quite you
do have to set up a single per installation symlink to
allow htdig to find the automatically generated per
list htdig configuration files.

You probably want to run this patch as follows:

cd <mailman 2.0.6 untarred and unzipped directory>
patch -p1 < <this patch file>

[http://sourceforge.net/tracker/index.php?func=detail&aid=444884&group_id=103&atid=300103]

Revision history for this message
Ppsys (ppsys) wrote :

The htdig-2.0.6-03.patch version of the patch makes some
previously hard-coded things configurable and enhances the
capability to run the htdig searches and indexing on a
different machine to the one delivering Mailman and
Mailman's web UI.

Revision history for this message
Ppsys (ppsys) wrote :

This patch should also apply without problems to Mm 2.0.7

Revision history for this message
Ppsys (ppsys) wrote :

This patch should also apply without problems to MM 2.0.8

Revision history for this message
Ppsys (ppsys) wrote :

The htdig-2.0.8-0.1.patch version of the patch resolves a problem that can
arise with htdig indexing if the
web_page_url for a list uses other than the http addressing (some folks
want to use https). While specified
as for MM 2.0.8 the revised patch should work OK with 2.0.7, 2.0.6 and
probably back as far as 2.0.3. If
you do not have the requirement for using other than http addressing in
you lists web_page_urls it probably
isn't worth the trouble of upgrading to this patch level.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1a3-0.1.patch is a revised version of the patch that is compatible
with the code published in
mailman-2.1a3.tgz on sourceforge.

The only known deficiency is that the non-English versions of files under
$build/templates still contain text
in English and need translations I cannot do. Also the necessary pygettext
activity and subsequent
translations in files under $build/messages remain to be done.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1cvs-20011217.patch is a revised version of the
patch that is compatible with the code published in mailman
CVS on sourceforge as 11:50 GMT 17 Dec 2001

The only known deficiency is that the non-English versions
of files under $build/templates still contain text in
English and need translations I cannot do. Also the
necessary pygettext activity and subsequent translations in
files under $build/messages remain to be done.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1cvs-20020306.patch is a revised version of the patch that is
compatible with the code published in
mailman CVS on sourceforge as 12:30 GMT 6 Mar 2002

Known deficiency is that the non-English versions of files under
$build/templates still contain text in English
and need translations I cannot do. Also the necessary pygettext activity
and subsequent translations in
files under $build/messages remain to be done.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.9-0.1.patch is a revised version of the patch
that is compatible with MM 2.0.9

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.10-0.1.patch is a revised version of the patch
that is compatible with MM 2.0.10

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.11-0.1.patch is a revised version of the patch that
is compatible with MM 2.0.11

This version removes an incompatibility with Python 2.2
which caused warning messages to be generated when any
of the family cron/nightly_htdig scripts were run.

Some guidance on file access permissions for some htdig
database files needed by rundig have been added to
installation notes.

Revision history for this message
Ppsys (ppsys) wrote : Patch revised for MM 2.0.11 compatibility + minor Python 2.2 related fix

Other attachments

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.12-0.1.patch is a revised version of the patch that
applies without complaint to MM 2.0.12.

It also add a facility for adding site wide htdig configuration
attributes to all list specific htdig configuration files.

Revision history for this message
Ppsys (ppsys) wrote :

Do not use htdig-2.0.12-0.1.patch there is an error in it.
Use htdig-2.0.12-0.2.patch instead

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.13-0.1.patch is purely cosmetic to get no mumble
application to MM 2.0.13

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b2-0.1.patch is a revised version of the patch
that is compatible with MM 2.1b2

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.13-0.2.patch just adds a GPL notice to the patch

Revision history for this message
Barry Warsaw (barry) wrote :

I've sent Richard some comments off-line about this patch.

Meta comments: the 2.0.x patches can't be officially
supported, but I'm going to create an unofficial patches
page off the wiki for where the 2.0 patches can be migrated.

I think this patch set is too big for MM2.1, but if it's
cleaned up as per my private message, let's re-evaluate it
for MM2.2 (or 3.0).

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b3-0.1.patch is a revised version of the patch that is
compatible with MM 2.1b3

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b3-0.2.patch corrects a dumb syntax error in htdig-
2.1b3-0.1.patch which will typically show up as logged errors
in the operation of the ArchRunner qrunner at line 721 of
HyperArch.py

Revision history for this message
Cmackinlay (cmackinlay) wrote :

Just rebuilt MM as 2.1b3 with htdig.
Upgraded lists which had htdig before work fine
New lists give the obvious error:
  Unable to read word database file
  Did you run htmerge?
Running the cronjob doesn't fix as it used to, message is:
  Output from command /usr/bin/python -
S /usr/local/mailman/cron/nightly_htdig ..

Traceback (most recent call last):
  File "/usr/local/mailman/cron/nightly_htdig", line 153, in ?
    main()
  File "/usr/local/mailman/cron/nightly_htdig", line 118, in
main
    file(rundig_run_file, 'w').close()
NameError: global name 'file' is not defined

The archive/htdig folder only contains the xx.conf file, but no
db.xx files

If I copy in db.xx files from another list then the problem goes
away (except I've now got an invalid set of references!)

Is this my elementary error or is it more sinister?!

Revision history for this message
Cmackinlay (cmackinlay) wrote :

Got a workaround!

The line referred to in the traceback:
 file(rundig_run_file, 'w').close()
is used to create a 'rundig_last_run' file of lenght 0 bytes
Creating this manually (or copying it) means the line isn't
called and everything seems to work.

Either file() is not a valid function call or my python is broken -
I'm not literate enough in python to know the answer though!

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b3-0.3.patch removes use of the file() function, used
instead of the open() function, in three cron scripts added by
the patch. Use of the file() function created an unnecessary
dependency on Python 2.2

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b4-0.1.patch is a revised version of the patch that is
compatible with MM 2.1b4

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b5-0.1.patch is a revised version of the patch that is
compatible with MM 2.1b5

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.0.13-0.3.patch corrects a minor typo in text appearing
in the list TOC after the patch is applied.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1b6-0.1.patch is a revised version of the patch that is
compatible with MM 2.1b6

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1-0.1.patch is a revised version of the patch that is
compatible with MM 2.1

Revision history for this message
Ppsys (ppsys) wrote :

mailer-2.0.13-0.4.patch improves the content type and
security handling by htdig.py for MM 2.0.13 version of patch

Revision history for this message
Ppsys (ppsys) wrote :

Uploaded wrong file mailer-2.0.13-0.4.patch on last attempt.

Should have been htdig-2.0.13-0.4.patch which improves the
content type and security handling by htdig.py for MM 2.0.13
version of patch.

Please ignore mailer-2.0.13-0.4.patch file

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1-0.2.patch corrects a bug in htdig.py and deals with
an adverse interaction between htdig.py and a bug in
$prefix/scripts/driver (see #668685 for a patch to fix this).

It also improves the content type and security handling by
htdig.py for MM 2.1 version of patch

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1-0.3.patch corrects yet another bug in htdig.py. Hope
that all of them!

Stops use of obsolete config variable DEFAULT_HOST in
several files.

Revision history for this message
Ppsys (ppsys) wrote :

It seems it is possible, if this patch is installed, for a list's
htdig conf file and the list specific htdig index db files to be
read directly through the web interface for list archives.

Even if this patch isn't installed it seems a list's pipermail.pck
file can also be read directly through the web interface for list
archives.

This seems to be true for accesses via /pipermail for public
lists and via /mailman/private for private lists.

The problem does not occur for htdig search results
accessed via /mailman/htdig as the htdig.py script is more
protective than private.py

Broadly speaking the data affected is availble to a user in
normal operation which is why I do not consider the issue to
be a security breach as such.

Adding the following RewriteRule to Apache's httpd.conf
prevents the situation, assuming you got the RewriteEngine
On:

RewriteRule ^(/pipermail/.*)/(pipermail.pck|htdig/[^/]*)$
$1/index.html [F]

RewriteRule ^(/mailman/private/.*)/(pipermail.pck|htdig/[^/]*)$
$1/index.htm
l [F]

You could, of course, substitute an R flag for the F flag on the
RewriteRules and be more hacker friendly.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.1-0.1.patch.gz introduces no functional change but
applies without offset warnings to MM 2.1.1

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.1-0.2.patch.gz close a security exploit which allows
leakage of information held in htdig's per-list search indexes
to users not authorized to view private list archives.

Read file INSTALL.htdig-mm installed by this patch for details
and instructions for upgrading MM installations using earlier
versions of this patch

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.1-0.3.patch.gz fixes a fault when mmsearch.py is
rasing an excpetion because it has had a problem running
htsearch

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.1-0.4.patch.gz fixes a problem with mmsearch
handling multi-page search results from htsearch.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.2-0.1.patch.gz is a revised version for MM 2.1.2
compatibility.

It also incoporates a previosuly unpublished change to
overcome a potential problem with htdig excluced urls - see
the INSTALL.htdig-mm file for more information

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.2-0.2.patch.gz corrects error in file uploaded as
htdig-2.1.2-0.1.patch.gz. Sorry for any inconvenience.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.2-0.3.patch.gz adds some minor performance
improvement in template handling in MM 2.1.2

You should consider also applying this bug-fis patch:

[ 730769 ] template access hierarchy is broken

http://sourceforge.net/tracker/index.php?
func=detail&aid=730769&group_id=103&atid=100103

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.2-0.3.patch.gz corrects an error in 2 scripts,
mmsearch.py and remote_mmsearch, which caused an
exception if list archives were being accessed via HTTPS and
a search was performed.

Revision history for this message
Ppsys (ppsys) wrote :

last comment should have read:

htdig-2.1.2-0.4.patch.gz corrects an error in 2 scripts,
mmsearch.py and remote_mmsearch, which caused an
exception if list archives were being accessed via HTTPS and
a search was performed.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.3-0.1.patch is a MM 2.1.3 compatible version of
the patch

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.3-0.3.patch.gz add minor private archive security
improvements to the patch for MM 2.1.3

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.3-0.4.patch provides minor improvements in
handling of HTTP request handled by htidg.py which lead to
the user receiving an authentication challenge.

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.3-0.5.patch modifies htdig.py and private.py; the
security changes introduced by htdig-2.1.3-0.2 patch to
these scripts incorrectly blocked access to the
listname.mbox/listname.mbox file. The 0.5 revision of the
patch corrects this error. This problem and a suggested fix
were pointed out to me in a private email by Stephan Berndts
stb-mm at spline.de

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.4-0.1.patch is a MM 2.1.4 compatible version of
the patch

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.5-0.1.patch.gz is a MM 2.1.5 compatible version of the patch

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.6-0.1.patch.gz is a MM 2.1.6 compatible version of
the patch

Revision history for this message
Ppsys (ppsys) wrote : Patch revised for MM 2.1.7 compatibility

Other attachments

Revision history for this message
Ppsys (ppsys) wrote :

Use htdig-2.1.7-0.1.patch.gz for both MM 2.1.7 and MM 2.1.8

Revision history for this message
Ppsys (ppsys) wrote :

htdig-2.1.9-0.1.patch.gz is a MM 2.1.9 compatible version of the
patch

Revision history for this message
Ppsys (ppsys) wrote :

Originator: YES

File Added: htdig-2.1.10-0.1.patch.gz

Revision history for this message
Jean.c.h (slug71) wrote :

Marked this bug as 'Invalid' due to its age and nothing further has been added in a long time. New versions have been released since as well as some underlying stuff in the OS platform itself.

If this bug still affects then please change status back to 'Confirmed'.

Changed in mailman:
status: Confirmed → Invalid
Revision history for this message
Mark Sapiro (msapiro) wrote :

It's not a bug, it's a patch, and it's still relevant.

Changed in mailman:
status: Invalid → Confirmed
Revision history for this message
Mark Sapiro (msapiro) wrote :

More recent versions of this patch can be found at <http://www.openinfo.co.uk/mm/index.html> and <http://www.msapiro.net/mm/>.

To post a comment you must log in.