mailman corrupts RFC2047-encoded headers

Bug #266375 reported by Dwmw2-users
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
New
Medium
Unassigned

Bug Description

Given an input like this:

Subject:
=?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,=20fix=20ECC=
20on=20reading=20empty=20flash?=

Mailman appears to emit mail like this:

Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,
        =20fix=20ECC=20on=20reading=20empty=20flash?=

The input was RFC2047-compliant. The output isn't.

[http://sourceforge.net/tracker/index.php?func=detail&aid=1605144&group_id=103&atid=100103]

Revision history for this message
Saturn-de (saturn-de) wrote :

Originator: NO

http://www.ietf.org/rfc/rfc2047.txt

   An 'encoded-word' may not be more than 75 characters long, including
   'charset', 'encoding', 'encoded-text', and delimiters. If it is
   desirable to encode more text than will fit in an 'encoded-word' of
   75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
   be used.

Revision history for this message
Dwmw2-users (dwmw2-users) wrote :

Originator: YES

Hm, good point; thanks. I've fixed the script which generates mail for
each commit to the Linux kernel git tree, and it should no longer generate
encoded-words longer than 75 characters.

I still see this input...

Subject:
=?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,_fix_ECC_on_reading?=
        =?UTF-8?Q?_empty_flash?=

and this output...

Subject: =?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,
        _fix_ECC_on_reading?= =?UTF-8?Q?_empty_flash?=

The comma is allowed, and doesn't have to be '=2C', does it? See §4.2 (3)
and §5 (1).

Revision history for this message
Saturn-de (saturn-de) wrote :

Originator: NO

Hmm, I don't thin "," is allowed unencoded...

   token = 1*<Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

Revision history for this message
Dwmw2-users (dwmw2-users) wrote :

Originator: YES

That's only for the charset (UTF-8) and the encoding (Q). The comma
appears in the encoded-text, and should be fine (since this is a Subject:
header and hence comes under paragraph (1) of §5.

  encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token ; see section 3

   encoding = token ; see section 4

   token = 1*<Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

   encoded-text = 1*<Any printable ASCII character other than "?"
                     or SPACE>
                  ; (but see "Use of encoded-words in message
                  ; headers", section 5)

Revision history for this message
Saturn-de (saturn-de) wrote :

Originator: NO

Hmm, my thunderbird encodes "," with =2C

Revision history for this message
Dwmw2-users (dwmw2-users) wrote :

Originator: YES

Your thunderbird also refuses to send this:

    To: Some people : ;
    Bcc: <email address hidden>, <email address hidden>

Thunderbird isn't necessarily the best test of what's valid :)

The pertinent question is: why is mailman munging this _anyway_? Why can't
it just pass the header through as it was originally sent? If I put line
breaks in and lined things up sensibly like a SpamAssassin report does, why
should that be mangled by mailman?

Revision history for this message
Tokio Kikuchi (tkikuchi) wrote :

Originator: NO

This is derived from the python email module behavior that try to keep a
header line within 78 characters. Mailman parses the message first and do
something like adding subject prefix or message body header/footer then
regenerate RFC-2822 message. Email module thinks your subject has two part
structure separated by a comma and split by CRLF. I am not very sure but
current version of email doesn't distinguish structured and unstructured
headers defined in 2.2.1 and 2.2.2 of RFC-2822. Anyway, It is safer to
shorten the header lines within 78 charcters.

FYI, email module generates your subject header like this:
Subject:
=?utf-8?q?=5BMTD=5D_NAND=3A_CAF=C3=89_NAND_driver_cleanup=2C_fix?=
 =?utf-8?q?_ECC_on_reading_empty_flash?=

Revision history for this message
Chrissamuel (chrissamuel) wrote :

Originator: NO

You can disable header wrapping in the module (if I am looking at the
correct Python docs) according to this page:

http://docs.python.org/lib/module-email.generator.html

It implies that by passing through maxheaderlen set to 0 to all calls of
Generator then you shouldn't get this wrapping behaviour, though I don't
know when this appeared in Python.

I believe this may also be the the cause of Mailman breaking my PGP/MIME
messages as diff'ing the saved original and the version that comes back
shows that the only differences are for long MIME headers and for
reformatting of the headers in the message/rfc822 attached email.

I am not sure if this is related to 815297, but it sure looks like it.

Caveat: I am not a Python programmer, just a Postmaster..

Revision history for this message
A.M. Kuchling (amk) wrote :

One possible fix is to parse the subject header's content by turning it into a Header object. This seems to work for simply sending a message through, but I don't know if every use of the Subject header in Mailman works correctly if the value is a Header object instead of a string.

It does seem like a bug in the underlying email package, though; I've filed http://bugs.python.org/issue8769 in the Python bug tracker.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.