Comment 3 for bug 1202395

Revision history for this message
Cedders (cedric-gn) wrote :

Hi Mark

Thanks for the reply. By the way, it was you who suggested this approach, and I still think you were right back then!

Firstly, according to http://wiki.python.org/moin/DefaultEncoding, sys.getdefaultencoding() is pretty much deprecated and will be removed in Python 3.0 (as you say "Python's default encoding is ascii regardless of locale"). Secondly, I don't think the input to sync_members should be interpreted as a 7-bit message header with possibly RFC 2047 encoding. Thirdly, add_members does not have this problem. Fourthly, if you did escape the non-ASCII characters with base64 or quoted-printable at some point, then these would presumably show up in the command output (and possibly the web interface).

Finally, yes, modifying site.py as you describe does fix both problems (with or without the patch), but in practice are most sysadmins likely to do that? If they fail to modify it, should sync_members crash? And what if for some reason the system locale changes to, eg iso-8859-1? On a site with a UTF-8 encoding, as I unders tand it, all this functionality does is convert from utf-8 to utf-8. There is a per-list encoding, as might be useful on a non-unicode system hosting lists in both ISO-8859-5 and ISO-8859-1, but as far as I can see, the list encoding is not taken into account in the command-line scripts.

I did wonder if assigning
   enc = locale.getdefaultlocale()[1] or locale.getpreferredencoding() or "UTF8"
within the script would help (outputting to correct encoding for console), but it doesn't; as you say it's the implied decode on the output of formataddr and join that is not seen as a Unicode string. Logically perhaps it should first be decoded from the input encoding and re-encoded as enc, the expected encoding in the system locale; but that's equivalent to doing nothing.

If the defaultencoding approach were to be implemented in Python in future in a way that doesn't cause this problem (beyond being applied in concatenation and join), then encoding the strings from (for example) an ISO-8859-5 to give legible output on a UTF-8 console would be the way to go. But it doesn't look to me like that is the way the wind is blowing.

Hope this makes sense.