Comment 3 for bug 508251

Revision history for this message
John A Meinel (jameinel) wrote :

In the case of gnome-panel, this fails during "find_extra_authors". It fails because it iterates the changelog and tries to decode('utf-8') each line, looking for an author.

In the case of gnome-panel, it lists Translators as:

(Pdb) pp changes
['* New upstream version:',

...
 ' Docs Translators:',
 ' - Maxim Dziumanenko (uk)',
 ' Translators:',
 ' - Vital Khilko (be)',
 " - J\xe9r\xe9my Le Floc'h (br)",
 ' - Pema Geyleg (dz)',
 ' - Ivar Smolin (et)',
 ' - Beno\xeet Dejean (fr)',
...

Note that I'm pretty certain this is iso-8859-1 encoding, as '\xe9' => é and '\xee' => î. Not to mention that iso-8859-2 and iso-8859-15 all decode it to the same characters. I guess that means it could be any of them...

Anyway,

#1) These won't match the extra author information anyway, because they aren't in the form [Author Name]. So we could just wait to decode them until after the match is run. The current author regex is:
extra_author_re = re.compile(r"\s*\[([^\]]+)]\s*", re.UNICODE)

Which IIRC, says "leading-space [ anything-but-] ] trailing space".

However, if this sort of data is then brought into the commit log, etc, it is going to fail anyway, when we try to create a Unicode commit message.

#2) Allow the decode to fail, and just assume there isn't an author there.

#3) Fall back to iso-8859-1 as the decoder.