Comment 6 for bug 317779

Revision history for this message
Martin Pool (mbp) wrote :

You can reproduce this from a Python shell with

>>> trace.note(u"\u68ee".encode("utf-8"))
Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 791, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

just emitting a unicode object does work, and it gets converted according to the error handling of the converter:

>>> trace.note(u"\u68ee")
?

With this patch applied, in a C locale, you cannot log unicode strings, but you can log non-ascii byte strings and they get passed through as such:

In [10]: trace.note(u"\u68ee hello")
Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 791, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

In [11]: trace.note(u"\u68ee hello".encode('utf-8'))
森 hello

I'm not sure that's really an improvement though. I think generally we want the input to the log functions to be unicode strings, and for them to have errors=replace treatment going to the log file or stderr.

Can you tell me more about the context where you hit this?