Logging with a non-ascii string raises UnicodeError in the standard logging package

Bug #317779 reported by philyoon
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned

Bug Description

This is due to the fact that logging.StreamHandler.emit method tries to encode the input with the 'encoding' attribute of the stream. I think you should not use StreamWriter as a file-like object, since StreamWriter isn't intended to accept str inputs. The fix for this issue follows:

=== modified file 'bzrlib/trace.py'
--- bzrlib/trace.py 2009-01-08 17:53:46 +0000
+++ bzrlib/trace.py 2009-01-16 07:39:12 +0000
@@ -269,9 +269,7 @@
         r'%Y-%m-%d %H:%M:%S')
     # after hooking output into bzr_log, we also need to attach a stderr
     # handler, writing only at level info and with encoding
- writer_factory = codecs.getwriter(term_encoding)
- encoded_stderr = writer_factory(sys.stderr, errors='replace')
- stderr_handler = logging.StreamHandler(encoded_stderr)
+ stderr_handler = logging.StreamHandler(sys.stderr)
     stderr_handler.setLevel(logging.INFO)
     logging.getLogger('bzr').addHandler(stderr_handler)

Revision history for this message
philyoon (philyoon) wrote :

Whitespaces in the description seem to be merged. Attaching the diff file.

Revision history for this message
methane (songofacandy) wrote :

It may be happen because non-ascii byte string was logged.
All of non-ascii string should be unicode str.

Revision history for this message
Martin Pool (mbp) wrote :

I'm trying to get rid of this use of logging in https://code.edge.launchpad.net/~mbp/bzr/remove-logging so this bug should be taken into account there, and vice versa.

Revision history for this message
John A Meinel (jameinel) wrote :

Martin, do you have an idea of the status of this bug now that some of your logging changes have landed?

Changed in bzr:
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Martin Pool (mbp) wrote :

I wouldn't assume it's fixed because not all of that work has landed yet. We do still using logging.

Changed in bzr:
status: Incomplete → Confirmed
tags: added: logging trace unicode
Martin Pool (mbp)
Changed in bzr:
assignee: nobody → Martin Pool (mbp)
status: Confirmed → In Progress
Revision history for this message
Martin Pool (mbp) wrote :

You can reproduce this from a Python shell with

>>> trace.note(u"\u68ee".encode("utf-8"))
Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 791, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

just emitting a unicode object does work, and it gets converted according to the error handling of the converter:

>>> trace.note(u"\u68ee")
?

With this patch applied, in a C locale, you cannot log unicode strings, but you can log non-ascii byte strings and they get passed through as such:

In [10]: trace.note(u"\u68ee hello")
Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 791, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

In [11]: trace.note(u"\u68ee hello".encode('utf-8'))
森 hello

I'm not sure that's really an improvement though. I think generally we want the input to the log functions to be unicode strings, and for them to have errors=replace treatment going to the log file or stderr.

Can you tell me more about the context where you hit this?

Changed in bzr:
assignee: Martin Pool (mbp) → nobody
status: In Progress → Confirmed
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.