[master] UnicodeDecodeError in osutils.walkdirs with non-ascii filenames

Bug #488519 reported by Martin Pool
130
This bug affects 22 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Eric Moritz

Bug Description

Originally reported in https://bugs.edge.launchpad.net/bzr/+bug/56947/comments/4 but probably a different bug.

Traceback http://launchpadlibrarian.net/36077911/bzr-20091125204431-4294.crash

Traceback:
 Traceback (most recent call last):
   File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 842, in exception_to_return_code
     return the_callable(*args, **kwargs)
   File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 1037, in run_bzr
     ret = run(*run_argv)
   File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 654, in run_argv_aliases
     return self.run(**all_cmd_args)
   File "/usr/lib/python2.6/dist-packages/bzrlib/builtins.py", line 1713, in run
     possible_transports=[to_transport])
   File "/usr/lib/python2.6/dist-packages/bzrlib/bzrdir.py", line 532, in create_branch_convenience
     bzrdir.create_workingtree()
   File "/usr/lib/python2.6/dist-packages/bzrlib/bzrdir.py", line 1609, in create_workingtree
     accelerator_tree=accelerator_tree, hardlink=hardlink)
   File "/usr/lib/python2.6/dist-packages/bzrlib/workingtree_4.py", line 1465, in initialize
     delta_from_tree=delta_from_tree)
   File "/usr/lib/python2.6/dist-packages/bzrlib/transform.py", line 2150, in build_tree
     delta_from_tree)
   File "/usr/lib/python2.6/dist-packages/bzrlib/transform.py", line 2166, in _build_tree
     for dir, files in wt.walkdirs():
   File "/usr/lib/python2.6/dist-packages/bzrlib/workingtree.py", line 2432, in walkdirs
     current_disk = disk_iterator.next()
   File "/usr/lib/python2.6/dist-packages/bzrlib/osutils.py", line 1392, in walkdirs
     names = sorted(_listdir(top))
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 5: ordinal not in range(128)

See also bug 244360

Tags: unicode
Revision history for this message
Martin Pool (mbp) wrote :

https://bugs.edge.launchpad.net/bzr/+bug/371597 has some analysis.

This bug may not seem very severe but across all the dupes it is commonly hit, and it's confusing for users hitting it.

Changed in bzr:
importance: Medium → High
summary: - UnicodeDecodeError in osutils.walkdirs during build_tree
+ [master] UnicodeDecodeError in osutils.walkdirs with non-ascii filenames
Revision history for this message
Parth Malwankar (parthm) wrote :

Steps to reproduce:

[tmp]% mkdir unierror
[tmp]% cd unierror
[unierror]% touch `printf "\x83"`
[unierror]% bzr init
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0: ordinal not in range(128)

Traceback (most recent call last):
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 911, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 1109, in run_bzr
    ret = run(*run_argv)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 689, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/commands.py", line 704, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/cleanup.py", line 122, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/cleanup.py", line 156, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/builtins.py", line 1741, in run
    possible_transports=[to_transport])
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/bzrdir.py", line 578, in create_branch_convenience
    bzrdir.create_workingtree()
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/bzrdir.py", line 1736, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/workingtree_4.py", line 1474, in initialize
    delta_from_tree=delta_from_tree)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/transform.py", line 2279, in build_tree
    delta_from_tree)
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/transform.py", line 2295, in _build_tree
    for dir, files in wt.walkdirs():
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/workingtree.py", line 2417, in walkdirs
    current_disk = disk_iterator.next()
  File "/storage/parth/src/bzr.dev/trunk/bzrlib/osutils.py", line 1647, in walkdirs
    names = sorted(_listdir(top))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 0: ordinal not in range(128)

You can report this problem to Bazaar's developers by running
    apport-bug /var/crash/bzr.1000.2010-06-03T04:18.crash
if a bug-reporting window does not automatically appear.
[unierror]%

Revision history for this message
Martin Pool (mbp) wrote :

<poolie> 1- in most cases, the filename is valid but the filename encoding is not set properly
<poolie> eg on unix filenames are normally going to be utf-8 but in this case we seem to be detecting it as ascii
<poolie> so that could be about better detection or better defaults or perhaps (last resort) having a configuration for it
<poolie> 2- in some cases the user really may have a file that does not fit with the encoding of their tree
<poolie> and then i guess we should...
<poolie> i wonder
<poolie> i think ideally we would just tolerate it if it was an ignored file

<parthm> poolie: i saw another ticket by you saying we need a config option for bazaar.conf. maybe that should be fixed first?
<parthm> poolie: bug #538925
<ubot5> Launchpad bug 538925 in Bazaar "want configuration option for filesystem encoding (affected: 1, heat: 6)" [Medium,Confirmed] https://launchpad.net/bugs/538925
<poolie> i would do #1 then bug 538925 then #2

Revision history for this message
Eric Moritz (ericmoritz) wrote :

The problem is that the top variable is a unicode object and the values that are returned by _listdir() are regular strings. When sorted is ran over the values, it tries to compare the strings and unicode objects which triggers Python's automatic string decoding mechanism. This using the 'ascii' encoding by default.

A solution would be to map safe_unicode() to each filename returned by _listdir() and raise a more useful error. Something along the lines of "filename "x" is not encoded as utf8".

Revision history for this message
Eric Moritz (ericmoritz) wrote :

After looking at this more, it looks like there is already a errors.BadFilenameEncoding error, so the previous patch is redundant. I'm preparing a more concise fix.

Revision history for this message
Martin Pool (mbp) wrote :
Changed in bzr:
assignee: nobody → Eric Moritz (ericmoritz)
status: Confirmed → In Progress
Martin Pool (mbp)
Changed in bzr:
milestone: none → 2.2.0
Revision history for this message
Eric Moritz (ericmoritz) wrote :

Sorry I fubar'd the original merge request, here's the new merge request
https://code.edge.launchpad.net/~ericmoritz/bzr/488519-walkdirs-encoding-bug/+merge/27006

John A Meinel (jameinel)
Changed in bzr:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.