dirstate updating fails if there are symlinks and non-ascii filenames

Bug #135320 reported by Matthias Müller-Reineke
28
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
John A Meinel

Bug Description

The following occured with my branches. Many files and directories were added, moved, removed and renamed in the branches (gv-allgemein was removed from this branch but nor from the merge source):

bzr merge sftp://vvw@localhost/u/users/vvw/bzr-rep/vws/
+N .bzrignore.OTHER
+N gv-allgemein/
+N gv-allgemein/c-include/
+N gv-allgemein/c-include/sqlnet.log
+N gv-allgemein/c-source/
+N gv-allgemein/c-source/spver_std_ini.c.OTHER
+N gv-allgemein/pc-include/
+N gv-allgemein/pc-include/sp011.c2h.OTHER
+N gv-allgemein/pc-include/sp011.col.OTHER
+N gv-allgemein/pc-include/sp011.d2h.OTHER
+N gv-allgemein/pc-include/sp011.dat.OTHER
...
+N src/batch/pdf/AN-Haus-und-Grund-HVK.pdf
+N src/batch/pdf/AN-Wohn-Mehr-HVK.pdf
+N src/batch/pdf/HGRUNDHAFT_2007.pdf
...
 M src/tech/allgemein/modus.kon
 M src/tech/allgemein/status.kon
Contents conflict in .bzrignore
Conflict adding files to gv-allgemein. Created directory.
Conflict because gv-allgemein is not versioned, but has versioned children. Versioned directory.
Conflict adding files to gv-allgemein/c-include. Created directory.
Conflict because gv-allgemein/c-include is not versioned, but has versioned children. Versioned directory.
Conflict adding files to gv-allgemein/c-source. Created directory.
Conflict because gv-allgemein/c-source is not versioned, but has versioned children. Versioned directory.
Contents conflict in gv-allgemein/c-source/spver_std_ini.c
Conflict adding files to gv-allgemein/pc-include. Created directory.
Conflict because gv-allgemein/pc-include is not versioned, but has versioned children. Versioned directory.
Contents conflict in gv-allgemein/pc-include/sp011.c2h
Contents conflict in gv-allgemein/pc-include/sp011.col
...
Text conflict in src/geschaeft/allgemein/vws_beilage_ms.ext
Text conflict in src/geschaeft/db/brdat_gt.pc
75 conflicts encountered.
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 29: ordinal not in range(128)

Traceback (most recent call last):
  File "/u/users/mueller/lib/python/bzrlib/commands.py", line 817, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/u/users/mueller/lib/python/bzrlib/commands.py", line 779, in run_bzr
    ret = run(*run_argv)
  File "/u/users/mueller/lib/python/bzrlib/commands.py", line 477, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/u/users/mueller/lib/python/bzrlib/builtins.py", line 2770, in run
    cleanup()
  File "/u/users/mueller/lib/python/bzrlib/workingtree_4.py", line 1125, in unlock
    self.flush()
  File "/u/users/mueller/lib/python/bzrlib/workingtree_4.py", line 312, in flush
    self.current_dirstate().save()
  File "/u/users/mueller/lib/python/bzrlib/dirstate.py", line 1639, in save
    self._state_file.writelines(self.get_lines())
  File "/u/users/mueller/lib/python/bzrlib/dirstate.py", line 1228, in get_lines
    return self._get_output_lines(lines)
  File "/u/users/mueller/lib/python/bzrlib/dirstate.py", line 1521, in _get_output_lines
    inventory_text = '\0\n\0'.join(lines)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 29: ordinal not in range(128)

bzr 0.90.0candidate0 on python 2.5.0.final.0 (linux2)
arguments: ['/u/users/mueller/bin/bzr', 'merge', 'sftp://vvw@localhost/u/users/vvw/bzr-rep/vws/']

** please send this report to <email address hidden>
/u/users/mueller/lib/python/bzrlib/lockable_files.py:110: UserWarning: file group LockableFiles(<bzrlib.transport.local.LocalTransport url=file:///u/users/mueller/bzr_diagnose/C%2B%2B/.bzr/checkout/>) was not explicitly unlocked
  warn("file group %r was not explicitly unlocked" % self)
/u/users/mueller/lib/python/bzrlib/lock.py:79: UserWarning: lock on <open file u'/u/users/mueller/bzr_diagnose/C++/.bzr/checkout/dirstate', mode 'rb+' at 0x40642f98> not released
  warn("lock on %r not released" % self.f)

Related branches

Revision history for this message
Edmundo (eantoranz) wrote :
Download full text (3.5 KiB)

I got the same (or similar) problem yesterday when merging changes from my development branch to my stable branch:

$ bzr merge /home/antoranz/bus/eclipse
+N doc/Cities/Bogotá/
+N doc/Cities/Bogotá/TransMilenio A.png
+N doc/Cities/Bogotá/TransMilenio B.png
+N doc/Cities/Bogotá/TransMilenio C.png
+N doc/Cities/Bogotá/TransMilenio D.png
+N doc/Cities/Bogotá/city.xml
 M doc/Cities/Maracaibo/city.xml
 M web/server/includes/city/ProcessedCity.php
 M web/server/js/public_transport.js.php
 M web/server/seeloadedcity.php
All changes applied successfully.
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 802, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 758, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 2879, in run
    cleanup()
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree_4.py", line 1117, in unlock
    self.flush()
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree_4.py", line 296, in flush
    self.current_dirstate().save()
  File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 1974, in save
    self._state_file.writelines(self.get_lines())
  File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 1506, in get_lines
    return self._get_output_lines(lines)
  File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 1842, in _get_output_lines
    inventory_text = '\0\n\0'.join(lines)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

bzr 1.0.0 on python 2.5.1.final.0 (linux2)
arguments: ['/usr/bin/bzr', 'merge', '/home/antoranz/bus/eclipse']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'es_CO.UTF-8'
plugins:
  gtk /usr/lib/python2.5/site-packages/bzrlib/plugins/gtk [0.93.0]
  launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/launchpad [unknown]
  multiparent /usr/lib/python2.5/site-packages/bzrlib/plugins/multiparent.pyc [unknown]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.
/usr/lib/python2.5/site-packages/bzrlib/lockable_files.py:110: UserWarning: file group LockableFiles(<bzrlib.transport.local.LocalTransport url=file:///home/antoranz/bus/stable/.bzr/checkout/>) was not explicitly unlocked
  warn("file group %r was not explicitly unlocked" % self)
/usr/lib/python2.5/site-packages/bzrlib/lock.py:79: UserWarning: lock on <open file u'/home/antoranz/bus/stable/.bzr/checkout/dirstate', mode 'rb+' at 0x86ab578> not released
  warn("lock on %r not released" % self.f)

$ bzr --version
Bazaar (bzr) 1.0.0
  Python interpreter: /usr/bin/python 2.5.1.final.0
  Python standard library: /usr/lib/python2.5
  bzrlib: /usr/l...

Read more...

Revision history for this message
Matthias Müller-Reineke (matthias-mueller-reineke) wrote :

I've posted a merge request with a patch which seems to solve the bug on <email address hidden> .
Unfortunately there was no reply until now.

The following patch seems to solve the bug.
Unfortunately I don't understand enough about Bazzaar
* to write a test which reproduces the bug
* to write a test of this patch
Dear Edmundo, do you have a clue?

=== modified file 'bzrlib/dirstate.py'
--- bzrlib/dirstate.py 2007-12-19 08:12:34 +0000
+++ bzrlib/dirstate.py 2008-02-11 10:41:12 +0000
@@ -1839,7 +1839,7 @@
         """
         output_lines = [DirState.HEADER_FORMAT_3]
         lines.append('') # a final newline
- inventory_text = '\0\n\0'.join(lines)
+ inventory_text = '\0\n\0'.join(map(osutils.safe_utf8, lines))
         output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
         # -3, 1 for num parents, 1 for ghosts, 1 for final newline
         num_entries = len(lines)-3

Revision history for this message
Edmundo (eantoranz) wrote : Re: [Bug 135320] Re: bzr merge - exceptions.UnicodeDecodeError

Man... I'm absolutely clueless... what bothers me the most right now
is that after trying this and that with bazaar (including deleting my
development branch without a back up, I lost some of the work I had
made recently :-S). I'll just have to redo that... I promise I won't
be using non-ascii characters for filenames (or directories... at
least for a while).

Revision history for this message
Matthias Müller-Reineke (matthias-mueller-reineke) wrote : Re: bzr merge - exceptions.UnicodeDecodeError

This bug is the reason why we are staying at bzr 0.17 .

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 135320] bzr merge - exceptions.UnicodeDecodeError
Download full text (5.4 KiB)

I think the encoding needs to be done earlier in the code that builds
up the dirstate; it's probably incorrect to do it here.

On 4/3/08, Matthias Müller-Reineke
<email address hidden> wrote:
> I've posted a merge request with a patch which seems to solve the bug on
> <email address hidden> .
> Unfortunately there was no reply until now.
>
> The following patch seems to solve the bug.
> Unfortunately I don't understand enough about Bazzaar
> * to write a test which reproduces the bug
> * to write a test of this patch
> Dear Edmundo, do you have a clue?
>
> === modified file 'bzrlib/dirstate.py'
> --- bzrlib/dirstate.py 2007-12-19 08:12:34 +0000
> +++ bzrlib/dirstate.py 2008-02-11 10:41:12 +0000
> @@ -1839,7 +1839,7 @@
> """
> output_lines = [DirState.HEADER_FORMAT_3]
> lines.append('') # a final newline
> - inventory_text = '\0\n\0'.join(lines)
> + inventory_text = '\0\n\0'.join(map(osutils.safe_utf8, lines))
> output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
> # -3, 1 for num parents, 1 for ghosts, 1 for final newline
> num_entries = len(lines)-3
>
> --
> bzr merge - exceptions.UnicodeDecodeError
> https://bugs.launchpad.net/bugs/135320
> You received this bug notification because you are a member of Bazaar
> Developers, which is the registrant for Bazaar.
>
> Status in Bazaar Version Control System: New
>
> Bug description:
> The following occured with my branches. Many files and directories were
> added, moved, removed and renamed in the branches (gv-allgemein was removed
> from this branch but nor from the merge source):
>
> bzr merge sftp://vvw@localhost/u/users/vvw/bzr-rep/vws/
> +N .bzrignore.OTHER
>
> +N gv-allgemein/
> +N gv-allgemein/c-include/
> +N gv-allgemein/c-include/sqlnet.log
> +N gv-allgemein/c-source/
> +N gv-allgemein/c-source/spver_std_ini.c.OTHER
> +N gv-allgemein/pc-include/
> +N gv-allgemein/pc-include/sp011.c2h.OTHER
> +N gv-allgemein/pc-include/sp011.col.OTHER
> +N gv-allgemein/pc-include/sp011.d2h.OTHER
> +N gv-allgemein/pc-include/sp011.dat.OTHER
> ...
> +N src/batch/pdf/AN-Haus-und-Grund-HVK.pdf
> +N src/batch/pdf/AN-Wohn-Mehr-HVK.pdf
> +N src/batch/pdf/HGRUNDHAFT_2007.pdf
> ...
> M src/tech/allgemein/modus.kon
> M src/tech/allgemein/status.kon
> Contents conflict in .bzrignore
> Conflict adding files to gv-allgemein. Created directory.
> Conflict because gv-allgemein is not versioned, but has versioned children.
> Versioned directory.
> Conflict adding files to gv-allgemein/c-include. Created directory.
> Conflict because gv-allgemein/c-include is not versioned, but has versioned
> children. Versioned directory.
> Conflict adding files to gv-allgemein/c-source. Created directory.
> Conflict because gv-allgemein/c-source is not versioned, but has versioned
> children. Versioned directory.
> Contents conflict in gv-allgemein/c-source/spver_std_ini.c
> Conflict adding files to gv-allgemein/pc-include. Created directory.
> Conflict because gv-allgemein/pc-include is not versioned, but has versioned
> children. Versioned directory.
> Contents conflict in gv-allgemein/pc-include/sp011.c2h
...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote : Re: bzr merge - exceptions.UnicodeDecodeError

I can't reproduce this bug locally by merging two simple branches with non-ascii filenames. Can someone provide a set of commands to reproduce?

(Just in case it matters, I have an Ubuntu Hardy laptop, and my $LANG is en_AU.UTF-8., but I see the most recent traceback on this bug is also from a *nix system using UTF-8.)

Revision history for this message
Edmundo (eantoranz) wrote : Re: [Bug 135320] Re: bzr merge - exceptions.UnicodeDecodeError

This is what I was able to make in a simple attempt (not exactly what
happend before). See what you can make out of it:

$ bzr version
Bazaar (bzr) 1.3.1rc1
  Python interpreter: /usr/bin/python 2.5.1.final.0
  Python standard library: /usr/lib/python2.5
  bzrlib: /usr/lib/python2.5/site-packages/bzrlib
  Bazaar configuration: /home/antoranz/.bazaar
  Bazaar log file: /home/antoranz/.bzr.log

Copyright 2005, 2006, 2007, 2008 Canonical Ltd.
http://bazaar-vcs.org/

bzr comes with ABSOLUTELY NO WARRANTY. bzr is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.

$ bzr status
$ bzr status -v
$ bzr branch . ../test
Branched 26 revision(s).
$ cd ../test/
$ bzr status
$ bzr mv doc/Cities/Bogota/ doc/Cities/Bogotá
doc/Cities/Bogota => doc/Cities/Bogotá
$ bzr status
renamed:
  doc/Cities/Bogota => doc/Cities/Bogotá
$ bzr commit -m "A"
Committing to: /home/antoranz/bus/test/
renamed doc/Cities/Bogota => doc/Cities/Bogotá
Committed revision 27.
$ bzr status
$ bzr mv doc/Cities/Bogotá/ doc/Cities/Bogota
doc/Cities/Bogotá => doc/Cities/Bogota
$ bzr status
renamed:
  doc/Cities/Bogotá => doc/Cities/Bogota
$ bzr commit -m "B"
Committing to: /home/antoranz/bus/test/
renamed doc/Cities/Bogotá => doc/Cities/Bogota
bzr: ERROR: An inconsistent delta was supplied involving
'doc/Cities/Bogota\xa1/AV Villas.png',
'avvillas.png-20080408194113-wt267gqqk9qfq7jt-1'
reason: working tree does not contain new entry
$ bzr status
working tree is out of date, run 'bzr update'
renamed:
  doc/Cities/Bogotá => doc/Cities/Bogota
$ bzr update
All changes applied successfully.
Updated to revision 28.
$ bzr status
$ ls doc/Cities/
Bogota compress.sh Maracaibo
$ bzr check
checked branch file:///home/antoranz/bus/test/ format Bazaar Branch
Format 6 (bzr 0.15)
checked repository <bzrlib.transport.local.LocalTransport
url=file:///home/antoranz/bus/test/> format
<RepositoryFormatKnitPack1>
   187 revisions
   123 file-ids
   534 unique file texts
  7936 repeated file texts
     0 unreferenced text versions

Revision history for this message
Matthias Müller-Reineke (matthias-mueller-reineke) wrote : Re: bzr merge - exceptions.UnicodeDecodeError

Dear bzr programmers,

in my branch where the error occurs:

mueller@gvdev(13) > bzr revno
220

mueller@gvdev(13) > find * -type f|wc -l
5454
# '* ' prevents us from countig the content of .bzr

It is from a real system with many files and may revisions. Do I need many files or many revisions to reproduce the error?

Revision history for this message
Matthias Müller-Reineke (matthias-mueller-reineke) wrote :

(The following is copied from the dupplicate of this bug (#189246).)

The "print-debuger" showed that there

File "/usr/local/Python-2.5/lib/python2.5/site-packages/bzrlib/dirstate.py",
in _get_output_line
...
    inventory_text = '\0\n\0'.join(lines)
...

lines contained a mixture of strings and unicode objects. There were strings with utf-8 encoded file names. I discovered the utf-8 code of an german umlaut ('ä') which was part of a name of my files in a simple string.

This insight made me propose the path mentioned above.

Revision history for this message
John A Meinel (jameinel) wrote :

Do you have any idea what entries were Unicode?

Revision history for this message
codeslinger (codeslinger) wrote :

see Bug #187267 for a good and simple repro

I think the focus on merge is a red herring. it happened to me when creating a brand new repository. AFAICT any operation on a file name with an extended char will trigger this.

me repro steps are:

cd /usr/share
bzr init
bzr add

Result similar to above:
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in p

This was on a gentoo system.
Portage 2.1.4.4 (default-linux/x86/2007.0, gcc-4.1.2, glibc-2.5-r4, 2.6.16.29-11774_3 i686)
Linux: 2.6.16.29-11774_3 i686 Intel

perhaps it is trying to interpret characters as double byte when in fact it should be treating them as single byte special characters such as um lattes.

Revision history for this message
codeslinger (codeslinger) wrote :

people working on this bug should also consider Bug #3918 these bugs have much in common and are probably related.

special characters ought to be allowed in file names, however misguided they may be, they do exist in common user scenarios.

consider for instance a Japanese freeware icon editor, which although published in English, it has a file with Japanese characters in the name... but this is on a windows system which sees those as being single byte special characters, and being accessed on Linux via FUSE. Suddenly you have a mess. It is not okay to arbitrarily disregard characters however strange they may appear to be. Bottom line is that if the file system allows it to exist then bzr ought to support/preserve it.

Revision history for this message
codeslinger (codeslinger) wrote :

a lot of these dups are saying that it's a python bug.

FYI: in the above repro (bzr add) my version of Python is

dev-lang/python: 2.4.4-r5
dev-python/pycrypto: 2.0.1-r6

well, hopefully there is an easy work-around for python because installing updates is a pretty big deal and does not happen without a lot of time effort and testing, we need to keep our systems stable.

Revision history for this message
codeslinger (codeslinger) wrote :

there are only about a zillion dups of this bug, clearly it is a very popular accessory :-)

in Bug #59968

David R Dick wrote:

This root cause of this problem is that we store this data in XML, and
XML1.0 doesn't permit most ASCII control codes.

-----------------

Well, if that's all it is, what's wrong with escaping them?

Revision history for this message
codeslinger (codeslinger) wrote :

somebody is doing some code changes for this, in Bug #77657

also a correction to the above, it was Aaron who wrote that xml comment in response to David.

Revision history for this message
John A Meinel (jameinel) wrote :

@codeslinger

there are actually several different bugs, they just happen to provoke a similar error.

@Matthias

I didn't make it clear before, but *all* entries should be UTF-8 encoded. If there is a Unicode object, then there is a bug in the 'lines' being generated. Which is why I asked which ones were Unicode.
I'm trying to reproduce it here, but I have failed so far. I *have* been able to get the:Inconsistent delta" bug to trigger, but that was unable to give me a the UnicodeDecodeError problem that you are talking about.

I would certainly prefer it if we could reproduce this without your 5000 files, I just haven't been able to reproduce it.

Revision history for this message
John A Meinel (jameinel) wrote :

You might try a patch like this:
=== modified file 'bzrlib/dirstate.py'
--- bzrlib/dirstate.py 2008-03-06 18:21:43 +0000
+++ bzrlib/dirstate.py 2008-04-25 15:13:52 +0000
@@ -1585,6 +1585,11 @@
         lines.append(self._get_ghosts_line(self._ghosts))
         # append the root line which is special cased
         lines.extend(map(self._entry_to_line, self._iter_entries()))
+ for line in lines:
+ if isinstance(line, unicode):
+ import pdb; pdb.set_trace()
+ raise ValueError('nothing in "lines" should be unicode'
+ ': %r' % (line,))
         return self._get_output_lines(lines)

     def _get_ghosts_line(self, ghost_ids):

If it encounters a unicode string, it should drop you into the python debugger, at that point, if you can just give me the output of:

(pdb) pp line

It might be enough for me to track this down.

Revision history for this message
Matthias Müller-Reineke (matthias-mueller-reineke) wrote :

(Pdb) pp line
u'\x00GV-Sach-Dialoge\x00gvsachdialoge-20080214115823-10hc02e4k4mpk3zu-1\x00a\x00\x000\x00n\x00\x00l\x00startlinks\x000\x00n\x00dierks@gvdev-20080214120028-rd3suxtre8rz96mf\x00a\x00\x000\x00n\x00'
(Pdb)

Revision history for this message
John A Meinel (jameinel) wrote :

It looks like it is a problem with a symlink. Specifically this line seems to indicate that the file is absent in the current tree, but was a link in the base tree, and was absent in the merged tree.

I *don't* know why the symlink is given as unicode rather than utf-8. I suppose if this was being generated from an Inventory Entry...

When adding an entry, we clearly just do 'os.readlink()' which returns an 8-bit string (most likely fs encoded.)

I'm wondering if it has to do with one of the 'update_by_delta' functions.

I would guess the bug is in:

def _inv_entry_to_details(self, inv_entry):
    """Convert an inventory entry (from a revision tree) to state details.

    :param inv_entry: An inventory entry whose sha1 and link targets can be
        relied upon, and which has a revision set.
    :return: A details tuple - the details for a single tree at a path +
        id.
    """
    kind = inv_entry.kind
    minikind = DirState._kind_to_minikind[kind]
    tree_data = inv_entry.revision
    if kind == 'directory':
        fingerprint = ''
        size = 0
        executable = False
    elif kind == 'symlink':
        fingerprint = inv_entry.symlink_target or '' # <---- here
        size = 0
        executable = False
    elif kind == 'file':
        fingerprint = inv_entry.text_sha1 or ''
        size = inv_entry.text_size or 0
        executable = inv_entry.executable
    elif kind == 'tree-reference':
        fingerprint = inv_entry.reference_revision or ''
        size = 0
        executable = False
    else:
        raise Exception("can't pack %s" % inv_entry)
    return (minikind, fingerprint, size, executable, tree_data)

Specifically, inv_entry.symlink_target is probably a unicode string, and we need to be encoding it into something else.
Can you try the attached patch?

The reason we didn't notice is because it only trigger if you

1) Have a symlink and
2) Have non-ascii characters in your tree

Revision history for this message
Matthias Müller-Reineke (matthias-mueller-reineke) wrote :

Dear John,

your patch (applied to the newest http://bazaar-vcs.org/bzr/bzr.dev/) makes the UnicodeDecodeError disappear.

Revision history for this message
John A Meinel (jameinel) wrote :

Now the question is what to do if inv_entry.symlink_target is genuinely a Unicode string (with non-ascii characters).
In 'WT4._generate_inventory' we do:
elif kind == 'symlink':
    inv_entry.executable = False
    inv_entry.text_size = None
    inv_entry.symlink_target = utf8_decode(fingerprint)[0]

So I'm guessing we should do something like:

    fingerprint = inv_entry.symlink_target
    if isinstance(fingerprint, unicode):
      fingerprint = fingerprint.encode('UTF-8')
    elif fingerprint is None:
      fingerprint = ''

Come to think of it, I *think* inv_entry.symlink_target should actually be Unicode all the time. (I'm not positive, though, so we might want some asserts to check the behavior.)

John A Meinel (jameinel)
Changed in bzr:
importance: Undecided → Medium
status: New → Triaged
John A Meinel (jameinel)
Changed in bzr:
status: Triaged → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :

I think this will make it into 1.6, but it might slip to 1.7 depending on how much time people have to do the review.

Changed in bzr:
assignee: nobody → jameinel
milestone: none → 1.6
Revision history for this message
Edmundo (eantoranz) wrote :
Download full text (7.2 KiB)

Don't know if this is related to the same kind of problem. I'm not using filenames with UTF-8 characters, so I don't think it's directly related.... but anyway, here it goes. From .bzr.log:

0.123 encoding stdout as sys.stdout encoding 'UTF-8'
0.124 bzr arguments: [u'merge', u'--remember', u'/home/antoranz/eclipse/workspace/interfaces/']
0.124 looking for plugins in /home/antoranz/.bazaar/plugins
0.124 looking for plugins in /usr/lib/python2.5/site-packages/bzrlib/plugins
0.125 Plugin name __init__ already loaded
0.125 Plugin name __init__ already loaded
0.247 opening working tree '/home/antoranz/RSN/interfaces/stable'
0.284 Using fetch logic to copy between KnitPackRepository('file:///home/antoranz/eclipse/workspace/interfaces/.bzr/repository/')(<RepositoryFormatKnitPack1>) and KnitPackReposit
ory('file:///home/antoranz/RSN/interfaces/stable/.bzr/repository/')(<RepositoryFormatKnitPack1>)
[ 1019] 2008-06-19 15:01:29.015 INFO: +N docs/interfaces/contables/edc/20070630COZ900GENEDC101.FT
[ 1019] 2008-06-19 15:01:29.015 INFO: +N docs/interfaces/contables/edc/DF-FINANZAS-VBATCH-CARGUE_EDC.xls
[ 1019] 2008-06-19 15:01:29.016 INFO: +N docs/interfaces/contables/edc/th_edc01.4gl
[ 1019] 2008-06-19 15:01:29.016 INFO: +N docs/interfaces/topsoft/12df/20070316COZ901MDGIPV201.FT
[ 1019] 2008-06-19 15:01:29.016 INFO: +N docs/interfaces/topsoft/12df/th_pv03.4gl
[ 1019] 2008-06-19 15:01:29.016 INFO: +N docs/interfaces/topsoft/15df/Compras.xls
[ 1019] 2008-06-19 15:01:29.017 INFO: +N docs/interfaces/topsoft/16df/Importaciones.xls
[ 1019] 2008-06-19 15:01:29.017 INFO: +N docs/interfaces/topsoft/ts03/Cesion.xls
[ 1019] 2008-06-19 15:01:29.017 INFO: +N interfaces/clases/Interfaz_12DF.php
[ 1019] 2008-06-19 15:01:29.017 INFO: +N interfaces/clases/Interfaz_TS04.php
[ 1019] 2008-06-19 15:01:29.018 INFO: RM* docs/interfaces/contables/13df/13DF-FINANZAS-VENTANA_BATCH-PAGOS_EFECTUADOS ok.xls => docs/interfaces/contables/13df/DF-FINANZAS-VENTANA_
BATCH-PAGOS_EFECTUADOS.xls
[ 1019] 2008-06-19 15:01:29.018 INFO: M docs/interfaces/contables/edc/edc.sql
[ 1019] 2008-06-19 15:01:29.018 INFO: RM* docs/interfaces/topsoft/12df/12DF-FINANZAS-VENTANA_BATCH-RAPELES_APORTACIONES ok.xls => docs/interfaces/topsoft/12df/12DF-FINANZAS-VENTAN
A_BATCH-RAPELES_APORTACIONES.xls
[ 1019] 2008-06-19 15:01:29.019 INFO: M docs/interfaces/topsoft/12df/12df.sql
[ 1019] 2008-06-19 15:01:29.019 INFO: M docs/interfaces/topsoft/15df/15df.sql
[ 1019] 2008-06-19 15:01:29.019 INFO: M docs/interfaces/topsoft/16df/16df.sql
[ 1019] 2008-06-19 15:01:29.019 INFO: M docs/interfaces/topsoft/ts03/ts03.sql
[ 1019] 2008-06-19 15:01:29.020 INFO: M docs/interfaces/topsoft/ts04/carr54.php
[ 1019] 2008-06-19 15:01:29.020 INFO: M docs/interfaces/topsoft/ts04/ts04.sql
[ 1019] 2008-06-19 15:01:29.020 INFO: M docs/status_interfaces.txt
[ 1019] 2008-06-19 15:01:29.021 INFO: M interfaces/Comprobante.php
[ 1019] 2008-06-19 15:01:29.021 INFO: M interfaces/Contabilidad.php

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

0.722 return code 4

This was the output from the merge:
bzr merge --remember /home/antoranz/eclipse/workspace/inter...

Read more...

Revision history for this message
Edmundo (eantoranz) wrote :
Download full text (5.6 KiB)

It's not working very well. I was having problems with the stable branch, so I decided to recreated the stable branch from the development one. I just tried to create a checkout of the development repository and I get the error:
bzr checkout /home/antoranz/eclipse/workspace/interfaces stable/
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 1024, in run
    accelerator_tree, hardlink)
  File "/usr/lib/python2.5/site-packages/bzrlib/branch.py", line 768, in create_checkout
    hardlink=hardlink)
  File "/usr/lib/python2.5/site-packages/bzrlib/bzrdir.py", line 1260, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree_4.py", line 1367, in initialize
    wt.unlock()
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree_4.py", line 1131, in unlock
    self.flush()
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree_4.py", line 291, in flush
    self.current_dirstate().save()
  File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 2066, in save
    self._state_file.writelines(self.get_lines())
  File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 1591, in get_lines
    return self._get_output_lines(lines)
  File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 1928, in _get_output_lines
    inventory_text = '\0\n\0'.join(lines)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

bzr 1.5 on python 2.5.2 (linux2)
arguments: ['/usr/bin/bzr', 'checkout', '/home/antoranz/eclipse/workspace/interfaces', 'stable/']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'es_CO.UTF-8'
plugins:
  gtk /usr/lib/python2.5/site-packages/bzrlib/plugins/gtk [0.93.0]
  launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/launchpad [unknown]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

From the log:
0.137 encoding stdout as sys.stdout encoding 'UTF-8'
0.138 bzr arguments: [u'checkout', u'/home/antoranz/eclipse/workspace/interfaces', u'stable/']
0.138 looking for plugins in /home/antoranz/.bazaar/plugins
0.138 looking for plugins in /usr/lib/python2.5/site-packages/bzrlib/plugins
0.138 Plugin name __init__ already loaded
0.138 Plugin name __init__ already loaded
0.160 encoding stdout as sys.stdout encoding 'UTF-8'
0.258 opening working tree '/home/antoranz/eclipse/workspace/interfaces'
0.271 opening working tree '/home/antoranz/eclipse/workspace/interfaces'
0.272 create...

Read more...

Revision history for this message
Edmundo (eantoranz) wrote :

I re-rechecked (:-)) the filenames and I found one file what had a "ñ" on it. I "bzr mv"ed it to something without the ñ and now I'm able to checkout from that branch.

Revision history for this message
Daniel Clemente (n142857) wrote :

Bug 238365 causes the same failure due to the same conditions (symbolic link to file with Unicode name) but happens in a plugin (fast-import). There's a simple testcase to reproduce the bug.

Revision history for this message
Edmundo (eantoranz) wrote :
Download full text (5.9 KiB)

I'm working on a HUGE project (that is a little bit disordered, by the way :-)).

I KNOW that there are filenames that have strage characters.

I just tried bzr status and here's what happens (with bzr 1.6 b2):
$ ~/instaladores/bzr-1.6b2/bzr status
bzr: ERROR: exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 43-46: invalid data

Traceback (most recent call last):
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/commands.py", line 807, in ignore_pipe
    result = func(*args, **kwargs)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/builtins.py", line 178, in run
    show_pending=not no_pending)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/status.py", line 117, in show_tree_status
    want_unversioned=want_unversioned)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/tree.py", line 93, in changes_from
    want_unversioned=want_unversioned,
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/decorators.py", line 127, in read_locked
    return unbound(self, *args, **kwargs)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/tree.py", line 734, in compare
    want_unversioned=want_unversioned)
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/delta.py", line 217, in _compare_trees
    want_unversioned=want_unversioned):
  File "/home/antoranz/instaladores/bzr-1.6b2/bzrlib/workingtree_4.py", line 2450, in iter_changes
    (None, utf8_decode(current_path_info[0])[0]),
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 43-46: invalid data

bzr 1.6b2 on python 2.5.2 (linux2)
arguments: ['/home/antoranz/instaladores/bzr-1.6b2/bzr', 'status']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'es_CO.UTF-8'
plugins:
  gtk /usr/lib/python2.5/site-packages/bzrlib/plugins/gtk [0.93.0]
  launchpad /home/antoranz/instaladores/bzr-1.6b2/bzrlib/plugins/launchpad [unknown]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

The same kind of problem happens with 1.5 (stable):
$ bzr status
bzr: ERROR: exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 43-46: invalid data

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", ...

Read more...

Revision history for this message
Daniel Clemente (n142857) wrote :

As some users noticed, this is still not supported. I filed bug 272444 for that.

Revision history for this message
Vincent Ladeuil (vila) wrote :

@John:
Should that bug be re-opened ? Did your fix found its way into 1.6 ?

Revision history for this message
John A Meinel (jameinel) wrote :

This fix *is* in 1.6, but that doesn't mean we support symlinks pointing to non-ascii names. (See bug #272444). What works now is that *if* you have a symlink and you have a regular file with non-ascii names it doesn't crash.

So I think *this* bug is closed, and bug #272444 is still open.

Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.