Comment 19 for bug 135320

Revision history for this message
John A Meinel (jameinel) wrote : Re: bzr merge - exceptions.UnicodeDecodeError

It looks like it is a problem with a symlink. Specifically this line seems to indicate that the file is absent in the current tree, but was a link in the base tree, and was absent in the merged tree.

I *don't* know why the symlink is given as unicode rather than utf-8. I suppose if this was being generated from an Inventory Entry...

When adding an entry, we clearly just do 'os.readlink()' which returns an 8-bit string (most likely fs encoded.)

I'm wondering if it has to do with one of the 'update_by_delta' functions.

I would guess the bug is in:

def _inv_entry_to_details(self, inv_entry):
    """Convert an inventory entry (from a revision tree) to state details.

    :param inv_entry: An inventory entry whose sha1 and link targets can be
        relied upon, and which has a revision set.
    :return: A details tuple - the details for a single tree at a path +
        id.
    """
    kind = inv_entry.kind
    minikind = DirState._kind_to_minikind[kind]
    tree_data = inv_entry.revision
    if kind == 'directory':
        fingerprint = ''
        size = 0
        executable = False
    elif kind == 'symlink':
        fingerprint = inv_entry.symlink_target or '' # <---- here
        size = 0
        executable = False
    elif kind == 'file':
        fingerprint = inv_entry.text_sha1 or ''
        size = inv_entry.text_size or 0
        executable = inv_entry.executable
    elif kind == 'tree-reference':
        fingerprint = inv_entry.reference_revision or ''
        size = 0
        executable = False
    else:
        raise Exception("can't pack %s" % inv_entry)
    return (minikind, fingerprint, size, executable, tree_data)

Specifically, inv_entry.symlink_target is probably a unicode string, and we need to be encoding it into something else.
Can you try the attached patch?

The reason we didn't notice is because it only trigger if you

1) Have a symlink and
2) Have non-ascii characters in your tree