Comment 2 for bug 371597

Revision history for this message
John A Meinel (jameinel) wrote : Re: bzr init --development-rich-root crashed

This is the generic dirstate initialize() code.

DirStateWorkingTree.initiialize(revision_id=XXX)

Which then calls down to:

TreeTransform.build_tree(basis_tree, wt, accelerator_tree, ...)

Which then does this:
    existing_files = set()
    for dir, files in wt.walkdirs():
        existing_files.update(f[0] for f in files)

I believe the idea is that if you did "bzr co" in that directory, it would try to resolve files that already exist versus files that it is trying to create. It *does* seem silly to do this for 'bzr init', since you shouldn't be trying to create any files.

If it is considered critical, then we could do something in DSRT.initialize() to special case 'revision_id==NULL_REVISION'.

However, I'll also note that while initialize failed, 'bzr add' will fail for exactly the same reason later on. Namely, you have a non-ascii file, which doesn't conform to whatever you claim your filesystem encoding is. (My guess is you have a latin-1 filename and UTF-8 encoding.)

The error you see is because

1) os.listdir(u'unicode-string') is supposed to return unicode strings, with each filename decoded
2) When it encounters a string in the filesystem that cannot be decoded, it returns a plain byte string
3) When doing 'sorted(list_of_mixed_unicode_and_str)' it auto-upcasts the plain strings to unicode, and that fails, because the default in-memory string encoding is ascii, which can thus not handle any non-ascii paths.

Anyway, we've had this bug in 'bzr add' for a long time, such as bug #187267. ATM, we refuse to version files that can't be stored as Unicode strings, and we expect people to have their filesystem encoding set correctly.

To avoid this bug more completely requires a lot of reworking of internals, and also a policy decision as to how we want to handle these things.