Bazaar

Bug #77657
Comment #3

Comment 3 for bug 77657

Revision history for this message

John A Meinel (jameinel) wrote on 2007-01-03:

Well, the reason we don't support non-unicode is because internally all paths are handled as Unicode.

There is also a specific need for this, because how non-ascii characters are handled on various platforms is very different.

Specifically, Windows has a OEM codepage and a Unicode api. The OEM codepage means that you might be able to handle a non-ascii character if it exists in your codepage, though its final value will be arbitrary. The Unicode api allows you to create any valid unicode filename. Which means that I can create an arabic filename on a russian windows installation.

Further, Mac OS X handles filenames in a very different fashion, choosing to normalize unicode names, and doing so with a method different from other common normalizations. Specifically, a filename like å.txt will show up as: '\xe5.txt' on most systems, but on Mac it is 'a\u030a.txt'

They are both valid from a Unicode standpoint. The first is "(a with circle)" and the second is "(a) (with circle)"

Anyway, this is just to say that inserting an arbirtary character code on one filesystem will usually not be properly represented on another filesystem. Especially if you start getting into codepages and encodings. (You want me to version \xe5, which in latin-1 is å, but in iso-8859-2 it is ĺ, and in iso-8859-15 it is å, and in 'cp1251' (Russian) it is е.

It makes far more sense to version Unicode filenames, since they have a *chance* at being portable.