Comment 14 for bug 495880

Revision history for this message
Ma Hsiao-chun (mahsiaochun) wrote : Re: Extracting a file with german "Umlaut" in the filename doesn't work

Hey guys :)

It is worth knowing that ZIP archives can come with different encodings for file names.

The old standard encoding for ZIP is CP437 [1]. Since CP437 only covers the need of certain regions of the world, people on Windows began to use whatever local encoding available, for example, ZIP archives created in Simplified Chinese version of Windows uses CP936 [2].

In 2007, optional UTF-8 support is added to ZIP standard [3]. Unforunately, unzip pre-installed on Linux/Mac OS X and built-in ZIP support of MS Windows don't support the new standard well.

I know some people want unzip be fixed but the unzip upstream seems inactive. And unzip is a program supporting so many platforms (including VMS!), so it may be a bit hard to hack.

I would recommend 7Z archvie to do cross-platform archive exchange since it seems to support Unicode-based filename from day one.

1. http://en.wikipedia.org/wiki/Code_page_437
2. http://en.wikipedia.org/wiki/Code_page_936
3. http://www.pkware.com/documents/casestudies/APPNOTE.TXT