Comment 4 for bug 7427

Revision history for this message
Enrico Zini (enrico) wrote :

(In reply to comment #3)
> No, I'm absolutely certain that I'm in a UTF-8 locale, and I'm really very
sure that I'm not pretending that ISO-8859-1 strings are UTF-8. Try it for yourself.
> $ cat frederic
> Frédéric
> $ file frederic
> frederic: UTF-8 Unicode text
> $ od -tx1 < frederic
> 0000000 46 72 c3 a9 64 c3 a9 72 69 63 0a
> 0000013
> $ iconv -f UTF-8 -t ASCII < frederic
> Friconv: illegal input sequence at position 2
> $ iconv -c -f UTF-8 -t ASCII < frederic
> Frdric
> C3 A9 is the correct UTF-8 encoding of U+00E9, LATIN SMALL LETTER E WITH ACUTE.
> None of my tests with either recode or iconv have been able to get them to
> transcode this into the closest-possible ASCII representation without just
> leaving out the characters whose codepoints lie outside ASCII. Again, if you
> know how to get them to do this, I'd be interested to hear about it.

Something weird is going on:

$ echo è| recode utf-8..ascii
`e
$ echo é |LANG=C recode utf-8..ascii
recode: Invalid input in step `UTF-8..ANSI_X3.4-1968'
$ echo é |LANG=C iconv -f utf-8 -t ascii
iconv: illegal input sequence at position 0

wtf...

ow can some UTF8 characters be more UTF8 than others, and only for recode? And
how come iconv has a different idea of unicode than recode?

The only thing I can suggest now is a:
  reportbug iconv; reportbug recode

I'm sorry I have no other clues at the moment.

Ciao,

Enrico