Entering names with accents causes problems

Bug #7427 reported by Enrico Zini
60
Affects Status Importance Assigned to Milestone
shadow (Ubuntu)
Fix Released
Medium
Colin Watson

Bug Description

If I have accents in my real name, I'm happy I can enter them when I'm asked
about my real name. I'm even more happy to find those accents in the proposed
username, and I think "cool! Unix got to accept accents for usernames!", and
hit Enter again. At that point, the intallation just asks about my name again.

A good way to generate a good username from the user's name would be to
transcode from the inputted charset to ascii (recode can do that, I hope
something similar is available at base-config time) and then move to small caps
and remove all non-legal characters. Frédéric would become Fre'de'ric and then
frederic. I don't know if some localization libraries offer better transcoding
functions.

Bye,

Enrico

Revision history for this message
Colin Watson (cjwatson) wrote :

I don't seem to be able to persuade recode to do this, even with -f:

  $ echo 'Frédéric' | recode UTF-8..ASCII
  Frrecode: Invalid input in step `UTF-8..ANSI_X3.4-1968'
  $ echo 'Frédéric' | recode -f UTF-8..ASCII
  Fric

The best that's available when base-config is invoked is probably iconv. That
performs slightly less badly, but still not well enough:

  $ echo 'Frédéric' | iconv -f UTF-8 -t ASCII
  Friconv: illegal input sequence at position 2
  $ echo 'Frédéric' | iconv -c -f UTF-8 -t ASCII
  Frdric

The Perl Text::Iconv module's convert() method just returns undef.

If you know how to persuade any of these tools to do the right thing, I'd be
interested to hear it.

Revision history for this message
Enrico Zini (enrico) wrote :

(In reply to comment #1)
> $ echo 'Frédéric' | iconv -f UTF-8 -t ASCII
> Friconv: illegal input sequence at position 2
> $ echo 'Frédéric' | iconv -c -f UTF-8 -t ASCII
> Frdric
> The Perl Text::Iconv module's convert() method just returns undef.
> If you know how to persuade any of these tools to do the right thing, I'd be
> interested to hear it.

You are feeding latin1 sequences pretending they are UTF-8: use -f latin1 instead.

Ciao,

Enrico

Revision history for this message
Colin Watson (cjwatson) wrote :

No, I'm absolutely certain that I'm in a UTF-8 locale, and I'm really very sure
that I'm not pretending that ISO-8859-1 strings are UTF-8. Try it for yourself.

  $ cat frederic
  Frédéric
  $ file frederic
  frederic: UTF-8 Unicode text
  $ od -tx1 < frederic
  0000000 46 72 c3 a9 64 c3 a9 72 69 63 0a
  0000013
  $ iconv -f UTF-8 -t ASCII < frederic
  Friconv: illegal input sequence at position 2
  $ iconv -c -f UTF-8 -t ASCII < frederic
  Frdric

C3 A9 is the correct UTF-8 encoding of U+00E9, LATIN SMALL LETTER E WITH ACUTE.
None of my tests with either recode or iconv have been able to get them to
transcode this into the closest-possible ASCII representation without just
leaving out the characters whose codepoints lie outside ASCII. Again, if you
know how to get them to do this, I'd be interested to hear about it.

Revision history for this message
Enrico Zini (enrico) wrote :

(In reply to comment #3)
> No, I'm absolutely certain that I'm in a UTF-8 locale, and I'm really very
sure that I'm not pretending that ISO-8859-1 strings are UTF-8. Try it for yourself.
> $ cat frederic
> Frédéric
> $ file frederic
> frederic: UTF-8 Unicode text
> $ od -tx1 < frederic
> 0000000 46 72 c3 a9 64 c3 a9 72 69 63 0a
> 0000013
> $ iconv -f UTF-8 -t ASCII < frederic
> Friconv: illegal input sequence at position 2
> $ iconv -c -f UTF-8 -t ASCII < frederic
> Frdric
> C3 A9 is the correct UTF-8 encoding of U+00E9, LATIN SMALL LETTER E WITH ACUTE.
> None of my tests with either recode or iconv have been able to get them to
> transcode this into the closest-possible ASCII representation without just
> leaving out the characters whose codepoints lie outside ASCII. Again, if you
> know how to get them to do this, I'd be interested to hear about it.

Something weird is going on:

$ echo è| recode utf-8..ascii
`e
$ echo é |LANG=C recode utf-8..ascii
recode: Invalid input in step `UTF-8..ANSI_X3.4-1968'
$ echo é |LANG=C iconv -f utf-8 -t ascii
iconv: illegal input sequence at position 0

wtf...

ow can some UTF8 characters be more UTF8 than others, and only for recode? And
how come iconv has a different idea of unicode than recode?

The only thing I can suggest now is a:
  reportbug iconv; reportbug recode

I'm sorry I have no other clues at the moment.

Ciao,

Enrico

Revision history for this message
Matt Zimmerman (mdz) wrote :

This bug lies somewhere other than base-config; where should it be forwarded?

Revision history for this message
Colin Watson (cjwatson) wrote :

*** Bug 12775 has been marked as a duplicate of this bug. ***

Revision history for this message
Colin Watson (cjwatson) wrote :

We at least seem to get a proper error message nowadays, explaining what
usernames are valid, although it's insufficiently clear that "letters" must be
ASCII.

Revision history for this message
Colin Watson (cjwatson) wrote :

A thought about how to make this less bad in a simple way: if the attempt to
automatically generate a username produces something that isn't usable, then
simply don't try to automatically generate the username, and leave it blank.
This would mean that at least somebody entering "José" as their first name would
not be presented with the invalid "josé" username. Since ASCII transliteration
might in some cases be very difficult (consider names in non-Latin scripts),
this is probably the best we can do until full Unicode usernames are safe
everywhere.

Revision history for this message
Colin Watson (cjwatson) wrote :

Created an attachment (id=1509)
never generate invalid default usernames

Revision history for this message
Colin Watson (cjwatson) wrote :

(In reply to comment #8)
> A thought about how to make this less bad in a simple way: if the attempt to
> automatically generate a username produces something that isn't usable, then
> simply don't try to automatically generate the username, and leave it blank.
> This would mean that at least somebody entering "José" as their first name would
> not be presented with the invalid "josé" username. Since ASCII transliteration
> might in some cases be very difficult (consider names in non-Latin scripts),
> this is probably the best we can do until full Unicode usernames are safe
> everywhere.

Done, mitigating this issue:

shadow (1:4.0.3-30.7ubuntu10) hoary; urgency=low

  * Never generate invalid default usernames (part of Ubuntu #668).

 -- Colin Watson <email address hidden> Fri, 4 Mar 2005 11:09:13 +0000

Revision history for this message
Matt Zimmerman (mdz) wrote :

Sounds perfectly reasonable to me, and resolves this bug as far as I'm concerned.

Revision history for this message
Matt Zimmerman (mdz) wrote :

*** Bug 15034 has been marked as a duplicate of this bug. ***

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

"If I have accents in my real name, I'm happy I can enter them when I'm asked
about my real name."

My real name has accents, too. But, when I try to enter it when I'm asked for my
real name, the keyboard hangs. I can't go further in the installation. Only
press the 'Esc' key several times solves the problem, going back to the
installation menu.

PS: All this with an Spanish keyboard, and selecting Spanish, Spain in the
beginning of the installation.

Revision history for this message
Colin Watson (cjwatson) wrote :

*** Bug 18908 has been marked as a duplicate of this bug. ***

Revision history for this message
Ante Karamatić (ivoks) wrote :

This isn't a problem cause of username. If you try to create a user with
fullname "Ante Karamatić", installation will autogenerate username "ante" wich
is OK. But, still, instalation will fail and user will be prompted to create New
user - procedure wich he/she allready did. Username is OK, fullname is a
problem. I didn't do more digging but it seems to me that this isn't a shadow
problem, since shadow doesn't contain fullname at all.

Usernames shouldn't contain accents, and that's normal behaviuor (you don't want
to mess up your keyboard layout and then not be able to login at all to fix
issue), but fullname should be real fullname, with accents.

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

(In reply to comment #15)
> This isn't a problem cause of username. If you try to create a user with
> fullname "Ante Karamatić", installation will autogenerate username "ante" wich
> is OK. But, still, instalation will fail and user will be prompted to create New
> user - procedure wich he/she allready did. Username is OK, fullname is a
> problem. I didn't do more digging but it seems to me that this isn't a shadow
> problem, since shadow doesn't contain fullname at all.
>
> Usernames shouldn't contain accents, and that's normal behaviuor (you don't want
> to mess up your keyboard layout and then not be able to login at all to fix
> issue), but fullname should be real fullname, with accents.

I totally agree.

Revision history for this message
Benjamín Valero Espinosa (benjavalero) wrote :

My real name is Benjamín Valero Espinosa. What I do is entering my full name
without accents (because otherwise keyboard hangs) and then, once Ubuntu is
installed, change my name in "User properties", so I can see my right name when
I win in the Gnome games :)

I agree that no accent must me used in usernames.

Benja

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

(In reply to comment #17)
> My real name is Benjamín Valero Espinosa. What I do is entering my full name
> without accents (because otherwise keyboard hangs)

That is the problem: the keyboard hangs when you type an accent writing your
name during installation. I think this should be fixed in order to let to
introduce accents in the real name or, at least, not to hang the keyboard.

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

I've found two new side effects of entering accented vocals during Breezy
installation.

If I type an accented key in the real username during the installation, that
keycode gets into that real username (but the installation can continue). Then,
when the system is installed,

a) if I go to "About me" (under System -> Administration), the real username
field doesn't shows the correct name.
b) if I go to "Users and groups" (under System -> Administration, too), the
application crashes.

I must change the real username using vipw (or directly editing /etc/passwd),
and then the two above applications ("About me" and "Users and groups") works fine.

I think this is a very real bug with growing consecuencies (two of then showing
here).

Revision history for this message
Colin Watson (cjwatson) wrote :

*** Bug 20805 has been marked as a duplicate of this bug. ***

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

More info:

This is the codes that appears in the fullname field instead of the
accented vocals:

 á = <E1> ó = <F3> É = <C9> Ú = <DA>
 é = <E9> ú = <FA> Í = <CD> Ñ = (whitespace)
 í = <ED> Á = <C1> Ó = <D3> € = (whitespace)

The above reads (for example): if the user types á, in the fullname field
appears <E1> (all the four characters).

Moreover, if you type any of the above characters (except Ñ and €), then
the backspace key doesn't works anymore (you can't delete the <??> code).

This bug is not only in the fullname field: the problem appears in any
text field during the installer (machine name, fullname, username, IP
address, and so on...)

Revision history for this message
Enrico Zini (enrico) wrote :

(In reply to comment #22)
> Moreover, if you type any of the above characters (except Ñ and €), then
> the backspace key doesn't works anymore (you can't delete the <??> code).

This one might be related with:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=110564

Revision history for this message
Benjamín Valero Espinosa (benjavalero) wrote :

I have been probing, because I have fish-memory, and I have found that this
problem doesn't exist in Warty! Curious and curious! (As Alice in Wonderland
said, I think, my English is the only thing poorer than my memory :D)

Benja

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

(In reply to comment #24)
> I have been probing, because I have fish-memory, and I have found that this
> problem doesn't exist in Warty! Curious and curious! (As Alice in Wonderland
> said, I think, my English is the only thing poorer than my memory :D)
>
> Benja

I think this is because Warty's default character code is Latin1, not UTF-8.

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

I think this bug is a duplicate of bug #13938.

Revision history for this message
Colin Watson (cjwatson) wrote :

(In reply to comment #26)
> I think this bug is a duplicate of bug #13938.

No, #7593 is referring to problems with actual keyboard entry of characters and
the encoding of characters that reach applications. This bug is talking about
problems with the password subsystem, which is at a higher level. I think
they're probably separate.

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

(In reply to comment #27)
> (In reply to comment #26)
> > I think this bug is a duplicate of bug #13938.
>
> No, #7593 is referring to problems with actual keyboard entry of characters and
> the encoding of characters that reach applications. This bug is talking about
> problems with the password subsystem, which is at a higher level. I think
> they're probably separate.

Mmmm... Ok, the symptoms seems to be the same, or almost the same. For example,
the <E9> code appearing in the screen instead of the é character...

But your knowledge overcomes my knowledge :)

Ricardo.

Revision history for this message
Bo Rosén (bo-rosen) wrote :

(In reply to comment #19)

Just a little extra FYI as I ran into this too.

> a) if I go to "About me" (under System -> Administration), the real username
> field doesn't shows the correct name.
> b) if I go to "Users and groups" (under System -> Administration, too), the
> application crashes.

same here. The About Me name field only says "Full Name"

> I must change the real username using vipw (or directly editing /etc/passwd),
> and then the two above applications ("About me" and "Users and groups") works
fine.

When I look at /etc/passwd using less the accented character in my name is an
empty square, but using nano or gedit it displays correctly.

I installed Breezy today.

Revision history for this message
Ante Karamatić (ivoks) wrote :

This problem is, on a first sight, solved in Dapper Flight2. User is created, installation continues. But, if you do grep [username] /etc/passwd, you will see that user doesn't have setup his full name:

user:x:1000:1000::/home/user:/bin/bash

while, it should be

user:x:1000:1000:Šđžž Čćčž:/home/user:/bin/bash

So bug isn't solved, but IMHO, this behaviour is better.

Lakin Wecker (lakin)
Changed in user-setup:
status: Unconfirmed → Confirmed
Revision history for this message
Colin Watson (cjwatson) wrote :

This appears to be a chfn bug:

$ sudo chfn -f 'Tést Üser' test
chfn: invalid name: "Tést Üser"

It looks to me as if nobody's taught chfn about UTF-8.

The problem with keyboard entry of UTF-8 characters in the installer is bug #13938.

Revision history for this message
Benjamín Valero Espinosa (benjavalero) wrote :

Ante, I have just installed Dapper Flight 5 and I have still the same problem, so if it was solved in Flight 2, it has got back!

Revision history for this message
Benjamín Valero Espinosa (benjavalero) wrote :

It works in Ubuntu Edgy!! Please, close this bug :)

Revision history for this message
Henrik Nilsen Omma (henrik) wrote :

Confirmed. Works for me as well. Closing bug!

Changed in shadow:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.