grep -w doesn't always grab whole words - broken multibyte support

Bug #274221 reported by Rusty Russell
4
Affects Status Importance Assigned to Milestone
grep
Unknown
Unknown
grep (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: grep

1)
rusty@vivaldi:~$ lsb_release -rd
Description: Ubuntu 8.04.1
Release: 8.04

2) rusty@vivaldi:~$ apt-cache policy grep
grep:
  Installed: 2.5.3~dfsg-3
  Candidate: 2.5.3~dfsg-3
  Version table:
 *** 2.5.3~dfsg-3 0
        500 http://au.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status

3) Only whole words returned. There were a couple of spurious results in a large grep. Example file (from Linux kernel) is enclosed, and as you can see, piping through same grep expression again "fixes" the problem.

4) rusty@vivaldi:~$ grep -w alloca devel/kernel/linux-2.6/kernel/pid.c
 * Generic pidhash and scalable, time-bounded PID allocator
rusty@vivaldi:~$ grep -w alloca devel/kernel/linux-2.6/kernel/pid.c | grep -w alloca
rusty@vivaldi:~$

Revision history for this message
Rusty Russell (rusty-rustcorp) wrote :
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for your report. This is reproducible in Intrepid. I suspect that this is one of those multibyte bugs of grep.

Does it work correctly with the following command:
LANG=C grep -w alloca kernel/pid.c

Can you post the output of the command : locale

Thanks in advance.

Changed in grep:
status: New → Incomplete
Revision history for this message
Rusty Russell (rusty-rustcorp) wrote :

Yes, LANG=C fixes it.

rusty@vivaldi:~/devel/kernel/patches/linux-2.6$ LANG=C grep -w alloca kernel/pid.c
rusty@vivaldi:~/devel/kernel/patches/linux-2.6$ grep -w alloca kernel/pid.c
 * Generic pidhash and scalable, time-bounded PID allocator
rusty@vivaldi:~/devel/kernel/patches/linux-2.6$ locale
LANG=en_AU.UTF-8
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for confirming.
We'll take a look.

Changed in grep:
status: Incomplete → Confirmed
description: updated
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

this is a strange bug.

only grep -w alloca pid.c returns a wrong result on this particular string. Results returned with words "alloc" or "allocat" are corrects.

Do you have other examples in order to try to figure out a pattern to reproduce this issue ?

Thanks for your help.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.