grep: -i --color will not color matches when pattern contains uppercase

Bug #9231 reported by Debian Bug Importer
4
Affects Status Importance Assigned to Milestone
grep (Debian)
Fix Released
Unknown
grep (Ubuntu)
Invalid
High
Unassigned

Bug Description

Automatically imported from Debian bug report #226397 http://bugs.debian.org/226397

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : critical bugs in multibyte locales(UTF-8, CJK, ..) regexp

severity 249245 grave
severity 274352 grave
severity 226397 grave
severity 276209 grave
merge 249245 226397 238167
severity 277122 grave
severity 276206 grave
thanks

Bug#249245 can be fixed by patch derived from gawk's dfa.c.
Bug#274352 can be fixed by 1 line patch.
Bug#277122 (in gawk dfa.c) is the same bugs as Bug#274352 (in grep dfa.c).
Bug#276209 (in grep) and Bug#276206 (in gawk) is the same bug in dfa.c about
case insensitivity of character ranges.

All of these bugs break behaviour in multibyte locales (UTF-8, CJK, ..)

Regards,
Fumitoshi UKAI

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Automatically imported from Debian bug report #226397 http://bugs.debian.org/226397

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <email address hidden>
Date: Tue, 06 Jan 2004 12:31:27 +0100
From: BertJan Bakker <email address hidden>
To: Debian Bug Tracking System <email address hidden>
Subject: grep: -i --color will not color matches when pattern contains uppercase

Package: grep
Version: 2.5.1.ds1-2
Severity: normal

When using the -i option of grep AND any character of the
pattern to match is uppercase, grep will correctly match
BUT will not color them correctly.

To reproduce, compare the output of the following two
commands:
echo 'spam foo SPAM FOO' | grep -i --color spam
echo 'spam foo SPAM FOO' | grep -i --color SPAM

-- System Information:
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux bertjan 2.6.0 #1 Fri Dec 19 13:58:19 CET 2003 i686
Locale: LANG=C, LC_CTYPE=C

Versions of packages grep depends on:
ii libc6 2.3.2.ds1-10 GNU C Library: Shared libraries an

-- no debconf information

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Tue, 19 Oct 2004 11:30:56 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: critical bugs in multibyte locales(UTF-8, CJK, ..) regexp

severity 249245 grave
severity 274352 grave
severity 226397 grave
severity 276209 grave
merge 249245 226397 238167
severity 277122 grave
severity 276206 grave
thanks

Bug#249245 can be fixed by patch derived from gawk's dfa.c.
Bug#274352 can be fixed by 1 line patch.
Bug#277122 (in gawk dfa.c) is the same bugs as Bug#274352 (in grep dfa.c).
Bug#276209 (in grep) and Bug#276206 (in gawk) is the same bug in dfa.c about
case insensitivity of character ranges.

All of these bugs break behaviour in multibyte locales (UTF-8, CJK, ..)

Regards,
Fumitoshi UKAI

Revision history for this message
Martin Pitt (pitti) wrote :

I checked this under several locales, works fine in Warty.

Closing as NOTWARTY.

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote :

severity 226397 normal
thanks

Sorry, it was wrong merge.

> severity 226397 grave
> merge 249245 226397 238167

Regards,
Fumitoshi UKAI

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 20 Oct 2004 02:03:49 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: Re: critical bugs in multibyte locales(UTF-8, CJK, ..) regexp

severity 226397 normal
thanks

Sorry, it was wrong merge.

> severity 226397 grave
> merge 249245 226397 238167

Regards,
Fumitoshi UKAI

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : ignore case bug

tags 226397 + patch
tags 238167 + patch
thanks

This bug is only in singlebyte locales.
Above chunk, which does the same way as if (color_option) {} (special
handling for match_icase), fixes Bug#238167.
Next 2 chunks, which does the same way as MB_CUR_MAX != 1 (tolower
pattern), is necessary to fix both of these bugs.

Anyway, this is tricky, and I think regex library should be fixed
as if glibc's regex library, which has RE_ICASE syntax, so that
we may not need such a trick.

--- grep-2.5.1.orig/src/grep.c 2004-10-21 00:57:33.000000000 +0900
+++ grep-2.5.1/src/grep.c 2004-10-21 01:39:24.000000000 +0900
@@ -554,6 +554,38 @@
     {
       size_t match_size;
       size_t match_offset;
+ if(match_icase)
+ {
+ /* Yuck, this is tricky */
+ char *buf = (char*) xmalloc (lim - beg);
+ char *ibeg = buf;
+ char *ilim = ibeg + (lim - beg);
+ int i;
+ for (i = 0; i < lim - beg; i++)
+ ibeg[i] = tolower (beg[i]);
+ while ((match_offset = (*execute) (ibeg, ilim-ibeg, &match_size, 1))
+ != (size_t) -1)
+ {
+ char const *b = beg + match_offset;
+ if (b == lim)
+ break;
+ if (match_size == 0)
+ break;
+ if (color_option)
+ printf ("\33[%sm", grep_color);
+ fwrite (b, sizeof (char), match_size, stdout);
+ if (color_option)
+ fputs ("\33[00m", stdout);
+ fputs("\n", stdout);
+ beg = b + match_size;
+ ibeg = ibeg + match_offset + match_size;
+ }
+ free (buf);
+ lastout = lim;
+ if (line_buffered)
+ fflush(stdout);
+ return;
+ }
       while ((match_offset = (*execute) (beg, lim - beg, &match_size, 1))
    != (size_t) -1)
         {
@@ -1719,8 +1751,9 @@
   if (!install_matcher (matcher) && !install_matcher ("default"))
     abort ();

+ if (match_icase) {
 #ifdef MBS_SUPPORT
- if (MB_CUR_MAX != 1 && match_icase)
+ if (MB_CUR_MAX != 1)
     {
       wchar_t wc;
       mbstate_t cur_state, prev_state;
@@ -1747,8 +1780,17 @@
      }
    i += mbclen;
  }
- }
+ } else
 #endif /* MBS_SUPPORT */
+ {
+ int i, len = strlen(keys);
+ for (i = 0; i < len; i++) {
+ if (isupper(keys[i]))
+ keys[i] = tolower(keys[i]);
+ }
+ }
+ }
+

   (*compile)(keys, keycc);

Regards,
Fumitoshi UKAI

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Thu, 21 Oct 2004 01:49:19 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: ignore case bug

tags 226397 + patch
tags 238167 + patch
thanks

This bug is only in singlebyte locales.
Above chunk, which does the same way as if (color_option) {} (special
handling for match_icase), fixes Bug#238167.
Next 2 chunks, which does the same way as MB_CUR_MAX != 1 (tolower
pattern), is necessary to fix both of these bugs.

Anyway, this is tricky, and I think regex library should be fixed
as if glibc's regex library, which has RE_ICASE syntax, so that
we may not need such a trick.

--- grep-2.5.1.orig/src/grep.c 2004-10-21 00:57:33.000000000 +0900
+++ grep-2.5.1/src/grep.c 2004-10-21 01:39:24.000000000 +0900
@@ -554,6 +554,38 @@
     {
       size_t match_size;
       size_t match_offset;
+ if(match_icase)
+ {
+ /* Yuck, this is tricky */
+ char *buf = (char*) xmalloc (lim - beg);
+ char *ibeg = buf;
+ char *ilim = ibeg + (lim - beg);
+ int i;
+ for (i = 0; i < lim - beg; i++)
+ ibeg[i] = tolower (beg[i]);
+ while ((match_offset = (*execute) (ibeg, ilim-ibeg, &match_size, 1))
+ != (size_t) -1)
+ {
+ char const *b = beg + match_offset;
+ if (b == lim)
+ break;
+ if (match_size == 0)
+ break;
+ if (color_option)
+ printf ("\33[%sm", grep_color);
+ fwrite (b, sizeof (char), match_size, stdout);
+ if (color_option)
+ fputs ("\33[00m", stdout);
+ fputs("\n", stdout);
+ beg = b + match_size;
+ ibeg = ibeg + match_offset + match_size;
+ }
+ free (buf);
+ lastout = lim;
+ if (line_buffered)
+ fflush(stdout);
+ return;
+ }
       while ((match_offset = (*execute) (beg, lim - beg, &match_size, 1))
    != (size_t) -1)
         {
@@ -1719,8 +1751,9 @@
   if (!install_matcher (matcher) && !install_matcher ("default"))
     abort ();

+ if (match_icase) {
 #ifdef MBS_SUPPORT
- if (MB_CUR_MAX != 1 && match_icase)
+ if (MB_CUR_MAX != 1)
     {
       wchar_t wc;
       mbstate_t cur_state, prev_state;
@@ -1747,8 +1780,17 @@
      }
    i += mbclen;
  }
- }
+ } else
 #endif /* MBS_SUPPORT */
+ {
+ int i, len = strlen(keys);
+ for (i = 0; i < len; i++) {
+ if (isupper(keys[i]))
+ keys[i] = tolower(keys[i]);
+ }
+ }
+ }
+

   (*compile)(keys, keycc);

Regards,
Fumitoshi UKAI

Revision history for this message
In , Anibal Monsalve Salazar (anibal) wrote : #226397 is fixed in grep 2.5.1.ds2-2

Version: 2.5.1.ds2-2

Bug #226397 is fixed in grep 2.5.1.ds2-2.

Format: 1.7
Date: Wed, 26 Oct 2005 19:14:35 +1000
Source: grep
Binary: grep
Architecture: source i386 alpha sparc
Version: 2.5.1.ds2-2
Distribution: unstable
Urgency: low
Maintainer: Anibal Monsalve Salazar <email address hidden>
Changed-By: Anibal Monsalve Salazar <email address hidden>
Description:
 grep - GNU grep, egrep and fgrep
Closes: 181378 206470 224993 240239 257900 267718 284676
Changes:
 grep (2.5.1.ds2-2) unstable; urgency=low
 .
   * Patched 64-egf-speedup.patch with patch from Nicolas François
     <email address hidden>. Put 64-egf-speedup.patch,
     65-dfa-optional.patch, 66-match_icase.patch and 67-w.patch back
     in, closes: #181378, #206470, #224993.
   * Fixed "minor documentation syntax error", closes: #240239,
     #257900. Patches by Allard Hoeve <email address hidden> and Derrick
     'dman' Hudson <email address hidden>.
   * Fixed "info page not in main info menu", closes: #284676,
     #267718. Patches by Rui Tiago Cação Matos
     <email address hidden> and Paul Brook <email address hidden>.
Files:
 88b2af4b3578729420158583be03731f 660 utils required grep_2.5.1.ds2-2.dsc
 14e96467e8623210c797ec104ed9e3b2 21354 utils required grep_2.5.1.ds2-2.diff.gz
 e69a3fbbab86633594273203f7f2207e 139112 utils required grep_2.5.1.ds2-2_i386.deb
 76128b684a7deac71454c5f6b5697345 140514 utils required grep_2.5.1.ds2-2_sparc.deb
 01da865bef322c130f6f46abad86d1f9 147868 utils required grep_2.5.1.ds2-2_alpha.deb

Aníbal Monsalve Salazar
--
 .''`. Debian GNU/Linux
: :' : Free Operating System
`. `' http://debian.org/
  `- http://v7w.com/anibal

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Fri, 28 Oct 2005 21:43:23 +1000
From: =?iso-8859-1?Q?An=EDbal?= Monsalve Salazar <email address hidden>
To: <email address hidden>
Subject: #226397 is fixed in grep 2.5.1.ds2-2

--mHPqJA4CAkqJPAOw
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Version: 2.5.1.ds2-2

Bug #226397 is fixed in grep 2.5.1.ds2-2.

Format: 1.7
Date: Wed, 26 Oct 2005 19:14:35 +1000
Source: grep
Binary: grep
Architecture: source i386 alpha sparc
Version: 2.5.1.ds2-2
Distribution: unstable
Urgency: low
Maintainer: Anibal Monsalve Salazar <email address hidden>
Changed-By: Anibal Monsalve Salazar <email address hidden>
Description:=20
 grep - GNU grep, egrep and fgrep
Closes: 181378 206470 224993 240239 257900 267718 284676
Changes:=20
 grep (2.5.1.ds2-2) unstable; urgency=3Dlow
 .
   * Patched 64-egf-speedup.patch with patch from Nicolas Fran=C3=A7ois
     <email address hidden>. Put 64-egf-speedup.patch,
     65-dfa-optional.patch, 66-match_icase.patch and 67-w.patch back
     in, closes: #181378, #206470, #224993.
   * Fixed "minor documentation syntax error", closes: #240239,
     #257900. Patches by Allard Hoeve <email address hidden> and Derrick
     'dman' Hudson <email address hidden>.
   * Fixed "info page not in main info menu", closes: #284676,
     #267718. Patches by Rui Tiago Ca=C3=A7=C3=A3o Matos
     <email address hidden> and Paul Brook <email address hidden>.
Files:=20
 88b2af4b3578729420158583be03731f 660 utils required grep_2.5.1.ds2-2.dsc
 14e96467e8623210c797ec104ed9e3b2 21354 utils required grep_2.5.1.ds2-2.dif=
f.gz
 e69a3fbbab86633594273203f7f2207e 139112 utils required grep_2.5.1.ds2-2_i3=
86.deb
 76128b684a7deac71454c5f6b5697345 140514 utils required grep_2.5.1.ds2-2_sp=
arc.deb
 01da865bef322c130f6f46abad86d1f9 147868 utils required grep_2.5.1.ds2-2_al=
pha.deb

An=EDbal Monsalve Salazar
--
 .''`. Debian GNU/Linux
: :' : Free Operating System
`. `' http://debian.org/
  `- http://v7w.com/anibal

--mHPqJA4CAkqJPAOw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDYg7bgY5NIXPNpFURApgEAJsHFo2JxKxXGK5v5hME09gwCTerRACfbQmc
zauOYt16hvrRJGKKgWztqCo=
=M2pr
-----END PGP SIGNATURE-----

--mHPqJA4CAkqJPAOw--

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.