linuxunixgrepcentos7centos5

grep fails to find escaped dash that used to work


I have specific strings that are followed by either a dash '-', vertical bar '|' or digit

8-year-old code has always filtered the data with [-\|0-9]

this grep is now failing

> cat regex
^abc[\-\|0-9]
> echo abc- | grep -v -f regex
abc-

It works fine when the backslash is removed

> cat regex
^abc[-\|0-9]
> echo abc- | grep -v -f regex
>

The backslash works fine on the command-line!

> echo abc- | grep -v ^abc[\-\|0-9]
>

Since this works directly on the command-line it would seem to be a change in how "-f" loads the file into the program?

I have validated this behavior on both GNU grep 2.20 (on CentOS7) & GNU grep 2.5.1 (on CentOS5)

The obvious solution is to just remove the backslash. Every indication from my searching says the backslash should be allowed.

I would really like to understand why it started failing...but works fine on the command-line. I no longer have access to older linux boxes to test on.


Solution

  • If it ever worked, it may have been a bug. In POSIX BRE, a backslash in ranged expressions is not special:

    <backslash> shall be special except when used in a bracket expression

    Alternatively, since your regex range actually means "all characters between \ and \, plus pipes and numbers", you may have been using a locale that considered - equivalent to \ in collation order. I wasn't able to reproduce this with any locales on my systems.

    but works fine on the command-line

    It doesn't. Since you're not quoting the string, the shell removes the backslashes. Here's what grep ends up seeing:

    $ printf '%s\n' ^abc[\-\|0-9]
    ^abc[-|0-9]