grepposix-ere

Extended regular expression grouping


Trying to understand grouping in extended regular expressions. What is the difference between following two extended regular expressions.

$ echo "the CPU is" | grep -E '[Tt]he CPU|computer is'
the CPU is
$ echo "the CPU is" | grep -E '[Tt]he (CPU|computer) is'
the CPU is

On my Ubuntu bash shell the grep program in the former pattern highlights the CPU in red color. In the latter pattern grep highlights the CPU is. How is grouping change the pattern matching in the two cases above.


Solution

  • Because they are not equivalent! For example:

    $ cat ip.txt
    xyz The CPU 123
    get the CPU and book
    the computer is not here
    The CPU is bad
    
    $ grep -E '[Tt]he CPU|computer is' ip.txt
    xyz The CPU 123
    get the CPU and book
    the computer is not here
    The CPU is bad
    
    $ grep -E '[Tt]he (CPU|computer) is' ip.txt
    the computer is not here
    The CPU is bad
    

    Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions.