linuxgrep

How to grep -o nested square brackets and characters


I have a string below which is stored in a file aaa

333333444444aaa[aaa[[bb[b[ccc]zzz]xx[x]cc]]cc222222211111111

The left and right square brackets may not match in the string. So I want to grep all the lowercase letters and square brackets as a string. I'm using grep -o '[a-z\[\]]*' aaa to get below as a whole.

aaa[aaa[[bb[b[ccc]zzz]xx[x]cc]]cc

But it returns 3 patterns which are single lowercase letter, single left square bracket, single lowercase letter with one or more right square bracket on the right.

So I tried grep -o '[a-z\[]*' aaa. It returns 2 patterns which are lowercase letters with left square brackets, lowercase letters. That's closer to the result I want but still not correct for sure.

Is it possible to only use grep -o and square brackets matching to get the expected result?


Solution

  • Since you did not tell grep to do otherwise, it is using POSIX Basic Regular Expression syntax. Your regex includes a bracket expression followed by a lone right bracket followed by an asterix:

    grep -o '[a-z\[\]]*'
    

    ā†’

    [a-z\[\]
    ]
    *
    

    So, your expression tells grep to look for:

    Backslashes are not special inside a BRE bracket expression. Nor are left brackets. As the reference above states, and @tshiono notes in the comments, to include a right bracket inside the bracket expression it must appear first.

    This leads to the slightly odd looking regex [][a-z] or, equivalently, the even odder looking []a-z[].


    Had you used grep's -E option you would have seen the same result since "The rules for ERE Bracket Expressions are the same as for Basic Regular Expressions".

    However, if your grep supports -P (Perl syntax), your original regex would give the result you intended.