regexbashsed

How to extract text from a string using sed?


My example string is as follows:

This is 02G05 a test string 20-Jul-2012

Now from the above string I want to extract 02G05. For that I tried the following regex with sed

$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'

But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.

So, my question is what am I doing wrong here and how to correct it.

When I try the above string and pattern with python I get my result

>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>

Solution

  • The pattern \d might not be supported by your sed. Try [0-9] or [[:digit:]] instead.

    To only print the actual match (not the entire matching line), use a substitution.

    sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'
    

    The parentheses capture the text they match into a back reference. Here, the first (and only) parentheses capture the string we want to keep, and we replace the entire line with just the captured string \1, and print the resulting line. (The p option says to print the resulting line after performing a successful substitution, and the -n option prevents sed from performing its normal printing of every other line.)