regexsed

sed repetition match misbehaving


I am trying to get a file path from the following string:

"# configuration file /etc/nginx/conf.d/default.conf"

by passing it to sed:

sed -n 's,\(# configuration file \)\(\/[a-zA-Z_.]\+\)\+,\1,'

I expect /etc/nginx/conf.d/default.conf be catched in \1, but surprisingly only the default.conf part is returned. Here I understand that the referenced part gets refilled each time with the next match of /[a-zA-Z_.]\+. Isn't it logical that each next match goes to the next reference, so default.conf will be returned in \4?

/[a-zA-Z_.]\+ >>>

\(/etc\)\(/nginx\)\(/conf.d\)\(/default.conf\)
   \1        \2        \3           \4

Solution

  • This might work for you (GNU sed):

    sed -nE 's,(# configuration file )((/[a-zA-Z_.]+)+),\2,p' file
    

    This will capture the file path.

    sed -nE 's,(# configuration file )((/[a-zA-Z_.]+)+),\1,p' file
    

    This will capture the beginning of the comment.

    sed -nE 's/(# configuration file )((\/[a-zA-Z_.]+)+)/\3/p' file
    

    This will capture the end of the file path.

    N.B. When a capture group is qualified by something that maybe a repetition i.e. *,?,+ or anything between {...} it will retain the last such repetition (see solution 3).