regexbashword-boundary

Applicable for use in a bash script match substring as whole word only in parent string?


Applicable for use in a bash script match substring as whole word only in parent string?

For example parent string:

A='enp1s0 enp2s0 enp3s0 enp4s0 lo tailscale0'

And substring:

B='enp1s'

I have tried:

[[ $A == *"$B"* ]] This matches but of course ignores 'whole' word, enp1s matches emp1s0
[[ $A =~ \<"$B"\> ]] This matches in error... enp1s is not a matching whole word
[[ $A =~ .*<"$B"\.* ]] This matches in error same as above

Reading Google as well as its A.I. comments the 2nd and 3rd tests are supposed not match, but they do. Given 'enp1s0 enp2s0 enp3s0 enp4s0 lo tailscale0', a whole word match would be 'enp1s0' not 'enp1s' per my understanding. If so, why are the 2nd and 3rd tests above failing?

Environment is Debian 12 fully upto date as of this posting date.


Solution

  • The rules for using a regexp that includes backslashes inside the test construct are arcane so it's usually clearer/simpler to define the whole regexp in a variable and then compare against that, e.g.:

    $ A='enp1s0 enp2s0 enp3s0 enp4s0 lo tailscale0'
    

    $ B='enp1s'
    $ re="\\<$B\\>"
    $ [[ $A =~ $re ]] && echo yes
    

    $ B='enp1s0'
    $ re="\\<$B\\>"
    $ [[ $A =~ $re ]] && echo yes
    yes
    

    Be aware , though, that those \\< and \\> constructs are word boundaries so if B happens to be part of a substring of A that contains non-word-constituent characters, e.g.:

    $ A='enp1s0-3 enp2s0 enp3s0 enp4s0 lo tailscale0'
    

    then the above will produce a false match:

    $ B='enp1s0'
    $ re="\\<$B\\>"
    $ [[ $A =~ $re ]] && echo yes
    yes
    

    and so you need to use something like this instead to specifically test for spaces around B:

    $ B='enp1s0'
    $ re="(^|\\s)$B(\\s|$)"
    $ [[ $A =~ $re ]] && echo yes
    

    $ B='enp1s0-3'
    $ re="(^|\\s)$B(\\s|$)"
    $ [[ $A =~ $re ]] && echo yes
    yes
    

    Also be aware that the above is doing regexp matching so any regexp metachars in your B string such as . or * will be treated as such, not as literal characters, so if you're trying to match a literal substring that just happens to contain regexp metacharacters then you could end up with false matches, e.g.:

    $ B='en.*0'
    $ re="\\<$B\\>"
    $ [[ $A =~ $re ]] && echo yes
    yes
    

    For a literal substring match, assuming the white space in A is blank chars, use [[ " $A " == *" $B "* ]] instead, e.g.:

    Incomplete substring does not match:

    $ B='enp1s'
    $ [[ " $A " == *" $B "* ]] && echo yes
    

    Substring containing the "match any character" globbing metachar does not match:

    $ B='enp1s?'
    $ [[ " $A " == *" $B "* ]] && echo yes
    

    Substring containing the "match any character" regexp metachar does not match:

    $ B='enp1s.'
    $ [[ " $A " == *" $B "* ]] && echo yes
    

    Substring which is a complete "word" matches:

    $ B='enp1s0'
    $ [[ " $A " == *" $B "* ]] && echo yes
    yes