I've just noticed a new installation that Ubuntu does not have gawk
installed by default.
Therefore all my awk expressions containing word border markers : "<", ">" don't work at all, example:
$ readlink -e $(which awk)
/usr/bin/mawk
$ echo "word1 Bluetooth word3" | awk '/\<Bluetooth\>/'
$
EDIT0 : On another system where gawk
is installed, it works :
$ readlink -e $(which awk)
/usr/bin/gawk
$ echo "word1 Bluetooth word3" | awk '/\<Bluetooth\>/'
word1 Bluetooth word3
$
EDIT1 : Moreover mawk
shows strange behavior :
$ echo word1 word2 word3 | mawk '/^\w+/{print$1}'
word1
$ echo sebastien1 abc toto | mawk '/^\w+/{print$1}'
Here are some of the escapes sequences gawk
understands :
$ man gawk | grep '\\[yswSW<>].*Matches'
\y Matches the empty string at either the beginning or the
\< Matches the empty string at the beginning of a word.
\> Matches the empty string at the end of a word.
\s Matches any whitespace character.
\S Matches any nonwhitespace character.
\w Matches any word-constituent character (letter, digit, or
\W Matches any character that is not word-constituent.
EDIT2 : Ed Morton
is right, mawk
does not understand \w
nor the other espaces sequences that gawk
does understand :
$ man mawk | grep '\\[yswSW<>]'
$
Is there way of matching words that works for both mawk
and gawk
?
Depends what you want to do with the match but this might be adequate:
$ echo "word1 Bluetooth word3" | awk '/(^|[^[:alnum:]_])Bluetooth([^[:alnum:]_]|$)/'
word1 Bluetooth word3
There is no common escape sequence that means "word boundary" in all awks or even just in POSIX awks.
If that's not all you need then edit your question to better explain what you want to do with the matching string and provide sample input/output that demonstrates that usage.
Regarding your edit - mawk isn't showing strange behavior. You're asking it to find a line that starts with 1 or more w
s (w
is a literal char and \w
is also still that same literal char) and print the first field from that line. The first line you test with starts with w
, the second one doesn't.
If you're trying to match word-constituent characters (which is what \w+
would do in gawk) then use [[:alnum:]_]+
in a POSIX awk or [a-zA-Z0-9_]+
in any awk assuming those character ranges are correct for your locale. If you wanted to print the word that matches that regexp then it'd be:
$ echo 'sebastien1 abc toto' |
awk 'match($0,/^[[:alnum:]_]+/){print substr($0,RSTART,RLENGTH)}'
sebastien1