I have two bash scripts that execute some an awk script. It's supposed to filter out blocked users:
testawk.sh:
#!/usr/bin/bash
awk_script_file=$(cat << 'EOF'
$0 ~ "User " user ".* blocked"
{
print
}
EOF
)
# Run awk through bash to get file globbing to work
bash -c "awk -v user='${user}' '${awk_script_file}' ${file}"
testawk2.sh:
#!/usr/bin/bash
awk_script_file=$(cat << 'EOF'
$0 ~ "User " user ".* blocked" {
print
}
EOF
)
# Run awk through bash to get file globbing to work
bash -c "awk -v user='${user}' '${awk_script_file}' ${file}"
You can see that literally the only difference is the placement of the curly brace {
at the end of the regex matchphrase.
Now when I run this script against test data (user=evil_user;file=data.csv; . testawk.sh
) I get different results.
data.csv:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
User evil_user blocked: Limit exceeded
ex ea commodo consequat. Duis aute irure dolor in reprehenderit
User evil_user blocked: Limit exceeded
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
testawk.sh outpout:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
User evil_user blocked: Limit exceeded
User evil_user blocked: Limit exceeded
ex ea commodo consequat. Duis aute irure dolor in reprehenderit
User evil_user blocked: Limit exceeded
User evil_user blocked: Limit exceeded
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
testawk2.sh output:
User evil_user blocked: Limit exceeded
User evil_user blocked: Limit exceeded
And I don't understand why?
Note: The indirection of calling bash within the script is to allow filepath globbing expansion for ${file}
.
To answer the title: Yes.
In an awk condition/action pair, the action has to start on the same line as the condition; Awk is not a freeform language, as newlines are significant.
So when you do this:
/whatever/
{ something }
It is interpreted as the condition /whatever/
with no explicit action (that therefore triggers the default action of "print the record") followed by the action block { something }
with no explicit condition (that therefore is triggered on every record).
So the program winds up both printing every line that matches /whatever/
and doing the { something }
to every single line, whether it matches /whatever/
or not. If part of the { something }
is printing out the line, the lines that do match /whatever/
will be printed twice.