bashawkdata-filtering

awk: does the placement of the curly brace { matter?


I have two bash scripts that execute some an awk script. It's supposed to filter out blocked users:

testawk.sh:

#!/usr/bin/bash 

awk_script_file=$(cat << 'EOF'

$0 ~ "User " user ".* blocked"
{
    print
}
EOF
)

# Run awk through bash to get file globbing to work
bash -c "awk -v user='${user}' '${awk_script_file}' ${file}"

testawk2.sh:

#!/usr/bin/bash 

awk_script_file=$(cat << 'EOF'

$0 ~ "User " user ".* blocked" {
    print
}
EOF
)

# Run awk through bash to get file globbing to work
bash -c "awk -v user='${user}' '${awk_script_file}' ${file}"

You can see that literally the only difference is the placement of the curly brace { at the end of the regex matchphrase.

Now when I run this script against test data (user=evil_user;file=data.csv; . testawk.sh) I get different results.

data.csv:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
User evil_user blocked: Limit exceeded
ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
User evil_user blocked: Limit exceeded
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

testawk.sh outpout:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
User evil_user blocked: Limit exceeded
User evil_user blocked: Limit exceeded
ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
User evil_user blocked: Limit exceeded
User evil_user blocked: Limit exceeded
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

testawk2.sh output:

User evil_user blocked: Limit exceeded
User evil_user blocked: Limit exceeded

And I don't understand why?

Note: The indirection of calling bash within the script is to allow filepath globbing expansion for ${file}.


Solution

  • To answer the title: Yes.

    In an awk condition/action pair, the action has to start on the same line as the condition; Awk is not a freeform language, as newlines are significant.

    So when you do this:

    /whatever/
    { something }
    

    It is interpreted as the condition /whatever/ with no explicit action (that therefore triggers the default action of "print the record") followed by the action block { something } with no explicit condition (that therefore is triggered on every record).

    So the program winds up both printing every line that matches /whatever/ and doing the { something } to every single line, whether it matches /whatever/ or not. If part of the { something } is printing out the line, the lines that do match /whatever/ will be printed twice.