linuxshellunixawkposix

Handling arbitrary arguments portably in AWK


I want to make my shebang enabled POSIX awk programs have more of a standard interface to them -- not using the -v var=val interface but using one that looks like other programs available from the Linux/UNIX command line. The issue that I have encountered is that awk scripts relay flags to awk -- and then awk itself has first crack at interpreting those flags. Also, different implementations of awk have different flag options. The end result is that it is nigh impossible to build an awk program with an interface that can just work parsing ARGC, ARGV[] for flags.

So, I end up encapsulating my awk programs in shell -- which adds to my support and testing burden -- and that shell code looks like the following:

arg_core=""
arg_directory=""
arg_module=""
arg_output=""
arg_regmap=""
arg_regpage=""
arg_help=0
arg_version=0
arg_verbose=0
while getopts c:d:m:o:p:r:hvV o
do
        case "$o" in
        c) arg_core="$OPTARG";;
        d) arg_directory="$OPTARG";;
        m) arg_module="$OPTARG";;
        o) arg_output="$OPTARG";;
        p) arg_regpage="$OPTARG";;
        r) arg_regmap="$OPTARG";;
        h) arg_help=1;;
        v) arg_version=1;;
        V) arg_verbose=1;;
        --) break;;
        ?) help >&2
             exit 1;;
        esac
done
shift `expr $OPTIND - 1`

# Handling help and version (verbose option also displays revision
# history and notes) is more easily done outside the getopts loop.
if [ $arg_version -gt 0 ]
then
        version
        [ $arg_verbose -gt 0 ] && rev_history
fi
if [ $arg_help -gt 0 ]
then
        [ $arg_version -gt 0 ] && echo
        help
fi
[ $arg_help -gt 0 -o $arg_version -gt 0 ] && exit 0

awk -v arg_core="$arg_core" -v arg_directory="$arg_directory" -v arg_module="$arg_module" -v arg_output="$arg_output" -v arg_regmap="$arg_regmap" -v arg_regpage="$arg_regpage" -f rffe2tpf.awk -- "$@"

My question is... I want to eliminate the shell script encapsulation and want to do my argument parsing in awk -- and I want to do it portably. (Note... I am not asking, "How do I do getopts in awk?" I am instead asking, "How do I from the shebang in an awk script portably stop awk from parsing flags?") Is there a way to trick awk or the shebang into accomplishing this goal?


Solution

  • Posting to stackoverflow has made me think more deeply about the problem, and I believe I have something closer to a solution. If anyone has some "shebang-foo" that I am unaware of that can solve this problem, then I will pick that answer above my own.

    Special thanks to BinaryZebra for pointing me to getopt.awk -- as I no longer need to think about rolling my own getopt() implementation.

    We may not be able to portably provide the behaviour we want from the shebang, but we are able to significantly limit the amount of code in the shell script to support the behaviour we want. The proposed solution is generic and can be used for all our awk scripts.

    I call the following script bawk:

    #! /usr/bin/env sh
    p=$1
    shift
    awk -f "$p" -- "$@"
    

    If bawk is placed in our path, it can be used in the awk script's shebang. Here is a test script:

    #! /usr/bin/env bawk
    BEGIN { for (i=1; i<ARGC; i++) print ARGV[i] }
    

    Output:

    $ ./foo.awk -abc -1 -2 -3
    -abc
    -1
    -2
    -3