awkgawk

Using pipe character as a field separator


I'm trying different commands to process csv file where the separator is the pipe | character.

While those commands do work when the comma is a separator, it throws an error when I replace it with the pipe:

awk -F[|] "NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2] [|] $4 [|] $5 }" OFS=[|] file1.csv file2.csv

awk "{print NR "|" $0}" file1.csv

I tried, "|", [|], /| to no avail.

I'm using Gawk on windows. What I'm I missing?


Solution

  • For anyone finding this years later: ALWAYS QUOTE SHELL METACHARACTERS!

    I think gawk (GNU awk) treats | specially, so it should be quoted (for awk). OP had this right with [|]. However [|] is also a shell pattern. Which in bash at least, will only expand if it matches a file in the current working directory:

    $ cd /tmp
    $ echo -F[|]    # Same command
    -F[|]
    $ touch -- '-F|'
    $ echo -F[|]    # Different output
    -F|
    $ echo '-F[|]'  # Good quoting
    -F[|]           # Consistent output
    

    So it should be:

    awk '-F[|]'
    # or
    awk -F '[|]'
    

    awk -F "[|]" would also work, but IMO, only use soft quotes (") when you have something to actually expand (or the string itself contains hard quotes ('), which can't be nested in any way).

    Note that the same thing happens if these characters are inside unquoted variables.

    If text or a variable contains, or may contain: []?*, quote it, or set -f to turn off pathname expansion (a single, unmatched square bracket is technically OK, I think).

    If a variable contains, or may contain an IFS character (space, tab, new line, by default), quote it (unless you want it to be split). Or export IFS= first (bearing the consequences), if quoting is impossible (eg. a crazy eval).

    Note: raw text is always split by white space, regardless of IFS.