awkprocess-substitution

Making a system call with awk


I have a delimited file with hundreds of thousands of lines that must be migrated to a new system. One of the fields is required in the new system and is required to be unique, but is often missing in the data file. I would like to use awk to insert uuids for this purpose. My normal way of substituting a default value (presuming $2 is missing) is

awk 'BEGIN{FS="\t"}{OFS="\t"}!$2{$2="defaultvalue"}1' myfile 

which works well for text literals, but I'd like to use the same mechanism to use command substitution within awk. If I start with the file:

field1     requiredfield     field3
value1        networkId       value3
value2                        value4

the desired output would be:

field1     requiredfield                        field3
value1        networkId                          value3
value2    6b34b312-8169-4539-b52f-a075542fb063   value4

where I'm using uuidgen to generate the value. I can almost get things to work but not quite

awk 'BEGIN{FS="\t"}{OFS="\t"}!$2{$2=system("uuidgen")}1'

returns a zero along with with the uuidgen output

awk 'BEGIN{FS="\t"}{OFS="\t"}!$2{"uuidgen"|getline d}{$2=d}1'

and

awk -v uuid=$(uuidgen) 'BEGIN{FS="\t"}{OFS="\t"}!$2{$2=uuid}1'

don't have a problem with the zero, but they give the same uuid for all records. What am I missing?


Solution

  • Assuming your input file is like this:

    cat -vte file
    
    field1^Irequiredfield^Ifield3$
    value1^InetworkId^Ivalue3$
    value2^I ^Ivalue4$
    value5^I ^Ivalue6$
    value7^I ^Ivalue8$
    

    You may try this awk that runs given command for each record and used an output from shell if inner command fails:

    cmd="uuidgen"
    
    awk -v cmd="$cmd" -v uuid="$($cmd)" '
    BEGIN {FS=OFS="\t"}
    $2 ~ /^[[:blank:]]*$/ {
       $2 = ((cmd | getline out) > 0 ? out : uuid)
       close(cmd)
    } 1' file | column -t
    
    field1  requiredfield                         field3
    value1  networkId                             value3
    value2  857063FB-673B-487B-8C00-6DF01537DA22  value4
    value5  63333D5E-2156-4855-B8BC-CBB7CEF9E9F4  value6
    value7  8CD16A13-F0F5-42E6-9998-3AA064F9B4FC  value8
    

    column -t has been used for tabular output.