bashawkxargstr

When combining tr with null characters and xargs -0, it doesn't work as expected


So, I have a file that contains ~ delimited fields that I need to convert to sql. The catch here is that there are spaces in this data. Generally, this wouldn't be an issue because I could just use xargs -0 and feed it null delimiters.

When I pair it with tr '~' '\0000' I get weird behavior and I am not sure why. It seems to actually take too many fields when using xargs.

Input:

FA_PRD01_PHX1~EBDT~30-JAN-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~30-JAN-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
FA_PRD01_PHX1~EBDT~23-JAN-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~23-JAN-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
FA_PRD01_PHX1~EBDT~13-MAR-2023~04~JobDefinition://company/apps/ess/custom/shared/Apps_Reports/Certifications and Competencies/ALLCERTRPT~1~1
FA_PRD01_PHX1~EBDT~13-FEB-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~13-FEB-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
FA_PRD01_PHX1~EBDT~13-FEB-2023~05~JobDefinition://company/apps/ess/custom/shared/Apps_Reports/Certifications and Competencies/ALLCERTRPT~1~1
FA_PRD01_PHX1~EBDT~06-FEB-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~06-FEB-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1

command:

cat data.txt | tr '~' '\0000' | 
   xargs -0 -n 7 printf "insert into ESS_SYSTEM_ERROR_METRICS  values('%s','%s',to_date('%s','DD-MM-YYYY'),%s,'%s',%s,%s);\n"

expected:

insert into ESS_SYSTEM_ERROR_METRICS  values('FA_PRD01_PHX1','EBDT',to_date('30-JAN-2023','DD-MM-YYYY'),18,'JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS',1,1);

actual:

insert into ESS_SYSTEM_ERROR_METRICS  values('FA_PRD01_PHX1','EBDT',to_date('30-JAN-2023','DD-MM-YYYY'),18,'JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS',1,1
FA_PRD01_PHX1);
insert into ESS_SYSTEM_ERROR_METRICS  values('EBDT','30-JAN-2023',to_date('08','DD-MM-YYYY'),JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW,'1',1
FA_PRD01_PHX1,EBDT);

My field count with xargs is correct, so I am also wondering if xargs is using something else to delimit.


Solution

  • Because you're replacing the ~ with a \0 character and using the \0 as delimiter for xargs, the \n in the input will be treated as part of the last "field"; you need to translate the newlines in the input too:

    tr '\n~' '\0' < data.txt |
    xargs -0 -n 7 printf "insert into ESS_SYSTEM_ERROR_METRICS  values('%s','%s',to_date('%s','DD-MM-YYYY'),%s,'%s',%s,%s);\n"
    

    advice: When generating code automatically, you better make sure to escape the strings properly (here you print them RAW, that might prove dangerous)


    UPDATE

    By proper escaping I mean generating the SQL in a way that won't allow code-injection; for example:

    awk -F'~' '
        function sql_stringify(str,escape_backslashes) {
            gsub(/\047/,"\047\047",str);
            if (escape_backslashes)
                gsub(/\\/,"\\\\&",str);
            return "\047" str "\047";
        }
        {
            printf( "insert into ESS_SYSTEM_ERROR_METRICS  " );
            printf( "values(%s,%s,to_date(%s,%s),%d,%s,%d,%d);\n", \
                sql_stringify($1), \
                sql_stringify($2), \
                sql_stringify($3), \
                sql_stringify("DD-MM-YYYY"), \
                $4, \
                sql_stringify($5), \
                $6, \
                $7 \
           );
        }
    ' data.txt
    

    note: for MySQL and PostgreSQL you'll need to set the escape_backslashes switch of sql_stringify to true