So, I have a file that contains ~
delimited fields that I need to convert to sql. The catch here is that there are spaces in this data. Generally, this wouldn't be an issue because I could just use xargs -0
and feed it null delimiters.
When I pair it with tr '~' '\0000'
I get weird behavior and I am not sure why. It seems to actually take too many fields when using xargs.
Input:
FA_PRD01_PHX1~EBDT~30-JAN-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~30-JAN-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
FA_PRD01_PHX1~EBDT~23-JAN-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~23-JAN-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
FA_PRD01_PHX1~EBDT~13-MAR-2023~04~JobDefinition://company/apps/ess/custom/shared/Apps_Reports/Certifications and Competencies/ALLCERTRPT~1~1
FA_PRD01_PHX1~EBDT~13-FEB-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~13-FEB-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
FA_PRD01_PHX1~EBDT~13-FEB-2023~05~JobDefinition://company/apps/ess/custom/shared/Apps_Reports/Certifications and Competencies/ALLCERTRPT~1~1
FA_PRD01_PHX1~EBDT~06-FEB-2023~18~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS~1~1
FA_PRD01_PHX1~EBDT~06-FEB-2023~08~JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW~1~1
command:
cat data.txt | tr '~' '\0000' |
xargs -0 -n 7 printf "insert into ESS_SYSTEM_ERROR_METRICS values('%s','%s',to_date('%s','DD-MM-YYYY'),%s,'%s',%s,%s);\n"
expected:
insert into ESS_SYSTEM_ERROR_METRICS values('FA_PRD01_PHX1','EBDT',to_date('30-JAN-2023','DD-MM-YYYY'),18,'JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS',1,1);
actual:
insert into ESS_SYSTEM_ERROR_METRICS values('FA_PRD01_PHX1','EBDT',to_date('30-JAN-2023','DD-MM-YYYY'),18,'JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLS',1,1
FA_PRD01_PHX1);
insert into ESS_SYSTEM_ERROR_METRICS values('EBDT','30-JAN-2023',to_date('08','DD-MM-YYYY'),JobDefinition://company/apps/ess/custom/Shared Folders/Apps_Reports/APPSSALESFORCETIMEDTLSNEW,'1',1
FA_PRD01_PHX1,EBDT);
My field count with xargs is correct, so I am also wondering if xargs is using something else to delimit.
Because you're replacing the ~
with a \0
character and using the \0
as delimiter for xargs
, the \n
in the input will be treated as part of the last "field"; you need to translate the newlines in the input too:
tr '\n~' '\0' < data.txt |
xargs -0 -n 7 printf "insert into ESS_SYSTEM_ERROR_METRICS values('%s','%s',to_date('%s','DD-MM-YYYY'),%s,'%s',%s,%s);\n"
advice: When generating code automatically, you better make sure to escape the strings properly (here you print them RAW, that might prove dangerous)
By proper escaping I mean generating the SQL in a way that won't allow code-injection; for example:
awk -F'~' '
function sql_stringify(str,escape_backslashes) {
gsub(/\047/,"\047\047",str);
if (escape_backslashes)
gsub(/\\/,"\\\\&",str);
return "\047" str "\047";
}
{
printf( "insert into ESS_SYSTEM_ERROR_METRICS " );
printf( "values(%s,%s,to_date(%s,%s),%d,%s,%d,%d);\n", \
sql_stringify($1), \
sql_stringify($2), \
sql_stringify($3), \
sql_stringify("DD-MM-YYYY"), \
$4, \
sql_stringify($5), \
$6, \
$7 \
);
}
' data.txt
note: for MySQL and PostgreSQL you'll need to set the escape_backslashes
switch of sql_stringify
to true