regexwindowsbashsedsql-scripts

It is a way to modify an SQL Insert Script with SED Command in order to insert word DATE before date value?


Modifying SQL Scripts with SED Command

I would like to explain my task. I have been asked to modify many files with one command without alter the data. The core is to properly insert DATE_B values, since it would give us an error trying to insert this into MYTABLE, due to the fact we create DATE_B as DATE datatype.

My Scripts

Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2018-10-30','YYYY-MM-DD'),'2021-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2022-08-11','YYYY-MM-DD'),'2022-08-06');

I have to modify all DATE_B values in order to insert DATE before the value. What I mean is:

Expected Scritps

Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),DATE'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2018-10-30','YYYY-MM-DD'),DATE'2021-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2022-08-11','YYYY-MM-DD'),DATE'2022-08-06');

My approach was to use REGEX in order to identify the single quote, year, dash, month, dash, day, dash, single quote... But if I do this I will substitute also my properly formatted to_timestamp values.

I found this command from GeekForGeeks useful since we catch the first letter and we enclose them with parenthesis.

$ echo "Welcome To The Geek Stuff" | sed 's/\(\b[A-Z]\)/\(\1\)/g'

Trying to approach my goal:

$ echo "Welcome To The Geek Stuff '2010-10-12' '1999-10-10'" | sed -E 's/(\b[A-Z])/DATE\1/g'

Using regex:

$ echo "Welcome To The Geek Stuff '2010-10-12' '1999-10-10'" | sed -E 's/'('\d{4}-)'/DATE\1/g'
$ echo "Welcome To The Geek Stuff" ,'1999-10-10' | sed 's/^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$/'DATE\1'/g'

I also read about h, H subcommand of sed, but it is not clear enough for me to make it works.

I would appreciate if someone could explain how to properly implement sed h subcommand and how to properly use regex with sed in order to identify COMMA, single quote, year, dash, month, dash, day, dash, single quote. Save this and insert word DATE

Thank you beforehand!


Solution

  • Assumptions/Understandings:

    Adding a few scenarios to our input:

    $ cat insert.sql
    Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),'2019-09-09');
    Insert into MYTABLE (DATE_A, DATE_C) values ('2021-09-09',to_timestamp('2018-10-30','YYYY-MM-DD'));
    Insert into MYTABLE (DATE_A, MY_INT, DATE_B) values ('2022-08-11','2344','2022-08-06');
    

    One sed idea:

    sed -E "s/( \(|,)('[0-9]{4}-[0-9]{2}-[0-9]{2}')/\1DATE\2/g" insert.sql
    

    Where:

    This generates:

    Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),DATE'2019-09-09');
    Insert into MYTABLE (DATE_A, DATE_C) values (DATE'2021-09-09',to_timestamp('2018-10-30','YYYY-MM-DD'));
    Insert into MYTABLE (DATE_A, MY_INT, DATE_B) values (DATE'2022-08-11','2344',DATE'2022-08-06');
    

    Once satisfied with the results OP can add the -i flag to have sed overwrite the input file.