awkgrep

grep: extract content between prefix and suffix


I've a file content like this:

Listening for transport dt_socket at address: 8000
------------------------------------------------------------
🔥 ^[[1m HAPI FHIR^[[22m 5.4.0 - Command Line Tool
------------------------------------------------------------
Process ID                      : 21719@psgd
Max configured JVM memory (Xmx) : 3.2GB
Detected Java version           : 11.0.7
------------------------------------------------------------
^[[32m2021-07-01^[[0;39m ^[[1;32m12:27:40.79^[[0;39m ^[[37m[main]^[[0;39m ^[[37mWARN ^[[0;39m ^[[1;34mo.f.c.i.s.c.ClassPathScanner^[[0;39m ^[[1;37mUnable to resolve location classpath:db/migration. Note this warning will become an error in Flyway 7.
^[[0;39m^[[32m2021-07-01^[[0;39m ^[[1;32m12:27:42.641^[[0;39m ^[[37m[main]^[[0;39m ^[[37mWARN ^[[0;39m ^[[1;34mo.f.c.i.s.c.ClassPathScanner^[[0;39m ^[[1;37mUnable to resolve location classpath:db/migration. Note this warning will become an error in Flyway 7.
^[[0;39m^[[32m2021-07-01^[[0;39m ^[[1;32m12:27:44.693^[[0;39m ^[[37m[main]^[[0;39m ^[[37mINFO ^[[0;39m ^[[1;34mc.u.f.j.m.t.InitializeSchemaTask^[[0;39m ^[[1;37m3_3_0.20180115.0: Initializing ORACLE_12C schema for HAPI FHIR
^[[0;39m^[[32m2021-07-01^[[0;39m ^[[1;32m12:27:44.848^[[0;39m ^[[37m[main]^[[0;39m ^[[37mINFO ^[[0;39m ^[[1;34mc.u.f.j.m.t.BaseTask^[[0;39m ^[[1;37m3_3_0.20180115.0: SQL "create sequence SEQ_BLKEXCOL_PID start with 1 increment by  50" returned 0
^[[0;39m^[[32m2021-07-01^[[0;39m ^[[1;32m12:27:44.918^[[0;39m ^[[37m[main]^[[0;39m ^[[37mINFO ^[[0;39m ^[[1;34mc.u.f.j.m.t.BaseTask^[[0;39m ^[[1;37m3_3_0.20180115.0: SQL "
create sequence SEQ_BLKEXCOLFILE_PID start with 1 increment by  50" returned 0
^[[0;39m^[[32m2021-07-01^[[0;39m ^[[1;32m12:27:47.573^[[0;39m ^[[37m[main]^[[0;39m ^[[37mINFO ^[[0;39m ^[[1;34mc.u.f.j.m.t.BaseTask^[[0;39m ^[[1;37m3_3_0.20180115.0: SQL "

    create table HFJ_BINARY_STORAGE_BLOB (
       BLOB_ID varchar2(200 char) not null,
        BLOB_DATA blob not null,
        CONTENT_TYPE varchar2(100 char) not null,
        BLOB_HASH varchar2(128 char),
        PUBLISHED_DATE timestamp not null,
        RESOURCE_ID varchar2(100 char) not null,
        BLOB_SIZE number(10,0),
        primary key (BLOB_ID)
    )" returned 0

I need to extract only content inside between SQL " and " returned 0 trimming all whitespaces.

Any ideas?

I've tried to reduce problem using:

$ echo 'sdf SQL" sdf sdf" returned 0' | grep 's/SQL"\(.*\)" returned 0/\1/' -

But it's getting empty.

My expected output is:

create sequence SEQ_BLKEXCOL_PID start with 1 increment by  50;
create sequence SEQ_BLKEXCOLFILE_PID start with 1 increment by  50;
create table HFJ_BINARY_STORAGE_BLOB (
       BLOB_ID varchar2(200 char) not null,
        BLOB_DATA blob not null,
        CONTENT_TYPE varchar2(100 char) not null,
        BLOB_HASH varchar2(128 char),
        PUBLISHED_DATE timestamp not null,
        RESOURCE_ID varchar2(100 char) not null,
        BLOB_SIZE number(10,0),
        primary key (BLOB_ID)
    );

I've tried to perform:

cat test.log | sed -E 's/.* SQL"(.*)" returned 0/\1/'

It's returning me all file content...

Using awk, it returns empty:

$ awk -v RS='SQL "[[:space:]]+?\n\n+.*returned 0' '                                                                                                  
RT{
 gsub(/^SQL "\n+|\n+$/,"",RT)
 sub(/" returned 0[[:space:]]+?\n*$/,"",RT)
 print RT";"
}
' test.log

Solution

  • This can be done using custom RS in gnu-awk that splits data on SQL "..." text block and then inside action block it extracts text between quotes without leading space.

    awk -v RS=' SQL "[^"]+"' 'RT {
    gsub(/^[^"]*"[[:space:]]*|"[^"]*$/, "", RT); print RT ";"}' file.sql
    
    create sequence SEQ_BLKEXCOL_PID start with 1 increment by  50;
    create sequence SEQ_BLKEXCOLFILE_PID start with 1 increment by  50;
    create table HFJ_BINARY_STORAGE_BLOB (
           BLOB_ID varchar2(200 char) not null,
            BLOB_DATA blob not null,
            CONTENT_TYPE varchar2(100 char) not null,
            BLOB_HASH varchar2(128 char),
            PUBLISHED_DATE timestamp not null,
            RESOURCE_ID varchar2(100 char) not null,
            BLOB_SIZE number(10,0),
            primary key (BLOB_ID)
        );