awkpartial-matches

awk - partial match of several columns between two files and return a one liner of both files concatenated with comma


I have two files, each one contains the following:

/tmp/mydir-1:

direction=1, code=a b c d, time=xxxx
direction=1, code=f x fdfsdf sdfs, time=xxxx
direction=1, code=a b c f, time=xxxx

and the 2nd file /tmp/mydir-2:

direction2=2, code2=a b c fsd, time2=xxxx
direction2=2, code2=f x fdfsdf sdfs, time2=xxxx
direction2=2, code2=a b c ff, time2=xxxx

i want to match in file2, only lines that contains code2=XXX that matches the code=XXX in the first file, and to return each line of both files concatenated by ,.

which means that if here the only lines that matches between these two files are:

direction=1, code=f x fdfsdf sdfs, time=xxxx

and

direction2=2, code2=f x fdfsdf sdfs, time2=xxxx

so to return:

direction=1, code=f x fdfsdf sdfs, time=xxxx, direction2=2, code2=f x fdfsdf sdfs, time2=xxxx

I'm new to awk.. i need to do some stuff but still don't know how to connect the things together.

i know that with this i can split the code or code2 column and to print the actual value after the = with:

cat /tmp/mydir-1 |  awk -F ', ' '{split($2,aa,"="); print aa[2]}'

this returns:

a b c d
f x fdfsdf sdfs
a b c f

now i'm trying to split the code column by = and compare the value between both files and this is where i go wrong.

when I execute this:

awk -F ', ' 'FNR==NR {split($2,aa,"="); a[aa[2]]; next} split($2, aaa, "="); aaa[2] in a' /tmp/mydir-1 /tmp/mydir-2

I get tall the lines is mydir-2 duplicated:

direction2=2, code2=a b c f, time2=xxxx
direction2=2, code2=a b c f, time2=xxxx
direction2=2, code2=f x fdfsdf sdfs, time2=xxxx
direction2=2, code2=f x fdfsdf sdfs, time2=xxxx
direction2=2, code2=a b c ff, time2=xxxx

so this is where i'm stuck. I'm guessing i'm comparing the 2nd file to itself somehow ? not really sure how to continue from here.

any information regarding this issue would be greatly appreciated.

thanks

update

thanks @KamilCuk for your update.

I changed the variable names and this is the command:

awk -F ', ' 'FNR==NR {split($2,f1split,"="); f1[f1split[2]]; next} {split($2, f2plit, "=");} f2split[2] in f1' /tmp/mydir-1 /tmp/mydir-2

I added the 2nd split for the 2nd file in {} as @KamilCuk mentioned, and when I execute it the results are empty.

the variable names and what i think they mean:

f1: first file
f2: 2nd file
f1split: first file 'code' split
f2split: 2nd file 'code2' split

did i understand awk syntax correctly in the matter of what awk code relates to the first file and what to the 2nd ?

awk 'FNR==NR {<CODE FOR FIRST FILE>} <CODE FOR 2nd FILE>' /tmp/mydir-1 /tmp/mydir-2

Solution

  • $ cat tst.awk
    BEGIN { FS="[=,]"; OFS=", " }
    NR==FNR {
        file1[$4] = $0
        next
    }
    $4 in file1 {
        print file1[$4], $0
    }
    

    $ awk -f tst.awk file1 file2
    direction=1, code=f x fdfsdf sdfs, time=xxxx, direction2=2, code2=f x fdfsdf sdfs, time2=xxxx