awk

Print lines based on two column values


I have a two columns file like this (the second column is sorted) :

m       8569
=       8569
u       8569
j       8569
=       8570
m       8570
j       8570
c       8570
j       8571
j       8572
j       8573
n       8573
=       8573
m       8573
c       8573
u       8574
u       8574

I need to print the lines for which the value in col1 is "u" , but only when this value is associated to all the same values in col2. I should get :

u       8574
u       8574

For example, that does not work here because the "u" value is associated to the "8569" value but there are also the "m" , "=" and "j" values :

m       8569
=       8569
u       8569
j       8569

I also try to deal with this issue (awk group by and print if matches a condition) but I get stuck to get the lines that match only the character "u".

Best


Solution

  • Using GNU awk for arrays-of-arrays

    gawk '
        { data[$2][$1][NR] = $0 }
        END {
            for (val in data)
                if ("u" in data[val] && length(data[val]) == 1)
                    for (nr in data[val]["u"])
                        print data[val]["u"][nr]
        }
    ' file
    

    outputs

    u   8574
    u   8574
    

    But if you only need to print unique instances, we can save some memory

    gawk '
        { data[$2][$1] = 1 }
        END {
            OFS = "\t"
            for (val in data)
                if ("u" in data[val] && length(data[val]) == 1)
                    print "u", val
        }
    ' file