I have a two columns file like this (the second column is sorted) :
m 8569
= 8569
u 8569
j 8569
= 8570
m 8570
j 8570
c 8570
j 8571
j 8572
j 8573
n 8573
= 8573
m 8573
c 8573
u 8574
u 8574
I need to print the lines for which the value in col1 is "u" , but only when this value is associated to all the same values in col2. I should get :
u 8574
u 8574
For example, that does not work here because the "u" value is associated to the "8569" value but there are also the "m" , "=" and "j" values :
m 8569
= 8569
u 8569
j 8569
I also try to deal with this issue (awk group by and print if matches a condition) but I get stuck to get the lines that match only the character "u".
Best
Using GNU awk for arrays-of-arrays
gawk '
{ data[$2][$1][NR] = $0 }
END {
for (val in data)
if ("u" in data[val] && length(data[val]) == 1)
for (nr in data[val]["u"])
print data[val]["u"][nr]
}
' file
outputs
u 8574
u 8574
But if you only need to print unique instances, we can save some memory
gawk '
{ data[$2][$1] = 1 }
END {
OFS = "\t"
for (val in data)
if ("u" in data[val] && length(data[val]) == 1)
print "u", val
}
' file