linuxunixcommset-operations

Finding Set Complement in Unix


Given this two files:

 $ cat A.txt     $ cat B.txt
    3           11
    5           1
    1           12
    2           3
    4           2

I want to find lines number that is in A "BUT NOT" in B. What's the unix command for it?

I tried this but seems to fail:

comm -3 <(sort -n A.txt) <(sort -n B.txt) | sed 's/\t//g' 

Solution

  • comm -2 -3 <(sort A.txt) <(sort B.txt)
    

    should do what you want, if I understood you correctly.

    Edit: Actually, comm needs the files to be sorted in lexicographical order, so you don't want -n in your sort command:

    $ cat A.txt
    1
    4
    112
    $ cat B.txt
    1
    112
    # Bad:
    $ comm -2 -3 <(sort -n B.txt) <(sort -n B.txt)
    4
    comm: file 1 is not in sorted order
    112
    # OK:
    $ comm -2 -3 <(sort A.txt) <(sort B.txt)
    4