bashfindcomm

How to get files in directory A but not B and vice versa using bash comm?


I'm trying to use comm to get files on a folder A that is not on B and vice-versa:

comm -3 <(find /Users/rob/A -type f -exec basename {} ';' | sort) <(find "/Users/rob/B" -type f -exec basename {} ';' | sort)

I'm using basename {} ';' to exclude the directory path, but this is the output I get:

    IMG_5591.JPG
IMG_5591.jpeg
    IMG_5592.JPG
IMG_5592.jpeg
    IMG_5593.JPG
IMG_5593.jpeg
    IMG_5594.JPG
IMG_5594.jpeg

There's a tab in the name of the first directory, therefore all entries are considered different. What am I doing wrong?


Solution

  • The leading tabs are not being generated by the find|basename code; the leading tabs are being generated by comm ...

    comm generates 1 to 3 columns of output depending on the input flags; 2nd column of output will have a leading tab while 3rd column of output will have 2 leading tabs.

    In this case OP's code says to ignore column #3 (-3, the files in common between the 2 sources), so comm generates 2 columns of output w/ the 2nd column having a leading tab.

    One easy fix:

    comm --output-delimiter="" <(find...|sort...) <(find...|sort...)
    

    If for some reason your comm does not support the --output-delimiter flag:

    comm <(find...|sort...) <(find...|sort...) | tr -d '\t'
    

    This assumes the file names do not include embedded tabs otherwise replace the tr with your favorite code to strip leading white space, eg:

    comm <(find...|sort...) <(find...|sort...) | sed 's/^[[:space:]]*//'
    

    Demo ...

    $ cat file1
    a.txt
    b.txt
    
    $ cat file2
    b.txt
    c.txt
    
    $ comm file1 file2
    a.txt
                    b.txt
            c.txt
    
    # 2x tabs (\t) before 'b.txt' (3rd column), 1x tab (\t) before 'c.txt' (2nd column):
    
    $ comm file1 file2 | od -c
    0000000   a   .   t   x   t  \n  \t  \t   b   .   t   x   t  \n  \t   c
    0000020   .   t   x   t  \n
    
    # OP's scenario:
    
    $ comm -3 file1 file2
    a.txt
            c.txt
    
    # 1x tab (\t) before 'c.txt' (2nd column):
    
    $ comm -3 file1 file2 | od -c
    0000000   a   .   t   x   t  \n  \t   c   .   t   x   t  \n
    

    Removing the leading tabs:

    $ comm --output-delimiter="" -3 file1 file2
    a.txt
    c.txt
    
    $ comm -3 file1 file2 | tr -d '\t'
    a.txt
    c.txt
    
    $ comm -3 file1 file2 | sed 's/^[[:space:]]*//'
    a.txt
    c.txt