shelldiffutility

diff: how to use '--ignore-matching-lines' option


I have two files:

$ cat xx
aaa
bbb
ccc
ddd
eee

$ cat zz
aaa
bbb
ccc
#ddd
eee

I want to diff them, while ignoring comments.

I tried all possible permutations, but nothing works:

diff --ignore-matching-lines='#' -u xx zz
diff --ignore-matching-lines='#.*' -u xx zz
diff --ignore-matching-lines='^#.*' -u xx zz

how can I diff two files, while ignoring given regex, such as anything starting with # ?


Solution

  • That not how the -I option in diff works, see this Giles's comment on Unix.SE and also on the man page - 1.4 Suppressing Differences Whose Lines All Match a Regular Expression

    In short, the -I option works, if all the differences (insertions/deletions or changes) between the files match the RE defined. In your case, the diff between your two files, as seen in the output

    diff f1 f2
    4c4
    < ddd
    ---
    > #ddd
    

    i.e. 4th line change in both the files, ddd and #ddd are the "hunks" as defined in the man page, together don't match any of your REs #, #.* or ^#.*. So when such an indifference exists, the action will be to print both the matching and the non-matching lines. Quoting the manual,

    for each nonignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones.

    The same would have worked better, if the file f1 did not contain the line ddd, i.e.

    f1

    aaa
    bbb
    ccc
    eee
    

    f2

    aaa
    bbb
    ccc
    #ddd
    eee
    

    where doing

    diff f1 f2
    3a4
    > #ddd
    

    would result in just one "hunk", #ddd which can be marked for ignoring with a pattern like ^# i.e. ignore any lines starting with a #, as you can see will produce the desired output (no lines)

    diff -u -I '^#' f1 f2 
    

    So given your input contains the uncommented line ddd in f1, it will be not straightforward to define an RE to match a commented and an uncommented line. But diff does support including multiple -I flags as

    diff -I '^#' -I 'ddd' f1 f2
    

    but that cannot be valid, as you cannot know the exclude pattern beforehand to include in the ignore pattern.

    As a workaround, you can simply ignore lines starting with # on either of the files, before passing it to diff i.e.

    diff <(grep -v '^#' f1) <(grep -v '^#' f2)
    4d3
    < ddd