gitautomationrhapsody

Automate reverting hunks matching pattern in Git


I have project files with strings that switch between empty and a list of interface names every couple of saves. Some of these files are tens of thousands of lines long, which creates massive diffs that make it difficult to find actual changes. I want to automatically filter out these changes when staging files.

The strings in question are of the format - m_str = "";, where the string is either empty or populated with a list of interface names. Occasionally this string is long enough that it gets broken across multiple lines.

I tried creating a patch by running a python script on the diff, but the patch becomes corrupt if the line numbers change from reverting changes to a multi-line string. I've also tried git diff -G'm_str', which didn't work for me (possibly something with my tools being on Windows).

I may be able to write a program that can automatically run git checkout -p on the file and revert any hunks matching a regex, but that seems unnecessarily difficult.

I've also looked into using a smudge filter, but that still leaves me with the issue of reverting only those lines.

Is there a way to checkout hunks of a file via a script, or otherwise ignore changes matching a certain pattern in Git? If it helps, the offending software is IBM Rational Rhapsody.


Solution

  • It is possible to use git diff -I<regex> to get a patch that only includes hunks whose every line match the regex. Then if we save that patch, reset the repository to before the current changes, then apply that patch via git apply, this effectively removes all hunks that match the regex.

    git diff -I"POT-Creation-Date" > /tmp/filtered.diff
    git checkout .
    git apply /tmp/filtered.diff
    

    This works for automatically reverting hunks matching pattern, but only if the hunks can reliably match the pattern in a single line, and only when every line of applicable hunks match the pattern.

    Unfortunately if the hunk can match the pattern across multiple lines, or if you need to match hunk-wise (ignore hunks that match pattern) and not line-wise (ignore hunks whose lines all match the pattern), this does not work. More work is needed in Git or elsewhere to address that case.