gitgit-add

How do I git add only lines matching a pattern?


I'm tracking with git some configuration files. I usually do an interactive git add -p but I'm looking at a way to automatically add all new/modified/deleted lines that match a pattern. Otherwise it's going to take me ages to do all the interactive split and add. git add has a pattern matching for filenames, but I can't find anything about the content.


Solution

  • I cranked out this experimental and poorly tested program in TXR:

    Sample run: first where are we at in the repo:

    $ git diff
    diff --git a/lorem.txt b/lorem.txt
    index d5d20a4..58609a7 100644
    --- a/lorem.txt
    +++ b/lorem.txt
    @@ -2,10 +2,14 @@ Lorem ipsum dolor sit amet,
     consectetur adipiscing elit,
     sed do eiusmod tempor
     incididunt ut labore et dolore
    -magna aliqua. Ut enim ad minim
    +minim
    +minim
     veniam, quis nostrud
     exercitation ullamco laboris
    +maxim
    +maxim
     nisi ut aliquip ex ea commodo
    +minim
     consequat.  Duis aute irure
     dolor in reprehenderit in
     voluptate velit esse cillum
    

    And:

    $ git diff --cached  # nothing staged in the index
    

    The goal is to just commit the lines containing a match for min:

    $ txr addmatch.txr min lorem.txt
    patching file .merge_file_BilTfQ
    

    Now what is the state?

    $ git diff
    diff --git a/lorem.txt b/lorem.txt
    index 7e1b4cb..58609a7 100644
    --- a/lorem.txt
    +++ b/lorem.txt
    @@ -6,6 +6,8 @@ minim
     minim
     veniam, quis nostrud
     exercitation ullamco laboris
    +maxim
    +maxim
     nisi ut aliquip ex ea commodo
     minim
     consequat.  Duis aute irure
    

    And:

    $ git diff --cached
    diff --git a/lorem.txt b/lorem.txt
    index d5d20a4..7e1b4cb 100644
    --- a/lorem.txt
    +++ b/lorem.txt
    @@ -2,10 +2,12 @@ Lorem ipsum dolor sit amet,
     consectetur adipiscing elit,
     sed do eiusmod tempor
     incididunt ut labore et dolore
    -magna aliqua. Ut enim ad minim
    +minim
    +minim
     veniam, quis nostrud
     exercitation ullamco laboris
     nisi ut aliquip ex ea commodo
    +minim
     consequat.  Duis aute irure
     dolor in reprehenderit in
     voluptate velit esse cillum
    

    The matching stuff is in the index, and the nonmatching +maxim lines are still unstaged.

    Code in addmatch.txr:

    @(next :args)
    @(assert)
    @pattern
    @file
    @(bind regex @(regex-compile pattern))
    @(next (open-command `git diff @file`))
    diff @diffjunk
    index @indexjunk
    --- a/@file
    +++ b/@file
    @(collect)
    @@@@ -@bfline,@bflen +@afline,@aflen @@@@@(skip)
    @  (bind (nminus nplus) (0 0))
    @  (collect)
    @    (cases)
     @line
    @      (bind zerocol " ")
    @    (or)
    +@line
    @      (bind zerocol "+")
    @      (require (search-regex line regex))
    @      (do (inc nplus))
    @    (or)
    -@line
    @      (bind zerocol "-")
    @      (require (search-regex line regex))
    @      (do (inc nminus))
    @    (or)
    -@line
    @;;    unmatched - line becomes context line
    @      (bind zerocol " ")
    @    (end)
    @  (until)
    @/[^+\- ]/@(skip)
    @  (end)
    @  (set (bfline bflen afline aflen)
            @[mapcar int-str (list bfline bflen afline aflen)])
    @  (set aflen @(+ bflen nplus (- nminus)))
    @(end)
    @(output :into stripped-diff)
    diff @diffjunk
    index @indexjunk
    --- a/@file
    +++ b/@file
    @  (repeat)
    @@@@ -@bfline,@bflen +@afline,@aflen @@@@
    @    (repeat)
    @zerocol@line
    @    (end)
    @  (end)
    @(end)
    @(next (open-command `git checkout-index --temp @file`))
    @tempname@\t@file
    @(try)
    @  (do
         (with-stream (patch-stream (open-command `patch -p1 @tempname` "w"))
           (put-lines stripped-diff patch-stream)))
    @  (next (open-command `git hash-object -w @tempname`))
    @newsha
    @  (do (sh `git update-index --cacheinfo 100644 @newsha @file`))
    @(catch)
    @  (fail)
    @(finally)
    @  (do
         (ignerr [mapdo remove-path #`@tempname @tempname.orig @tempname.rej`]))
    @(end)
    

    Basically the strategy is:

    If this screws up, we can just do git reset to wipe the index, fix our broken scriptology and try again.

    Just blindly matching through + and - lines has obvious issues. It should work in the case when the patterns match variable names in config files, rather than content. E.g.

    Replacement:

    -CONFIG_VAR=foo
    +CONFIG_VAR=bar
    

    Here, if we match on CONFIG_VAR, then both lines are included. If we match on foo in the right hand side, we break things: we end up with a patch that just subtracts the CONFIG_VAR=foo line!

    Obviously, this could be made clever, taking into account the syntax and semantics of the config file.

    How I would solve this "for real" would be to write a robust config file parser and re-generator (which preserves comments, whitespace and all). Then parse the new and original pristine file to config objects, migrate the matching changes from one object to the other, and generate an updated file to go to the index. No messing around with patches.