gitgit-mergeepubepub3

Prioritising Local Changes


I'm scraping pages from a website, munging them, then compiling them into a ebook. I'm using Git for both the code and the HTML content.

I have to make manual edits to some pages, and they're often updated upstream. This leaves me with the problem of how to retain my local edits when the site updates.

For example, I download v1 of page A, I delete an invalid "", and commit my changes; later I download v2 of page A, which has new content, but still features "". I want to merge the new content into my copy of page A, but also apply my local changes.

I suspect I'll need to manually resolve conflicts sometimes, but on the whole this should be automatic.

I've experimented with merge strategies, rebasing, and other approaches to no avail. What am I missing?

EDIT:

To help clarify my problem:

git init
wget -O page.html https://example.com/
git add page.html
git commit -a -m "w0"
git checkout -b ebook
sed -i -e 's/http:/https:/' page.html
git commit -a -m "e1"
git checkout master
git merge ebook
wget -O - https://example.com/ | sed -e 's/may/may not/' > page.html
git commit -a -m w1
git checkout ebook
git merge master

At the end the last local edit is preserved but the first lost. I know I'm doing something stupid, but...


Solution

  • I would maintain a branch that tracks the original web pages only, let's call it web. Every time you download an update, commit it to the web branch. Then you need a ebook branch for your changes. After updating the web branch, merge it into your ebook branch, resolving any conflicts that arise. ebook is initially created as a branch off of the initial web.

    Scenario: Let's assume you started with W0 as the initial state on the web server, then you made local changes in commits E1 and E2. Then the web server was updated to W1, which you merge in to ebook to get E3.

    That would give you a history that looks like this:

    W0 -------- W1    (web branch)
      \           \
       E1 - E2 --- E3   (ebook branch)
    

    When you download the next update to web, W2, you'll get this commit graph, assuming you also had E4 as additional reformatting changes required because of W1:

    W0 -------- W1 -------- W2    (web branch)
      \           \           \
       E1 - E2 --- E3 - E4 --- E5   (ebook branch)
    

    When you merge W2 into E4 to get E5, Git should apply only the changes between W1 and W2 to E4, which should do what you want.

    Note: this process only ever merges from web into ebook, never from ebook into web. Merging from ebook back into web would undo the desired effect, as discussed in the comments below this answer.