I have a project that has more than 3 years of history in the svn repository. It was migrated to git, but the guy who did this, just take the last version and throw out all these 3 years of history.
Now the project has the last 3-4 months of history in one repository, and I've imported the other 3 years of svn history into a new git repository.
Is there some way to connect the root commit of the second repository into the last commit of the first one?
It is something like this:
* 2017-04-21 - last commit on master
|
* 2017-03-20 - merge branch Y into master
|\
| * 2017-03-19 - commit on branch Y
| |
* | 2017-03-18 - merge branch X into master
/| * 2017-02-17 - commit on another new branch Y
* |/ 2017-02-16 - commit on branch X
| * 2017-02-15 - commit on master branch
* | 2017-01-14 - commit on new branch X
\|
* 2017-01-13 - first commit on new repository
|
* 2017-01-12 - init new git project with the last version of the code in svn repository
.
.
There is no relationship between the two different repositories yet, this is what I wanna
do. I want to connect the root commit of 2nd repository with the last commit of the first
one.
.
.
* 2017-01-09 - commit
|
* 2017-01-08 - commit
|
* 2017-01-07 - merge
/|
* | 2016-01-06 - 2nd commit the other branch
| * 2016-01-05 - commit on trunk
* | 2016-01-04 - commit on new branch
\|
* 2015-01-03 - first commit
|
* 2015-01-02 - beggining of the project
Update:
I just learn that I need to do a The answer was to use git rebase
, but how? Please, let's consider the commit dates like it was the SHA-1 codes...git filter-branch
with --parent-filter
option, not a git rebase
.
Update 2:
I tried the command git filter-branch --parent-filter 'test $GIT_COMMIT = 443aec8880e898710796a1c4fb4decea1ca5ff66 && echo "-p 98e2b95e07b84ad1e40c3231e66840ea910e9d66" || cat' HEAD
and it didn't work:
PS D:\git\rebase-test\rep2cc> git filter-branch --parent-filter 'test $GIT_COMMIT = 443aec8880e898710796a1c4fb4decea1ca5ff66 && echo "-p 98e2b95e07b84ad1e40c3231e66840ea910e9d66" || cat' HEAD
fatal: ambiguous argument '98e2b95e07b84ad1e40c3231e66840ea910e9d66 || cat': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Update 3:
It didn't work on Windows CMD or PowerShell, but it did work in Git Bash on windows.
First things first: you need a single repo that has all the available history.
Make a clone of the repo with the recent history. Add the repo with the old history as a remote. I recommend this clone be a "mirror" and that you finish by replacing your origin repo with this one. But alternately you can leave --mirror
off, and you'll finish by pushing (possibly force-pushing depending on which approach you use) all refs back to origin.
git clone --mirror url/of/current/repo
cd repo
git remote add history url/of/historical/repo
git fetch history
The next thing you need to do is figure out where you'll be splicing the history. The terminology to describe this is a bit fuzzy I think... what you want is to find the two commits that correspond to the most recent SVN revision for which both histories have a commit. For example your SVN repo contained versions 1, 2, 3, and 4. Now you have
Recent-History Repo
C --- D --- E --- F <--(master)
Old-History Repo
A --- B --- C' --- D'
where A
represents version 1, B
represents version 2, C
and C'
represent version 3, and D
and D'
represent version 4. E
and F
are work created after the original migration. So you want to splice the commits whose parent is D
(E
in this example) onto D'
.
Now, I can think of two approaches, each with pros and cons.
Rewriting The Recent History
IMO the best way if you can coordinate a cut-over of all developers to a new repo (meaning you arrange a time when they all agree that all outstanding work is pushed, so they discard their clones; then you do the conversion; then they all re-clone) is to (effectively) rebase the recent history onto the old history.
If there is really just a single branch, then you can literally use rebase
git rebase --onto D' D master
(where D
and D'
are replaced with the SHA ID of the commits).
More likely you have some branches and merges in the recent history; in that case a rebase operation will start becoming a problem very quickly. On the other hand, you can take advantage of the fact that D
has the same tree as D'
-- so a rebase and a re-parent are more or less equivalent.
So you can use git filter-branch
with a --parent-filter
to do the rewrite. Based on the examples in the docs at https://git-scm.com/docs/git-filter-branch you would do something like
git filter-branch --parent-filter 'test $GIT_COMMIT = D && echo "-p D'" || cat' HEAD
(where again D
and D'
are replaced with the SHA ID of the commits).
This creates "backup" refs that you'll need to clean up. In the end you'll get
A --- B --- C' --- D' --- E' --- F' <--(master)
It's the fact that F
was replace by F'
which creates the need for a hard cut-over (more or less).
Now if you made a mirror clone back at step 1, you can consider wiping the reflog, dropping the remotes, and running gc
, and then this is a new ready-to-use origin repo.
If you made a regular clone, then you'll need to push -f
all the refs to the origin, and this will likely leave behind some clutter on the origin repo.
Using a "replacement commit"
The other option doesn't create a hard cut-over, but it leaves you with small headaches to deal with forever. You can use git replace
. In your combined repo
git replace `D` `D'`
By default, when generating log output or whatever, if git finds D
, it will substitute D'
(and its history) in the output.
There are some known glitches. There may be unknown glitches. And by default the "replacement refs" that make this all work aren't shared, so you have to push and fetch them deliberately.