Is there any way to split a git bundle file? In say, repo.bundle1 and repo.bundle2, each containing half of the repo. The portable bundle is too large in size for transfer.
How else could I approach this assuming the maximum size allowed for transfer cannot be altered.
Bundles can be incremental.
They can't have dangling commits, so there is a bit of a game you have to play if you want to incrementally bundle an existing branch.
They have to be applied "in order" so that as you apply a bundle, its root commits' parents are available to latch onto. (There may be a way to get around this with shallow repos, but if you're trying to ultimately reconstruct the entire repo then you won't want to worry about that.)
And of course if any single commit is too large (e.g. due to commit of a very large file) that will be a problem.
Say you have
x -- x -- x <--(branch1)
/
A -- B -- C -- D -- E -- F -- G -- H -- I -- J <--(master)
\ /
o -- o <--(branch2)
And say you want to break this into bundles of no more than 3 commits. So let's start at the root. We're going to progressively move the master
branch, so let's keep track of its current position.
git checkout master
git tag real_master
Now we look up the SHA ID for C
(or find some other name that refers to C
, such as in this case master~7
) and then
git reset --hard master~7
Note that I'm using hard resets; that's probably not necessary, but I'm making the assumption that you can do this from a repo with a clean work tree, and in that case doing hard resets keeps everything in nice, simple states (as I see it, anyway).
We're ready to create our first bundle
git bundle create 0.bundle master
This bundle includes B
, which is the root for branch1
, so we can bundle up branch1
now.
git bundle create 1.bundle master..branch1
This is equivalent to
git bundle create 1.bundle ^master branch1
Either way, we're saying to assume that the receiving repo already has the ocmmits reachable from master
, so only the x
commits will be placed in this bundle.
It might seem like D
, E
, F
is the next logical step; but F
depends on the o
commits in brnach2
. So really the next logical thing would be to bundle branch2
along with D
. Since we still have master
at C
we can say
git bundle create 2.bundle master..branch2
Now we need to move master
to G
so that we can bundle E
, F
, and G
. Make sure we're on master
and
git reset --hard real_master~3
git bundle create 3.bundle ^branch2 ^master~3 master
Here I'm noting that both older mainline history and branch2
history are reachable from master
(by way of the merge at F
), but since they're both already bundled I exclude both of them.
Finally,
git reset --hard real_master
git tag -d real_master
git bundle create 4.bundle master~3..master
In practice you probably would use more than 3 commits per bundle. If you have a side-branch that's too big on its own, you can break it up using the same technique we used to segment master
in this example.
Now you can transfer these independently, and fetch (or pull) from them in order to reconstruct the repo on the other end.
UPATES
Two additional notes:
First, as compared to ElpieKay's suggestion to use dd
and cat
, the above approach has pros and cons.
It only relies on git itself (though the utilities needed for the dd
/cat
approach typically ship with git).
The individual bundle files are each useful by themselves, whereas if you segment the file with dd
you have to reconstruct all the parts to be sure you have a usable bundle. This also means you could save the bundles and combine them with additional bundles you create later (as more changes happen); but that would only matter if you need to create another new remote repo from scratch at that point.
Actually just shipping incremental changes back and forth, where both sides already have a common baseline of commits, is the basic use case for bundles. So you might decide to use the dd
/cat
approach to initially create the remote repo, then use incremental bundles for subsequent sharing of updates.
The biggest advantages of the dd
/cat
approach is that it's very rote / scriptable (i.e. simple assuming the tools are on hand), whereas you have to think about how to partition up the commits for the above approach; and also the dd
approach can split a single, obnoxiously large commit if it turns out to be one.
I also forgot to mention initially, that you can list multiple branches to be included in a bundle. So for example if your threshold were more like 8 commits per bundles, you could
1) Move master
to E
2) bundle master
and branch1
as 0.bundle
3) Move master
back to J
4) bundle master
excluding master~5
as 1.bundle
and be done.