Adapting svn:externals usage for move to Mercurial

We've got in a corporate environment an svn repository structure which looks like this:

root
  libs
    shared_lib1
    shared_lib2
    private_lib
  public_code
  private_code

where public_code is an external repository which is open source and where people from outside the company have read-write-access. shared_lib1 and shared_lib2 are also external repositories shared with a different group of programmers from an other company. I'm the maintainer and can do basically whatever is technically best, the outside users will have to adapt.

I'm now wondering what the best way is to move from this structure to a mercurial repository.

1) I could closely simulate the old setup using mercurial subrepositories. OR
2) I could make one big repo for us and three new smaller, separate repositories for the external partners (so basically forking projects) and exchange changesets between the big one and the separate ones.

With setup 1) in svn, branching is a nightmare because I by policy always have to branch public_code, shared_lib1 and shared_lib2 when I branch root. For this I have to call svn branch four times and modify svn:externals properties by hand three times. Can I easily branch the main repo in mercurial and get automatically new branches for all sub-repositories?

When I do setup 2), the file system will be different between repos. E.g. I will have public_code/Makefile in repo "root" but the file will be just "Makefile" in repo "public_code". Will Mercurial still be able to synchronize changes between the repos? How could the workflow look like?

Solution

With setup 1) in SVN, branching is a nightmare because I by policy always have to branch public_code, shared_lib1 and shared_lib2 when I branch root. For this I have to call svn branch four times and modify svn:externals properties by hand three times. Can I easily branch the main repo in Mercurial and get automatically new branches for all sub-repositories?

No, subrepositories don't work like that. Named branches in the top-level repository will not propagate to subrepositories automatically. If you make a 1.x branch in your code, then it's not clear that shared_lib1 should also have 1.x branch. In fact, it probably shouldn't branch at the same time the top-level code branches, especially if the library is being used by several different top-level projects.

When I do setup 2), the file system will be different between repos. E.g. I will have public_code/Makefile in repo root but the file will be just Makefile in repo public_code. Will Mercurial still be able to synchronize changes between the repos? How could the workflow look like?

No, you cannot push and pull between the repositories if you create them like that. You can only push/pull between repositories when they originate from the same "mother" repository. Here it sounds like you'll create three unrelated repositories.

In a situation like this, you should evaluate carefully why you have svn:externals in Subversion and how they map to Mercurial subrepositories. They are not a 1–1 replacement for svn:externals. You should also look into tool support for subrepos — both in Mercurial itself and your Mercurial hosting, your continues build system, etc. I wrote part of the Mercurial subrepo code and as of Mercurial 2.0, there are still some sharp edges here and there.

In a nutshell, what subrepositories give you is a very tight coupling between subsystems. This is normally something to be avoided :-) We try hard to make our software systems loosely coupled since that gives us flexibility.

The main use case for subrepositories is a "build repository" where you track the precise versions of your components that you used in a given build. You cannot ask Mercurial to track the tip of a given branch in a subrepo, it will always track a given changeset in a given repository. This is what makes it possible to re-create a given checkout later: the .hgsubstate file tracks the precise changesets that were checked out in each subrepo.

So if your root repository is not used for development, but only for building releases, then subrepositories can in fact work great for you. The workflow will be something like

$ cd root
$ cd libs/shared_lib1
$ hg pull
$ hg update 2.0
$ cd ../..
$ make test && hg commit -m "Updated to sharedlib1 2.0"
$ hg tag 2.3

You then release version 2.3 of your software and Mercurial knows that it depends on version 2.0 of shared_lib1. You'll do this once in a while when the people responsible for the subcomponents tell you that they have a new release ready for you. Your CI server can of course do this nightly to see if the components work together!

Subrepositories work less well if the developers are working in root directly and if they make changes to the subcomponents as part of their work in root. That is an indication of a too tight coupling between the components: if main code depends on an exact changeset of a subcomponent, well then the subcomponent ought to be directly in the main code. Also, hg commit in the top-level repository will recurse and use the same commit message in the subrepos when ui.commitsubrepos=True. (The default was changed to False in Mercurial 2.0.) This is often not desired and when it does make sense, well then the subrepo to very tightly coupled and should be a part of the top-level repo.

So, to sum up: use subrepos if root is a "build repository". Otherwise, you should either inline the components in the top-level repository, or you should couple the pieces together more loosely by using something like Maven or similar to manage dependencies. These tools will normally let you say "please the latest version of root and all its dependencies" and then you can make a formal release when you're happy with the tests. These "snapshot" builds cannot be precisely reproduced, but that's also not needed — only the final releases need the strict and precise dependency tracking.