svnsvnadminsvndumpsvn-administraton

Dumping Subversion Repository one revision at a time


We have a big, giant sloppy Subversion repository that contains 60+ projects. The trunk, branches, and tags directories are in the root of the repository. Some branches are done branches/project/branchName. Others are done branches/BranchName/project. There's a lot of cruft.

There are almost 200,000 revisions, 22Gb, and 60+ projects.

I want to restructure the repository, so each project has its own repository, and institute standard branching strategy. Dumping the entire repository takes about 7 to 8 hours, and then filtering out what I want is a very long process since I have to run svndumpfilter multiple times.

I am thinking of a new strategy. If I look at the revisions involved in a single project, we might be talking about 400 revisions. I know I can run svnadmin dump on a range of revisions. What if I dump out just the revisions of the project I'm interested in? I can run svnadmin dump for each revision. I think this might actually be faster. However, how will this affect the load into a new repository?

Is there a problem of simply dumping only the revisions I want?


Solution

  • The first problem that comes to my mind is that you wouldn't be able to load your new dumps straight to the new repos, because those dumps would be missing nodes to create parent folders (trunk/branches/tags, whatever) and svnadmin load command will fail with File not found error. So you will have to create them beforehand, like this: svn mkdir http://server/svn/ProjectX/Trunk -m "Created Trunk"

    On the second thought, there could be all sorts of other issues if commits to your project have cross-references. E.g. you dump revisions from 1000 to 1500 for /branches/ProjectX/branch, but some node in dump will contain Node-copyfrom-rev: 800 and Node-copyfrom-path: /branches/ProjectY/branch headers, because developer just wanted some shared file from that project and used the svn copy command. And here a filtering madness will begin. To mitigate that, you might try to process those dumps with svndumpfilterIN script, that will pull for you missing files from the live repo with svnlook. But beware, that it has it's own bugs (see my answer to this question: SVNDumpFilter changing paths before adding them?).

    On the third thought, if you want separate repos for each project, you'll probably also want to relocate dumped projects to the root folder and that is where things get real messy. For example, almost none of the known to me tools that able to relocate path in dumps such as Svn-DumpReloc, svndumpsanitizer (not sure about svndumptool with merge hack) process svn:mergeinfo properties and that will cause your dump import to fail.

    So, given your constraints, I can't see a solution using partial dumps, that wouldn't require some manual tinkering with repos and dump files afterwards.