svnsvnadminsvndump

svnadmin dump of text only, without binary files


It is possible to filter the SVN dump, generated by svndamin dump, so it will not included encoded binary data, just the text deltas and data?

I want to have a dump of an existing large SVN repositories, but only of the code. I have no interest in the stored binaries. However, binary files will make the dump file unnecessarily large. How can I generate the dump and exclude binary content?

Tried and failed, already:

  1. It is not practical to process the svn log diffs. It is a large and old repository, and getting diffs only for a short time period takes a lot of time and often gets stuck.
  2. The binary files are scattered all over, and not stored under a single known path, so I cannot use svndumpfilter to exclude them - Unless there is some way to use this filter with regular expressions, e.g. *.jar.

Solution

  • svndumpfilter is part of any Subversion installation

    svndumpfilter exclude — Filter out nodes with given prefixes from the dump stream.

    Beginning in Subversion 1.7, svndumpfilter can optionally treat the PATH_PREFIXs not merely as explicit substrings, but as file patterns instead.

    $ svndumpfilter exclude --pattern "*.OLD" < dumpfile > filtered-dumpfile
    Excluding prefix patterns:
       '/*.OLD'