mercurialcvsrcs

Script to adjust history in an RCS/CVS ,v file


In preparation for a migration to Mercurial, I would like to make some systematic changes to many thousands of ,v files. (I'll be editing copies of the originals, I hasten to add.)

Examples of the sorts of changes I'm after:

  1. For each revision whose message begins with some text that indicates a known username (e.g. [Fred Bloggs]), if the username in the comment matches the Author in the ,v file, then delete the unnecessary username text from the commit message
  2. If the ,v contains a useful description, append it to the commit message for revision 1.1 (cvs2hg ignores the description - but lots of our CVS files actually came from RCS, where it was easy to put the initial commit message into the description field by mistake)
  3. For edits made from certain shared user accounts, adjust the author, depending on the contents of the commit message.

Things I've considered:

  1. Running 'cvs log' on each individual ,v file - parsing the output, and using rcs -m to change this history. Problems with this include:
    • there doesn't seem to be a way to pass a text file to rcs -m - so if the revision message contained singled and/or or double quotes, or spanned multiple lines, it would be quite a challenge quoting it correctly in the script
    • I can't see an rcs or cvs facility to change the author name associated with a revision
    • less importantly, it would be likely to start a huge number of processes - which I think could get slow
  2. Writing Python to parse the ,v file, and adjust the contents. Problems with this include:
    • we have a mixture of line-endings in our ,v files - including some binary files that should have been text, and vice-versa - so great care would be needed to not corrupt the files
    • care would be needed for quoting of the @ character in any commit messages, if it fell on the start of the line in a multi-line comment
    • care would also be needed on revisions where the last line of the committed file was changed, and doesn't have a newline - meaning that the ,v has a @ at the very end of a line, instead of being preceded by \n
  3. Clone the version of cvs2hg that we are using, and try to adjust its code to make the desired edits in-place

Are there any other approaches that would be less work, or any existing code that implements this kind of functionality?


Solution

  • Your first approach may be the best one. I know that in Perl, handling quotation marks and multiple lines wouldn't be a problem. For example:

    my $revision = ...;
    my $log_message = ...;
    system('rcs', "-m$revision:$log_message", $filename);
    

    where $log_message can contain any arbitrary text. Since the string doesn't go through the shell, newlines and other metacharacters won't be reinterpreted. I'm sure you can do the same thing in Python.

    (As for your second approach, I wouldn't expect line endings to be a problem. If you have Unix-style \n endings and Windows-style \r\n endings, you can just treat the trailing \r as part of the line, and everything should stay consistent. I'm making some assumptions here about the layout of ,v files.)