I am converting a very old and huge CVS repository to Git using cvs2git via Cygwin. It works fine and I started testing the new repository. I found no bigger peculiarities. But I wonder how the timestamps of a commit/change set are determined.
So far I determined, that the timestamps between certain CVS revisions differ by 1 or 2 hours plus x, where x is a time from some seconds or minutes (most cases) up two 15 minutes. Many timestamps only differ by whole hours (x=0).
I guess this has to do something with the "timestamp error correction" I found to be a cvs2svn feature (http://www.mcs.anl.gov/~jacob/cvs2svn/features.html). Maybe it has something to do with time zones, also.
The results of my tests show, that all commits with only one file in the change set differ by whole hours. That supports my "time zone hypothesis". But it also leads me to the question how the timestamp of change sets with multiple files is determined.
I tried to go through the code and found out (with help from Google) that there is a "COMMIT_THRESHOLD" in the config.py of the cvs2svn_lib. It is used for fuzzing the file based commits in the CVS together, I guess. Although the code looks written well, my lack of technical understanding of CVS, SVN and Git revision storage makes it hard for me to understand.
Therefore, I would be grateful if someone could answer the following questions:
Kind regards
Edit:
As someone considered this question as "too broad", I am afraid I did not make my point clear enough. So I would like to give a concrete (while fictional) example:
cvs2git found 3 file changes for one change set. They where committed on the same day (let's say on 30th February 2016). But their times differ:
If it was only file 1, I would think, that cvs2git uses 2016-02-30T12:34:56 as timestamp for the Git commit. But which timestamp is chosen, when the commits for all 3 files belong to one change set?
Related to this, when my repository is converted the times seem to be adjusted by exactly 1 or 2 hours, too. This also happens when there is only one file in the change set. I guess it is some kind of time zone adjustment. So I would like to know, why the "timestamp error correction" changed my timestamps, to check whether I accept these changes or not. I did some statistics on the converted Git repository and the commit times seem ok to me in principle; but that is not enough for me.
You ask two questions:
How are timestamps generated for commits touching multiple files?
For commits that modify files, cvs2svn/cvs2git takes the newest timestamp from among the file-level commits that comprise the commit. However, if that timestamp is earlier than the timestamp of the previous commit or more than one day after the time of conversion, it instead chooses a timestamp one second after that of the previous commit.
For commits that involve branching or tagging (for which CVS doesn't record timestamps at all), the timestamp is set to be one second after the timestamp of the previous commit.
Why are timestamps sometimes off by an integral number of hours?
CVS records timestamps in UTC without recording a timezone, and cvs2svn/cvs2git uses those timestamps as-is without trying to guess a timezone. So the timestamps should be correct, but are expressed in UTC.
git log
has a --date
option that can be used to ask that dates be displayed in the local timezone.
The cvs2svn project file doc/design-notes.txt
documents the algorithms used by cvs2svn/cvs2git in quite some detail.