A few basic version control questions

... which I didn't feel like splitting into several question posts, since I guess, them being basic, most people here will know how to answer.

I've been developing for several years now, and I've never had the time to learn about version control. Renaming directories with different version names always seemed enough. Now, I've finally decided to learn it, but some basic terminology and working principles still confuse me.

My projects are relatively small, up to 10 or so files (although the files are relatively big), and are done in non-OO way. I often take one approach, do it to some point, then decide that that will not do, then (reusing old code) completely rewrite the whole project with completely different file organization, and internal code organization. File dissapearing and new files appearing between those "versions" are not uncommon.

So here are my "confusions":
1) for example, as described, I've putted the first version into vc. Then I delete all files, and rewrite them anew. If I understood well, that would be a new "branch", right ?
2) if I continue developing that version, I would just keep commiting to that "branch" ?
3) does vc save, when saving a save-point in a branch, all files in, or does it just save the difference between them?
4) can I easily get all (whole project) files from a certain save-point in a branch, or do I have to follow it up through diff's until the beginning ? (I just want to be able to say, "here, this is the save-point - copy all files you need so it looks like this")
5) what does it mean, in very simple-terms, to "push" and "pull". I don't understand the difference between "push"/"pull" and "commit".

If it matters, I'm using VS 2008, and am thinking of using git-extensions, for I've heard nice stuff about it. Is it a good combination, and would using SVN (for example VisualSvn or Ankh) be a better option for me, considering the above?

-- with regards, Peter

Solution

Part of your confusion probably comes from different version control systems (VCSs) that use terminology differently.

I usually think of code in "lines". I start with the original version of a file, and save it to my version control system. Putting it into the VCS is called a "check-in". The version control system tags it with some number, such as revision 1.0. Now I compile my software. It breaks, so I have to edit it. To do that, I "check it out" of the version control system and edit it. Now that it's fixed, I check it back in and the version control system stores it as revision 1.1. My boss wants a new feature, so I check it out, edit it, and check it back in again, and it's stored as revision 1.2.

That's the "main line" or "trunk" of code.

A version control system will let you get any old version of a file by specifying the revision number. Let's say I get a bug report from software based on revision 1.1. I can use "diff" or any comparison tool to compare 1.1 with 1.0 and see what changed. It doesn't matter how the version control system stores it internally, I just ask for it by revision number and I get the whole file.

The next thing to understand is that a group of files makes up your project or solution. When you're going to compile your software to release it to the world, you want to associate a "label" with all of those files so you can treat them all as a group. Most people use a numeric label, such as Windows 3.0, Windows 3.51, etc., but that's just convention. You could label a version "hardy heron" or "gutsy gibbon" if you want.

Now, this is all fine if you're one guy who just keeps updating things as you go along. But let's say you keep working on your software, and release version 7, then 8, then 9, and now you're working on version 10. But today you get a serious bug report on version 7 that you just have to fix. So you go to your VCS and request all the source files with the label "version 7". You get those into a separate folder on your disk, and fix the bug. But when you go to check those files in, you need them to be a part of version 7 because you've already added features in versions 8 and 9. This is when you create a "branch".

An example might be clearer. Let's say you checked out "version 7" of the package, and the file to fix was at revision 1.23. In version 10 (which you're working on in a different folder) you're working with revision 1.40. You don't want the changes for version 7 to go into 1.41, because that would overwrite and destroy all the neat features you added in revisions 1.24 thru 1.40. So you create a branch, and check in your changed file as revision 1.23.0.1. You compile it, and now the bug is fixed. And now you have to release it to your customers. When you release, you create a new label. I'd label this something like "version 7.1" so that I could tell the difference between the broken software and the fixed software. And I'd know that it didn't have all the features of versions 8+.

If you plot those software versions on a line, you'd think of a number line going straight from 1 to 10. Where does 7.1 fit on to this line? It sticks out the side, like a branch sticks out from the trunk of a tree. That's where we get the names of "branch" and "trunk" from.