Git versus Mercurial for tracking large file systems

I'm looking to find a way to retain, access, and analyze files which come in over time. One part of a solution to this may be using a free, efficient file version control system (VCS) like Mercurial or Git.

One way to track file updates is to save multiple copies of a file for a limited time. However, itt should be more more space-efficient to have a VCS merely save diffs in the files. A VCS would also aid in tracking exactly when a certain change was made.

Mercurial came up as one candidate VCS for consideration. I used to use Mercurial a couple years ago, and it's a fine VCS, known for being fast (hence the name Mercurial). But over the past year I've been pretty happy using Git.

Git was started by Linus Torvalds to replace BitKeeper in Linux kernel development, but has lately expanded to use by the Fedora, Perl, Gnome and Google Android projects. It started out as an obtuse "file database", but now has nice VCS "porcelain" and a number of good GUIs on Unix and Windows. It also has a built-in "instaweb" that makes browsing repositories easy for casual users.

I've seen data indicating Git repositories are about half the size of comparable Mercurial repositories, and some operations (ex. diff) can be an order of magnitude faster in Git. A lot of this may be dependent on the nature of the repository, operation, and degree of tweaking, running tests on actual data sets would be useful.

There are other things to consider. Git, for example, has "hooks" which are useful for triggering actions in build scripts etc. I imagine Mercurial has something similar. How are things like documentation? I saw Amazon had three Git-specific books, and one for Mercurial (which I've read and is quite good).

The key here is using a VCS to monitor changes over time in a large file tree -- not for version control in the application development sense. I'll try to post updates on this as things progress.

Categories: 

There is 1 Comment