I’m a source-control kind of guy. Anyone that knows me would assume that I’d always insist on a source-control tool of some kind, even for my own “solo” work.
But they’d be wrong – I’ve only just found one I’m happy with, and in the meantime I’ve gone several years without any source-control tool. And frankly, I’ve always been a bit perplexed at how everyone else seems to get along with these tools.
Sure, in the past I’ve worked on teams using PVCS or ClearCase, and before that PANVALET on mainframes (and some other mainframe tool whose name I can’t even remember). I’ve had the odd encounter with CVS, Subversion and Perforce. And when I started setting up my own development environment environment a few years back, source-control was one of the first things I looked at (together with overall directory structures, backup, and security).
But at that time I wasn’t happy with any of the tools I found. Everyone else seemed to be using CVS, but the more I learnt about it the more of a ridiculous nightmare it seemed. I looked at Subversion and Perforce and a few others, but at the time they all seemed far too awkward, limited and problematic to suit my needs – just far more trouble than would be worth. The more expensive tools were beyond my budget (and in any case, given past experiences, I kind of expected them to be worse rather than better).
I think at least part of the problem was that these tools tend to address a broad but ill-defined set of loosely-related issues. It’s as if everybody knows what such source-control tools are supposed to do (unfortunately, often based on CVS, which just seems insane), but this isn’t based on any clear definition of exactly what needs such a tool should and shouldn’t be trying to address. Then each specific tool has its own particular flaws in conception, architecture and implementation. Throw non-standard services, storage mechanisms and networking protocols into the mix, and you end up having to deal with a huge pile of complications and restrictions just to get one or two key benefits.
As an aside, the Google “Tech Talk” video Linus Torvalds on git has plenty of scathing comments about these traditional source-control tools and why they aren’t the answer. If you want some more examples of people who aren’t enjoying their source-control tools, there are also some great comments on the “Coding Horror” article Software Branching and Parallel Universes.
In the end, it looked both simpler and safer for me to live without a source-control tool. That’s heresy in civilized software engineering circles, even for a one-man project. But it has worked fine for me up until now.
In the absence of a source-control tool, I’ve maintained separate and complete copies of each version of each project, and done any merging of code between them manually (or at least, using separate tools). This loses out on the locking, merging and history tracking/recreation that a source-control tool could provide, but to date that hasn’t been of any consequence (and can partly be addressed by other means, e.g. short-term history tracking by my IDE, use of “diff” tools against old backups etc). In return I’ve not had to deal with any of the overheads, complexity or risks of any of these tools, nor had to fit the rest of my environment and procedures around them.
Don’t get me wrong: on a larger team, or more complex projects, some kind of source-control tool would normally be absolutely essential, however problematic and burdensome. But I am not a larger team, and so far it hasn’t been worth my while to shoulder such burdens.
Anyway, I revisit this subject every now and then, to see if the tools have reached the point where any are good enough to meet my needs (and so that I have a rough idea of what to do if I suddenly do need a source control tool after all).
And this time around, at last, everything seems to have changed…
This time, the world suddenly seems full of “distributed” (or perhaps more accurately, “decentralized”) source-control tools. Despite initially fearing that things had just got a whole lot more complicated, these tools have actually turned out to be exactly what I’ve been looking for all this time.
I’m not going to try and explain distributed source-control tools here, but for some general background, see (for example):
- Kyle Cordes’ talk notes for “A Brief Introduction to Distributed Version Control”.
- Intro to Distributed Version Control (Illustrated) at betterexplained.com.
- The Google Tech Talk video Linus Torvalds on git.
- Understanding Mercurial (basic concepts of Mercurial and distributed source-control in general).
- Distributed Revision Control Systems: Git vs. Mercurial vs. SVN on Russell Beattie’s Weblog.
- DVCS Mini Roundup (summary and comparison of the currently-available tools).
Of the currently-available distributed source-control tools, a quick look round suggested that Mercurial might be best for me, and some brief exploration and experimentation with it completely won me over.
At last, a souce-control tool that I’m happy with!
Mercurial gives me precisely the benefits I’m looking for from a source-control tool – in particular, history tracking/recreation and good support for branching and merging. It’s flexible enough to let me add these facilities into my existing development environment and directory structures without otherwise impacting them (even though this isn’t how most teams would normally use it), it doesn’t need any significant adminstration, and it seems simple and reliable.
In addition, Sun has chosen it for the OpenJDK project (as stated, for example, in Mark Reinhold’s blog), and Mozilla is adopting it too (as described in Version Control System Shootout Redux Redux), so I can feel reasonably confident it’ll be around and supported for a while.
Some of the particular things I like about Mercurial are:
- It all seems simple and reasonably intuitive, and everything “just works”.
- Branching and tagging, and more importantly merging, all look relatively simple, safe, and effective.
- Its overall approach makes it very flexible. I especially like the way the internal Mercurial data is held in a single directory structure in the root of the relevant set of files. This keeps it together with the files themselves, with no separate central repository that everything depends on, whilst also not scattering lots of messy extra directories into the “real” directories. It was easy to see how this could be fitted into my existing directory structures, backup, working practices etc without any significant impact or risk, and without other tools and scripts needing to be aware of it. At the same time I don’t feel it ties me down to any one particular structure, and I can see how it could readily accommodate much larger teams or more complex situations.
- Although this is entirely subjective, it feels rock solid and safe. Retrieving old versions and moving backwards and forwards between versions works quickly and reliably, with no fuss or bother. The documentation’s coverage of its internal architecture and how this has been designed for safety (e.g. writing is “append only” and carried out in an order that ensures “atomic” operation, use of checksums for integrity checks etc) gives me good confidence that corruptions or irretrievable files should be very rare. For extra safety I can still keep my existing directories in place (holding the current “tip” of each version), so that at worst my existing backup regime still covers them even if anything in Mercurial ever gets corrupted.
- The documentation provided by the Distributed revision control with Mercurial open-source book seems excellent. I found it clear and readable enough to act as an introduction, but extensive and detailed enough to work as a reference. I spent a couple of hours reading through the whole thing and felt like this had given me a real understanding of Mercurial and covered everything I might need to know.
- Commits are atomic, and can optionally handle added and deleted files automatically. This means that I can pretty much just carry out the relevant work without regard for Mercurial, then simply commit the whole lot at the end of each task, without having to individually notify Mercurial of each new or deleted file. This removes a lot of the need for integration with IDEs, and a lot of the potential source-control implications of using IDE “refactoring” facilities.
Some of these are intrinsic benefits of distributed source control; some are due to Mercurial being a relatively new solution (and able to build on the best of earlier tools whilst avoiding their mistakes and being free of historical baggage); and some are just down to it being well designed and implemented.
For anyone coming from other tools, some conversion/migration tools are listed at Mercurial’s Repository Conversion page, but of course I haven’t tried any of these myself.
The only weaknesses I’ve encountered so far are:
- Mercurial deals with individual files, and is therefore completely blind to empty directories. The argument seems to be that empty directories aren’t needed and aren’t significant, but I think this is more an artifact of the implementation than anything one would deliberately specify. I don’t think it’s such a tool’s place to decide that empty directories don’t matter. I have directories that exist just to maintain a consistent layout, or as already-named placeholders in readiness for future files. To work around this I’ve had to find all empty directories and give them each a dummy “placeholder” file.
- Although there’s at least one Eclipse plug-in, at least one NetBeans plug-in, and a TortoiseHg project for an MS-Windows shell extension, these seem to be at a very early stage. I’d expect this situation to improve over time, especially for NetBeans (given Sun’s use of Mercurial for OpenJDK). In the meantime this doesn’t have much impact on my own use of Mercurial, as the command-line commands are simple to use and powerful enough to be practical. During normal day-to-day work, my use of Mercurial has generally been limited to a commit of a complete set of changes when ready, plus explicit “rename”s of files where necessary.
- On MS Windows you need to obtain a suitable diff/merge tool separately, as this isn’t built into the Mercurial distribution (but the documentation points you at several suitable tools, and shows how to integrate them into Mercurial – and anyway, I’d rather have the choice than be saddled with one I don’t like, or have a half-baked solution as part of the source-control tool itself).
I’ve now been using Mercurial for a couple of months. Despite my general dislike of all the source-control tools I’d looked at beforehand, I have been very pleased with Mercurial.
If you’re looking for a new source control tool, or have always disliked tools such as CVS, Subversion and Perforce, I’d certainly recommend Mercurial as worth taking a look at.