Distributed Software Development With Git

Real Men do it Themselves

There is stuff that changes the way you work. Then, there is stuff that changes the way you think.

When Donald E. Knuth wanted to write a series of books about The Art of Computer Programming, he found himself missing a program to convert his words into a beautiful book. To solve that problem, he invented TeX. When there were no nice fonts around, he added METAFONT. In a similar way, when Linus Torvalds found himself lacking a good version control system (VCS) after Bitkeeper decided to close access for OSS developers, he chose the only solution he had: He wrote his own.

And thus, Git was born and a lot of people living in abuse-protected web forums were in deep trouble. Even before them, the critics soared: What, another VCS?

Subversion vs. Git

Especially the people around Subversion were not so pleased and many people wondered why Linus chose to do his own thing instead of building on existing code. One of the reasons is that Subversion can be thought as a very elaborate bug fix for CVS. It didn’t try to reinvent the wheel.

It also inherits some legacy: You have to setup a central server if you want to do distributed development outside of your LAN. Certain operations are slow, like checkout and update. Agreed, they are faster than CVS but try these with Git. And it’s monolithic software unless you’re willing to use your C compiler. There are only very few ways to interact with the repository from a shell script, only a few hooks to do custom stuff (like sending email). If you just wanted to add a small feature, it would mean real programming work instead of whipping together a quick shell script.

I’m by no means a critic of Subversion; I’m using it every day and I’m happy with it. My point is that it’s confining me in a pretty small box, just a little bit larger than CVS and with less problems. That doesn’t make it larger, though. An example.

Oh, the Pain

You have some files which you want to take home to work on. So you copy them on an USB drive, take them home, edit them, bring them back. When you return to work, a co-worker has changed one of the files. He tells you after you copied all the files from your USB drive back onto your work PC (“Who has time to read all those warnings? Yes to All!”)

The next day, you’re smarter and check in the files into Subversion (SVN). There is no need for a central server and when you ignore the warnings from SVN, you can create the repository on a network drive. When the drive fails in an inopportune moment, your repository will be data trash, but there are certain risks one has to take.

You checkout a copy on your USB drive and take that home. Since working on the file from your USB stick is too slow, you copy everything on your home PC and edit it there. When copying the files back on the USB stick, you notice a lot of write-protected files in .svn directories. Oh well, time for “Yes to All!” again.

After returning to work, you synchronize your checkout with the SVN repository. Life is great. Unless you have Linux at home and were not so careful about Carriage Return/Line Feed conversion and you find the copy of your data on the USB drive is now currupt. But who is using Linux anyway?

The real trouble starts when you feel the need to carry the repository with you. Imagine you have a great idea, you have the USB drive with you, but you’re neither at work nor at home. If you have a computer closeby, you could work on the copy on the USB drive but at the cost of either getting out of sync with your home or work copy.

Subversion, like CVS, only supports a single, central repository unless you use tools like SVK. SVK depends on Perl, though, and it adds nice little … err … rather big cryptic code strings to the commit in messages.

Git

Git, on the other hand, has been built on the “greenfield”. Torvalds could add all the features he wanted and avoid all the common mistakes inherited by the CVS legacy. From a 1000 feet, it’s a set of loosely coupled commands which work on an object database which allows to version objects. Git doesn’t care what an object is, it just versions it. This is pretty similar to SVN, maybe except that Git handles large files better. And that Git is faster for most operations.

The main difference between Git and SVN is that Git is decentralized. This means you can create as many repositories as you want and synchronize them. So in the example above, you can have one repository at work, one on your USB drive and one at home. You can work on all three of them independently and then use Git to figure out how to merge everything.

Remember the dreaded branches from CVS? SVN eased the pain considerably but with Git, everything is a branch to begin with.

To become happy with Git, there are two major steps you need to take. First, you must understand that there is no server. Forget about the idea of server. Git allows to synchronize different copies of a couple of files in different places without a server. To do this effectively, Git keeps some information in the .git directory. If you want do to this remotely, you can use Git as a server, too, but that is basically the same thing as using it locally. Except that the name of a different computer is involved.

The second step is that you don’t put more than one project into one Git repository. With CVS, we are used to use modules. With Subversion, you create subtrees with trunk and branch. With Git, you have one repository per project. Setting up a repository is so cheap, it really doesn’t make sense to have more than one project in it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s