“On undoing, fixing, or removing commits in git” is a web page which guides you when Git doesn’t do as you want to.
When working with Git, you’ll eventually realize that an important part is the workflow: How many branches do I need? How do I organize them? How does code flow between them?
Vincent Driessen is sharing his workflow in “A successful Git branching model”
When using git, you really have to define a workflow that you want to use. In his talk “Real World Git Workflow” (slides on slideshare), Stefan Saasen explains some kinds of workflows and when they are appropriate.
The first step in any workflow design is asking questions. Some examples that will sound familiar:
Your questions will be different but make sure you ask them. There is no perfect workflow but there is your workflow. Omit this step at your own peril.
First, which collaboration model do you use?
“Anarchy” (anyone can push anywhere, slide 17), “Gatekeeper” (one person reviews all changes, often used by OSS projects, slide 18), “Dictator and Lieutenants” (Linux, slide 19) or “Centralised” (slide 20). See also “Git Workflows” on atlassian.com/git.
In the enterprise, the centralized model is often used. This approach makes it most simple to integrate all the tools (CI servers, code quality tools, deployment, …). (slide 23)
The two most common branching models are “continuous delivery” and “product releases.” (slide 25)
From slide 28:
“Significant branches map to a concept in the outside world. It may be a past release, an environment or a role. Those branches are long-running and stable whereas feature branches are short lived and volatile.”
– Stefan Saasen
Slide 27 shows the branches for “continuous delivery”. PR is a “pull request.”
As you can see, development is consolidated in the staging branch and pushed into production (master branch) from there. If you make a hotfix in the master branch, it is cherry-picked back into staging.
Slide 30 shows how to handle “product releases.” You have a single, central repository with a master branch which consolidates the development (no staging). Each feature and bugfix happens in a short-lived “feature branch.” When a release is made (slide 31), then a new long living release branch is created. Bug fixes still happen in short lived branches.
When the bug is fixed, the fix is first merged back into the oldest affected release. From there in the next release until you get to the latest release from which it is then merged back into master (slide 33).
Or we use the correct git merge strategy: ours. This creates a new changeset with the merge information without actually merging anything. For git, it will look as if everything has been done and it won’t bother us with merging those changes ever again (slide 39).
While the above sounds reasonable, the question is why? What are the rules and forces which make this better? Stefan introduces “the merge protocol” to answer this. See slide 42 for this.
In a nutshell, you always try to merge more stable branches into less stable ones: Bug fix branch into branches where the bug hasn’t been fixed, yet. Features into branches which don’t have the feature.
That’s why you never merge master back into a release branch: Releases are most stable. You merge them into master. If you have fixed in master that you really need in a release, you cherry-pick them (slide 43).
A lot of people understand why code reviews would be a good thing, “but …” Sounds familiar? Then pull requests are for you.
Pull requests are an easy, low-overhead tool to have as much “code review” as you feel comfortable with. You can merge with by clicking a button or you can review the changes line-by-line. Your choice.
Most projects will use a single canonical repository but remote forks are useful, too. Imagine you have fixed an important (for you at least) bug in a OSS project. You send them a pull request but it’s rejected! What do you do?
You fork the project. git allows you to still track the changes made by the original project while isolating your life as much as you want (slide 52)
A fork is nice if you want to do an innovation spike – code that might never be included in the product. Fork instead of polluting the project history with dead experiments (53).
Some department needs big changes to some component? Fork it until the feature stabilizes. You can still merge them if you want, but you don’t have to (54).
Reduce the noise (55). A fork allows you to rewrite history.
You can use pre and post hooks to make everyone’s life easier. Use a local pre-commit or pre-push hooks to make sure some important tests have been run. For example, you could run FindBugs or checkstyle.
An interesting post-checkout hook would be to check whether the branch is green (66), i.e. code builds and all tests pass. Stop wasting time to search for bugs that were already there before you started your work. You can get this gem from bitly.com/green-builds (69).
The explosion of branches can quickly bog down your build server if you don’t come up with a strategy to handle this (71). Usually, it’s enough to build stable and master but developers will love it when they can manually trigger feature branch builds (72).
At the beginning he reminded us how quickly a well designed system goes bad due to hurried changes. We need to be aware of our technical debt and we need to allocate time to spend on reducing it (slides 3-12).
As an example, car batteries are easy to find. They are a replacement part, designers and engineers make it easy to find. Compare this to the configuration of your project. If you need to change it, how easy is it to find the file that needs to be changed and then the place in the file?
Another important point is skills. In most other professions, you have some mastery of a skill before you use it. You train hundreds of hours before you play your first football game. In Software, we show you a computer, we show you the programming language of the year (not necessarily this year’s). There is no time to master the tools you have to use from day one (slides 13-15).
“We are what we repeatedly do; excellence then, is not an act but a habit.” – Aristotle
Or as Wikipedia defines it:
Stop wondering why you always make the same mistakes – they’re habits. Eliminate them ASAP (slide 19):
Bad Habits – Katherine Murdock “The Psychology of Habit”:
- Recognize bad habits and eliminate them ASAP
- The older you get the more difficult it is to remove a bad habit
- Each repetition leaves its mark!
Turning bad habits into good ones – Dr. Michael Roussell, PhD.:
- You can’t erase a habit, you can only overwrite one.
- Insert the new habits into the current habit loops
Configure your IDE properly and remove bad defaults. Replace “ex.printStackTrace();” with “throw new RuntimeException(ex.getMessage(), ex);” (slides 43-45).
One bad habit is empty catch blocks with “can never happen” comments. If you see one during a code review, replace it with “System.exit(-1);”. It can never happen, right? Right? (slides 46-47).
Note: I have create a “ShouldNotHappenException” for this case 🙂
Another one is to make every method in a static helper class public. Maybe some of them can be package private? (slide 48)
Learn about other good habits. Read books like “Effective Java” (Joshua Bloch) and “Clean Code” (Robert C. Martin) (slide 49)
Learn the keyboard shortcuts of your IDE (slide 78)
Remember (slide 79):
“Any jackass can kick down a barn, but it takes a good carpenter to build one.” – Sam Ryburn
Project lombok and lombok-pg – In a nutshell, these hook into the Java compiler and generate additional bytecode when certain annotations are present. Bored with getters, setters, hashCode() and equals() plus a nice toString()? Use @Data (slides 21-28).
Want to use lambda expressions but can’t upgrade to Java 8? Then lambdaj is for you (slides 34-39).
Use Spock to make tests more compact and easier to understand. (slides 50-52)
JUnitParams will help you run tests with different parameters (57-59).
Need to wait for something during a test? Awaitility will help. (60-61)
When mocking isn’t enough and you need to inject code during a test, Byteman is the tool you want to look at (62-63)
Getting bored writing boiler plate code in Java to make a compiler happy? Have a look at Groovy. (64-67)
How about adding dependencies to your scripts? Try Grape. (68-69)
Is your build a mess? Do you feel Maven is too verbose or too limiting? Gradle might be for you. (70-72)
Atlassian has resources on their website if you want to know more about Git and how to implement your own workflow using it.
Key points from the talk:
Git is one of those tools with a thousand uses. Now, it’ s 1001. Stefan Wehrmeyer has started to put texts of German laws into Git to make it easier to track changes.
Andrew Niefer blogs about Building Eclipse from Git. Unfortunately, he doesn’t explain how to do that if you’re not a committer (i.e. have a user on eclipse.org).
I’m still hoping that one day, it will be possible for people outside the Eclipse team, to be able to build Eclipse projects.
In his last article, Joel talks how DVCS confused him and how he solved the problem. One sentence in particular should be noted:
these systems think in terms of changes, not in terms of versions.
PS: I prefer Mercurial to Git for
1. I need a working DCVS, not a toolbox to build one. I prefer it when a smart guy has given all the hidden issues some thought, so I don’t have to.
2. There is a simple, working Windows installer.
3. It’s written in Python.
There is stuff that changes the way you work. Then, there is stuff that changes the way you think.
When Donald E. Knuth wanted to write a series of books about The Art of Computer Programming, he found himself missing a program to convert his words into a beautiful book. To solve that problem, he invented TeX. When there were no nice fonts around, he added METAFONT. In a similar way, when Linus Torvalds found himself lacking a good version control system (VCS) after Bitkeeper decided to close access for OSS developers, he chose the only solution he had: He wrote his own.
And thus, Git was born and a lot of people living in abuse-protected web forums were in deep trouble. Even before them, the critics soared: What, another VCS?
Especially the people around Subversion were not so pleased and many people wondered why Linus chose to do his own thing instead of building on existing code. One of the reasons is that Subversion can be thought as a very elaborate bug fix for CVS. It didn’t try to reinvent the wheel.
It also inherits some legacy: You have to setup a central server if you want to do distributed development outside of your LAN. Certain operations are slow, like checkout and update. Agreed, they are faster than CVS but try these with Git. And it’s monolithic software unless you’re willing to use your C compiler. There are only very few ways to interact with the repository from a shell script, only a few hooks to do custom stuff (like sending email). If you just wanted to add a small feature, it would mean real programming work instead of whipping together a quick shell script.
I’m by no means a critic of Subversion; I’m using it every day and I’m happy with it. My point is that it’s confining me in a pretty small box, just a little bit larger than CVS and with less problems. That doesn’t make it larger, though. An example.
You have some files which you want to take home to work on. So you copy them on an USB drive, take them home, edit them, bring them back. When you return to work, a co-worker has changed one of the files. He tells you after you copied all the files from your USB drive back onto your work PC (“Who has time to read all those warnings? Yes to All!”)
The next day, you’re smarter and check in the files into Subversion (SVN). There is no need for a central server and when you ignore the warnings from SVN, you can create the repository on a network drive. When the drive fails in an inopportune moment, your repository will be data trash, but there are certain risks one has to take.
You checkout a copy on your USB drive and take that home. Since working on the file from your USB stick is too slow, you copy everything on your home PC and edit it there. When copying the files back on the USB stick, you notice a lot of write-protected files in .svn directories. Oh well, time for “Yes to All!” again.
After returning to work, you synchronize your checkout with the SVN repository. Life is great. Unless you have Linux at home and were not so careful about Carriage Return/Line Feed conversion and you find the copy of your data on the USB drive is now currupt. But who is using Linux anyway?
The real trouble starts when you feel the need to carry the repository with you. Imagine you have a great idea, you have the USB drive with you, but you’re neither at work nor at home. If you have a computer closeby, you could work on the copy on the USB drive but at the cost of either getting out of sync with your home or work copy.
Subversion, like CVS, only supports a single, central repository unless you use tools like SVK. SVK depends on Perl, though, and it adds nice little … err … rather big cryptic code strings to the commit in messages.
Git, on the other hand, has been built on the “greenfield”. Torvalds could add all the features he wanted and avoid all the common mistakes inherited by the CVS legacy. From a 1000 feet, it’s a set of loosely coupled commands which work on an object database which allows to version objects. Git doesn’t care what an object is, it just versions it. This is pretty similar to SVN, maybe except that Git handles large files better. And that Git is faster for most operations.
The main difference between Git and SVN is that Git is decentralized. This means you can create as many repositories as you want and synchronize them. So in the example above, you can have one repository at work, one on your USB drive and one at home. You can work on all three of them independently and then use Git to figure out how to merge everything.
Remember the dreaded branches from CVS? SVN eased the pain considerably but with Git, everything is a branch to begin with.
To become happy with Git, there are two major steps you need to take. First, you must understand that there is no server. Forget about the idea of server. Git allows to synchronize different copies of a couple of files in different places without a server. To do this effectively, Git keeps some information in the .git directory. If you want do to this remotely, you can use Git as a server, too, but that is basically the same thing as using it locally. Except that the name of a different computer is involved.
The second step is that you don’t put more than one project into one Git repository. With CVS, we are used to use modules. With Subversion, you create subtrees with trunk and branch. With Git, you have one repository per project. Setting up a repository is so cheap, it really doesn’t make sense to have more than one project in it.