Version control - The ultimate guide

Version control – The ultimate guide

Goal of this article

The goal of this article is to examine how to use version control to get the maximum possible benefit out of it.

It doesn’t explain the syntax in detail. It assumes you’re already familiar with basic commands for things such as creating commits, creating and changing branches, pulling and pushing, etc. Rather, this article explains the strategy around using version control, such as how to use branches, how to structure your commits, how to write commit messages, etc.

First steps for beginners

If you’re looking for just the basics of git and version control, there are many resources that are really good for beginners. Here are some resources I recommend looking into:

  • Atlassian tutorials on Git – This is an amazing resource. It’s the resource that I personally used to learn most of what I know about git today. It takes you from complete beginner to intermediate / advanced level with git.
  • Pro Git by Scott Chacon and Ben Straub – Trusted colleagues of mine who are considered git experts recommend this book. I’ve read parts of it, but it’s on my list to read fully. It seems suitable for all levels, from beginner to advanced.

Other than those resources, any git tutorial you can find, will probably help. Alternatively, many beginner programming courses use git as they work through small projects, so if you find a course like that, it’s a great way to get started.

About version control

Before we jump into how to optimise your branches and commits, let’s examine the importance of version control and the benefits it provides.

How important is good version control?

The basics are essential

Using the basic of version control, is essential. This includes things like:

  • Creating commits
  • Creating and changing branches
  • Pulling and pushing from / to remotes

I imagine that almost all programming jobs out there use at least these basic features of version control.

Thankfully, these are things that even complete beginners can learn very quickly.

Intermediate to advanced usage

In my experience, many jobs don’t require more than the basics. I’ve personally worked with many people, even senior developers, who only had very basic knowledge of git. There were no issues with their work, and there didn’t seem to be any issues with them finding jobs either.

However, knowing a few more things can be helpful. I can also imagine that some higher level positions will benefit from more advanced knowledge or even require it. This would include things that you might consider to be intermediate or advanced knowledge of version control. For example:

  • Cleaning up your branch history using interactive rebase.
  • Knowing about branching workflows and strategies and being able to apply them to a project.
  • Knowing about merge strategies, such as fast-forward merges vs normal merges.
  • Being able to revert commits.
  • Knowing miscellaneous techniques such as how to cherry pick commits and such.
  • Etc.

It’s essential that at least some people in your team have that level of knowledge. It’s necessary because a project really should have a branching strategy and a merging strategy in place. Additionally, version control issues, while rare, do appear sometimes, so it’s important to have a few developers who are able to fix them.

However, again based on my own experience, it seems that as long as one or two developers have that level of knowledge in the team, the other developers don’t need to. The knowledgeable developers can set up the infrastructure and version control standards for the team. Afterwards, the other developers only have to use the basics such as creating commits and pushing them, while following the existing standards. If there are ever any issues, the less knowledgeable developers can ask the more knowledgeable developers for help.

Perfect version control

Other than the essentials discussed already, using version control well doesn’t seem to be a high priority matter. This is mostly because version control issues tend to be rare, at least in my experience. Additionally, scanning back through the commit history of a codebase isn’t something that’s frequently done.

For these reasons, structuring commits well and having good commit messages, among other things, are things that many developers neglect.

Therefore, since I always advocate being pragmatic, I don’t believe it’s worthwhile to be extremely diligent with your commit history. I don’t personally spend a long time on every single commit to make it perfect. I don’t believe that the time spent would bring sufficient benefit over the lifetime of the project to make it worthwhile.

However, I do believe that having a pretty good commit history is worthwhile. Maintaining a pretty good standard takes significantly less time than maintaining a perfect standard. Therefore, this is the point at which I believe that the benefits gained over the lifetime of the project justify the cost of time.

Please note that this conclusion is only based on my personal experience. Your experience may be different. In any case, always remember to be pragmatic and do what you believe is best in your situation.

Benefits of version control

We can classify the benefits we get from version control into two categories:

  • Essential benefits
  • "Nice-to-haves"

Essential benefits

The first category includes benefits that are considered absolutely essential in modern software development. It includes things like:

  • The ability to switch between different versions of our code (either different commits or branches), so we can try things out and work on different things at different times.
  • The ability to safely back up our work.
  • The ability for different computers and users to access the codebase.
  • The ability to collaborate with other developers, easily.
  • The ability to resolve code conflicts easily.
  • And much more.

We can obtain all of these benefits with only basic knowledge and basic usage of version control.

Nice-to-haves

The second category includes benefits that many developers would consider "optional" or "nice-to-have". It includes things like:

Help with debugging

If a bug is proving tricky to find, you can greatly reduce the area of code you need to debug by identifying the exact commit where the bug was introduced.

You can do this by:

  • Reading the commit history and seeing if any commits look relevant.
  • Checking out past commits one by one and testing them to see if the bug is there.
  • Using the git bisect command.

If you manage to find the "culprit" commit, you might be able to save a lot of time when debugging.

Better code reviews

Sometimes, it’s useful to do a code review on a single commit. For example, you might have already performed a code review on a branch, but requested some additional changes. In this case, it might be a waste of time to re-review all the changes in the branch. Instead, you can code review just the latest commit.

In other cases, you may prefer to perform a code review one commit at a time altogether.

Easier reverting / resetting of commits

Sometimes, things just go wrong and you need to revert or reset some commits.

If it’s a critical case, such as a bug having been introduced to the master branch and pushed live, you may just use the "shotgun approach" and revert everything since the last sprint, just to be safe.

But in less critical situations, it might be better to revert as little as possible, so you don’t lose any changes which aren’t faulty.

Easier to read and understand the commit history of a project

Sometimes it’s useful to examine the commit history of the project.

This can be for various reasons, such as:

  • Wanting a general overview of the project’s history.
  • Looking through a branch to get an overview of how a feature was developed.
  • Looking at specific commits, to understand why code changed the way it did.
  • Looking at the history of particular files, to understand why, when and how those files have changed. Among other things, knowing these reasons may help you from changing implementations that you shouldn’t change.

Branches

A branch is a separate version of your code. You use separate branches during development so you can work on new code, without accidentally breaking the main stable branch. They are essentially a separate sandbox where you can code anything you want.

Things to consider when working with branches are:

  • Branching strategy and workflow
  • Merging strategy (normal merges vs fast-forward merges)
  • Solving conflicts with merging vs back merging
  • What to name branches
  • And many more things

Branching strategies and workflows

Have a stable main branch

Ideally, you should always have a stable main branch. This branch should be as close to being release-ready as possible.

The primary reason for this, is because it’s relatively easy to maintain an already-stable branch. However, the longer a branch remains unstable, the harder it will be to make it stable again. This is because broken code may keep pilling up. As a result, things may break in more complicated ways over time.

Additionally, an unstable branch may block or slow down the development of new features. For example, if a section of the codebase is not working properly, developers may need to fix it before they can complete a new feature. Alternatively, if some tests aren’t passing, developers may be confused or distracted by them when they’re writing new tests. And so on…

Further, Agile advocates frequent, small releases. Keeping a main branch continuously stable and close to release-ready, is very helpful in making this possible.

Use feature branches

Since the master branch should always be stable and as close to release-ready as possible, new development should be done on "feature branches".

That’s because, as you develop new features, the code may break. You may write prototype code that’s not fit for production, hardcode values, write failing tests, etc. You don’t want to commit that code into the stable branch until it’s finished and working.

Further, using a feature branch ensures that even if you do accidentally push the code to the remote, it doesn’t affect the stable branch and accidentally get released.

In addition, by using feature branches you get more benefits such as the ability to do pull requests and code reviews. You can also have multiple developers collaborate on a feature. This requires you to push the code to the remote at some point, but you wouldn’t want to push the code to master until the feature is complete and properly tested.

So that’s why you should create branches off master (or whatever you’ve named your main, stable branch), do all of your development there, then merge to master when you’re done.

For more information on this, see feature branch workflow.

Gitflow

Gitflow is an established branching strategy / workflow. It’s quite strict and has a lot of ceremony regarding branches and releases.

In my opinion, one of its greatest benefits is that it’s very safe. All of the ceremony gives the team plenty of time to ensure that the master branch is as stable as possible before releasing. Other workflows can also provide the same level of safety, but only if you’re able to implement them properly and with that purpose in mind.

A neutral point of Gitflow is that it’s well-defined and not modifiable, or at least, it doesn’t claim to be modifiable. You may consider this an advantage, since it means that you don’t have to worry about designing your own workflow, or you may consider it a disadvantage, since you technically shouldn’t modify it even if it would suit your project better. In either case, perhaps it’s a moot point. It’s your project, which means that you can technically do whatever you want, including modifying your use of Gitflow.

A small disadvantage of Gitflow is its complexity. It does things in a way that’s not necessary intuitive, particularly with its release and hotfix branches. Each of those branches has particular rules about where it must branch off from and be merged into. For example, release branches branch off develop and are merged into master and also into develop. Hotfix branches branch off master and are merged into master and develop.

In comparison, other workflows like GitLab flow are simpler in this aspect. Everything branches off master and merges into master. Bugfixes should additionally be merged or cherry picked into the appropriate release branches, but that’s to be expected.

Another minor disadvantage, is that Gitflow is not as suitable for continuous delivery. You can set it up for continuous delivery, but the additional branches and ceremony tend to make it more difficult to do so than other workflows.

For actual information on how Gitflow works, please see the Gitflow Workflow page by Atlassian or the Gitflow post by Vincent Driessen.

I would recommend using Gitflow when:

  • You want maximum safety regarding releases.
  • You’re not practicing continuous delivery.
  • You have a personal preference for Gitflow.

GitLab flow

GitLab flow is a fairly simple and flexible branching strategy.

Here is how it works:

  • You create feature branches off the master branch and merge them back into the master branch.
  • Options for release branches are flexible:
    • You can release directly from master.
    • You can have a separate branch for "production".
    • You can have multiple branches between master and "production", such as "pre-production", "staging", etc.
    • You can have multiple release branches for different numbered releases.
  • Hotfixes branch off master and are merged into master. You also need to add them to the production branch and / or numbered release branches. You can do this by merging master into them, or merging the hotfix branch, or cherry-picking some of the hotfix commits. The option you choose depends on how much additional commit history you want to merge into the branches. If you only want the hotfix commits, just cherry pick them.

That’s pretty much it.

It’s very flexible, which means that you can make it as safe as you want and / or keep it as simple as you want.

One thing that I also like about it, is that it follows an intuitive branch "stability hierarchy". There is a clear order in terms of the stability of branches. Merges / cherry-picks follow that order. For example:

  • feature branch (least stable branch) -> merged into master and optionally tested -> merged into pre-production and optionally tested -> merged into production (most stable branch)
  • hotfix (similar to a feature branch in terms of stability) -> merged into master and tested -> cherry-picked into pre-production or release-candidate branches and tested -> cherry picked into production or numbered release branches.

For full information on GitLab flow, see Introduction to GitLab Flow.

I recommend GitLab flow:

  • For projects where you want a simple, fast workflow, potentially with continuous delivery.
  • If you want to design your own workflow to suit your project’s needs.
  • If you have a preference for GitLab flow.

More strategies

There are many more strategies to choose from. It might be worth exploring some of them, as you may like them far better than the ones presented in this article.

Many of them, such as GitLab flow, are flexible, with only a few prescribed concepts such as working off master and using feature branches. As a result, many of them are similar.

Here are some more branching workflows that I’ve come across:

If you’re interested, feel free to do your own search for more. You may find some that are more suitable for you than the ones in this article.

Conclusion

  • Keep a stable branch
  • Use feature branches
  • Consider picking a workflow like Gitflow, GitLab flow, or an alternative, and stick to it.

Merging strategy (fast-forwards vs normal merges)

For an explanation of normal merges vs fast-forward merges, please see the Git Merge tutorial by Atlassian.

Long-story-short, a normal merge creates a "merge commit" on the target branch. A commit with a message such as "Merge branch ‘X’". This commit will also contain the commit history of both branches.

A fast-forward merge makes the commit history look as though you made each commit directly on the target branch, instead of on a feature branch which you then merged. There won’t be a merge commit.

Each strategy has pros and cons.

Feature Normal merges Fast-forward merges
Commit history graph Tend to create very messy commit history graphs. Create neat, linear commit history graphs.
Filtering commit history Allow you to select whether you want to see only merge commits, only commits which aren’t merge commits, or both. Can only see normal commits (as only those exist in the commit history).
Reverting / resetting Can revert individual commits or entire branches easily. Can only revert individual commits, meaning if you want to revert an entire branch, you’ll have to revert every commit from that branch.
Rebase vs merge command The merge command is easier for beginners and more difficult to mess up. The rebase command can be more difficult for beginners. It’s also easier to mess up due to having to resolve similar conflicts repeatedly, increasing the chance of error.
Commit history preservation Preserve project history perfectly. All of the original commits are available, and merge conflicts can be reproduced and examined. Don’t preserve the full project history. Changes made with the rebase command are permanently lost.

There is also a third option. You can use normal merges, after rebasing. In other words, rebase to get your branch into a state where a fast-forward merge is possible, but do a normal merge. This creates a very neat, linear history and also includes the merge commit.

However, this strategy is more difficult to enforce. Enforcing normal merges or fast-forward merges, only needs some basic configuration on most version control providers. To enforce normal merges but also check if the branch would be valid for fast-forwarding would probably take more configuration and custom scripts.

Which strategy should you choose?

Both strategies have been used successfully in all sorts of projects.

For private (not open-source) projects, I don’t think the choice matters very much. You can use whichever strategy you personally prefer.

The main consideration for most people is: Do you care about having a neat, linear, commit history? If so, then use the fast-forward merging strategy. Otherwise, go for normal merges for the additional benefits with filtering and reverting commits and branches.

For open-source software, the fast-forward merging strategy seems to be much more common (based on GitHub’s top 20 repositories in terms of stars, as of the time of writing).

This is probably good, as the vast majority of "issues" (similar to "tickets" in task management software) in open-source software seem to result in just a single, small commit. In this case, having a merge commit would add unnecessary noise to the commit history.

Solving conflicts with merging vs back merging (or rebasing)

(This section only applies if you’re not enforcing fast-forward merges.)

Sometimes, a feature branch and the branch you want to merge into have code conflicts between them.

There are a few options for resolving these code conflicts:

  1. You can merge your feature branch into the target branch (and resolve the code conflicts during the merge).
  2. You can "back merge" the target branch into your feature branch (and resolve the code conflicts during the merge). Afterwards, you can merge your feature branch into the target branch.
  3. You can rebase your feature branch onto the target branch (and resolve the code conflicts during the rebase). Afterwards, you can merge the feature branch into the target branch.

Merging directly into the target branch is the most unsafe option. Back merging is very safe. Rebasing is relatively safe and results in the cleanest commit history. (But also bear in mind that it’s dangerous to rewrite the history of remote branches. This is covered in more detail in another section.)

The issue with code conflicts is that it’s possible to resolve them incorrectly. If this happens, in the best case, you’ll break the build. In the worst case, you’ll introduce bugs that may not get caught right away.

For this reason, if there are code conflicts, you don’t want to merge into a branch that’s intended to be more stable than your feature branch, because you might break it.

Therefore, it’s recommended to back merge OR rebase instead. That way, if there are any problems, only your feature branch will break, which is much better than a more stable branch breaking. You can then fix the issues in your feature branch. Then, when you’re happy that everything works well, you can merge your feature branch into the target branch.

Other branch tips

Don’t rewrite the commit history of remote branches

Rewriting the history of remote branches is dangerous for many reasons:

  • Force pushing is dangerous.
  • It creates conflicts for everyone else.
  • It would be a nightmare if everyone did it often.
Force pushing is dangerous

Rewriting the commit history of a remote branch requires you to force push.

Force pushing is very dangerous. It completely overwrites the remote branch with whatever you push. If you force push at the wrong time, you may overwrite any new commits your teammates have made which means that you’ll delete their work.

The best way to avoid this is to never force push.

But, if you’ve decided that you will force push, at the very least use the git push --force-with-lease command instead of git push --force. If used correctly, this command will prevent a push if any new commits exist on the remote branch, meaning that you won’t accidentally overwrite any new commits your teammates have made.

However, note that it’s very easy to use this command incorrectly, making it no different from force pushing. For an explanation of how to use it properly, please see –force considered harmful; understanding git’s –force-with-lease.

Overwriting remote history creates conflicts for everyone else

Changing the commit history of a remote branch will create code conflicts for everyone else using that branch. Every other developer will then have to merge or rebase the new branch changes and also resolve the code conflicts.

This is inconvenient and potentially error-prone. Further, if the team uses merges instead of rebases this will create messier commit histories.

In addition, conflicts can also happen if the changed branch is an ancestor of a branch that a developer is using or if the changed branch is the target branch they want to merge into.

If everyone rewrote history, working would be very difficult

Consider if everyone rewrote the history of remote branches as often as they wanted to. You would have to interrupt your work to fix your local branch fairly often. As a result, working could become very difficult.

Conclusion

The best way to avoid all of these problems is to not modify the history of remote branches.

Exceptions

There are times when it’s probably safe to rewrite the history of remote branches.

  • The safest time to rewrite history is probably soon before merging into another branch and deleting the branch. That’s because, after that point, no one will have access to that branch, so it won’t affect anyone negatively. Even so, ideally, this should be a process that the entire team is aware of, so that no one is surprised when it happens.
  • If you’re absolutely certain that no one else has used this branch (or created a new branch off it) and that no one will be negatively affected by rewriting its history, then it’s probably safe to rewrite its history. However, to truly be sure, you may have to confirm with many other developers.

Branch naming

If you’re using normal merges, then the branch name will show up in the commit message of the merge commit, when you merge the branch. Therefore, it’s useful to have a good branch name that describes the work done in the branch.

Anything sufficiently descriptive will do. For example:

"fix-issue-with-service-worker"

My personal preference is a format like this:

"feat/ticketId-title-of-ticket"

In more detail:

  • The "feat" at the start is the type of the story. E.g. feature, hotfix, etc. This is a convention used by Gitflow.
  • The ticket ID or issue number comes next. This is here so that a user reading the "merge commit" message can immediately open the ticket for more information. The only time I wouldn’t include this is if the version control provider automatically links to the ticket / issue anyway.
  • There are many delimiters you can use between the words, such as dash (-) and underscore (_). My personal preference is the dash (-).
  • The rest describes what the branch is working on.

Use git pull with rebase

Your local branch and the corresponding remote branch can diverge, meaning they can end up with different commit histories.

This can happen for a few reasons, such as:

  • Another developer has added commits to the remote branch.
  • Someone rewrote the history of the remote branch.
  • You rewrote the history of your local branch, so it no longer matches the history of the remote branch.

If your local branch and the remote branch have diverged, when you fetch those changes, you can either merge them or rebase them into your local branch.

You can merge them with git pull, or:

git fetch
git merge

You can rebase them with git pull --rebase, or:

git fetch
git rebase

You can also make git pull have the default behaviour of rebasing by changing your global git configuration file or by executing the command git config --global pull.rebase true in your terminal.

The difference between merging and rebasing is that, if you merge the changes, if the merge isn’t be a fast-forward merge, you’ll end up with a merge commit in your local branch and therefore a messier commit history. Further, the merge commit will contain the history of the remote branch and the history of your local branch. In the case where a developer rewrote the history of the remote branch to clean it up, this somewhat defeats the point, as the original history will be reintroduced when you use git pull.

However, if you rebase the changes with git pull --rebase, you’ll end up with a linear commit history.

My personal recommendation is to always rebase these changes, rather than merge them.

The main reason for this is because, after the feature branch is merged into master, a merge commit, in the middle of the commits of the feature branch, is unlikely to provide any benefit to a future user. Rather, having to navigate through merge commits will probably be an inconvenience. I imagine a future user will be much better served with an easy-to-follow commit history featuring only the unique commits of the feature branch.

Another reason is because it’s "safe" to rebase at this point, as far as the remote branch is concerned. If you use git pull --rebase and then you git push, you won’t overwrite the history of the remote branch.

For the details of how all of this works, please see the git pull tutorial by Atlassian and the git branch rebasing page in Pro Git.

Have short-lived feature branches

It’s important for feature branches to be as short-lived as possible.

One problem with long-lived branches is that the parent and feature branches diverge over time. This means that code conflicts build up over time, making merging more difficult.

Another issue is that bigger features tend to be harder to test. They are also more dangerous, meaning that many things can go wrong with them. When possible, it’s generally safer to develop new features in small increments. You can do this by having small feature branches. Since they’ll get merged into master soon, you can test increments of a feature bit-by-bit, rather than working on a large feature branch over a few months, merging into master and testing it all in one go.

If you’re going to have long-lived feature branches, at the very minimum you should "back merge", or rebase, master into it regularly, so that it doesn’t diverge too much. You should also be testing the feature as you develop it, to minimise the danger of merging a large untested feature into master.

But the best thing to do is to have short-lived feature branches. This is also recommended by Agile, as seen in the Agile 12 principles.

Don’t squash branches into a single commit, unless you have a good reason

This refers to having a branch with multiple commits, and rewriting history so they appear as a single commit, before merging.

Well structured commits provide many benefits. Additionally, good commit messages make it much easier to understand the commit history of the codebase. So don’t "delete" this useful information if you don’t have to.

However, there may be good reasons for squashing branches. For example, if many commits are not well-thought out, or the commit messages are not very helpful, then it may be better to squash those commits into a single commit with a good commit message.

Another reason may be if you are using the fast-forward merge strategy, but want your commits to resemble "merge commits" for easy reverting.

In the end, it’s up to you to decide whether to squash commits or not. Just make sure you understand the pros and cons.


Commit structure

Structuring your commits well helps you gain the maximum benefits from using version control.

As already mentioned, if you only care about the basic benefits, then you don’t have to worry about this too much.

You can get away with whatever kind of commits you want.

In many places I’ve worked in, I’ve often seen:

  • Branches consisting of a single, large commit.
  • Multiple commits, each of which was just the work the developer did the previous day, rather than deliberately structured.
  • Multiple commits in a branch, each with the same commit message.
  • Commits that don’t pass the build. In fact, in many places where I’ve worked, the majority of commits fail the build until the final commit of the branch.

And that’s completely fine. It probably doesn’t negatively influence most projects.

However, if you want to obtain the maximum benefits from version control, it’s worthwhile to have a bit more diligence and structure with your commits.

Guidelines for commit structure

Here are some guidelines for structuring your commits, based on the "nice-to-have" benefits described earlier:

Commits should be small

Small commits help with:

  • Debugging – If you find the culprit commit where a bug originated, it will be easier to debug it if the code changes in that commit are small.
  • Code reviews – All other things being equal, less code is easier to review than more code.
  • Reverting commits – If your commits are large, then you’ll lose a lot of code when you revert them, even code that worked fine and didn’t need to be reverted. If your commits are small, you’ll lose less work when you revert them.

Commits should be stable

Stable commits help with:

  • Debugging
  • Reverting / resetting commits.
Debugging

When debugging and searching for the culprit commit where a bug originated, you checkout to different commits each time.

If the commit you checkout to is stable and passes the build, you can start the server, launch the product and do as much manual testing as you want. On the other hand, if the code doesn’t build, then that’s not possible. Instead, you’ll have to waste time searching for the closest stable commit, or fixing the current commit until you can run the code and test it.

Reverting / resetting commits

Also, when reverting / resetting commits, if every commit passes the build, it means that you can revert back to any commit with minimal risk.

Each commit should be a logical unit of change

This helps with:

  • Debugging
  • Code reviewing
  • Examining the commit history of the project
  • Reverting commits

So basically, this helps with all the benefits.

Overall, this guideline just encourages common sense organisation for your commits.

Consider, when examining the commit history of the project, would you rather see a history like this?

  1. "Work on feature X"
  2. "Work on feature X"
  3. "Work on feature X"
  4. "Complete feature X"

Or would you rather see a history like this?

  1. Add markup for feature X.
  2. Add styling for feature X.
  3. Add validation for feature X.
  4. Import and use feature X in app.

In the commits in the first example, every commit probably includes modifications to HTML, CSS, JavaScript and server code. Also, the incomplete feature is probably imported into the app in the first commit, so that developer can launch the product and test it.

Overall, it’s not possible to get an accurate idea of what happened in each commit.

The commits in the second example are much better. At the very least, you can get a clear idea of what happened in each commit. In these commits, you would only see modifications to the relevant files.

Obviously, this helps when examining the commit history of the project.

(Note: You can still import the incomplete feature into your app for testing purposes. Just don’t commit it in git until the feature is ready.)

You can split your commits any way you like, as long as they’re well structured for a future reader. Here is a more "vertically sliced" example, split by "feature" rather than file type:

  1. Add case X for data validation.
  2. Add case Y for data validation.
  3. Import and use data validation in App.
  4. Show error message to user if they input invalid data.

Every commit to be a logical unit of change. The changes made in a commit should be related and logical. Other, unrelated, changes should be in different commits.

This helps when reverting commits. Each commit is easier to understand. Therefore, it’s easier to understand which commits you should revert. Also, because you’re committing changes in a proper order and including all related changes, it’s easier to keep your commits stable. This means that there is less risk when reverting commits.

It also helps when debugging, because instead of having multiple unrelated code changes in each commit, which may get in the way of your debugging, you’ll have focused and related changes.

Finally, it helps with code reviews, because every commit’s goal and code changes are easily understandable and therefore easier to review.

Commits don’t have to be miniscule (extremely small)

I’m certain that this goes against the traditional advice you’ll hear about commits. However, I personally believe that commits which are too small are unhelpful.

There is a balance here to be had here. Most commits benefit from being small and atomic. Further, most developers tend to create commits which are too large, rather than too small.

But, on the other extreme, you can also have commits which are needlessly small.

For example, at one point, I tried creating commits in accordance with a very small TDD loop which went like this:

  1. Create a small failing test for a particular case.
  2. Code the minimum implementation necessary to make the test pass.
  3. Refactor.

For every loop, I created a single commit for the code changes for steps 1 and 2 (I combined these steps because I wanted stable commits). If step 3 was necessary, I created a separate commit for it.

I ended up with quite a lot of commits.

Here is an example:

  1. Handle base case in factorial
  2. Handle case n=1 in factorial
  3. Handle case n=2 in factorial
  4. Combine conditions in factorial (refactor step)
  5. Handle all positive n in factorial
  6. Remove unnecessary check for n=2 (refactor step)
  7. Throw error for case n<0
  8. Import and use factorial function in X

In my opinion, these small, atomic commits are more of a nuisance than a benefit. I personally find them too long to read through.

I would much prefer:

  1. Create function to calculate factorial
  2. Import and use factorial function in X

The main reason for this is because, since my commits are small and atomic, I already end up with way too many commits in my projects, usually far more than many other developers I work with. If instead I created commits according to the TDD loop, I might end up with 4x the number of commits. That would be far too many.

However, let’s tackle the pros and cons in terms of what we want from version control:

  • Examining project history – In my opinion, the second example is easier to read. The first example provides more detail, but I personally don’t feel like I want that level of detail in my commit history.
  • Debugging – The smaller commits are probably slightly easier to debug. However, the code in question is very small anyway, so the difference would be minor at best.
  • Reverting / resetting commits – Deciding which commits to revert requires reading the commit history. As mentioned, I believe example 2 is better for this. In terms of code changes that will be lost, more changes will be lost with example 2. However, example 2 is fairly small anyway, so it’s not a significant difference.
  • Code reviewing – Personally, I would not want to code review at the level of granularity of example 1.

But please bear in mind: This is just my personal opinion. It’s fine if you prefer example 1.In either case, just use whichever you believe is best for your codebase.

Be pragmatic

In my opinion, the most important goal is to be pragmatic. This means to maximise the value you bring to your employer, both in the long-term and short-term.

Following that, as already mentioned, I don’t believe that being 100% perfect with version control, particularly your commit structure and commit messages, is worthwhile. I believe it’s better to be pretty good, rather than perfect.

Personally, I would consider myself only around 80% as diligent as I would be if I was trying to do everything perfectly.

For example, sometimes I create a commit where I think "I really should split this into two commits", but don’t really want to make the effort or want to have the additional commit in my project history (too many commits can be a nuisance too). However, if I was really being 100% diligent, I might have split it.

At other times, maybe I realise that a few commits back I committed an unrelated refactor with my code changes. If I think that it’s only minor, I won’t always go back and fix it. Maybe I just don’t feel like the time I would spend on fixing it is worth the benefit gained over the duration of the project.

In your case, remember to be pragmatic. Try both options (100% diligent and pretty good) and review which is best for your situation.

Example commits

Here are some examples of good and bad commits.

Multiple things per commit

If your commit message has the word "and" in it, it suggests that you could split the commit into multiple commits.

For example:

  1. add images to the resources folder and add images to about page
  2. finish HTML and styling for about page
  3. add image lazy loading with JavaScript

In may be better to reorganise the commits like this:

  1. add images to the resources folder
  2. add HTML for about page
  3. add styling for about page
  4. add image lazy loading with JavaScript

Alternatively, if the about page was fairly large, you may instead want to split the commits by section, like so:

  1. add images to the resources folder
  2. add header section to about page
  3. add section 1 to about page.
  4. add section 2 to about page.
  5. add section 3 to about page.
  6. add footer to about page.
  7. add image lazy loading with JavaScript

In these commits, commits 1 to 6 would contain both HTML and CSS changes, so that they form a "logical change". To put it differently, a user reading these commit messages in the future would probably expect both HTML and CSS in each commit.

Configuration and package installations

In general, I prefer to have granular commits for configuration and package installations.

For example, here are some of the initial commits from a personal project I’m working on as of the time of writing:

  1. build(config): Set up initial package.json
  2. ticket-number: build(dependencies): Install webpack
  3. ticket-number: build(dependencies): Install webpack-cli
  4. ticket-number: build(dependencies): Install html-webpack-plugin
  5. ticket-number: feat(src): Add sample files for webpack build
  6. ticket-number: build(config): Add basic webpack config
  7. ticket-number: build(npmScripts): Add build script
  8. ticket-number: build(dependencies): Install @babel/core
  9. ticket-number: build(dependencies): Install babel-loader
  10. ticket-number: build(dependencies): Install @babel/preset-react
  11. ticket-number: build(config): Add basic babel configuration
  12. ticket-number: build(config): Use babel in webpack build

It would probably be acceptable to group multiple related dependencies together, especially if they’re normally installed together. However, I like keeping them separate so I can see them easily in the project’s commit history.

General example

Here are some more commits from a story in one of my personal projects:

  1. ticket-number: build(dependencies): Install rxjs
  2. ticket-number: fix(config): Remove console.log from storybook config
  3. "ticket-number: test(testUtils): Add test utility for testing custom rxjs observables". (This is needed for the textProcessor commit further below)
  4. ticket-number: build(config): Add import alias for testUtils in jest config
  5. ticket-number: feat(controller): Add function createSplitEveryNObservable in textProcessor
  6. ticket-number: feat(model): Add ChunkStateManager class
  7. ticket-number: feat(controller): Add uploadHandler file
  8. ticket-number: feat(view): Use new controller functions for text upload and processing

Here, I had the option of splitting commit 6 into more granular commits. E.g. Instead of 1 commit for the entire ChunkStateManager class and its tests, I could have created a separate commit for every method. If I was being 100% diligent, I would have done that, but at the time I thought that it didn’t matter very much. I also didn’t want to spend the extra time to split the commit.

In terms of debugging, I don’t think that splitting commit 6 further would provide a significant benefit. That’s because a bug would only appear in the program after I import and use that class somewhere, which I do in commit 7. In other words, more atomic commits wouldn’t be a significant help, as they would never be flagged in git bisect.

Code refactoring commits

Should you have separate commits for refactoring code?

It depends.

In general, whenever I’m adding new code, I tend to refactor along the way.

If I believe that the refactor I’m performing is small and related to the current code I’m trying to add, then I don’t create a separate commit.

For example, I may be working in a class to add a new method to it. When I finish, I may notice similar code elsewhere in the same class, so I might proceed to refactor both instances of duplicate code into a new private method. In this case, I wouldn’t create a separate commit for the refactored code.

Since the refactor was small and related, I don’t feel that a separate commit would add much value. Also, I find too many miniscule commits to be a nuisance. Finally, a separate commit wouldn’t help significantly for debugging, since the additional code changes were very small.

On the other hand, if it was a significant refactor, or the duplicate functionality was found in a different class, I would probably create a separate commit. Commits should be small, so in these cases where the code changes are larger, I would split the code changes into two, smaller commits.

And of course, if I don’t consider a refactor related at all, then I would definitely have separate commits. A commit should be a logical unit of change, so unrelated code changes should be in separate commits.

But these are just my thoughts on this. Remember that you can do whatever you think is best in your situation, especially if you have well thought-out reasons for doing so.


Commit messages

Commit messages are important to help you understand what happened in a commit.

Ideal goals of commit messages

The ideal goals of commit messages are:

  • When scanning through the subject lines of many commits, you want to quickly understand what happened in each commit.
  • When looking at individual commits in detail, you want to thoroughly understand what happened and why, just from the commit message. All the relevant information should ideally be included, without needing to check external links or tickets for additional information.

Be pragmatic

To really achieve the ideal goals of commit messages, you would probably have to write a lot of text every single time. You would also have to carefully plan each commit and ensure that all the relevant information is included.

In line with being pragmatic, in my actual work, I generally don’t put in that much effort. In fact, the majority of the time, I only write the subject line of the commit message. I usually only write commit message bodies if I feel that the reasons for my changes, or the implementation, may be confusing to other developers reading the commit in the future.

So keep in mind that this section covers how to write ideal commit messages, but you don’t necessarily have to go that far.

Commit messages in a nutshell

Overall, I really like the direction of conventional commits.

I find that following that convention results in really good commit messages that provide a lot of useful information.

So if you want the short version of how to write good commit messages, I recommend reading that page. You can use their recommendations exactly, or adapt them slightly to work for you.

Sidenote: Ticket ID, issues and task managers

The ticket ID refers to a ticket in Jira, or an issue in GitHub, or anything similar to that.

Overall, I believe that having the ticket or issue ID, somewhere in the commit message, is absolutely essential. A significant amount of information can usually be found in tickets or issues. As a result, it’s usually much easier for someone to find the ticket ID and open the ticket for more information, than to continue reading the commit message.

Also, if you’re currently reading the ticket and you want to examine the relevant commits, you can search for them if they include the ticket ID in the commit messages.

Of course, if you don’t find tickets or issues useful, then you don’t have to include the ticket ID in the commit message.

Commit subject line

The goal of the commit message subject line is to provide a very quick summary of what happened in that commit. Most often, the subject line is read when someone is scanning over multiple commits.

It’s important to list the most important information in the subject line, in a concise way.

Example format

Here is the format that I typically use:

ticket-number: type(optional-scope): Description

For example:

1234567: feat(view): Import and use service worker

Or, without the optional scope:

1234567: feat: Import and use service worker

Overall guidelines

Here are some commonly cited guidelines for the commit message subject line.

Use imperative tense

This means that you should use "add" or "fix" instead of "added" and "fixed".

For example:

"add HTML for about page"

instead of

"added HTML for about page"

There are a few reasons for this:

  • Imperative tense results in shorter commits. For example "fix" is shorter than "fixed" or "fixes".
  • The source code of git uses imperative tense. In other words, this could be the considered the original convention for modern version control.
  • Git commands like git merge and git revert automatically generate commit messages written in imperative tense. Unless you manually edit those commit messages, consider using imperative tense in general for consistency across all of your commits.

Another reason I’ve heard of is that: You can think of commits as commands or patches to apply. For example, when rebasing, a commit message in imperative tense describes a command that’s about to happen if you apply the commit, such as "fix X". At this point, it can be argued that this tense makes the most sense, as the commit isn’t applied yet and therefore what it does is not in the past. Other commands act in a similar way.

But realistically, it doesn’t really matter. If you and your team prefer past tense, that’s completely fine. Feel free to use that.

Keep it short

Shorter is better. All things being equal, a concise description is better than a longer description, particularly when you’re scanning over a ton of commit messages.

Also, some version control providers truncate subject lines to 72 characters. Originally this was due to legacy reasons, such as terminals being "smaller" in the past than they are today. Many providers have since changed to show as many characters as your monitor allows, but some still truncate to the subject line to 72 characters.

My personal recommendation is to keep the subject line short, when possible. But otherwise, I definitely don’t consider 72 characters to be a hard rule. If you really need more than 72 characters on rare occasions, then use them.

However, note that if all of your subject lines are longer than 72 characters, they will be quite difficult to scan through. So in general, always make a good effort to try and keep your subject lines short.

Optional ticket ID

In every place where I’ve worked so far we include the ticket ID as the first thing in the subject line.

The main reason for this is to ensure that we remember to include it. If it’s always the first thing we write in commit messages, then it’s difficult to forget it.

However, my personal preference is to include it in the footer of the commit message.

That’s because, if you’re scanning through the commit history of the project, the ticket ID doesn’t offer useful information, to the reader, about the commit, it’s just metadata. I believe it’s better to include it in the footer and utilise the limited space in the subject line for a useful description instead.

As a final note, consider using a git hook to insert the ticket ID for you, or to validate that it’s there. One way of doing this is to include the ticket ID in your branch name. Then, a script can access the branch name, read the ticket ID and write it in your commit message.

Optional emoji

This seems to be a fairly new trend. Some projects have started including emojis in the subject line. The emoji that’s used depends on the type of commit.

For example:

🐛 fix(x): Fix bug X

This could be a pretty good idea. A visual aid like that can instantly show you what type of commit you’re looking at. When looking over the commit history of a project, this could prove very useful.

One small downside, is that this won’t be helpful until you learn what particular emojis mean in your codebase. However, for many long-term projects, I imagine that won’t be an issue.

If this seems useful to you, consider giving it a try.

Commit type

This refers to the types described in the conventional commits specification.

The types describe the purpose of the code changes made in that commit. For example:

  • fix – The commit fixed a bug or is part of multiple commits that, when combined, fix a bug.
  • feat – The commit works towards a new feature.
  • feat! – The commit works towards a new feature and also introduces a breaking change.

This information is very useful when scanning over the commit history of a project. Overall, quicker you can get an idea of what each commit does, the better off you’ll be.

The conventional commits specification requires those types, but you can also use any other types you want. At the time of writing, conventional commits links to the Angular convention for additional commit types which you may want to consider.

Scope

Scope is another useful piece of information to quickly specify where (in which area of the codebase) changes are being made.

You can think of scopes as distinct areas in the codebase or distinct functionality in the codebase where you often make standalone changes.

Both conventional commits and the Angular convention recommend using some high-level nouns, such as:

  • parser
  • lang
  • animations
  • elements
  • language-service
  • Etc.

Unless your project is also a framework like Angular, you’ll probably use different scopes. You’re free to choose whatever you want.

Personally, I haven’t put too much thought into this, yet, for my personal project. At the moment, I just use some scopes that I feel are useful. For example:

  • Config
  • Dependencies (npm packages)
  • View
  • Model
  • Controller
  • Any other "top-level" folders or areas of the project where I frequently make standalone changes.

I imagine that as a project grows, more obvious scopes will start to appear.

Subject line description

Capitalized vs lowercased

The conventional commits specification doesn’t enforce a particular casing for the description. It only states that you should be consistent with your casing.

Therefore, you can start the description with a capital letter or a lowercased letter. It’s all up to your preference.

For example, both of these are valid:

  • build(config): disable rule x for ESLint
  • build(config): Disable rule x for ESLint

My personal preference is to start the description with a capital letter.

I find that the capital letter gives a clearer indication of where the description starts. Overall, this makes it easier for me to read, especially when scanning over multiple commits.

However, you may have the opposite opinion. In either case, you can use whichever you prefer.

No full-stop at the end of the description

Don’t end the subject line with a full-stop.

It’s an unnecessary additional character, which we don’t want because subject lines should be as short as possible.

Also, full stops shouldn’t be used for things like subject lines and headings, except in particular cases. For more information on this, please see Full Stops in Titles, Headings and Captions.

Content

The content of the description will always vary, depending on what exactly you did in that commit.

In essence, the description should answer this question: If someone was looking at the commit subject line, 6 months from now and they wanted to understand what you did, as quickly as possible, what should the subject line say?

If you can answer that question well, then you’ll probably be able to write a good description. You might also need to re-word it so that it’s sufficiently short.

Please don’t worry if coming up with good commit descriptions seems very difficult to you. It is actually a very difficult thing to do, so it’s not just you. Even people that have had a lot of practice with it struggle. Just try your best, you’ll definitely improve over time.

Commit message body

The purpose of the commit message body is to help a reviewer, examining the commit in the future, understand as much as possible about why the particular changes were made.

The commit message body should include information such as:

  • Why the changes needed to be made.
  • How the changes were made (if it’s not obvious from the commit diff).
  • Why the particular implementation was chosen (if it’s not immediately obvious).
  • Any limitations of the current code.

In addition, consider these points, mentioned in the section "information in commit messages" from the OpenStack Git Commit guidelines:

  • Do not assume the reviewer understands what the original problem was.
  • Do not assume the reviewer has access to external web services/site.
  • Do not assume the code is self-evident/self-documenting.

Be pragmatic

As already mentioned, personally, I rarely write commit message bodies. Most of the time, I believe that the changes I made and the reasons for them are obvious from the ticket. If the developer wants more details, I expect them to open the ticket and find them there. Therefore, I don’t believe that the time required to write comprehensive commit message bodies is worth it.

Overall, I’ll only write a commit message body if I feel that it’s needed. In other words, if I believe that a developer may be confused about my changes, even after reading the ticket, I’ll include a short commit message body.

Even then, I don’t write a lot, because I don’t feel it’s necessary. I believe that a short sentence on why those changes needed to be made, or why that particular implementation was used, is probably sufficient.

Additionally, I provide links for further reading where appropriate, because most developers will have internet access and will find those links useful.

But of course, in your case, you can write commit messages that are as comprehensive as you want. You know what’s best for your own project.

Formatting guidelines for the commit message body

It’s recommended that you separate the commit body and the commit subject with a blank line in-between them. Some reasons for this are:

  • It logically separates the commit subject from the body. Without the blank line, you couldn’t be sure if the second line was intended to be part of the subject or part of the body.
  • It’s easier to read.
  • I’ve heard of many conventions stating that there should be a blank line in-between, but I’ve never heard of the contrary. In other words, you can consider this to be a convention.
  • I’ve heard that some tools rely on having a blank line between the subject line and body, one example being Vim. (Although I’m not sure if this issue still exists.)

Additionally, the commit message body should be wrapped at 72 characters. This means that you should start a new line whenever the next word you would have typed would make the line wider than 72 characters. The reasons for this are:

  • Similar reasons as the commit subject line being limited to 72 characters.
  • It’s easy to do, so you might as well follow an old convention.

(If you feel that this convention is no longer necessary, then please feel free to ignore it. Also, please let me know your reasons for it in the comments, I would like to know.)

Commit message footer

The commit message footer is there for metadata, such as references to ticket IDs, or anything else your organisation requires.

This would be my preferred place to include the ticket ID.

Most conventions recommend having key value pairs, with one pair on each line, such as:

TicketID: 12345
MyKey: MyValue

In terms of other guidelines, similar to the commit body, it’s recommended to separate the footer and the body or subject line with a blank line.

As mentioned earlier, if possible, I recommend using a git hook to insert the metadata for you, or at least validate that it’s there. That way you can’t forget to include it.

For additional information and examples, see "including external references" from the OpenStack Git Commit guidelines.


Tips for good version control while working

To finish off, here are some tips to help you when working with version control.

Commit early and often, clean history and push once

A good workflow when using version control is the following:

  1. Commit early and often.
  2. Clean and rewrite your local branch history as often as you want.
  3. Push to the remote after you’ve finished your work. Then, don’t rewrite your branch history any more (because you shouldn’t rewrite the history of remote branches).

Committing often is very useful. I recommend committing every time you write code that you believe you might want to keep. You can even use temporary commits with messages such as "wip" (work in progress).

That way, if you make more changes that you don’t want to keep, you can hard reset back to an earlier commit. Since you commit often, you’ll only lose the changes you made in the last few minutes. Most likely, those are exclusively the changes that you wanted to throw away.

On the other hand, if you commit infrequently, you can’t just hard reset. If you do that, you may lose a significant amount of work. Instead, you’ll have to manually and carefully fix your code until it works again, which is much more error prone and time-consuming than just resetting back to the working version from 2 minutes ago.

Also, you can clean your branch’s history any time you want. For example, when you complete some code that you’re certain you want to keep and you want to properly structure the commits for it.

At the very least, you should ensure that the commit history is clean before you push.

Some commands you can use to clean the commit history of your branch are:

  • git rebase --interactive, which can rewrite the entire history of your branch. For more information on using this command, see the git rebase tutorial by Atlassian.
  • git commit --amend, which modifies your last commit instead of creating a new one.
  • git commit --fixup hash-number, which creates a new commit but flags it for "fixup" with the commit with the hash "hash-number". For more information on this command, please see auto-squashing git commits by thoughtbot.

For more details on commands for rewriting history, please see the rewriting history tutorial by Atlassian.

Configuration for options

With certain commands, you may always use certain options.

For example, you might always use the --rebase option with the git pull command (git pull --rebase).

If that’s the case, you can modify your global git configuration file to always use certain options with certain commands, even if you don’t include them in the command. For example, you can make the command git pull equivalent to git pull --rebase.

To do this, you can modify your global git configuration file directly:

// .giconfig

[pull]
  rebase = true

Alternatively, you can change the configuration from the terminal, with a command such as git config --global pull.rebase true.

Consider using a different branch for risky rebasing

Rebasing can be risky. It creates many more code conflicts than merging. Depending on the kind of code conflicts and their frequency, this can make rebasing fairly error-prone.

Thankfully, there are ways to reset a rebase. For example, after a rebase, you can view the git reflog (with the git reflog command), find a commit hash from before the rebase started and reset back to that commit. (For more information on resetting rebases see the git rebase tutorial by Atlassian).

However, it’s not always easy. For example, the reflog can get fairly messy after many rebases, making it difficult to look through it. This means that you may have difficulty finding an appropriate commit to reset back to.

As a result, whenever I’m faced with a tricky rebase, I tend to do this instead:

  1. I branch off my current branch with git checkout -b new-temporary-branch-name. This creates a new branch which contains the entire history of my current branch.
  2. Then I do a rebase on the new branch. If the rebase on the new branch goes badly, I can just delete it and my original branch will be unaffected.
  3. If the rebase goes well, I hard reset my original branch a few commits back.
  4. Then I either fast-forward merge the new branch into my original branch, or cherry-pick some commits from the new branch into my original branch.
  5. Then I delete the new branch.

Consider using a new branch for prototype code

If I want to try out some "prototype code" that I may end up not keeping, I sometimes create a new branch for it.

I do this because I don’t want to have "prototype code commits" mixed in with my normal commits. If I did, I might lose track of which commits are the prototype commits. Then, when I go to delete them, first, I’ll have to hunt for them in the commit history. Even worse, I might accidentally delete one of my normal commits, thinking it was a prototype commit.

So instead, I create a new branch and work on the prototype code there. If I decide that the code is not worth keeping, I delete the branch. If I decide to keep it, I just fast-forward merge the branch into my original branch.


Final notes

That’s all. I hope this article was useful to you. If you have any feedback, or even counter-arguments, please let me know in the comments.

See you on the next post.