For everything you need to know about version control, check out Version control – Everything you need to know, on Programming Duck.
Knowledge prerequisites: This article assumes that you can already use the basics of git and know the basics of version control. If you don’t, then I recommend that you start at the beginner section of the article Version control – Everything you need to know.
Structuring your commits well helps you gain the maximum benefits of version control.
However, as already mentioned in Version control – How much do you need to know and use?, this probably isn’t necessary. You can get all of the essential benefits with very little diligence in your commits.
In many places I’ve worked in, I’ve often seen:
- Branches consisting of a single, large commit.
- Multiple commits, each of which was just the work the developer did that day, rather than deliberately structured.
- Multiple commits in a branch, each with the same commit message.
- Commits that don’t pass the build. In fact, the majority of commits fail the build until the final commit of the branch.
That’s completely fine. It probably doesn’t negatively influence most projects.
However, if you want to obtain the maximum benefits from version control, it’s worthwhile to have a bit more diligence and structure with your commits.
Guidelines for commit structure
Here are some guidelines for structuring your commits, based on the "nice-to-have" benefits described in benefits of version control.
Commits should be small
Small commits help with:
- Debugging – If you find the commit where a bug originated and the code changes in that commit are very small, then you only need to debug a small area of code.
- Code reviews – Everything else being equal, less code is easier to review than more code.
- Reverting commits – If your commits are large, then you’ll lose a lot of code when you revert them, even code that worked fine and didn’t need to be reverted. If your commits are small, then you’ll lose less work when you revert them.
Commits should be stable
Stable commits help with:
- Reverting / resetting commits
When debugging and searching for the commit where a bug originated, you
checkout to different commits each time.
If the commit you
checkout to is stable and passes the build, you can start the server, launch the product and do as much manual testing as you want. On the other hand, if the code doesn’t build, then that’s not possible. Instead, you’ll have to waste time searching for the closest stable commit, or fixing the current commit until you can run the code and test it.
Reverting / resetting commits
If every commit passes the build, it means that you can revert back to any commit with minimal risk.
Commits should be logical units of change
This helps with:
- Code reviewing
- Examining the commit history of the project
- Reverting commits
This guideline just encourages common sense organisation for your commits.
Consider, when examining the commit history of the project, would you rather see a history like this?
- Work on feature X
- Work on feature X
- Work on feature X
- Complete feature X
Or would you rather see a history like this?
- Add markup for feature X
- Add styling for feature X
- Add validation for feature X
- Import and use feature X in app
Overall, it’s not possible to get an accurate idea of what happened in each commit.
The commits in the good example are much better. At the very least, you can get a clear idea of what happened in each commit. Obviously, this helps when examining the commit history of the project.
(Note: You can still import the incomplete feature into your app for testing purposes. Just don’t commit that part until the feature is ready.)
Also, you can split your commits in any way you want. Here is a more "vertically sliced" example, split by "feature" rather than file type:
- Add case X for data validation
- Add case Y for data validation
- Change error message for invalid email
Every commit should be a logical unit of change. The changes made in a commit should be related and logical. Other, unrelated, changes should be in different commits.
Logically structured commits also help with reverting. The changes in each commit are more obvious. This means that it’s easier to understand which commits you should revert. Also, because you’re committing changes in a proper order and including all of the related changes, it’s easier to keep your commits stable.
Further, it helps with debugging. Instead of having multiple unrelated code changes in each commit, which you may not expect, you’ll have clear and related code changes.
Finally, it helps with code reviews, because every commit’s goal and code changes are easily understandable and therefore easier to review.
Commits don’t have to be miniscule (extremely small)
This probably goes against the traditional advice you’ll hear about commits. However, I personally believe that commits which are too small are unhelpful.
There is a balance here to be had here. Most commits benefit from being small and atomic. Further, most developers tend to create commits which are too large, rather than too small.
But, at the other extreme, you can also have commits which are needlessly small.
For example, at one point I experimented with creating commits in accordance with a very small TDD loop, which went like this:
- Create a small failing test for a particular case.
- Code the minimum implementation necessary to make the test pass.
On every loop, I created a commit which included the code changes for steps 1 and 2 (I combined these steps to have stable commits). If step 3 was needed (it’s not always necessary), I created a separate commit for it.
I ended up with a lot of commits.
Here is an example:
- Handle base case in factorial
- Handle case n=1 in factorial
- Handle case n=2 in factorial
- Combine conditions in factorial (refactor step)
- Handle all positive n in factorial
- Remove unnecessary check for n=2 (refactor step)
- Throw error for case n<0
- Import and use factorial function in X
In my opinion, these small, atomic commits are more of a nuisance than a benefit. I personally find them too long to read through.
I would much prefer:
- Add function to calculate factorial
- Import and use factorial function in X
However, let’s consider the pros and cons in terms of what the benefits from version control:
- Examining project history – Personally, I find the second example easier to read. The first example provides more detail, but I don’t feel like that level of detail is helpful.
- Debugging – The smaller commits are probably slightly easier to debug. However, the code in question is very small anyway, so the difference would be minor at best.
- Reverting / resetting commits – Deciding which commits to revert requires reading the commit history. As mentioned, I believe the non-TDD commits are better for this. In terms of code changes that will be lost, more changes will be lost with example 2. However, example 2 is fairly small anyway, so it’s not a significant difference.
- Code reviewing – Personally, I would not want to code review at the level of granularity of the tight TDD commits.
However, please bear in mind that this is just my personal preference and opinion. It’s perfectly acceptable for you to have the opposite opinion. It’s up to you to make your own decision on what you prefer and to use what’s best for your codebase.
The most important thing in software (and in many areas of life) is to be pragmatic. This means to maximise the value in what you do and to not be stuck in theoretical ideals if it’s just not worth it.
Following that, as already mentioned, I don’t believe that being 100% perfect with version control, particularly with your commit structure and commit messages, is worthwhile. I believe it’s better to be pretty good, rather than perfect.
Personally, I would consider myself only around 80% as diligent as I would be if I was trying to do everything perfectly.
For example, sometimes I create a commit where I think "I really should split this into two commits", but don’t really want to make the effort or want to have the additional commit in the commit history (too many commits can be a nuisance too). However, if I was really being 100% diligent, I might have split it.
At other times, maybe I realise that a few commits back I committed an unrelated refactor with the code changes. If I think that it’s only minor issue, I won’t always go back and fix it. Maybe I just don’t feel like the time I would spend on fixing it is worth the benefit gained over the duration of the project.
In your case, remember to be pragmatic. Feel free to try both options (100% diligent and pretty good) and review which is best for your situation.
Structuring commits in your own work
While you work, you probably won’t be actively thinking of all the guidelines mentioned so far. Instead, here are some easier things to consider:
- Can you describe all of the code changes with a single, short phrase, without the word "and" included? For example:
- Change the styling on the about page
- Change the styling on X section of the about page
- Add about page
- Add a check for X error in form validation
- Add public method X to class Y
- How large are the code changes?
- Larger code changes may benefit from being split into multiple commits.
- How related are the code changes? Is it obvious or expected that these changes would be in the same commit?
- If changes are unrelated, or if a user may not expect them to be in the same commit, then perhaps they should be split into different commits.
In the end, you’ll have to make your own judgement. Consider all of these points. As you gain experience, you’ll get better at judging how to structure your commits.
Also, as already mentioned, most people tend to create commits that are very large. To counteract that, consider trying to create commits which feel too small for a while. Most likely, they won’t actually be too small. Also, if you don’t try this to get used to smaller commits, you may never notice if your commits are too large.
Here are some examples of good and bad commits.
Multiple things per commit
If your commit message includes the word "and", it suggests that you could split the commit into multiple commits.
- add images to the resources folder and add images to about page
- finish HTML and styling for about page
It may be better to reorganise the commits like this:
- add images to the resources folder
- add HTML for about page
- add styling for about page
Alternatively, if the about page is fairly large, you may instead want to split the commits by section, like so:
- add images to the resources folder
- add header section to about page
- add section 1 to about page
- add section 2 to about page
- add section 3 to about page
- add footer to about page
In these commits, commits 1 to 6 would contain both HTML and CSS changes, so that they form a "logical change". To put it differently, a user reading these commit messages in the future would probably expect both HTML and CSS in each commit.
Configuration and package installations
In general, I prefer to have granular commits for configuration and package installations. Only one package installed per commit. I prefer this so I can see them easily in the project’s commit history.
For example, here are some of the initial commits from a personal project I’m working on as of the time of writing:
- build(config): Set up initial package.json
- ticket-number: build(dependencies): Install webpack
- ticket-number: build(dependencies): Install webpack-cli
- ticket-number: build(dependencies): Install html-webpack-plugin
- ticket-number: feat: Add sample files for webpack build
- ticket-number: build(config): Add basic webpack config
- ticket-number: build(npmScripts): Add build script
- ticket-number: build(dependencies): Install @babel/core
- ticket-number: build(dependencies): Install babel-loader
- ticket-number: build(dependencies): Install @babel/preset-react
- ticket-number: build(config): Add basic babel configuration
- ticket-number: build(config): Use babel in webpack build
However, it would also be acceptable to group multiple related dependencies together, especially if they’re normally installed together. Use your own judgement and use whichever version you prefer.
Here are some more commits from a story in one of my personal projects:
- ticket-number: build(dependencies): Install rxjs
- ticket-number: fix(config): Remove console.log from storybook config
- ticket-number: test(testUtils): Add test utility for testing custom rxjs observables (this is needed for the textProcessor commit further below)
- ticket-number: build(config): Add import alias for testUtils in jest config
- ticket-number: feat(controller): Add function createSplitEveryNObservable in textProcessor
- ticket-number: feat(model): Add ChunkStateManager class
- ticket-number: feat(controller): Add uploadHandler file
- ticket-number: feat(view): Use new controller functions for text upload and processing
Here, I had the option of splitting commit 6 into more granular commits. E.g. Instead of 1 commit for the entire
ChunkStateManager class and its tests, I could have created a separate commit for every method. If I was being 100% diligent, I would have done that, but at the time I thought that it didn’t matter very much. I also didn’t want to spend the extra time to split the commit.
Code refactoring commits
Personally, I don’t always separate commits where I refactor code.
If I believe that the refactor is small and related to the current functionality I’m trying to add, then I don’t create a separate commit.
For example, I may be working in a class to add a new method to it. When I finish, I may notice similar code elsewhere in the same class, so I might proceed to refactor both instances of duplicate code into a new private method. In this case, I wouldn’t create a separate commit for the refactored code.
Since the refactor is small and related, I feel that a separate commit would be more of a nuisance than a benefit. Also, a separate commit wouldn’t significantly help with debugging, since the additional code changes are very small.
On the other hand, if it was a significant refactor, or if the duplicate functionality existed in different classes (and I refactored both of them to eliminate it), I would probably create a separate commit. Commits should be small, so in these cases where the code changes are larger, I would split the code changes into two, smaller commits. Also, a refactor on class Y feels less appropriate when I’m supposed to be working on class X.
On the other hand, if it was a significant refactor, or if the duplicate functionality existed in different classes (and I refactored both of them to eliminate it), I would probably create a separate commit for the refactor. Commits should be small, so these larger changes should be split into separate commits. Also, a refactor on class Y, when I’m supposed to be working on class X, is probably unexpected. Someone reading the commit message in the future "add functionality foo to class X", wouldn’t expect a change in class Y. In other words it probably wouldn’t be considered an atomic, logical unit of change.
Finally, if I don’t consider the refactor to be related at all, then I would definitely have separate commits. A commit should be a logical unit of change, so unrelated code changes should be in separate commits.
However, these are just my thoughts on this. Remember that you can do whatever you think is best in your situation, especially if you have well thought-out reasons for doing so.
That’s it for this article.
If you have any feedback, or even counter-arguments, please let me know in the comments.
Next, if you want to know more about version control, please see the article Version control – Everything you need to know.