The Git submodule: misunderstood beast or remorseless slavering monster?

By Adam Riddell

As the title suggests, submodules are a somewhat controversial topic amongst git users. Upon suggesting the idea of submodules to a team of developers you can expect to receive a diverse range of responses, from "no no no", to "argh", to "what's a submodule, please?". As I see it, submodules have had a bit of a hard rap. While they occupy a somewhat esoteric space in the git habitat, and there is some overhead involved in really getting to grips with them, submodules can be an incredibly useful tool to have around. At Catalyst IT we use them all the time - and they are an indispensable part of maintaining and supporting the myriad of plugins in use across our client Moodle and Totara sites.

What is a submodule anyway?

Even amongst those well versed in the arcane ways of git, submodules are still a bit mysterious. As is often the case with git, the underlying implementation is beautifully simple, but the human-facing interface is... not. Broadly speaking, a submodule is a git repo inside a git repo ("we heard you like git..."), but it's a little more sophisticated than that.

A submodule can be any git repo that is available to you - it's even possible to use the same repo that you are attempting to add a submodule to (although if you are seriously considering this I would recommend that you stop to re-evaluate the life choices that have led you to this point). When you add a submodule to your project, git will keep track of exactly two things: the remote of the submodule git repo, and a commit hash from that repo. That sounds pretty simple, right?

So what else happens when you add a submodule to your project? It's really not that different to cloning a git repo in the normal way:

git submodule add git://wacky-submodule.git ./wacky-path

A new directory "wacky-path" will be created and the code for "wacky-submodule" will be checked out inside this directory (git will check out the head of your master branch by default). If you have a look in this new directory, you'll find that everything looks and behaves like a normal git repository. From within this directory you can stage, commit and push changes, pull from the remote repository, and view the git log. As far as git is concerned it *is* a  normal repository.

From the outside it's a different story. In your parent repo, all git cares about is whether the contents of your submodule directory match the contents of the remote repository and commit hash that it has. If that isn't the case, git will tell you that there are changes in your submodule when you do "git status". To change the contents of your submodule, you change the commit hash that it's pointed to, then stage and commit this change and push it up. You can do this either by directly modifying your submodule and committing your changes within it, or by pulling changes from the remote repository of the submodule.

Why are they great?

So that's a (very) brief introduction to submodules. Why even bother with all of this, though? The first important advantage to using submodules is that submodules allow you to keep the commit history of the component you're including in your project. Assuming the component you're adding is publicly available as a git repository, incorporating this component without the use of submodules presents a problem. You would normally have to throw away all of that potentially useful git history metadata and replace it with a single commit that is (at best) semantically equivalent to "installed the thing from the place". You can of course go and find the external repository and look at the history there, but with submodules all of that version history is available right there in the code base. You can also use a newer (or older) version of the plugin whenever you need to by changing the submodule commit.

At Catalyst we make use of a number of Moodle plugins which are common across more than one site (the Moodle component of our newly available Coursebank service is just one such example). It is exactly this kind of scenario in which submodules make a lot of sense. Rather than keep several redundant copies of the version history for a particular module across our Moodle code bases, we use submodules to centralise this metadata. All that we need to keep in each Moodle code base is the location of the submodule and the particular version that needs to be checked out. We have one authoritative source of version history for that module, and tracking any changes is nice and straightforward. If we want to customise a 3rd-party plugin, we use a local mirror repository and keep our changes in a branch there. We can use the same customised version on multiple Moodle site, and we don't need to do any complex tracking to ensure that the same version is deployed everywhere. If the submodule commit is the same, we can be sure that exactly the same version of the plugin is deployed.

The dark side of submodules

All this is not to say that submodules are always an ideal solution. Unless you re-use the same sub-component across multiple code bases, the trade-off in complexity may not be worth it. I'll outline some things to bear in mind when using submodules.

Make sure you push

If you maintain your own repositories for use as submodules, it's likely that these repos won't be much use outside of the context of a parent project. Moodle plugins are a very good example of this. In this scenario, the most convenient way to develop with submodules is to work on the submodule directly inside your code base. The thing to watch out for here is that it's possible to make changes within your submodule, stage and commit them and then update the parent repo to point the submodule at this commit without first pushing your changes to the submodule repo. If this happens, the next person to pull your changes to the parent repository is going to receive a rude submodule error, and they are going to blame *you*. The way to avoid this is to make a point of remembering to push any commits inside your submodule before you update the parent repository.

Be careful when git cloning with submodules

Another potential pitfall - and the one that seems to worry submodule detractors the most - is that a standard git clone won't check out your submodules for you. Git will cheerfully clone a repository and leave any submodule directories desolate and bare. This little quirk becomes especially horrifying when one imagines the possibility of vanishing submodules during a production deployment. The fix to this one is simple - using the --recursive flag when doing a git clone ensures that all submodules will be present and accounted for. Of course, automating your deployments is the way to go here, because a script will never forget "--recursive". Besides, you're not doing your production deployments manually, are you?

What can I do to make using them less terrible?

Probably the single most effective way to combat problems like the pitfalls above (or indeed any number of problems submodule related or not ) is to make use of an effective continuous integration/delivery system which will catch little human mistakes like this before they get anywhere near production. At Catalyst, we make use of the Go Continous Delivery system to do just this - submodule plugins are regularly run through our Moodle test suite, and deployments are packaged and deployed automatically. This ensures that scary scenarios like missing plugins on production sites remain disturbing fictions that merely haunt the dreams of our dedicated sysadmins.

Go forth and submodule!

So we've established that git submodules are a powerful tool with some subtle nuances and some... delightful eccentricities. Despite the controversy that surrounds this often neglected corner of git, the overhead in learning it can be well and truly offset if you need to manage common components across a number of similar code bases. At Catalyst we've found that for the right application - supporting common Moodle plugins deployed across a wide range of sites - submodules are a great fit.

Catalyst IT Australia maintains and supports a diverse array of Moodle and Totara sites. If you'd like to find out more about that, check out:

https://www.catalyst-au.net/products-services-elearning

If you're interested in delving further into the intricacies of git, Catalyst have a number of upcoming git training courses: 

https://www.catalyst-au.net/training-services