On the importance of upstreaming

by Robin Sheat

This is based on a talk I gave at OSDC2014 in the Gold Coast. The conference page is here, and the video is also online.


A few years ago, I ran a Koha workshop at the Malaysian Open Source Conference in Penang. One thing that surprised me was the amount of people who were still running Koha versions around 2.2, which was released in 2002. What had happened is that over the years, they'd tweaked the Koha code they have to suit their needs a bit better. This is a perfectly fine thing to do, and is one of the benefits of free software, however in this case it caused one real problem: there was no longer an upgrade path to newer versions of the software. Here, I will cover the pros and cons of avoiding making a fork, and instead trying to put as many changes as possible back into the upstream project.

What is upstreaming?

Put simply, upstreaming is sending fixes and improvements that you make to a program to the original project so that they can be incorporated into it. This really is a process, it's not enough to just throw some patches over the wall and hope they get picked up. You will have to guide them through whatever process the project has to clear new things for getting into the main repository.

Why should you upstream?

A primary reason to send patches upstream is so that you aren't stuck maintaining a codebase that is steadily becoming more and more custom, as the original project moves on. If your changes get into a release a few months (say) after you created them, then you only have to carry them for yourself for those few months. Once they are in a release, your responsibility for them ends. Another argument in favour of upstreaming is that you're helping make the world a slightly better place. By releasing more code that can be freely used, you're giving people something that they may find useful, and save them from having to implement it themselves. Not just that, but they may take your work and improve upon it. This means that your feature will get better at no cost to you. Going one step further, if the program is fairly critical, it might be worth taking a more active role in its maintenance, and effectively becoming upstream. This might be done by taking on release manager or core developer type tasks. This will allow you to help guide the future direction of the project and ensure that it's not veering from where you (and hopefully everyone else) need it to be. Finally, a perfectly good, but surprising reason for upstreaming is simple vanity. It's good to see people using things that you make, and you get your name, and your company's name, associated with the project. Depending on the nature of your business, this could be a marketing opportunity.

Why shouldn't you upstream?

It is a lot of work to upstream, especially if your change is large and complex, or it's going in to a big project. You have to match the existing coding guidelines, make sure that you have the proper amount of test coverage, ensure that other use-cases aren't being broken by this change, so on, and so forth. However, this must be balanced out by having to maintain these updates until the end of time. It might be that a larger bit of cost right now ends up being cheaper in the long run. Another technical reason, and probably the most valid, is that the changes implemented are specifically designed for the environment at hand, and don't apply in anyone else's system. It's worth looking at these and seeing if they can be generalised at all. That way, people who need to do similar, but different, things will now find it easier too. It will sometimes be the case that there is no option but to carry around a local customisation to deal with this, but it will almost certainly pay to find ways to avoid this. Non-technically, some companies argue that there are potential legal liabilities, or that it may cause trade secrets to be revealed. This is best discussed with a lawyer of course, but it's notable that this hasn't historically been a problem at all. In a similar sort of vein is the idea that it might sacrifice a competitive advantage. These kinds of claims need to be evaluated objectively, they may well be the result of misinformation from long ago that has never been reevaluated.

In conclusion

Keeping in-house forks of projects is an easy thing to do, but it comes with some real traps that can only become apparent once some time has passed. It might be that when the problems caused become visible, it's now a very difficult and expensive proposition to deal with it. However, if a bit of extra effort was put in at the start and instead of keeping a fork changes were sent to the upstream project, many of these problems can be avoided.