Interning at Catalyst: first steps into free and open source software

by Matthew Northcott

Working at Catalyst’s Christchurch office this past summer has been a valuable professional experience, and has exposed me to the world of free and open source software (FOSS). Diving into my first project, GeoNode, I didn’t quite know what to expect. In fact, before my first day at Catalyst, I can’t say I knew how I would begin contributing. Fast-forward three months and I’m delighted to say: it’s very simple! All that’s needed is a Github account, a digitally signed agreement and a willingness to help.

Making GeoNode CMS compatible with Python 3

GeoNode is a web-based geospatial content management system written in Python and the Django web framework, which provides a tidy front-end user interface. My intern project aimed to make a significant version upgrade to GeoNode to make it compatible with Python 3. While this may not sound like much, the task was quite complex, as Python has been vastly improved in the eight-year gap between versions 2 and 3. In particular, seemingly minor changes to the language’s data types and syntax have made this upgrade difficult.

Tackling GeoNode’s complexity piece by piece

At first, the scale of GeoNode was daunting. Both the project’s size and its context contributed to how intimidating this project was to me, as a newcomer to GIS. However, by breaking the problem down and focusing on each component, I quickly realised there is a determinate, organised and rational structure to the chaos. We devised a plan to update external dependencies, test GeoNode using the new dependencies, and then upgrade GeoNode’s core modules. In many cases, modifications were required for compatibility. A notable example was upgrading the django-autocomplete-light dependency. While I was not involved, I understood the commitment necessary from my colleague, Dana, who took on this responsibility. She first had to learn how the old version functioned and then mapped this to changes in the new Python library. The nature of this extension meant that autocomplete itself required its own set of dependencies. Dana spent a lot of time looking into alternative search engines, as they too had been affected by the upgrade. We found this kind of dependency chain was not uncommon during the project.

Upgrading GeoNode’s core modules: the joy of coding

The second stage of the project involved upgrading each of GeoNode’s core modules. Dividing the problem again, each team member worked on one module at a time. This worked well, as GeoNode’s test suite runs per module and in sequence with little to no coupling. During this stage, I found the rhythm of actually writing software. This part is bliss: the knowledge that you are making meaningful changes to a major piece of software that will be used all over the world for a variety of purposes. While the aim was to upgrade compatibility to Python 3, I enjoyed the freedom of being able to make changes that were beneficial to code quality and performance along the way, which I consider a key strength to open source software.

Fixing Python’s inconsistent encoding: from ASCII to UTF-8

On the other hand, updating compatibility came with its complications. A lot of my frustration was due to the evolution of Python’s encoding scheme from ASCII to the now-standard UTF-8 (Unicode) format. Python 2 did support Unicode strings but used them inconsistently and sporadically. Python 3 resolves this by supporting Unicode by default. As a result, much of the project focused on converting the codebase to ensure encoding consistency and compatibility. The majority of issues came under this category.

Python 2 to 3: function return types – the devil is in the detail

Other problems were caused by the changes to function return types introduced in Python 3, many of which were significantly different in Python 2. Cases where functions return an iterator instead of a list, or vice versa, caused some issues. In one instance, the difference caused rapid and unending memory allocation which caused my system to freeze during testing. It took some time to find the one line causing the problem, and changing a single word solved it.

Github: from newbie to old hand

After making code changes, I learned how to create a pull request on Github. I understand each major project on Github does this a little differently, although with the same outcome. For each of our changes, we created the request, waited for the automated tests to be run and, if that succeeded, waited for the repository maintainer to approve and merge the changes. Done. It’s a great feeling when that first pull request is accepted and merged into the master branch for all to see. After that point, it’s onto the next, then the next, and soon there’s a flurry of pull requests to be merged.

Dealing with duelling experts in a global team

GeoNode is an OSGeo-supported open source project with 193 contributors on Github currently. With issues, proposals and pull requests coming from a large number of active contributors, code conflicts are likely, as are differences of opinion about how the project should be managed, what changes should be made, and how to make them. An example of this was in a discussion about how to perform the Python 2 to 3 upgrade. A contributor raised concerns with our approach (which was making GeoNode Python 2/3 cross-compatible first), arguing it added unnecessary complexity. Our plan involved adding some redundant code temporarily, which the GIS team agreed was a necessary stepping stone that enabled us to retain compatibility with the existing test suite. It was in our best interests to keep it this way to ensure ongoing testing, which would lead to fewer bugs to fix after the transition. For more minor and less confrontational conflicts, Github provided an excellent platform to discuss changes and proposals before any were merged or created. This proved extremely useful for reviewing changes with our main point of contact, the primary maintainers in Europe.

FOSS makes it easy to get involved

Upgrading GeoNode was a very positive professional experience, especially considering this was the first large-scale open source project I have been involved in. What surprised me is how easy it is to engage with a FOSS project of this kind if you have the appropriate technical knowledge: specifically, Git and a programming language of your choosing. As with any software development process, there were setbacks and problems. I wouldn’t describe any of these as unsolvable, but some certainly required creative approaches. In some cases, they required software tools I haven’t had much experience in at all, such as Docker, which is a technology I will gladly bring into future projects should the need arise.

Learning by doing with new tech and old

Throughout this internship, I have learned a lot about software I’ve previously been inexperienced in using. However, I was surprised to also learn a similar amount about Python simply by writing more code, despite it being a language I consider myself very experienced in. It just shows how application and situation can affect your understanding and how important these aspects are to consider when learning a skill. Contributing to free, open source software from within a company like Catalyst has been an invaluable experience in this regard, and I’m very thankful for the opportunity that has been my summer internship.

 

Matt (pictured below) works as a Junior DevOps Engineer at Catalyst
while finishing his Computer Engineering degree at the University of Canterbury.


Follow Catalyst on Twitter and LinkedIn to stay up to with all of our opportunities and news.


Catalyst is a New Zealand owned and operated company where openness, long term relationships, community and diversity are essential characteristics of how we do business. Since 1997, Catalyst has been enabling success with expert open source solutions, and clients all over the world trust us with their mission-critical systems.