Open source tile caching - Dana's intern experience

by Dana Lambert

During the 2019 to 2020 summer break, I was fortunate to be offered an internship working with Catalyst’s GIS team. Before this experience, I had just finished my third year as a software engineering student at the University of Canterbury. I had little industry experience in backend development, coming from more of a front-end background, and had absolutely no idea about GIS. Dana pictured with flowers in the background

Map tiles, GWC and OpenStack Swift

One of my projects was to speed up caching and serving of map tiles from GeoWebCache (GWC) with an OpenStack Swift (Swift) backend. My main problem was that I had no idea what map tiles, GWC or Swift were, to begin with.

Understanding map tiles is relatively simple. Rather than using a single image to show a map in your browser, map tiles are smaller square images (usually 256x256 pixels) that are displayed in a grid and make up the map. This significantly speeds up rendering time, as multiple map tiles can be requested in parallel, are easier to cache and – most importantly – you don’t have to re-draw the entire map canvas when users pan around the map. GWC is a core technology in this process: it caches map tiles to speed-up map rendering for end users.

Swift (an object store) was a little harder to wrap my head around. I’ve heard the term “object storage” used around Catalyst, but I had no idea why it was used as a backend for GWC. Object storage allows users to store and retrieve data as objects. Objects consist of the data itself, some metadata and a globally unique identifier. The cost for object storage scales proportionally with the amount of data stored, and objects – at least on the Catalyst Cloud – are automatically backed-up in multiple cloud regions. Using object storage for storing our map tile cache means lower costs for our clients, along with greatly improved storage reliability.

Creating an OpenStack integration

My first technology challenge was that GWC – while supporting some other object storage providers – did not have direct support for Swift object storage. I decided to create a custom GWC module to support Swift-based storage, building on the work of the excellent Apache jclouds® (jclouds) multi-cloud toolkit. Jclouds can interact with a wide range of cloud technologies through a simple and unified codebase. By using jclouds, my solution will be compatible not only with Swift but also with other object storage providers in the future. Additionally, jclouds has excellent documentation, which made it easy to learn and set up a sample project.

However, using jclouds didn’t come without challenges along the way. Most features, such as storing and retrieving map tiles, were relatively straightforward to implement using jclouds, but some other features were harder to tackle. I knew that tiles were represented in object storage using a folder-like hierarchy of properties, from layer name to projection to zoom level. However, it was hard to conceptualise how to translate a tile deletion request based on a geographic area into identifying the correct “folders” and files to delete. In the end, I studied existing GWC storage backends and discovered that GWC provides methods for identifying all tiles within an area using these properties as filters. I was then able to delete these tiles by simply iterating through them, using jclouds’ built-in delete method to remove them.

Another issue came up when testing the Swift integration for the first time. Initially, everything seemed to work as expected. However, on closer inspection, I found that GWC would return the correct tiles but not label them as images, leading to some browsers showing blank maps. I realised that I need to explicitly define the type of image used (such as ‘png’ or ‘jpeg’) when GWC first uploads them to the Swift backend. Fortunately, jclouds made this easy by providing a built-in way of setting those properties. However, I would not have spotted this without manual browser testing on multiple computers.

Mocking the clouds

After I was confident that my module was working as it should, I had to add automated tests. Automated testing is a form of scripted quality assurance used to verify software is functioning as expected. The aim is to automate checks which would be repetitive and time-consuming to do manually, saving time in the long run and ensuring future code changes don’t break existing features.

Automated tests can examine many aspects of a codebase, such as how individual methods work (unit testing) or how different parts of the application work together (integration testing). I realised that jclouds was already taking care of all interactions with the cloud object-store. So, my job was to ensure my module was a) giving the right data to the proper methods in jclouds, and b) handling any data received from jclouds correctly. For example, one test case verifies that when my module uploads a new map tile, it checks that the correct image format and the image size is explicitly defined. To do this, I needed to fake the existence of an actual cloud object store enabling me to have control over what jclouds methods were returned – in software, this is called mocking – a concept I hadn’t fully understood before my internship.

My work with the open source community

My last step was to propose my extension as an “official” addition to GWC. GWC is a mature open source project (more than 10 years old) with around 130 releases and 59 contributors. Generally, custom code that improves or fixes something gets into a FOSS project after the idea is proposed to the community. Once there is some interest, a so-called pull request can be made against the main project. Pull requests are a way of suggesting contributions for projects on GitHub. They may go through multiple reviews and need to pass multiple checks before being accepted.

I proposed the extension of Swift integration by emailing the GWC development mailing list first and was excited to receive a positive response. I have since created a pull request against upstream, which first required all the automated tests to pass. Surprisingly, the automated tests highlighted a few issues that did not show on my computer, such as some extensive debug output causing the tests to terminate early. This issue has now been resolved and the pull request is now waiting for an approving review.

A fantastic learning experience with a great team

Overall, I’ve thoroughly enjoyed this experience. It has been great to delve into the GWC project and learn about object storage while tackling some interesting problems. I’ve learnt a lot in the process from November to now. I have benefited from the range of new technologies and concepts which I’ve been exposed to, such as object storage, GWC, and working with Java and Docker. I believe all of these will help me in future development projects. This experience has also really opened my eyes to some areas of GIS, which I’ve found very interesting, having had no prior knowledge in the area.

This experience also wouldn’t have been the same without the opportunity to work with such an excellent team. I’ve appreciated all the help from the experienced developers on our team in solving problems and understanding concepts. I have also enjoyed being able to collaborate and discuss issues with the other interns, helping each other as we faced problems along the way.

 


Follow Catalyst on Twitter and LinkedIn to stay up to with all of our opportunities and news.


Catalyst is a New Zealand owned and operated company where openness, long term relationships, community and diversity are essential characteristics of how we do business. Since 1997, Catalyst has been enabling success with expert open source solutions, and clients all over the world trust us with their mission-critical systems.