by Andrew Boag
Catalyst has been working with the great open source Relational Database Management Solutions (RDMS) MySQL and PostgreSQL for well over a decade. We are proud of the experience and capability we have developed in database provisioning, configuration, optimisation and ongoing ownership strategy (meaning full management responsibilities).
Being able to efficiently manage and support a mission-critical application database 24x7 is a core component in Enterprise service delivery.
Since 2012, Catalyst have been working with the Amazon Web Services (AWS) database-as-a-service offering – Relational Database Service (RDS). Initially using the MySQL engine, and more recently moving to the PostgreSQL offering.
Applying RDS in our cloud architecture stacks has allowed us to focus our engineering skills on application improvements and optimisation, without having to be distracted with the 24x7 overhead of managing a highly-available PostgreSQL instance with automated failover.
We recently started taking more advantage of the Point-In-Time-Recovery (PITR) capabilities available for our RDS databases to satisfy our database backup and data retention requirements. In AWS-speak, this functionality is called “Automated Backups” and needs to be enabled on a per-database level – either via the AWS console or using API calls.
As any Oracle DBA will tell you, true PITR is the ability to restore a database to an exact point in time within the defined retention window. An example request from the application team to a DBA might be: “Please restore a copy of database CRM_SYSTEM to its state on Tue, Jan 5th at 03:03am so that we can investigate a data issue.”
So, What's the Difference?
Catalyst has set up PostgreSQL with full PITR support on a range of platforms in traditional bare-metal hosting environments. This can add a complexity overhead to the system management. We have often opted for a simpler model that utilises full-database backup on an automated regular schedule – stored as an SQL archive on an external file system, which is then pushed to tape backup.
Due to the way that full SQL dump's work, we incur higher storage costs holding numerous copies of the full database at particular points in time. And it is not as easy to keep always-up-to-the-minute copies of production data.
However, with the out-of-the-box PITR solution that comes with AWS RDS, we are able to easily fire up a separate copy of the database at any recent point in time. No need to access potentially large SQL archive files that may be located on a remote backup server, find a suitable server to restore them to and wait for the process to finish – for a large database this can be a number of hours.
Concrete Business Improvement Outcomes
In terms of tangible improvements to our operations with this new toolset, we have seen the following:
- Lowering our Recovery Time Objective (RTO): We can recover a running database from a disastrous data loss scenario faster than before.
- A smaller Recovery Point Objective (RPO): This is down to as low as three minutes. A considerable improvement.
- Less BAU Database Admin Work: There is now a lot less pressure on our DBAs and infrastructure team to get involved in day-to-day database provisioning as part of issue diagnostics.
- Simpler Backup Automation Process: Our backup mechanisms are a collection of scripts, build templates and checklists. These are now simpler. This is good.
Cheat Sheet: What to Understand
Some things to bear in mind:
- You have to enable “Automated Backups” for your RDS instance to use PITR. This may be already be enabled by default, but check.
- The retention window (how long you are able to reach backwards) is defined in the “Backup Retention Period” field in the instance settings. Maximum is 35 days, default is 7 days at time of writing.
- AWS generates a daily RDS snapshot that is required for PITR. You'll see these in the RDS Snapshots list. If you delete these, you'll likely break PITR capability.
- While Automated Backups is great, it should not be your ONLY database backup and data retention mechanism.
- Get comfortable with the tool. Do a test restore into a new RDS with the team to make everyone confident.
Get in touch if you would like to discuss any of these ideas further. Enjoy!