The BackupKey Saga: Samba triage and development on a deadline

By Andrew Bartlett

In Nov 2014 Microsoft released MS14-066 / KB2992611, fixing an issue some suggested was 'Microsoft's Heartbleed', in reference to the OpenSSL issues also found last year. Not long after this Samba AD DC users began having issues creating new mail profiles.

Indeed, as the patch fed into production at many organisations, it was found to cause issues in interoperability with many existing systems, and so it was not a big surprise to see it hit Samba as well, and so Catalyst's Samba Team mobilised to investigate and resolve the issue for the customer.

By Jan 2015 the issue was raised by a customer, and our engineers Andrew Bartlett and Garming Sam handled the development and testing of this issue.

This incident highlighted very well the capabilities our Samba developers have at Catalyst, ranging from the triage and diagnosis of one-off issues to the development, testing and upstream submission of complex new features. Additionally, the very nature of a security update is that it MUST be applied, it cannot be safely avoided, and so simply skipping KB2992611 was not an option.

Their approach was in four strands:

The first approach was to engage Microsoft, to ask them what changed. The Samba Team has access to great team of interoperability engineers at Microsoft, which we pay for access to as part of a significant anti-trust judgment against Microsoft in Europe.

Second was to build a series of six virtual machines, running Windows 8.1 per the customer's reproduction of the issue. On two of those were installed with no updates, on two the Nov 2014 roll-up release was installed, and on two all the current patches were installed. Then snapshots were taken, at the stage where they were prepared to join Samba (we used GIT master, as we would develop against that), and Windows 2012R2 (as a reference).

These snapshots were vital, because our Samba engineers wanted to trust nothing about the state the machines would preserve, and needed to be sure that when they 'fixed' the issue in Samba, that they could then revert the fix, re-test and again reproduce the issue.

They also wanted to be certain to isolate the changes in behaviour introduced with KB2992611, so had the unpatched machines available to double-check the before-and-after behaviour. These same machines were used to create traces using a variety of Microsoft-supplied tools, to provide data for before-and-after analysis by the Microsoft protocol interoperability team.

Moving on to areas of possible development, it was Garming who had the earliest hunch as to where the problems might lie, on the basis that KB2992611 fixed issues in a X.509 handling library. Earlier in the previous year, he had looked at some patches written by Arvid Requate of Univention to improve our BackupKey handling, a protocol that also uses X.509 and cryptography.

BackupKey is essentially a remote encryption/decryption service, designed to encrypt safely a final backup of the user's saved internet and e-mail passwords on the Active Directory DC, in case the user forgets their own password, which otherwise protects these secret values.

The network traces we took soon provided evidence, and showed the Windows clients contacting the ProtectedStorage DCE/RPC pipe. The team proceeded to 'printf debugging', adding additional debug statements and increasing the prominence of those already in place. It became clear that while Samba had implemented one of the two choices for this protocol, known as ClientWrap, that the client was attempting to use both that and unimplemented ServerWrap protocol, contrary to the MS-BKRP documentation.

At this point our engineering team decided to divide their efforts. Andrew Bartlett proceeded to implement the new protocol per the MS-BKRP specification, and Garming focused on refining the patches from Univention, on the basis that the client may be using ServerWrap because it found the ClientWrap implementation unsatisfactory.

On the ServerWrap side, the protocol was fleshed out in the server, and new tests were written ensuring that we would match the behaviour of the Microsoft Windows 2012R2 server exactly.

On the client-wrap side, the patches by Univention were refined, working to ensure that a 2048bit certificate would always be generated, something that Univention had noticed becoming a
requirement in June 2014.

As development progressed, the tests soon began to pass against both Samba and Windows, when our team hit upon an interesting thought: could our tests be passing perfectly, and yet still be wrong? If a remote encryption service implemented the wrong algorithm, but gave the correct answers (say by implementing a Caesar cipher, or DES, rather than RC4), could the tests written so far detect that? Would Samba still be interoperable if a mixed Samba/Windows domain was in use, with some servers running both systems? How could this be proved?

It was at that point that Andrew hit upon an ingenious idea: The secrets used to encrypt and decrypt the passwords are themselves stored in Active Directory. That implies that to Administrators, they are available over RPC. A new test was written that asked the server for the secret key, then asked a value to be encrypted. Then, the server-side encryption/decryption code was re-implemented in the client.

Disaster was indeed near, but was averted: the tests failed, the implementation was indeed incorrect! They had nearly developed a 'Samba-only' solution.

Our team looked over the documentation, and looked over it again - but frustratingly the code matched the specification exactly! How could they even start to guess what missing, 'secret' value was incorrect?

The Samba Team had been here before, with crypto puzzles having a long history and even a mention in the EU Courts, as part of the argument for why such documentation was critical.

Thankfully the solution didn't take long to find: a paragraph in the documentation called for a key to be truncated, oddly ignoring most of the value. When the truncation was removed, the tests all started to pass, and patches were soon provided to the customer and posted for inclusion in upstream Samba 4.2.

At the conclusion, our team evaluated the approaches taken, to understand what the key was to solving this for the client. They found that all four approaches were critical in resolving the issue:

The Microsoft engineers came back to our team, and acknowledged that the ServerWrap protocol had indeed gone from optional to mandatory, and that this had also broken some Windows-based networks. They also wondered just how our developers had managed to spot the wrong key length!
The tracing, development and testing, would not have been possible without the VM snapshots.
The ClientWrap protocol did need the fixes originally suggested by Univention.
The ServerWrap protocol implementation (written from scratch) was also strictly required.

It was an intense three weeks for our Samba development Team, but in doing so it showed the great capability we have to take issues from triage all the way to fully-tested and upstream-ready fixes, even when they involve implementing new protocols.