End-user Client Certificates with ACME

December 6, 2022

DISCLAIMER: I am currently working for Google. This post is published in my personal capacity, without using any knowledge I may have obtained from my employment with them. All the information provided here is coming from purely personal time and effort and does not represent the opinions or practices of my employer.


In my previous blog post I described how I built my own ACME CA to issue workload certificates for the various services I am running. The only thing left out was one final use case. End-user client certificates. This post helps complete the “vision” of the “mTLS Universe”. After reading this out loud, I can see how it may sound like a nightmare for some… :)

The problem

All my pods and servers can obtain TLS certificates from my ACME CA without problems, using e.g. TLS-ALPN-01 or KUBERNETES-01, which is a new challenge I created for my usecase. However, when my laptop wants to connect to these services, it needs a TLS certificate as well.

It doesn’t have a static IP address, or a dynamic DNS record pointed to it, nor does it have port 80 and port 443 always available on the Internet. It therefore also doesn’t have a hostname pointed to it 24/7 that can be used for DNS-01. And my shiny new ACME challenge is useless, as this is a physical machine, and not a Kubernetes pod.

I need a way to prove to my ACME CA that it is indeed this laptop that requests a certificate, and it really deserves one for the hostname I want.

A new challenge

As this is a new challenge, that cannot be addressed by existing Challenges, let’s create a new Challenge! Any good one needs a way to create proofs that the server can trust, that cannot be faked, and that are tied to the particular entity that requests the certificate. This seems like… attestation.

My colleague Brandon is actually standardizing one such challenge as we speak, which would probably be enough, and possibly better, but chronologically I designed and implemented this challenge almost two years ago, where this effort did not exist. This new challenge is actually older than KUBERNETES-01! Perhaps in the future I can migrate to his. I really like the use of WebAuthn, which I didn’t think of. You can reuse so many existing things.

Oh well… Here we go.

I initially looked at Touch ID, which was available on my MacBook Pro, but such a proprietary solution would limit me to macOS only, on supported hardware, and the (especially Go) API for Touch ID & Face ID was not really great and very limiting. I didn’t manage to create a secure enough proof of concept that I was happy with, despite already rejecting the idea.

The next thing available to me (all my computers) was a YubiKey. I use them for so many things, and found creative applications of them many years ago, since around 2016. They are working on all of my devices, regardless of operating system, and they are portable. That last thing may not be good, but I’m happy with the option.

And this is how the YUBIKEY-01 and YUBIKEY-02 challenges were born. Again, much like the KUBERNETES-01 I don’t think they are generally applicable, and I also don’t plan on standardizing any of them. The device attestation one is already very good. There may be trademark issues too, not sure :P

In these challenges the ACME CA provides the client with a 64 byte value. The first 16 bytes are actually a magic string, that both the server and the client have to verify, and it’s primarily added to avoid cross-protocol attacks. You don’t want to sign e.g. an SSH or god forbid a PGP message when trying to solve a challenge from a malicious server. The remaining 48 bytes are local, organically sourced, randomness.

For YUBIKEY-02, this is enough. If the specific YubiKey public key is present in the CA’s database, and the signature is of course valid, it automatically maps the key to a hostname server-side, and issues a certificate with the public key of the CSR. You can move this YubiKey anywhere, and assuming the computer has an appropriate ACME client, you can get a certificate.

Note that you need to sign the challenge with your private key, stored inside the hardware, but you can give the ACME server any private key for use in the certificate. This is the main difference with YUBIKEY-01 where you have to use the YubiKey key as the private key. My current implementation ignores the CSR and hardcodes the key from its database that maps keys to hostnames.

The reason I have two challenges is that some software could not use the YubiKey as a private key, and therefore I had to get a key on disk as well. However, I am issuing with two different hostnames, that are treated differently in ACLs. All software I write can use YUBIKEY-01 certificates.

There is a reason why this challenge family is called YUBIKEY and it’s specific to that. It is because it leverages a specific feature included, called PIV Attestation. In order for a public key to be valid, when submitting the CSR, before you get the certificate, you have to submit the Yubico Attestation Certificate. This is an X.509 certificate that is only available for keys generated on the hardware, and not for imported ones. This can guarantee there aren’t any copies, or it’s just a PEM file on disk. Assuming the firmware is correct of course. This is documented here. It would be theoretically impossible for an attacker to falsify this certificate.

From that, I extract and use the following:

Slot Number

Each key is stored in a “slot” and there’s a limited amount of them. For example, a key can be stored in slot 0x9a. I have recorded in my user database which is the correct slot for each key. If a key is suddenly found in the wrong slot, I don’t feel confident trusting it anymore. It may be retired, but I’d rather handle retirements gracefully.

Firmware Version

For commercial YubiKeys, the firmware cannot be upgraded or downgraded. You get what was there when the key was manufactured, and even if Yubico added features, you can’t get them, despite also owning a “YubiKey 5 NFC”. I expect the firmware to always be the same, and any change here would be an obvious red flag.

Serial Number

Much like the Firmware Version, the Serial Number is also something that will not change over the lifetime of the key. It has to remain the same. If I receive an attestation certificate with the wrong serial number, I will reject issuance.

PIN & Touch Policy

These two tell you whether this particular slot requires the person to enter a PIN, and when, as well as whether they have to touch the physical key for it to sign any data.

I refuse to accept keys that don’t require a PIN (as anyone could possibly use them if found), and I require the touch policy to be “always”, which means that you need to physically touch it once for every signature it performs.

Rationale

I had the following rationale when implementing this: I need to somehow know this is a real YubiKey-backed key, so I have to get the certificate. If I get it, I might as well check everything that I got.

The Slot Number, Firmware Version, and Serial Number are things that are very easy to verify (compare bytes), and they are impossible to change according to Yubico (which I already trust). Why not check them? Let’s assume there’s a weakness in the ECDSA private key generation code, and one can retrieve it from the public key trivially. They would need to know all these three values on top of the private key. If they’re a powerful adversary, they may be able to get them, but so be it. I can’t do much about it. And the damage is far larger.

The PIN policy is an important one. If the key does not require a PIN, anyone who finds it on the street can plug it in and start signing data. I wouldn’t want such a key. Anything else is probably okay, although I’d recommend for this type of thing to use an “Always” value as the only acceptable one.

With Touch policy, it is similar. Imagine if I had a “Once per session” PIN policy and a “No touch required” touch policy, or a “Cache for 15 seconds”. Someone with access to my laptop could easily send commands to the YubiKey and have it sign things arbitrarily. Now there are some mechanisms to protect against this, such as the fact that YubiKeys support only one session (that’s a feature!), but it’s better to require an explicit touch.

How far can we push this?

With the challenge above I can issue TLS Client Certificates for all my mTLS needs. But there’s another type of connection I am still making, and it’s not TLS. And that’s SSH!

It’s not commonly known, but SSH also supports certificates! They’re not X.509 certificates like TLS, and they have different properties, tradeoffs, etc. but they are still certificates that a CA can issue.

Instead of adding a list of public keys to authorized_keys on a server, you add a list of CAs. And then, assuming the implementation supports this, it allows you to log in with any key that can also present a certificate from these CAs that is valid.

It has been added to OpenSSH for a long time now, and Go has native support for that in its standard library. I like the idea of that: I am not using a 100-year self-signed TLS cert for mTLS, so why use an eternal “root” for SSH?

However, ACME is designed for X.509 certificates. But is it? I mean sure, it clearly states that in the definitions of the terms, and so many other places, but there are a few that use just certificates. Therefore, I’d argue that what I did is fully RFC8555 compliant! :P

Using YUBIKEY-01, I added support for SSH certificates to my ACME CA. It doesn’t look great, and I can’t say I’m proud of my code, but it works. If I did it again, I’d probably just make a 2-endpoint gRPC service that would not use any hacks, and it would be faster, more efficient, and better, but that’s an ACME post, not a $CustomProtocol post!

Obviously, this doesn’t work with any known ACME client, nor autocert, so I may do just that at some point. I just wanted to avoid duplicating the logic of the CA (logs, ACLs, etc.) to another service.

Deployment

To deploy this, I first tested what supports SSH CAs and what doesn’t. For the vast majority of my things that did, I replaced my SSH keys in authorized_keys with my SSH CAs. I generated some backup normal SSH keys offline, and stored them offline, that I can use to regain access to any systems. I set up alerts to fire if they are used to log in to a system, as this would be abnormal without me knowing.

Now, as my SSH CAs live in HSMs, and all my end-user SSH keys also live in HSMs, 100% of my SSHing is done over hardware-backed keys. I cannot go right now and administer my routers with a key that can be stolen by any app that I run. As anything you run on your computer can read ~/.ssh (it runs as you, and you have read access), I feel much safer.

I just have to make sure that not all SSH CAs will stop working at the same time, as I need at least one to SSH to them without having to retrieve the offline SSH keys.

Workflow

When I need to access something, I run the CLI tool dlogin, which is responsible for obtaining one or more TLS Client certificates, and one SSH certificate. I can specify what I want in a flag, and only request a subset of that.

All certificates, TLS & SSH, are valid only for 16 hours. Then they expire, and become useless. That means I need to run that tool once every 16 hours. In a walk with a friend, while describing this system, he suggested 8 hours, so I might as well lower their lifetime. The process takes seconds, so it’s not a problem. dlogin can optionally take a validity period, and request certs that are shorter than 16 hours (e.g. 30’ to do something quickly).

My end goal is to use a laptop that cannot access my infrastructure most of the time, and only gets access on demand, when needed, after strong authentication. So far, I have no complaints, but maybe I have more tolerance because it’s self-inflicted pain ;)

I am still looking for a good way to manage TLS certificates on browsers, and my exploration of mkcert code wasn’t really helpful. security doesn’t seem to support YubiKeys :( However, adding a TLS client keypair in the macOS keychain will automatically make it work on Safari and Chrome.

Ideally I’d like to load the mTLS certificate and then be able to use it with the YubiKey private key (not a smartcard), and just touch it every time I need a signature during a TLS handshake. This would allow me to only expose my services to browsers that have a client certificate, and not to the entire Internet, and only rely on the login page.

If you know how this can be done, I’d be very happy to hear from you!