DISCLAIMER: I am currently working for Google. This post is published in my personal capacity, without using any knowledge I may have obtained from my employment with them. All the information provided here is coming from purely personal time and effort and does not represent the opinions or practices of my employer.
In my previous blog post I described how I built my own ACME CA to issue workload certificates for the various services I am running. The only thing left out was one final use case. End-user client certificates. This post helps complete the “vision” of the “mTLS Universe”. After reading this out loud, I can see how it may sound like a nightmare for some… :)
All my pods and servers can obtain TLS certificates from my ACME CA without
problems, using e.g. TLS-ALPN-01
or KUBERNETES-01
, which is a new challenge
I created for my usecase. However, when my laptop wants to connect to these
services, it needs a TLS certificate as well.
It doesn’t have a static IP address, or a dynamic DNS record pointed to it, nor
does it have port 80 and port 443 always available on the Internet. It
therefore also doesn’t have a hostname pointed to it 24/7 that can be used for
DNS-01
. And my shiny new ACME challenge is useless, as this is a physical
machine, and not a Kubernetes pod.
I need a way to prove to my ACME CA that it is indeed this laptop that requests a certificate, and it really deserves one for the hostname I want.
As this is a new challenge, that cannot be addressed by existing Challenges, let’s create a new Challenge! Any good one needs a way to create proofs that the server can trust, that cannot be faked, and that are tied to the particular entity that requests the certificate. This seems like… attestation.
My colleague Brandon is actually standardizing one such
challenge
as we speak, which would probably be enough, and possibly better, but
chronologically I designed and implemented this challenge almost two years ago,
where this effort did not exist. This new challenge is actually older than
KUBERNETES-01
! Perhaps in the future I can migrate to his. I really like the
use of WebAuthn, which I didn’t think of. You can reuse
so many existing things.
Oh well… Here we go.
I initially looked at Touch ID, which was available on my MacBook Pro, but such a proprietary solution would limit me to macOS only, on supported hardware, and the (especially Go) API for Touch ID & Face ID was not really great and very limiting. I didn’t manage to create a secure enough proof of concept that I was happy with, despite already rejecting the idea.
The next thing available to me (all my computers) was a YubiKey. I use them for so many things, and found creative applications of them many years ago, since around 2016. They are working on all of my devices, regardless of operating system, and they are portable. That last thing may not be good, but I’m happy with the option.
And this is how the YUBIKEY-01
and YUBIKEY-02
challenges were born. Again,
much like the KUBERNETES-01
I don’t think they are generally applicable, and
I also don’t plan on standardizing any of them. The device attestation one is
already very good. There may be trademark issues too, not sure :P
In these challenges the ACME CA provides the client with a 64 byte value. The first 16 bytes are actually a magic string, that both the server and the client have to verify, and it’s primarily added to avoid cross-protocol attacks. You don’t want to sign e.g. an SSH or god forbid a PGP message when trying to solve a challenge from a malicious server. The remaining 48 bytes are local, organically sourced, randomness.
For YUBIKEY-02
, this is enough. If the specific YubiKey public key is present
in the CA’s database, and the signature is of course valid, it automatically
maps the key to a hostname server-side, and issues a certificate with the
public key of the CSR. You can move this YubiKey anywhere, and assuming the
computer has an appropriate ACME client, you can get a certificate.
Note that you need to sign the challenge with your private key, stored inside
the hardware, but you can give the ACME server any private key for use in the
certificate. This is the main difference with YUBIKEY-01
where you have
to use the YubiKey key as the private key. My current implementation ignores
the CSR and hardcodes the key from its database that maps keys to hostnames.
The reason I have two challenges is that some software could not use the
YubiKey as a private key, and therefore I had to get a key on disk as well.
However, I am issuing with two different hostnames, that are treated
differently in ACLs. All software I write can use YUBIKEY-01
certificates.
There is a reason why this challenge family is called YUBIKEY
and it’s
specific to that. It is because it leverages a specific feature included,
called PIV Attestation. In order for a public key to be valid, when submitting
the CSR, before you get the certificate, you have to submit the Yubico
Attestation Certificate. This is an X.509 certificate that is only available
for keys generated on the hardware, and not for imported ones. This can
guarantee there aren’t any copies, or it’s just a PEM file on disk. Assuming
the firmware is correct of course. This is documented
here. It
would be theoretically impossible for an attacker to falsify this certificate.
From that, I extract and use the following:
Each key is stored in a “slot” and there’s a limited amount of them. For
example, a key can be stored in slot 0x9a
. I have recorded in my user database
which is the correct slot for each key. If a key is suddenly found in the wrong
slot, I don’t feel confident trusting it anymore. It may be retired, but I’d
rather handle retirements gracefully.
For commercial YubiKeys, the firmware cannot be upgraded or downgraded. You get what was there when the key was manufactured, and even if Yubico added features, you can’t get them, despite also owning a “YubiKey 5 NFC”. I expect the firmware to always be the same, and any change here would be an obvious red flag.
Much like the Firmware Version, the Serial Number is also something that will not change over the lifetime of the key. It has to remain the same. If I receive an attestation certificate with the wrong serial number, I will reject issuance.
These two tell you whether this particular slot requires the person to enter a PIN, and when, as well as whether they have to touch the physical key for it to sign any data.
I refuse to accept keys that don’t require a PIN (as anyone could possibly use them if found), and I require the touch policy to be “always”, which means that you need to physically touch it once for every signature it performs.
I had the following rationale when implementing this: I need to somehow know this is a real YubiKey-backed key, so I have to get the certificate. If I get it, I might as well check everything that I got.
The Slot Number, Firmware Version, and Serial Number are things that are very easy to verify (compare bytes), and they are impossible to change according to Yubico (which I already trust). Why not check them? Let’s assume there’s a weakness in the ECDSA private key generation code, and one can retrieve it from the public key trivially. They would need to know all these three values on top of the private key. If they’re a powerful adversary, they may be able to get them, but so be it. I can’t do much about it. And the damage is far larger.
The PIN policy is an important one. If the key does not require a PIN, anyone who finds it on the street can plug it in and start signing data. I wouldn’t want such a key. Anything else is probably okay, although I’d recommend for this type of thing to use an “Always” value as the only acceptable one.
With Touch policy, it is similar. Imagine if I had a “Once per session” PIN policy and a “No touch required” touch policy, or a “Cache for 15 seconds”. Someone with access to my laptop could easily send commands to the YubiKey and have it sign things arbitrarily. Now there are some mechanisms to protect against this, such as the fact that YubiKeys support only one session (that’s a feature!), but it’s better to require an explicit touch.
With the challenge above I can issue TLS Client Certificates for all my mTLS needs. But there’s another type of connection I am still making, and it’s not TLS. And that’s SSH!
It’s not commonly known, but SSH also supports certificates! They’re not X.509 certificates like TLS, and they have different properties, tradeoffs, etc. but they are still certificates that a CA can issue.
Instead of adding a list of public keys to authorized_keys
on a server, you
add a list of CAs. And then, assuming the implementation supports this, it
allows you to log in with any key that can also present a certificate from
these CAs that is valid.
It has been added to OpenSSH for a long time now, and Go has native support for that in its standard library. I like the idea of that: I am not using a 100-year self-signed TLS cert for mTLS, so why use an eternal “root” for SSH?
However, ACME is designed for X.509 certificates. But is it? I mean sure, it
clearly states that in the definitions of the terms, and so many other places,
but there are a few that use just certificates
. Therefore, I’d argue that
what I did is fully RFC8555 compliant! :P
Using YUBIKEY-01
, I added support for SSH certificates to my ACME CA. It
doesn’t look great, and I can’t say I’m proud of my code, but it works. If I
did it again, I’d probably just make a 2-endpoint gRPC service that would not
use any hacks, and it would be faster, more efficient, and better, but that’s
an ACME post, not a $CustomProtocol post!
Obviously, this doesn’t work with any known ACME client, nor autocert
, so I
may do just that at some point. I just wanted to avoid duplicating the logic of
the CA (logs, ACLs, etc.) to another service.
To deploy this, I first tested what supports SSH CAs and what doesn’t. For the
vast majority of my things that did, I replaced my SSH keys in
authorized_keys
with my SSH CAs. I generated some backup normal SSH keys
offline, and stored them offline, that I can use to regain access to any
systems. I set up alerts to fire if they are used to log in to a system, as
this would be abnormal without me knowing.
Now, as my SSH CAs live in HSMs, and all my end-user SSH keys also live in
HSMs, 100% of my SSHing is done over hardware-backed keys. I cannot go right
now and administer my routers with a key that can be stolen by any app that I
run. As anything you run on your computer can read ~/.ssh
(it runs as you, and
you have read access), I feel much safer.
I just have to make sure that not all SSH CAs will stop working at the same time, as I need at least one to SSH to them without having to retrieve the offline SSH keys.
When I need to access something, I run the CLI tool dlogin
, which is
responsible for obtaining one or more TLS Client certificates, and one SSH
certificate. I can specify what I want in a flag, and only request a subset of
that.
All certificates, TLS & SSH, are valid only for 16 hours. Then they expire, and
become useless. That means I need to run that tool once every 16 hours. In a
walk with a friend, while describing this system, he suggested 8 hours, so I
might as well lower their lifetime. The process takes seconds, so it’s not a
problem. dlogin
can optionally take a validity period, and request certs that
are shorter than 16 hours (e.g. 30’ to do something quickly).
My end goal is to use a laptop that cannot access my infrastructure most of the time, and only gets access on demand, when needed, after strong authentication. So far, I have no complaints, but maybe I have more tolerance because it’s self-inflicted pain ;)
I am still looking for a good way to manage TLS certificates on browsers, and my
exploration of
mkcert
code wasn’t really helpful. security
doesn’t seem to support YubiKeys :(
However, adding a TLS client keypair in the macOS keychain will automatically
make it work on Safari and Chrome.
Ideally I’d like to load the mTLS certificate and then be able to use it with the YubiKey private key (not a smartcard), and just touch it every time I need a signature during a TLS handshake. This would allow me to only expose my services to browsers that have a client certificate, and not to the entire Internet, and only rely on the login page.
If you know how this can be done, I’d be very happy to hear from you!