Our nightly regressions test our ACME client, one test creates then revokes a certificate using LE Staging. The test has run fine for months, it started failing 15 October. The tests and the client havent changed in months. This revocation failure reproduces in LE Production as well.
The specific error returned by LE Staging is:
{'type': 'urn:ietf:params:acme:error:malformed', 'detail': 'JWS verification error', 'status': 400}
Our ACME client uses the account key to sign the revocation request.
Has anything changed wrt certificate revocation in LE Staging environment since 15 October?
This issue doesnt reproduce with letsencrypt/pebble:latest.
The reason ist set to 1 (=keyCompromise), which means that the certificate can only be revoked if the request is signed with the certificate private key (not the account key).
So you can either set the reason to 0 (then the certificate can be revoked using the account key) or otherwise make sure you are using the certificate private key to sign the request.
Hint: When using the account key (with reason=1) to rekove a certificate you should get the following error: "Revocation with reason keyCompromise is only supported by signing with the certificate private key" (urn:ietf:params:acme:error:unauthorized)
Really? I can't find anything in a RFC spec that says that (though it's entirely likely that I've missed it), and it surprises me. I would think that if I have a secure backup of my account key, and one of my servers with a certificate key gets stolen, or compromised in a way that also deleted the copy under my control, that I could claim using my account key that the certificate key was compromised. Why wouldn't the CA want to believe such a claim, but would believe it for other revocation reasons?
I can't find anything in the RFC either and it also surprises me because the RFC (in contrary) says:
Revocation requests are different from other ACME requests in that
they can be signed with either an account key pair or the key pair in
the certificate.
It is strange that this was changed without explanation in the pull request and without an API announcement. I can't find any spec for it either. I could speculate that Let's Encrypt wants to discourage people from using the key compromise reason, since they have to store a list of compromised keys forever, in order to fulfill their obligation to block new certificate requests using previously compromised keys. But that is purely speculation, and might be unrelated.
Subscribers can revoke certificates belonging to their accounts via the ACME API if they can sign the revocation request with the associated account private key. No other information is required in such cases.
Now, that's not saying anything about revocation reason, but it seems really weird to me that a CA would accept a revocation request, but not for the reason of a compromised key (which is the main reason one would want to be sure to revoke).
Also could you advise wrt using Pebble to test functionality - it was my impression that Pebble can force encounters with use cases that may never happen with Boulder. In this case Pebble allowed the revocation while Boulder failed it. Thanks!!!
If anyone does figure out the reason, IMHO this behavior should be added to either the "Divergences" or "Implementation Details" docs for Boulder. (I'm not sure yet which one it belongs to)
We are working on the fixes required to make Pebble and Boulder align. When we have more information, we will be sure to share in this thread! Hang tight.
@jple Just poking you again on this. It seems like a pretty significant change to revocation requirements, and not having any real update or explanation for a month is rather disconcerting. Thanks!
At minimum a statement from LE if breaking existing ACME clients was accidental, or if LE's philosophy isnt so much about providing significant visibility before pushing functionality like this to the cloud would be helpful. LE did a great job providing visibility wrt ACME v1.0 API sunset, so I'm guessing the former. Realize there are lots of moving parts here....
I agree with my fellow Community members above: this could (should?) have been communicated in some way.
I'm also interested into the reasons WHY this change was necessary. I can't tell from the pull requests on github? Maybe @jple could enlighten us about that too.
Thanks for your patience with this. We don't have an update yet but hope to by the end of next week! I promise we are working on it - and will post as soon as we are able.