Account public key discrepancy


#1

Hi,

Has anyone ever seen public keys for an account vary between the client and the LE server, yet most requests against the account still work (such as retrieving account info from /new-acct/ and changing the key)?

I’m wondering if there is a difference in the algorithm the library I’m using and the boulder implementation that sometimes results in some account public keys having a different result but still being somewhat valid (or is this normal behaviour).

I’ll admit to being a bit rusty at Elliptic curve crypto (and I’m far from being a crypto expert, generally). Happy to be schooled.

For instance, on staging, account id 7392041 locally had the public key:

 "key": {
    "kty": "EC",
    "crv": "P-256",
    "x": "nPTIABcDASY6FNGSNfHCB51tY7qChtgzeVazOtLrwQ",
    "y": "vEEs4V0egJkNyM2Q4pp001zu14VcpQ0_Ei8xOOPxKZs"
  }

and a query to /new-acct/ (which works fine) says the public key is:

 "key": {
    "kty": "EC",
    "crv": "P-256",
    "x": "AJz0yAAXAwEmOhTRkjXxwgedbWO6gobYM3lWszrS68E",
    "y": "vEEs4V0egJkNyM2Q4pp001zu14VcpQ0_Ei8xOOPxKZs"
  }

Requests generally work, except when validating challenge response (DNS or HTTP), which fail because the answer is not as expected, presumably skewed by the difference in key.

The interesting thing is that calling /key-change with new key works fine, and thereafter the new key works for all operations. Clearly the most likely problem is a bug on my side. I’m using the Certes library : https://github.com/fszlin/certes/blob/master/src/Certes/Crypto/AsymmetricCipherKey.cs


#2

Hi @webprofusion

checked my own version (key of my stage-account, RSA 4096):

Local:

xPKBq7VJeCWDcIQ0b8EGrbHuuRxQ2ZI3PncDBE1HbZb0qsI9gReJLPzVbfZqW/tf8GmM0UKR8lFglVMSBWbN6g90ZZza2ZqSxLjSkBVDvvechdeExDiZQ3FZTcHfgo+oNW+c4BGOpKtMtu1A+is4cpXzLyEm/YwrBayD2rXOSmn1yv5BG/OSK/dsN77tvZSX4apBwvua5DfigTLhq7t8t2L4NLKlHu11PdoQFZr2C8c+flEJqhY6xjEMpk/osCm0G9DJtJB+PiiOCMlNqVKYY3+OF5T4XTDK5jMwf32oYfrDZ8s40Phnvj+6KS+xuu694H74l7aKT5wqjpKlTag8ZFieue6zKtpePNBy3+zlFybx9UxE4GIsNaZlYrrxq6jinTnmaZl3Kt1yNz72Toql9q4ptef/FlkrfUxOkt5l4NhvP78SHDXHPH6r5aCaGVE847zLquwZGiJ51Nw1tCpLCgaVorugpa3lj/jI1V8xZKgtA9OktDpoAlQ3BZ2hebmlM1AmSQfjshhIFvbrMTdYtUz9+EkxGVGWalQLe4obePCJxfb9Tq1FRb3N+u4gkvpfAeoiRECnzOzZ6kVcsM+3/BNoO8iTj87g7XKOxXnKA4Mh3pF3xa/pNuyw0veawfrQsT+AC8W/XRG9zB1DBIuVdVbg83NdlNq1Q/RZytpxhCk=

Result of /new-account (v2) with {“onlyReturnExisting”:true} as payload:

xPKBq7VJeCWDcIQ0b8EGrbHuuRxQ2ZI3PncDBE1HbZb0qsI9gReJLPzVbfZqW_tf8GmM0UKR8lFglVMSBWbN6g90ZZza2ZqSxLjSkBVDvvechdeExDiZQ3FZTcHfgo-oNW-c4BGOpKtMtu1A-is4cpXzLyEm_YwrBayD2rXOSmn1yv5BG_OSK_dsN77tvZSX4apBwvua5DfigTLhq7t8t2L4NLKlHu11PdoQFZr2C8c-flEJqhY6xjEMpk_osCm0G9DJtJB-PiiOCMlNqVKYY3-OF5T4XTDK5jMwf32oYfrDZ8s40Phnvj-6KS-xuu694H74l7aKT5wqjpKlTag8ZFieue6zKtpePNBy3-zlFybx9UxE4GIsNaZlYrrxq6jinTnmaZl3Kt1yNz72Toql9q4ptef_FlkrfUxOkt5l4NhvP78SHDXHPH6r5aCaGVE847zLquwZGiJ51Nw1tCpLCgaVorugpa3lj_jI1V8xZKgtA9OktDpoAlQ3BZ2hebmlM1AmSQfjshhIFvbrMTdYtUz9-EkxGVGWalQLe4obePCJxfb9Tq1FRb3N-u4gkvpfAeoiRECnzOzZ6kVcsM-3_BNoO8iTj87g7XKOxXnKA4Mh3pF3xa_pNuyw0veawfrQsT-AC8W_XRG9zB1DBIuVdVbg83NdlNq1Q_RZytpxhCk

It’ ~ the same, the result of Letsencrypt uses the url-safe version: + -> -, / -> _, = at the end removed.

So I see no difference.


#3

It doesn’t really seem possible, on the face of it.

Are you able to show the raw request and response payloads for newAccount{onlyReturnExisting=true}? Neither reveals your private key nor can they be replayed so it is safe.


#4

Let me see if I captured anything useful, unfortunately the key change fixed the problematic key on the account, so now I can’t reproduce it.

I agree it’s very odd. At first I suspected I was using a stale key because the method I was using to save the settings perhaps hadn’t fully flushed to disk by the time the next part of the process loads the key and creates an Acme account - but that doesn’t explain why account operation would work.

It could just be an encoding/rounding difference that actually equates to the same result when fitted to the EC curve? I’m making stuff up now.


#5

Wow. I think you’re right, but I’m not 100% sure.

Quickly decoding both sets of encoded x,y coordinates seems to decode to the same coordinates: https://play.golang.org/p/rY-NripNaOq . I don’t know if I’ve made a mistake there.

I’m not too certain about the integer-to-octet-string encoding itself works either (its documented by another document linked from https://tools.ietf.org/html/rfc7518#section-6.2.1.2).


#6

OK, I think I know what’s wrong with your code: the encoding algorithm is wrong, which as you already figured out, causes your key-authz calculation to be wrong.

From the RFC:

The length of this octet string MUST
be the full size of a coordinate for the curve specified in the “crv”
parameter. For example, if the value of “crv” is “P-521”, the octet
string must be 66 octets long.

P-256 is 256-bits, so the string must be 32 octets long.

nPTIABcDASY6FNGSNfHCB51tY7qChtgzeVazOtLrwQ is 31 octets in length.

If you decode that, then pad it to 32 bytes, its encoding ends up as AJz0yAAXAwEmOhTRkjXxwgedbWO6gobYM3lWszrS68E, which is what Boulder sees.

So your encoding is not valid, even though decoders seem to tolerate it and decode it to the same coordinate.

Which also explains why Boulder behaves weird. I wonder if Boulder should be more intolerant of this case? cc @jsha


#7

Awesome, thanks for taking the time to look at this, it’s very much appreciated!

I’ll check out where the issue is on my side, most likely in Certes, under the covers it’s using the bouncycastle crypto functions but some parts are custom.


#8

Thanks again for your help @_az PR to certes now pending: https://github.com/fszlin/certes/pull/174


#9

Cool.

I was just pondering about how a byte actually got chopped off to begin with and whether ACME clients might be affected, since a number of them don’t seem to worry about the padding.


#10

It’s a good question, here’s an example private key that then causes the issue to manifest:

-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIDWajU0PyhYKeulfy/luNtkAve7DkwQ01bXJ97zbxB66oAoGCCqGSM49
AwEHoUQDQgAEAJz0yAAXAwEmOhTRkjXxwgedbWO6gobYM3lWszrS68G8QSzhXR6A
mQ3IzZDimnTTXO7XhVylDT8SLzE44/Epmw==
-----END EC PRIVATE KEY-----

The X Coord gets converted (using Bouncy Castle) to a Big Integer, then an unsigned byte array. In this case the byte array ends up 31 bytes, which could be an artifact of the conversion to big integer (as that has a variable storage size). It may be unique to the C#/BouncyCastle variant of EC Coords.


#11

Also worth noting that it only affects keys randomly (probably based on the size of the X number), so many users where fine but some would report that the auth challenges were failing despite presenting what appeared to be the correct answer.


#12

To further document my learning: Go also decodes that as a big int with a byte-length of 31.

However, in 60 seconds spent generating 2,888,254 P-256 keys, I only produced ones with ones with a 32-byte length X coordinate.

So perhaps generating one with 31-bytes is either vanishingly rare or invalid? :S

Edit: I was looking at the wrong thing. It generated 32/31 fairly evenly, as well as others:

32:1425640 31:1419220 30:5562 29:19

So the padding is very important (and it is a real issue for all ACME clients, then).


#13

Thanks, yes it seems to be pretty common, which makes it even more great that the solution has been found. Maybe it affects acme v2 more? I dunno. Haven’t really seen it before our v4.x release and I’d assumed there were other problems with account key mgmt but as it turns out they were probably red herrings.

I hadn’t realised that it could affect 50% of users though. I wonder how many failed auth reports are actually faulty key encodings? I’m assuming certbot does the right thing, therefore it would only be other clients that are affected (and they’re probably in the minority compared to certbot).

This is actually a pretty cool thing to raise awareness of.


#14

Certbot doesn’t support EC for account keys at all, so it has no chance to screw JWK up there.

I think Posh-ACME might have the same mistake? @rmbolger.

Maybe acme.sh too, based on the way it parses the output of openssl ec.


#15

It looks like this might be a BouncyCastle specific bug or perhaps just an implementation detail it expects the caller to handle? I’m pretty sure Posh-ACME is safe for the majority of users because most of the key related code doesn’t use BouncyCastle. It uses .NET native crypto libraries which seem (so far) to handle the padding automatically. I also then store account keys as JWK rather than PEM because I didn’t want to need to convert back from PEM using BouncyCastle.

I do have a function that is used to convert from the BC key objects to .NET key objects. But it’s currently only used by the Google Cloud DNS plugin to parse the Google account’s auth key which is PEM formatted. I tried using that against the “bad key” posted earlier and I do get an exception. BC reads the PEM just fine, but translating the key parameters from the BC object to the .NET object results in an exception that basically says the X/Y parameters are not valid.

Am I correct that the “bad key” is not actually bad? It’s just that BouncyCastle’s interpretation of it doesn’t include the necessary padding? So in my function, I could add the padding and avoid the .NET error without breaking the key?

I’ll also try running a similar key creation test to @_az’s and see whether .NET does indeed handle the padding automatically or not.


#16

So I ran a .NET test with 1,000,000 P-256 key creations and all of them came out with 32 byte X lengths. So I think it’s safe if you’re doing everything natively.

I started to run the same test against BouncyCastle and in the process I think I found a better way to fix the bug than the PR you submitted, @webprofusion. In short, BouncyCastle will handle the padding automatically but you have to get the parameter byte array slightly differently.

The certes code and my own conversion function is using the following to get the X/Y byte arrays which doesn’t add the padding:

ecKey.Q.AffineXCoord.ToBigInteger().ToByteArrayUnsigned()

Instead, we should use this:

ecKey.Q.AffineXCoord.GetEncoded()

which internally to BouncyCastle then does this in order to handle the padding automatically:

return BigIntegers.AsUnsignedByteArray((FieldSize + 7) / 8, ToBigInteger());

This definitely fixes my own function that converts BC keys to .NET keys.


#17

Neat! I think the original Certes code was inspired by other examples that tend to use .ToByteArrayUnsigned

And regarding the original ‘bad’ pem, as far as I’m aware it’s still a good key, but perhaps there’s a related encoding bug - even so, decoding should be independent of that.


#18

This is a great thread. Thanks for the analysis and the discovery, @_az! I’ll try to reproduce tomorrow or Tuesday, but your conclusion makes sense. Most likely we’ll want to submit an upstream patch in go-jose.


#19

I’ve submitted a PR: https://github.com/square/go-jose/pull/210


#20

So the latest version of Certify The Web (v4.0.12) has the key fix (not yet merged upstream to the main certes library), seems to be going mostly ok.

Interestingly I still have one user that cannot successfully perform http validation due the key authz being computed incorrectly for the challenge response. I’m now wondering if bouncy castle has subtly different behaviour on different versions of windows/.net/os patch levels which is a bit crazy but I’m grasping at straws. For a future release I’ll probably look to use more of the standard .net crypto functions instead of relying on BC.

Has anyone ever seen LE respond similarly to the following even if the challenge response is actually correct? (I saw something elsewhere about IPv6 but I don’t see how that would apply):
Validation of the required challenges did not complete successfully. The key authorization file from the server did not match this challenge [Br5NVeKTTEgLaPoMCOXuAl-ivmlGBgPxbSvmeO0SKXw.4hOot_D7SDRumLJLG3sciREeT6z7Um0M-l-KsB73Q_E] != [Br5NVeKTTEgLaPoMCOXuAl-ivmlGBgPxbSvmeO0SKXw.AbVkbPD9c9j2MNbLLQVVhqNJlUdoGnJYME7Sv4A_PXc]