Does resubmitting the same CSR affect rate limits?

bryanvaz · November 4, 2020, 7:19pm

Does anyone know if resubmitting the same CSR increases the 'pending validations' rate limit counter?

I'm trying to do multiple DNS-01 validations in parallel via a job queue. So after receiving the challenge TXT entry from LE and pushing the new TXT entry to the DNS server a worker throws the job back on the queue for other workers keep checking the propagation progress. Once the acme DNS entry is validated from multiple DNS servers, the worker starts up the acme client again to tell LE to validate the DNS entry.

Unfortunately, the acme JS client goes through the entire challenge process, with no way to jump to the validation step. As a result the client (1) resubmits the same CSR, (2) LE responds with the same challenge token (thankfully!), (3) the client double checks the DNS entry independent and passes (4) instructs LE to validate the challenge; then (5) certs are issued.

So as a result, the JS client submits the CSR to LE at least twice (maybe more if the client's internal DNS check fails, resulting the job going back in the queue.)

I obviously don't want to test this in the LE production, and I can't find any way to list the pending validations counter, to see if it goes up multiple times for a single CSR.

Thanks in advance to anyone that can help.

Cheers,
Bryan

JuergenAuer · November 4, 2020, 7:31pm

Hi @bryanvaz

that sounds buggy.

Why? Submitting the CSR is the last active client-step, then follows a result with the certificate url to download.

See

In short:

Creating a new order (Result: Some tokens)
Client creates files / dns entries
Client says: "Please check that"
LE checks
Client checks the challenges if they are valid
all valid -> order is ready
client calls finalize and uploads the CSR
Result: The certificate download url

Sounds that your client starts a new order. And there is a max-order limit.

So

you shouldn't start different orders
you shouldn't upload the same CSR more then one time

Sounds like your general code is a little bit wrong.

griffin · November 4, 2020, 7:54pm

Welcome to the Let's Encrypt Community, Bryan

As Juergen has already outlined the majority, I won't repeat it, but I can say that when I was designing a website client the process needed to be fairly linear, which initially resulted in duplicating many steps if there was a "missing TXT record" due to issues like slow propagation or manual error in copying-and-pasting. Similar situations occur with gethttpsforfree.com. Granted that I was typically using the staging server for testing process when making implementation changes (as anyone should be), I never received a limit on new orders because, no, it's technically not a new order. This is evident because the order URL is the same coming from the Let's Encrypt server. The same logic applies for authorizations. If an authorization fails though (and the order along with it), it is possible (and usually happens) that non-failed authorizations are "recycled" into the new order. Keep in mind that a failed authorization is no longer pending.

I did have users submit the CSR to my client up front so that I could read the common name (CN) and subject alternative names (SANs) from the CSR rather than require the user to supply that information (and potentially get it wrong). However, as Juergen already mentioned, the CSR is not actually submitted to the CA until the finalization process (as part of the finalization payload). If a failure occurs after this point in the process, which is possible but very rare, there is absolutely no harm in resubmitting the same CSR. In fact, the client absolutely should do so. The amount of repetition of process depends upon the flexibility of the client in gracefully handling errors and fulfilling the necessary steps to satisfy the ACME process. For example, an expired nonce certainly requires not only acquiring a new nonce, but also repeating the failed step. Fortunately, the ACME process itself is very robust and forgiving.

Be careful not to hit other rate limits though, especially those regarding bombarding endpoints (posting too many times too quickly) with requests.

The “new-reg”, “new-authz”, and “new-cert” endpoints on the v1 API and the “new-nonce”, “new-account”, “new-order”, and “revoke-cert” endpoints on the v2 API have an Overall Requests limit of 20 per second. The “/directory” endpoint and the “/acme” directory & subdirectories have an Overall Requests limit of 40 requests per second.

bryanvaz · November 4, 2020, 9:30pm

Thanks for the additional detail @JuergenAuer and @griffin.

So then the client is definitely just running through the "Creating a new order" step every time it starts up and not re-uploading the CSR (the client source code is so condensed and lacking comments that I can only infer.)

@griffin, I'm with you on assuming that because repeating the POST https://acme-staging-v02.api.letsencrypt.org/acme/new-order with the same kid, payload, and signature results in the exact same response, including token and challenge url, it isn't counted as a new order. Just wondering if someone knew definitively whether or not it does have a limit beyond the general API limits.

It sounds that at some point I'm going to have to write a new JS client from scratch that allows manual control over the process, but that's a pipedream for another day.

Unfortunately, ACME.js is the only JS client (the library that underpins the Greenlocker library) and it looks like it was designed to run as a background service from start to finish, so it doesn't let you break up the acme process to start and stop whenever you want, wherever you want.

On the plus side, using a job system allows me to easily defer and retry failed requests, so it'll only be a real problem if I'm creating new certs 24/7 (which is probably a good problem to have.)

Thanks again @griffin & @JuergenAuer!
Cheers,
Bryan

JuergenAuer · November 4, 2020, 9:56pm

Yep, that's the problem and there is a rate-limit (300 orders per 3 hours).

If possible, check the source code to find the step after creating the validation files (or before calling the challenge url to say "please check that"), then insert additional checks you want.

griffin · November 4, 2020, 10:56pm

@_az

I'm fairly certain that you know the answer to this offhand. Would you enlighten us, please.

_az · November 4, 2020, 11:25pm

A successful post to newOrder will contribute towards the New Orders rate limit if a new order is created. If there is an existing pending or ready order for the exact same domains, then you will get back the existing order, and it will not contribute to the New Orders rate limit.

Whether or not a new order contributes towards Pending Authorizations depends on whether the new order resulted in new authorizations being created.

With Let's Encrypt, if you created an order and then immediately created another 50 orders for the same domain, you would get back the same order; Pending Authorizations and New Orders would only have gone up by +1 each.

However, this reuse behavior is not reliable. An ACME CA can do whatever it wants and you shouldn't assume either way.

I think it's best to try avoiding leaving open resources. If you're going to create an order, then make sure you either respond to or deactivate each of the authzs. That way, you give yourself the best chance of not hitting rate limits. This becomes particularly important once you're working with hundreds of domains.

The CSR itself makes no difference for the above, because it is not part of the newOrder operation.

The only thing that distinguishes orders from each other is the list of domains (and the ACME account).

Yeah, I tried to use the library Greenlock uses before and it's definitely orientated towards a certain autocert use case and not at all suitable for arbitrary ACME workflows. The good news is that you can basically implement a full ACME client library in ~200ish LoC + an HTTP client + a crypto library.

griffin · November 5, 2020, 12:03am

@_az

What if you are resubmitting the same domain names for a pending (non-failed) order? I've gotten back the same order URL in this case (for instance when duplicating the order submission as the result of an expired nonce).

With certain workflows (like manual intervention/interactive), this might be difficult to enforce without some type of "reminder/callback task". I've been meaning to look into how to deactivate authorizations as a standard practice for my upcoming client transformation.

_az · November 5, 2020, 12:10am

Nice catch. I had completely forgotten about order reuse, and it completely changes the answer for the New Order limit.

An existing order can be returned, in which case the first order is +1/+1 and the second order is +0/+0.

I will revise the post.

petercooperjr · November 5, 2020, 12:55am

I used ACME.js for my hacked-together AWS Lambda Node.js renewal process, though since I did want it to do the whole validation/challenge/order/etc. all at once for my purposes (automating a DNS-01 challenge) it worked really well. But also on the client list is the acme-client library which looks to be even a bit more low-level. I haven't personally done anything with it, but maybe that will help you, or there might even be other libraries out there with some more digging.

griffin · November 5, 2020, 1:10am

My Website Client Workflow

Upon any failure, start over.

Submission

CSR POSTed by user
Extract CN and SANs from CSR and save CSR as finalize payload
GET directory url
GET/HEAD newNonce url
POST-as-GET newOrder url with JSON identifiers list of objects per (type=dns,value=SAN) as payload then record order url, authorization urls, and finalize url
POST-as-GET each authorization url with empty payload then map to a dns identifier, dns-01 challenge url, and dns-01 challenge value.

TXT record(s) manually created now...

Verification

POST-as-GET each dns-01 challenge url with empty JSON object as payload
POST-as-GET poll each authorization url with empty payload (with one second delay between polls) for transition out of pending status then fail if ten poll attempts have been made or status is not valid
POST-as-GET finalize url with CSR as payload
POST-as-GET poll order url with empty payload (with one second delay between polls) for transition out of pending, processing, and ready statuses then fail if ten poll attempts have been made or status is not valid
POST-as-GET certificate url with empty payload to receive fullchain.pem

bryanvaz · November 8, 2020, 3:05am

Hey @_az, got another funny quirky edge case question along the same lines (@griffin you may have seen this as well) :
Does LE's implementation of the DNS-01 ACME validation allow for other values in the TXT entry?
(It seems like it doesn't, but the RFC 8555 (Pg.66) spec says it can)

This happens when requesting a wildcard and non-wildcard certificate for a single domain on the same cert, for example 'domain.com' and '*.domain.com'. Since the ACME spec treats the two names as separate, the order results in two authz, each with a different token, and thus each requiring different TXT values, however they are both under the same challenge domain, in this example `'_acme-challenge.domain.com'.

However when polling the authz url to verify the DNS entries after both TXT values have been globally propagated, LE responds with a urn:ietf:params:acme:error:unauthorized error because LE concatenates all the TXT values into one long string before testing.

For example when ordering a cert for ['link.keyblade.dev', '*.link.keyblade.dev'], the order responds with the following authz challenges for DNS records :

keyblade.dev:
- Record Name: _acme-challenge.link.keyblade.dev
- Value: ZxI0sxuvqB0DRIWx0w98FlzoYHHTkDU3jBb-DoFrhMY
*.keyblade.dev:
- Record Name: _acme-challenge.link.keyblade.dev
- Value: B1Y0HXEols5d_iBfyAdLi7asCUuaTiNEJNWUEVMAvs0

So the following TXT record is created:

$ dig txt _acme-challenge.link.domain.dev

; <<>> DiG 9.10.6 <<>> txt _acme-challenge.link.keyblade.dev
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13234
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_acme-challenge.link.keyblade.dev. IN	TXT

;; ANSWER SECTION:
_acme-challenge.link.keyblade.dev. 300 IN TXT	"B1Y0HXEols5d_iBfyAdLi7asCUuaTiNEJNWUEVMAvs0" "ZxI0sxuvqB0DRIWx0w98FlzoYHHTkDU3jBb-DoFrhMY"

The usual POST-as-GET to the dns-01 challenge url doesn't hit any issues.

However the response to POST-as-GET to authorization URL for the transition out of pending status returns the following response body:
authz response for link.keyblade.dev:

{
  "type": "dns-01",
  "status": "invalid",
  "error": {
    "type": "urn:ietf:params:acme:error:unauthorized",
    "detail": "Incorrect TXT record \"B1Y0HXEols5d_iBfyAdLi7asCUuaTiNEJNWUEVMAvs0ZxI0sxuvqB0DRIWx0w98FlzoYHHTkDU3jBb-DoFrhMY\" found at _acme-challenge.link.keyblade.dev",
    "status": 403
  },
  "url": "https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/149800101/gpxyFw",
  "token": "xdc9s4mf3tzJZbR3Wh32UkwmwXjmEfvjRpWWryYf6Dk"
}

authz response for *.link.keyblade.dev:

{
  "type": "dns-01",
  "status": "invalid",
  "error": {
    "type": "urn:ietf:params:acme:error:unauthorized",
    "detail": "Incorrect TXT record \"B1Y0HXEols5d_iBfyAdLi7asCUuaTiNEJNWUEVMAvs0ZxI0sxuvqB0DRIWx0w98FlzoYHHTkDU3jBb-DoFrhMY\" found at _acme-challenge.link.keyblade.dev",
    "status": 403
  },
  "url": "https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/149800100/TPMgjw",
  "token": "1QlolD3xqEshRNoLt26BZ1z7yjBZ4wsGcju_NhoQM3w"
}

It looks like Boulder is concatenating all the TXT values into a single contiguous string before testing the record with a straight equality, however I don't know where the relevant lines in Boulder are to validate this assumption.

Excerpt from RFC 8555 pg.66:

To validate a DNS challenge, the server performs the following steps:

Compute the SHA-256 digest [FIPS180-4] of the stored key authorization

Query for TXT records for the validation domain name

Verify that the contents of one of the TXT records match the digest value

Just wondering if this behaviour is intentional or a bug.

Also cheers petercooperjr. Switched to acme-client since ACME.js didn't support the multi-value TXT record like above, and to try to get more granular control over the process. On the plus side it generally works, except the acme-client library doesn't support ES256 account keys. Not a major showstopper, but just adds to the "fun".

Cheers,
Bryan

_az · November 8, 2020, 3:10am

You need to create multiple TXT RRs rather than packing multiple values into a single TXT RR.

So it should be:

_acme-challenge.link.keyblade.dev. 300 IN TXT	"B1Y0HXEols5d_iBfyAdLi7asCUuaTiNEJNWUEVMAvs0"
_acme-challenge.link.keyblade.dev. 300 IN TXT	"ZxI0sxuvqB0DRIWx0w98FlzoYHHTkDU3jBb-DoFrhMY"

As long as the CA can find the correct value in any one of the RRs, it will work.

bryanvaz · November 8, 2020, 3:16am

Awesome! Thanks @_az, will try RR entries and see if that works.

(Also that makes total sense... )

Bryan

_az · November 8, 2020, 3:19am

If Route53 allows it, I recommend setting a TTL of 0 or 1 on those RRs.

Otherwise you can run into some annoying resolver caching on the Let's Encrypt side, depending on order of operations.

For example it can fail if the order is: create record 1, respond to challenge, create record 2, respond to challenge. (All within 60 seconds).

A more robust order to avoid it is: create record 1, create record 2, then respond to challenges.

griffin · November 8, 2020, 3:31am

I think this is just a case of string literal concatenation, which has been common since the dark ages.

"a" "b" "c" => "abc"

bryanvaz · November 8, 2020, 3:37am

@_az, good to know about the TTL. I was just worried about < 1s TTL, just in case Let's Encrypt uses DNS resolvers that dislikes low TTLs for multi-perspective validation.

Also totally on your side about parallelizing the record creation, that you only have to wait for the records to propagate once.

@griffin that's horrifying. I love how the Wikipedia entry calls it a "feature". It's giving me horrible PHP flashbacks to all the "features" in that language.

griffin · November 8, 2020, 3:39am

I haven't looked carefully at the involved specs, but I don't think TXT records were designed to be arrays.

rg305 · November 8, 2020, 3:56am

Perhaps inserting a carriage return or linefeed (in between) - LOL

griffin · November 8, 2020, 3:58am

I think it will still be treated as one string. Haven't reviewed the specs though. Technically all data is just one stream with interpreted demarcations.

Topic		Replies	Views
Just one _acme-challenge value for all domains in same DNS validation Feature Requests	23	6675	August 22, 2021
Failed Challenges Rate Limit/Prevention - Hosting Provider Issuance Tech	29	4898	June 6, 2017
DNS Manual Failure Going on 4.75 hours so far Help	62	1048	November 1, 2025
Max 10 domains in DNS verification Help	16	1649	August 22, 2021
Rate limit for '/acme' reached Client dev	21	4026	October 22, 2022

Does resubmitting the same CSR affect rate limits?

Submission

Verification

Related topics