Our DNS challenge logic, though, failed because the authz was valid at that point, and so there was no DNS challenge in the authz object.
I can (and will) update our DNS challenge logic. There’s definitely a race condition built into our workflow; it just seems a bit funny that it would happen for two domains at the same time.
Are you guys able to look at logs and see what may have happened here? Was there a pending status being given to some authz polls that should have given back valid?
I’m wondering if those initial order’s authzs were evervalid, and somehow only the new order’s authzs reflected a successful challenge.
I think it’s 300 authzs per week, not pending challenges.
We haven’t been hitting the rate limit you reference, which I suspect is because the authz doesn’t stay pending for a week; it just takes longer than we’re willing to wait for it to resolve. Our logic doubtless has room for improvement, but thus far it’s served us and our customers well.
Rate Limits - Let's Encrypt
You can have a maximum of 300 Pending Authorizations on your account. Hitting this rate limit is rare, and happens most often when developing ACME clients. It usually means that your client is creating authorizations and not fulfilling them. Please utilize our staging environment if you’re developing an ACME client.
You will, if you abandon challenges without waiting for them
(I am not sure if the challenge expires in a week or sooner/later, though… you should check it until it either fails or succeeds. Up to a day later should be fine.)
Our client is deployed widely enough—and for a long enough time—that were there an issue with the pending-authzs rate limit we’d almost certainly know.
But for the sake of argument, is there a way to check an account’s current “progress toward rate limit”?
I think the one-week interval is how long a valid authz lasts.
If you have a large number of pending authorization objects and are getting a rate limiting error, you can trigger a validation attempt for those authorization objects by submitting a JWS-signed POST to one of its challenges, as described in the ACME spec. The pending authorization objects are represented by URLs of the form https://acme-v02.api.letsencrypt.org/acme/authz/XYZ , and should show up in your client logs. Note that it doesn’t matter whether validation succeeds or fails. Either will take the authorization out of ‘pending’ state. If you do not have logs containing the relevant authorization URLs, you need to wait for the rate limit to expire. As described above, there is a sliding window, so this may take less than a week depending on your pattern of issuance.
Note that having a large number of pending authorizations is generally the result of a buggy client. If you’re hitting this rate limit frequently you should double-check your client code.
We’d hit it, I suspect, since we have one ACME account per server.
We used to hit the rate limit, years back when this was all fairly new, because we did an internal preflight check before polling LE, which meant some authzs were never polled. Since we fixed that, though, there haven’t been problems.
The challenge status stays in pending on his own, unless the ACME server decides to time it out after a very-very long time. The ACME client's responsibility to fire its URL to initiate the transition out from this state. As far as I know, the challenge status immediately changes to processing on firing its URL.
That’s probably it, and why we’re not hitting the rate limit. We poll for a while, then give up and switch to DNS; that first poll probably moves the authz to processing.
30 seconds had been enough. Maybe with LE’s switch to repeat authzs there’s a stronger case for extending that timeout?