Occasional No Such Authorization

We are seeing no such authorization error occasionally in our system. We are using auth url returned by new order by and we call load authorization API just after new order api, so I believe we shouldn't encounter this issue. Usually, cert issue process works so I'm wondering if this is Let's encrypt side issue.

My domain is: fl-lbadv-jan19.testing.prod.toratest.com

I ran this command:
We have custom code using V2 API.
We called New order API and Load authorization for each auth url returned by the new order API just after new order API.

It produced this output:

We occasionally see "HTTP error: 404 Not Found\n(problem (type "urn:ietf:params:acme:error:malformed") (instance "") (id ) (title ""): (detail "No such authorization"))","

If we retry same step, we can successfully issue new certificates.

Hello @natiueno, elcome to the Let's Encrypt community. :slightly_smiling_face:

I see here https://tools.letsdebug.net/cert-search?m=domain&q=fl-lbadv-jan19.testing.prod.toratest.com&d=168 a large number of issued Certificates since Jan. 14, 2023.
Please use the Staging Environment for testing
It seems the domain testing.prod.toratest.com gets a new subdomain daily. Also testing.prod.toratest.com Domain Name suggest that it is for testing. You may also be hitting the Rate Limits.

2 Likes

Thank you for your quick follow up.
Got it, I'll use staging let's encrypt endpoint for our staging env. We also test our prod system, and we need to use prod let's encrypt endpoint in our prod system on our new system release date.
We also see same error for our customers domain too (sorry, I can't share the domain here without their permission).

Usually, we got rate limit error when we hit it, so our system wait new API call until specified timing and retry. but this time it's fails because our code considers 404 as non temporary error, so it won't retry.
We can change our system to retry on 404, but we would like to see rate limit error if this is rate limit error.

3 Likes

I think 404 is 404.
But perhaps it is being used for such failures...

Is there a reason why the HTTP challenge requests are being redirected to HTTPS?

3 Likes

Thank you for your follow up.

Could you elaborate? We are only using Let's encrypts HTTPs endpoint, and use DNS-01 for verification process. Also, I believe auth URL on response for new order API is using https.

2 Likes

Sorry, I 'm missed that point.

3 Likes

Possibly related:
fl-lbadv-jan19.testing.prod.toratest.com | DNSViz
image

2 Likes

I haven't examined your case, but as general advice: Let's Encrypt production is geographically distributed to two redundant locations, each containing multiple database replicas. There is a known problem with this topology: It is possible to create a resource (like an account, authorization, or order), and then a subsequent read operation ends up hitting a read-replica which is not caught up. We do not allow our replication lag to drop below 1 second, but that does leave a window of opportunity for incorrect 404s for just-created resources.

If you're writing your own client, I'd recommend adding retries to most API calls. While this should be rare, and we are working on making it rarer or not present, it is probably worth handling. Ensuring your HTTP client is reusing connections will also help stay in the same datacenter.

Note that our staging environment is much less distributed, much smaller, and generally less likely to exhibit these behaviors.

7 Likes

Thanks. I should have noted this at first.

4 Likes

Thank you so much for clarification. We will update our client to retry on 404

4 Likes

But the 404 malformed error message doesn't really invite to replying methinks?

Isn't it possible to reply with the order just only when the replica has confirmed existance of the authz? If it's usually very fast, waiting for it shouldn't really impact much, right?

7 Likes

Beyond that, I suggest logging all API calls and errors. With our custom client, we decided to log API calls to throttle on our end (if we know something will be rate-limited, just delay until it won't be!), and also log errors in a manner that allowed us to quickly replicate and test issues.

We typically sleep for at least 1 second on all authorizations to handle DNS-01 updates, as not applying that to HTTP-01 was a chunk of extra work. Until @mcpherrinm's post above, I had a ticket to remove that as a bug, but now I'm leaving it in as a feature to get around ISRG's replication.

6 Likes

Thank you for your suggestion. We will add wait before load authorization too, and log all API calls

5 Likes

If you're interested in following along, here's the bug documenting this issue:

7 Likes

This is definitely a bug and shouldn't happen! And we'd like to fix it. But as a practical matter, as a Let's Encrypt client, it's possible to work around.

8 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.