Occasional No Such Authorization

natiueno · January 20, 2023, 5:21pm

We are seeing no such authorization error occasionally in our system. We are using auth url returned by new order by and we call load authorization API just after new order api, so I believe we shouldn't encounter this issue. Usually, cert issue process works so I'm wondering if this is Let's encrypt side issue.

My domain is: fl-lbadv-jan19.testing.prod.toratest.com

I ran this command:
We have custom code using V2 API.
We called New order API and Load authorization for each auth url returned by the new order API just after new order API.

It produced this output:

We occasionally see "HTTP error: 404 Not Found\n(problem (type "urn:ietf:params:acme:error:malformed") (instance "") (id ) (title ""): (detail "No such authorization"))","

If we retry same step, we can successfully issue new certificates.

Bruce5051 · January 20, 2023, 5:32pm

Hello @natiueno, elcome to the Let's Encrypt community.

I see here https://tools.letsdebug.net/cert-search?m=domain&q=fl-lbadv-jan19.testing.prod.toratest.com&d=168 a large number of issued Certificates since Jan. 14, 2023.
Please use the Staging Environment for testing
It seems the domain testing.prod.toratest.com gets a new subdomain daily. Also testing.prod.toratest.com Domain Name suggest that it is for testing. You may also be hitting the Rate Limits.

natiueno · January 20, 2023, 5:40pm

Thank you for your quick follow up.
Got it, I'll use staging let's encrypt endpoint for our staging env. We also test our prod system, and we need to use prod let's encrypt endpoint in our prod system on our new system release date.
We also see same error for our customers domain too (sorry, I can't share the domain here without their permission).

Usually, we got rate limit error when we hit it, so our system wait new API call until specified timing and retry. but this time it's fails because our code considers 404 as non temporary error, so it won't retry.
We can change our system to retry on 404, but we would like to see rate limit error if this is rate limit error.

rg305 · January 20, 2023, 5:52pm

I think 404 is 404.
But perhaps it is being used for such failures...

Is there a reason why the HTTP challenge requests are being redirected to HTTPS?

natiueno · January 20, 2023, 5:55pm

Thank you for your follow up.

Could you elaborate? We are only using Let's encrypts HTTPs endpoint, and use DNS-01 for verification process. Also, I believe auth URL on response for new order API is using https.

rg305 · January 20, 2023, 6:25pm

Sorry, I 'm missed that point.

rg305 · January 20, 2023, 6:33pm

Possibly related:
fl-lbadv-jan19.testing.prod.toratest.com | DNSViz

mcpherrinm · January 20, 2023, 6:39pm

I haven't examined your case, but as general advice: Let's Encrypt production is geographically distributed to two redundant locations, each containing multiple database replicas. There is a known problem with this topology: It is possible to create a resource (like an account, authorization, or order), and then a subsequent read operation ends up hitting a read-replica which is not caught up. We do not allow our replication lag to drop below 1 second, but that does leave a window of opportunity for incorrect 404s for just-created resources.

If you're writing your own client, I'd recommend adding retries to most API calls. While this should be rare, and we are working on making it rarer or not present, it is probably worth handling. Ensuring your HTTP client is reusing connections will also help stay in the same datacenter.

Note that our staging environment is much less distributed, much smaller, and generally less likely to exhibit these behaviors.

natiueno · January 20, 2023, 6:41pm

Thanks. I should have noted this at first.

natiueno · January 20, 2023, 6:41pm

Thank you so much for clarification. We will update our client to retry on 404

Osiris · January 20, 2023, 6:47pm

But the 404 malformed error message doesn't really invite to replying methinks?

Isn't it possible to reply with the order just only when the replica has confirmed existance of the authz? If it's usually very fast, waiting for it shouldn't really impact much, right?

jvanasco · January 20, 2023, 6:50pm

Beyond that, I suggest logging all API calls and errors. With our custom client, we decided to log API calls to throttle on our end (if we know something will be rate-limited, just delay until it won't be!), and also log errors in a manner that allowed us to quickly replicate and test issues.

We typically sleep for at least 1 second on all authorizations to handle DNS-01 updates, as not applying that to HTTP-01 was a chunk of extra work. Until @mcpherrinm's post above, I had a ticket to remove that as a bug, but now I'm leaving it in as a feature to get around ISRG's replication.

natiueno · January 20, 2023, 6:55pm

Thank you for your suggestion. We will add wait before load authorization too, and log all API calls

mcpherrinm · January 20, 2023, 9:02pm

If you're interested in following along, here's the bug documenting this issue:

github.com/letsencrypt/boulder

Reduce inaccurate 404s by adding structure to ACME object URLs

opened 07:26PM - 09 Dec 22 UTC

jsha

Now that we're doing more with sending traffic to replicas, we have a problem wh…ere sometimes a user creates an object (account, order, authzs), and then immediately fetches that object. If they hit a replica for the fetch, and the replica is lagged, they might get a 404. At a minimum we'd like to turn these 404s into something more informative that indicates a retry might succeed (like a 5xx series error). We might also like to route such prospective 404s to the primary DB. But we don't want to send _all_ 404s to the primary DB because that would be too much junk traffic. We can start incorporating creation timestamp into the URLs of objects we create. This will allow us at request time to determine that an object was recently created, and query the primary DB if the query to the replica returns no answer. To ensure that the timestamp is meaningful and not random junk, we can add an HMAC (ht @mcpherrinm). Adding structure to ids will also be useful as we start to horizontally shard. It will be useful to know which shard an object was created in, or which datacenter.

mcpherrinm · January 20, 2023, 9:30pm

This is definitely a bug and shouldn't happen! And we'd like to fix it. But as a practical matter, as a Let's Encrypt client, it's possible to work around.

system · February 19, 2023, 9:31pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't get a certificate for domain on www or naked version, getting error 'order authorization error' Help	10	344	April 6, 2024
The client lacks sufficient authorization - 404 Help	34	173792	June 14, 2020
Authorization staus invalid after the second response Help	7	661	July 23, 2020
Too many failed authorizations recently Help	4	1159	February 16, 2022
I cant get new crtificates - 404 Not Found Server	5	3216	July 6, 2017

Occasional No Such Authorization

Related topics