After renewing +/- 10,000 domains I get "Problem getting authorization" or "Failed to get registration by key"


#1

My domains are:
We host about 1 million domain names so everyday renewal is quite a lot.
Now I’m running into the issue that sometimes I get 500 errors after about generating/renew +/- 10k domains.

latest domains I would get this for are:

I ran this command:
We use resty-auto-ssl with dehydrated so in the case of passshoot.com:
/usr/local/bin/resty-auto-ssl/dehydrated --cron --accept-terms --no-lock --domain passshoot.com --challenge http-01 --config /efs/resty-auto-ssl/letsencrypt/config --hook /var/undeveloped/letsencrypt_hooks

It produced this output:
{
“type”: “urn:acme:error:serverInternal”,
“detail”: “Problem getting authorization”,
“status”: 500
}
or
{
“type”: “urn:acme:error:serverInternal”,
“detail”: “Failed to get registration by key”,
“status”: 500
}

Mostly I get the “Problem getting authorization”, rerunning the same command generates the certificate though

My web server is (include version):
nginx version: openresty/1.13.6.2

The operating system my web server runs on is (include version):
ubuntu 14.04

My hosting provider, if applicable, is:
I can login to a root shell on my machine (yes or no, or I don’t know):
yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel):
no


#2

https://letsencrypt.status.io/ says everything is fine, so I think an error 500 need @lestaff attention


#3

@roy_undeveloped Do you still gets sometimes get 500 errors?

Ping @cpu : {“type”: “urn:acme:error:serverInternal”,“detail”: “Problem getting authorization”,“status”: 500}


#4

@tdelmas Thanks for the tag. I’ll take a look.

@roy_undeveloped I haven’t looked deeply at the logs yet but I can suggest a few things right off the bat:

  1. You must be using an out of date version of Dehydrated - it isn’t sending a distinct user-agent, just “curl/7.35.0” - can you please update?
  2. You don’t have a Contact address associated with your ACME account. That’s really helpful and strongly recommended for someone issuing at the volume you are.
  3. You’re failing an awful lot of HTTP-01 challenges because of an NXDOMAIN response looking up an address for the domain name. That’s definitely something your software should be pre-checking before expending rate limits POSTing challenges that will fail. I see 104,696 challenges failed with this error in the past 7days alone.

I suspect that the rapid accumulation of pending and failed authorizations on your ACME account is tickling a performance problem in Boulder. We’ll investigate this further. There aren’t any wide spread issues at the moment.


#5

@roy_undeveloped - More feedback, the HTTP-01 challenges you’re failing from the NXDOMAIN errors are also being retried in extremely quick succession: e.g. for one example domain kompispriser.se you:

  1. Created an authz at 08/11/2018 16:26:20.924
  2. POSTed the HTTP-01 challenge, and failed it because of a NXDOMAIN error at 08/11/2018 16:26:21.919
  3. Created another authz for the same name at 08/11/2018 16:26:26.474, ~4s after failing the previous.
  4. POSTed the HTTP-01 challenge, and failed it because of the same NXDOMAIN error at 08/11/2018
    16:26:27.472
  5. Created another authz for the same name at 08/11/2018 16:26:30.558, ~3 s after failing the previous.
  6. POSTed the HTTP-01 challenge, and failed it because of the same NXDOMAIN error at 08/11/2018
    16:26:31.523
  7. Created another authz for the same name at 08/11/2018 16:26:34.761, ~4s after failing the previous.
  8. POSTed the HTTP-01 challenge, and failed it because of the same NXDOMAIN error at 08/11/2018
    16:26:35.740.
  9. That’s it! It seems like the process gave up and we’ve seen no further attempts to issue for this domain.

If you imagine that same thing happening for a bunch of your +/- 10,000 domains I can start to see why this would strain our systems and cause occasional 500 errors (About ~93 in the past 7ds).

I would strongly encourage you to look into how you can pre-check your own domains to ensure there’s a better likelyhood the HTTP-01 challenge will succeed before you POST the challenge. Please also consider exponential back-off between your failed challenge attempts. There are very few error conditions that would be resolved in <20s from the time of the first failure.


#6

@cpu - Thank you very much for the quick replies and getting to the root of the issues:
I have a couple of follow-up questions:

  • You said I don’t have a contact address associated with our ACME account.
    Can I just update that without getting into trouble with the renewal of our current certificates? (We currently have about 1,2 million certificates)
  • You mentioned checking beforehand if a domain has a proper DNS lookup.
    I actually do this already for every domain before trying to renew the domain I use 1.1.1.1 for that. Do you maybe use a different DNS server that I can as well for the pre-checking?

By the looks of it, it was a worker from lua-resty-auto that kept trying to renew an old certificate.
We have a different worker already in place because the lua-resty-auto one wasn’t keeping up with the number of domains we have.
I’ve now stopped the worker from lua-resty-auto-ssl completely so the brute checking, again and again, will not happen again.

Regards,
Roy


#7

There’s support in the ACME protocol for updating your registration with new contact information. It looks like Dehydrated has some kind of support with the --account command:

 --account                        Update account contact information

Updating the contact information shouldn’t affect renewals unless there is a bug with the client.

We run our own recursive resolver with a low max TTL (the configuration is comparable to this). Running your own Unbound instance with this configuration is a good way to ensure that the required records resolve and don’t run afoul of DNSSEC or other quirks. Short of that I’d recommend that directly check that all of the authoritative nameservers for the domain return the correct address, and if the domains are brand new registration that the required TLD records are present.

Great, thank you!


#8

Again, thank you for the quick replies.

  • I’ve also found that --account setting and I’ve updated our email address already.
  • I will definitely look into Unbound and the configuration you gave.