Half a dozen errors tonight saying that the server was busy

I'm pretty sure the problem has nothing to do with Synology.

I'm having the same problem tonight, from a macOS machine. I've gotten at least half a dozen errors tonight saying that the server was busy in a row so far tonight. It will validate anywhere from three to a dozen domains before giving up. I finally got it to work on the ninth attempt, and nearly all of the failures were server busy errors. (I also have some weird DNS flakiness with one domain that seems to be DNSSEC-related.)

I have my system set up to retry once a day, and that's not enough anymore. It had been trying unsuccessfully, and was within a week of not having a cert. Things have never been nearly this bad in the past. This looks to me like a real capacity problem that needs to be addressed ASAP.

Hi @dgatwood,

Do you know at what time it's trying? Is it at the top of the hour, or midnight in your local time zone or UTC, by any chance?

Let's Encrypt has been having some capacity problems specifically at midnight (UTC and perhaps one or two other time zones) because so many clients request renewals precisely then. There are some other recent threads about this. The trouble is that the request volume at those times is dozens or maybe hundreds of times the request volumes at other times. It's hard to address that by adding capacity when the volume is so spiky and when it could potentially be made less spiky as software developers follow Let's Encrypt's requests to add slightly more randomness into the timing of renewal requests.

If you're getting a server busy error at a time that's not the top of an hour, it would be great to see some more log information about that because it could reflect a different, previously unknown issue.

Edit: It sounds like you were trying at least in part by manually running a Let's Encrypt client? Did you have the bad luck to randomly happen to do so right at the top of an hour?

3 Likes

Would you please start a new thread. This thread looks very much like a problem in the Synology client. Your problem may have a similar symptom but could be a different cause.

4 Likes

I've split these posts into a new thread.

Notably, the Synology thread this was split from was from before we set the global rate limit that I suspect @dgatwood is running into.

There is more information in the API announcement message here: New "Service Busy" responses beginning during high load

6 Likes

Honestly, I have no idea when it was trying. My guess would be exactly 24 or 25 hours after the last time it ran, depending on random luck. :smiley:

I have a weird server setup, so I wrapped the client with a couple of hundred lines of custom shell scripts to make it all work correctly. It's held together by duct tape.... :joy:

Those outer scripts run about once an hour and check a file to see if the last run was more than a day ago, then runs the acme tool if it was. I stopped checking the exit status of the tool, because I kept running into problems where if something was wrong, I'd use up my quota of retries and have to wait a day after I got a "warning, your zone is about to expire" message before I could diagnose it. :frowning:

The problem I was encountering running it by hand was from about 8:30 to 8:45 Pacific time.

1 Like

Had you started a new thread you would have been shown the form questions below. We'll need more info if you want us to look at it further. Thanks

A domain name that is failing will be most helpful.

================================

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. https://crt.sh/?q=example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:

I ran this command:

It produced this output:

My web server is (include version):

The operating system my web server runs on is (include version):

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don't know):

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

The version of my ACME client is (e.g. output of certbot --version if you're using Certbot):

4 Likes

www.mklinux.org was the most common failing domain out of the set.
certbot 0.10.1
Domains: darwin-development.org git.gatwood.net infiniteloopfilms.com infiniteloopfilms.net mklinux.org shellscriptgames.com siliconvalleyrecords.com svrecords.com techmagazine.org thirty-six.net ucscwindensemble.org www.darwin-development.org www.infiniteloopfilms.com www.infiniteloopfilms.net www.mklinux.org www.shellscriptgames.com www.siliconvalleyrecords.com www.songcue.com www.svrecords.com www.techmagazine.org www.thirty-six.net www.ucscwindensemble.org www.xyztelevision.com xyztelevision.com homeserver.gatwood.net songcue.com gatwood.net www.gatwood.net

Most common failure was with www.mklinux.org, but not always.

Command was
python /etc/letsencrypt/acme-tiny/acme_tiny.py --account-key /etc/letsencrypt/account_keys/account.key --csr "$OUTPUTDIR"/temp.csr --acme-dir "/etc/letsencrypt/acme-challenge/" > "$OUTPUTDIR"/temp.crt

The output was an error that the service was busy, try again later.

Hmm. It's interesting that one also has the most DNS errors at dnsviz.net. Not sure what to make of that. Just hmmm.

Do you have the exact error message? Helps to ensure we look at right component.

Are you instantly retrying a failed attempt? Because the usual limit on failures is 5/hour per account and host name. So at most a 1H wait period. If your client constantly makes failed requests you could get blocked by LE. See about Rate Limits here

Note that I see you got a cert about 24H ago so an LE block is not active

2 Likes

This is incredibly old and not recommended for usage.

This is a very stripped down client that lacks proper error handling and cleanup functions.

Your cert has 27 domain names on it. If you hit a failure during renewal, your client will leave "pending authorizations", which are rate-limited for a week, for all the other domains. A proper client will disable the pending authorizations as part of a cleanup phase. It is very common for multi-domain certificate renewals to spiral out of control, and wedge accounts with rate limits due to this.

You should consider switching clients. You should also consider dropping down to one certificate per registered domain (e.g. bare + www).

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.