What’s creating Pending Authorizations?

Hello,

We’re a hosting provider who has been provisioning Let’s Encrypt certificates for use by our customers for about 2 years now. Everything has been working fine with provisions and renewals until earlier this month, when we started getting Too Many Pending Authorizations rate limiting errors.

We are running certbot from another (proprietary) script to handle renewals and cert provisioning, using certonly issuance, webroot validation, and managing web server configuration with our own tool set. We’re using a single Let’s Encrypt account across all of our (1500+) servers.

I’ve had success using LE_FIND_PENDING_AUTHZ.py to eliminate our pending authorizations. However, for some reason they are coming back quite quickly, and we’d like to track down the root of the problem. Canceling pending authorizations is troublesome due to the high number of authorizations we create in the 7 day window, so we’d like to figure out how to prevent this from happening in the first place.

Now that we’re having this problem, using a single account for all of our hosting servers feels like it may not have been the best choice. However, at least it makes it easier for us to clear out the authorizations centrally rather than having to manage it across many different accounts.

Regarding “too many pending authorizations”: I’ve read “That Should Never Happen” as long as we use certbot and don’t issue certs with a high number of SAN domains (we only issue at most one SAN for www along with the main domain name); but it is happening.

My initial suspicion was: if our script that calls certbot times out and kills off certbot in the middle of issuance, could this leave pending authorizations open? What about when certbot is processing renewals? Can authorizations be created by failed cert provisioning or failed cert renewal attempts when certbot completes without interrupotion?

We’ve drastically increased our timeouts and added instrumentation to help us determine if timeouts are the source of problems in the future, but I don’t have logs on whether this happened in the past.

If there is any other way that using certbot might create pending authorizations, please tell me. If there is any way to get a list of pending authorizations via ACME, that would be much more efficient than sending requests for thousands of authorizations just to find the 300 that are still pending.

Thanks!
Alan Ferrency

My domain is: (Various, we are a hosting provider)

I ran this command:

certbot certonly --webroot --webroot-path ... --account ... -d example.com -d www.example.com -n
certbot renew --account ... --post-hook ...

It produced this output:

2019-05-28 01:10:06,201:ERROR:certbot.log:There were too many requests of a given type :: Error creating new order :: too many currently pending authorizations: see https://letsencrypt.org/docs/rate-limits/

My web server is (include version): Not relevant because we use certonly? Apache 2.4 and NGINX

The operating system my web server runs on is (include version): Ubuntu, FreeBSD; various versions

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): 0.34.2

Hi @ferrency

if you want to create a new certificate, some steps are required:

  • The client creates a new order
  • The server (Letsencrypt) creates a list of authorizations (one per domain name) and a list of challenges (per authorization - http-01, dns-01 etc)
  • The client selects one challenge per authorization and creates the http-01 validating file (the dns TXT entry etc.)
  • If this is done, the client must send a “Hey, I am ready” command to the challenge url
  • Then the server starts to check that specific challenge (sample: http-01 validation) and ignores the other challenges of that authorization
  • Every check has a result - challenge is invalid or valid
  • if one challenge is invalid, the order is invalid
  • if all challenges are valid, the order is ready
  • Then the client can create a CSR, upload that CSR, the server creates the certificate and send the download url.

Result: If you start new orders, but if your client doesn’t send the “Hey, I am ready” command, the challenge (and the order) is pending.

So normally that should never happen. But if you have timeouts and if your client creates a new order, then you have a pending order.

It’s not relevant if it is the first certificate or if it is a renew.

Perhaps Certbot isn’t the right client.

I use my own client (not published). If there would be a timeout, the client would find the older order and would go to the next step.

Thanks.

It does sound like killing the certbot process when it’s part way through could cause this problem. The limited logs I’ve collected so far suggest it’s unlikely we timed out 300 times since the last time I cleared out pending authorizations.

I will go try to track down a few authorizations that were left pending, to see where they came from. In the mean time if anyone knows of any other specific situations with certbot that are known to leave pending authorizations, that could be helpful.

Thanks,
Alan

An update:

I tracked one pending auth back to its initial log file, and found that in this case the renewal conf file was missing its [[webroot_map]] for the specific domains, even though it had a webroot = setting configured. It seems like certbot may not be retracting its pending auths in this case of an erroneous configuration.

(Part of me thinks this is due to a change in certbot at some point, because I have never modified the way we ask to provision certs and yet the problem only started recently.)

That's possible. Certbot tries to create the validation file, that doesn't work, Certbot stops -> the order is pending.

But that's the problem. Certbot may be the wrong client to use with such a configuration.

Yeah, I’ve found additional error cases that are resulting in pending auths: a missing directory or other inability to create the webroot authorization file.

Do you have any recommendations for an alternative to certbot?

I have a plan to find and fix our current problematic configurations, but more flexibility in avoiding these problem cases in the first place seems like a better long-term solution.

Thanks.

No. I've created my own (raw .NET) client without using a library.

But there are a lot of libraries (different languages) you can use. A lot of things to do are local: Save the order informations, parse the order url, create the challenge file, the CSR.

So the interaction with the ACME-server is only a small part of the job.

With an own client you have a lot of different options.

2 Likes

Clients: https://letsencrypt.org/docs/client-options/

Many pending authorizations is a basically the sign of a bug in the client (or buggy behavior due to wack configuration). It’s not a common issue with Certbot these days. You can usually identify why it’s happening by posting full logs from across a few days of scheduled runs.

I do not think that the webroot_map theory is likely. Certbot would just fall back to the default webroot path, causing the authorization to fail upon response (and therefore not be pending).

1 Like

That's what I thought as well, and it matches what I thought used to happen with missing [[webroot_map]] sections.

However, my logs seem to disagree. I found these logs related to a domain name that also showed up with a Pending Authorization cleared by LE_FIND_PENDING_AUTHZ.py:

2019-05-28 06:40:08,941:INFO:certbot.auth_handler:http-01 challenge for <customerdomain>.com
2019-05-28 06:40:08,943:DEBUG:certbot.error_handler:Encountered exception:
Traceback (most recent call last):
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 69, in handle_authorizations
    resps = self.auth.perform(achalls)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/plugins/webroot.py", line 80, in perform
    self._set_webroots(achalls)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/plugins/webroot.py", line 98, in _set_webroots
    known_webroots)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/plugins/webroot.py", line 119, in _prompt_for_webroot
    webroot = self._prompt_for_new_webroot(domain, True)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/plugins/webroot.py", line 143, in _prompt_for_new_webroot
    force_interactive=True)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/display/ops.py", line 368, in validated_directory
    validator, *args, **kwargs)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/display/ops.py", line 325, in _get_validated
    code, raw = method(message, default=default, **kwargs)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/display/util.py", line 575, in directory_select
    return self.input(message, default, cli_flag)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/display/util.py", line 520, in input
    self._interaction_fail(message, cli_flag)
  File "<snip>/certbot/local/lib/python2.7/site-packages/certbot/display/util.py", line 466, in _interaction_fail
    raise errors.MissingCommandlineFlag(msg)
MissingCommandlineFlag: Missing command line flag or config entry for this setting:
Input the webroot for <customerdomain>.com:

2019-05-28 06:40:08,944:DEBUG:certbot.error_handler:Calling registered functions
2019-05-28 06:40:08,944:INFO:certbot.auth_handler:Cleaning up challenges
2019-05-28 06:40:08,944:DEBUG:certbot.plugins.webroot:All challenges cleaned up
2019-05-28 06:40:08,945:WARNING:certbot.renewal:Attempting to renew cert (<customerdomain>.com) from /etc/letsencrypt/renewal/<customerdomain>.com.conf produced an unexpected error: Missing command line flag or config entry for this setting:
Input the webroot for <customerdomain>.com:. Skipping.

Updating the webroot_map to match the existing webroot configuration item allowed the cert to be renewed correctly.

If this was a recent change in certbot behavior it would explain why I hadn't seen this problem until recently.

Yup, after playing with Certbot for a while, I completely agree with you.

I was mislead by some code in the webroot plugin that made me believe that Certbot would use the last value of webroot_path for domains not matched in the webroot_map.

However, it doesn’t work that way. I’m not sure if that is dead code (hasn’t been touched since 2016) or meant for another purpose, but the fallback condition never evaluates, if a domain is missing from the map.

@schoen do you know if Certbot meant to fall back in this case, or is the current behavior correct?

1 Like

I think the current behavior is what was intended by the Certbot developers (since the map was seen as a complete alternative to the path), but still might not be the most preferable behavior. @bmw, what do you think?

2 Likes

I believe the issues you’re seeing with the webroot values in the renewal configuration file is https://github.com/certbot/certbot/issues/7048 which will be fixed in our release next week. It’s a bug that was introduced in 0.31.0 and we’ll get it fixed, but the issue is only caused when reissuing certificates for domains that still have valid authorizations.

4 Likes

Thanks for the update.

I see Using the webroot path %s logs, but only for initial cert provisioning, not during renewals.

I would have guessed that I had some configurations without [[webroot_map]] sections that used to renew correctly, but at this point I can’t find any evidence of a problem that is inconsistent with issue #7048.

If there’s any unresolved issue with certbot here, it is:

Expected behavior: When Certbot fails to renew a certificate because there is no webroot_map setting or because the webroot_map directory does not exist, all pending domain auths should be cleared.

Actual behavior: After one of these failure modes, requested auths are left pending, which can lead to rate limiting issues in sufficiently large deployments.

I’m happy to know how to fix the rate limiting issue. If I have time, I’ll dig into the certbot code to see if there’s anything obvious.

Certbot’s actual behavior is it will never deactivate authorizations.

We could maybe change this and have Certbot deactivate authorizations if it encounters an error between obtaining the authorizations from the CA and sending the challenge responses, but that adds a fair bit of complexity client side, may defeat the benefits of Automatic recycling of pending authorizations server side, and to be honest, is something the core Certbot team wouldn’t have time to implement for quite a while.

Regardless of that, to try and give you some info to help you work around the problem in the meantime, I’ll summarize and add to some of the above info and say:

  1. The webroot issue only occurs if you’re issuing a certificate for a domain that still has a valid authorization. Let’s Encrypt’s authorizations have a 30 day lifetime while its certificates are valid for 90 days so to hit this webroot issue, you’re issuing another certificate for a domain well before its expiration. This could be due to setting --force-renewal or --renew-by-default on the command line or global configuration file or by having multiple certificate lineages (separate directories in /etc/letsencrypt/live) containing the same domain. Unsetting a command line flag is simple enough, but if its the latter problem and you don’t need certificates like that, I recommend running a command like certbot certificates to look for duplicates and then running certbot delete --cert-name <cert name> where <cert name> is given to you by the certbot certificates command.
  2. Certbot will only leak pending authorizations if it crashes during the window I described above. It crashing at this time could be due to the webroot issue or your scripts killing Certbot. If I were you, once you have the current pending authorization problem solved (either through your own scripts or waiting a week), I’d run certbot renew ... once and look through the log for messages like Attempting to renew cert (example.com) from /etc/letsencrypt/renewal/example.com.conf produced an unexpected error and fix up the problems that occurred.

I hope that helps!

1 Like

Thanks. If reducing leakage of pending auths in certbot isn’t feasible, I have enough of a solution to consider this issue resolved.

Searching for Attempting to renew cert (example.com) from /etc/letsencrypt/renewal/example.com.conf produced an unexpected error and fixing the problems is exactly how I addressed the core issue. Simply issuing a challenge to each pending auth wasn’t a long term solution because our broken pending renewals regenerated all 300 pending auths within a 24 hour period (we request renewals once daily staggered across the server population).

I am not sure our missing [[webroot_map]] settings are due to the bug you referred to: the renewal conf files were dated as of the most recent successful renewal/issuance, not the most recent failed renewal. We aren’t forcing renewals, although in some cases when a domain name is moved between servers the losing server may end up with a lame /etc/letsencrypt/live file.

In any case, the solution is clear: fix broken configurations and missing webroot directories.

Thanks for your help!

That's the behavior I'd expect. It's kind of a tricky issue that is caused by some implementation details described a bit in the issue and the PR that fixed the problem, but the issue can occur when a certificate is issued using the webroot plugin for domains that already have valid authorizations using a Certbot subcommand other than renew (or using the renew subcommand with webroot settings provided on the command line). That creates the renewal configuration file with missing values and you'll get the failure and leaked authorizations when the certificate is attempted to be renewed by certbot renew.

Certbot 0.35.0 will fix the problem of dropping information and will be able to automatically fix some cases where information has already been dropped from prior runs, but not all of them. If I were you, I'd upgrade to 0.35.0 as soon as you can when it's out next Wednesday and keep an eye on certbot renew failures/logs.

Another thing you may be able to do to find more of these problems now is to run certbot renew --dry-run which will attempt renew all certificates using Let's Encrypt's staging server without writing any changes to /etc/letsencrypt, however, depending on the size of your configuration, this could take quite a while.

Thanks. I’ve created an internal issue to better parse the reason renewals fail, but most likely increased vigilance regarding renewal failure e-mails will avoid almost all of our problems in the future.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.