In serious trouble: "too many currently pending authorizations"

As of right now we have dozens of customers awaiting SSL cert for nearly a week, but more importantly, we’re now 14 days away from expiring production certs on thousands of existing customers.

It’s typical for us to sometimes run into this message with our old acme v1 certbot client:

There were too many requests of a given type :: Error creating new authz :: too many currently pending authorizations: see https://letsencrypt.org/docs/rate-limits/

It usually can be solved with the clear-authz script. However, our logs seemed to have rolled away any of the relevant log messages necessary for clear-authz to function successfully. Therefore, we cannot clear these ourselves, and must wait for the full 7 day duration.

However, it’s now been 8 days.

We’ve been getting this too many currently pending steadily since Dec 31st, without any apparent break. Today is Jan 8th. That is 8 full days without a break from this rate-limit. When we noticed it and tried to run our clear-authz script, it was already Jan 6th, and the necessary logs for clear-authz had rolled away.

You will undoubtedly criticize our use of an old acme v1 client, which is good criticism. We have finally gotten internal approval to seriously enhance or completely replace this with a modern client as a Q1 goal for 2020, however right now we’re in serious trouble and just need to get past this rate-limit.

Note: we are hitting this rate-limit ONLY in your staging environment. Our system uses your staging environment for our production solution (yes yes, another thing we’ll have to change very soon). We’re currently scrambling to get it to use production only.

1 Like

If it’s urgent, can you just rotate to a new ACME account?

Thank you, yes we tried that. The CLI account creation tool informed us that Acme V1 accounts are not allowed to be created anymore.

Oh yeah - I forgot about that. Oof, that’s a tough one. :frowning:

My primary befuddlement right now is that we’ve been locked out due to rate-limit, non-stop, for (apparently) 8 days, which should simply never happen. I’m wondering if something has changed in the logic calculating that rate-limit in the staging server recently.

... however, our logs seemed to have rolled away any of the relevant log messages necessary for clear-authz to function successfully

I'm no longer certain that clear-authz is failing due logs being rolled away. We've just had a partially-successful run which results in logs which I would expect clear-authz to pick up on, and it does not.

Has anything changed in the last year+ that would cause clear-authz to no longer work or correctly parse logs??

1 Like

It's possible. There was a change to authz URLs some time ago which could have broken the regex the tool uses. I don't have an active ACME v1 account so I couldn't tell you whether that's the case.

You might have to update this line to say /authz-v3/ rather than /authz/.

2 Likes

Thank you, that got us to a point where it's successfully parsing our logs and finding authz. However, we're unsure if the rest of the steps in this tool are working.

Our output right now is simply:

Checking 199 authzs to see if they are pending ...

And then clear-authz completes. I expected more output, so I'm not confident about how many of those authz are pending, etc

1 Like

If any were pending, it would report it.

So all 199 are probably valid, invalid or expired.

You can also confirm by just visiting those URLs in a browser.

1 Like

There have been no changes to how the pending authorization rate limit is calculated in staging or production.

Sorry, but it sounds like there isn't anything that can be done here. You'll need to transition to ACME v2 or address the authz leak and wait 7 days for your existing pending authorizations to expire. Good luck,

2 Likes

You’ll need to ... wait 7 days for your pending authorizations to expire

That was our plan a week ago. I made this forum post on the 8th day. This is why I was so concerned. It appeared very likely from our end that your staging environment wasn't obeying the 7 day rate limit, though I'm very open minded that there may be another explanation involving a fault in our system. We grepped all of our logs during the past several days to watch for our rate-limit to open up and potentially re-close again due to new failures and new pending authz, and never saw that happen. However, we did lose the first couple days worth of logs, so it may have happened therein.

Anyhow, last night we lucked out and found a very old, unused Acme V1 account sitting dormant in an unused server, and switched to that in production. We've now created certs for all our new customers, but haven't yet tested renew. We've got 13 days to get that sorted out and will start testing today.

1 Like

The most likely explanation is that your system continued to leak pending authorizations throughout the rate limit period.

1 Like

I have retried with staging again today and it is still blocked with “too many currently pending” message.

Today is January 9th and this started on Dec 31st. We did manually parse all of our logs all the way back through January 4th or 3rd (can’t recall and logs rolled away from us since then) and confirmed that no more pending authz were “leaked” during that time frame.

It would seem the most likely thing is one of these

  • Our quota opened up and we leaked again between jan 1st and jan 3rd but can’t determine via logs
  • Our manual log parsing technique (used while clear-authz was broken) was flawed.

If the unlikely happens and we’re still rate limited in a few days, I’ll swing back in. Until then, we’re hitting production now and potentially using up that quote, which is scary for us, like riding on a reserve parachute :stuck_out_tongue:

2 Likes

For posterity:

By January 15th, our staging account had become available again. Only reasonable explanation is that during those 2 or 3 days in which we had no logs, we just happened to leak further authz and re-establish a 7-day wait (for a total of ~15 days of rate limit lockout).

We were able to survive by running on production account only, and running clear-authz script after EVERY run, rather than only when rate-limit is observed.

We’re beginning our effort to move to acme v2 this month.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.