my company is issuing several certs per day under a rate limit agreement but recently we’ve been hitting the too many pending authorizations limit. we’ve gone through all of the authz URLs from our logs and issued a “deactivate” call but the rate limit is still getting hit.
Is there a way to get a list of all pending authorizations yet? and if not can we get them cleared somehow?
Does deactivating a pending authorization work? To be honest, I haven't tried that one yet. The recommended way is to trigger a validation attempt so that the authorization goes to a success or fail state - see the last section of the rate limiting policy:
If you have a large number of pending authorization objects and are getting a rate limiting error, you can trigger a validation attempt for those authorization objects by submitting a JWS-signed POST to one of its challenges, as described in the ACME spec. The pending authorization objects are represented by URLs of the form https://acme-v01.api.letsencrypt.org/acme/authz/XYZ, and should show up in your client logs. Note that it doesn’t matter whether validation succeeds or fails. Either will take the authorization out of ‘pending’ state. If you do not have logs containing the relevant authorization URLs, you need to wait for the rate limit to expire. As described above, there is a sliding window, so this may take less than a week depending on your pattern of issuance.
I'm not aware of a way to get all pending authz in case you don't have all of them in your logs. In that case, waiting for their expiration or using a new account are probably the only options.
Does deactivating a pending authorization work? To be honest, I haven't tried that one yet.
@pfg Looking now I think we have a bug in Boulder that includes the deactivated pending authorizations in the count used to enforce the pending authorizations limit. I'll open a ticket to see about getting that fixed.
I think in the mean-time @jipperinbham will have to take the strategy you mentioned and force these pending authzs into a non-pending state with an update. Alternatively, the pending authzs will expire if you wait (presently this happens in 7 days from creation).
@cpu I’ve tried both methods and neither has resulted in clearing the pending authorizations. Is there really no way to get a list of all pending authorizations for an account?
As for the larger problem, do you have any client logs from Lego from deactivating your pending authorizations?
It looks to me that your registration ID is currently sending ~200-500 new-authz requests per minute!!! Is it possible that you successfully deactivated your existing pending authorizations but are creating so many new ones you immediately hit the rate limit again?
Can you investigate why your client is repeatedly posting to new-authz?
We have about 200 jobs backed up due to the rate limit so it’s just constantly retrying to obtain a new-authz but based on the logs, we’re never getting a good URL back but constantly getting the rate limit error.
When you say they are all in one of those three states, how many are in each?
Are you able to provide full logs via a non-public channel? Can you provide more information about your lego invocations (e.g. the command line arguments)?
Are you using the CLI at all or is your system integrating with Lego as a library?
You’re saying that all of the pending authz’s you tried to update are in the three mentioned states.
For state 1, deactivated, we know that deactivated pending authorizations do not affect the rate limit (we will resolve this issue within the next few releases). Having you deactivate these authorizations was likely bad advice (we weren’t aware of the bug from #2309 at the time), since you can’t use the strategy @pfg mentioned to trigger a validation attempt once the authz is deactivated.
For state 2, you can ignore these. The pending authorizations have expired and aren’t counting against the limit.
For state 3, you can ignore these. The pending authorizations were updated to invalid pending the failed challenge update strategy.
It definitely seems as though your systems are “leaking” pending authorizations by creating them and not updating or finalizing them. My best theory so far is that your system hit the limit from this leak, you deactivatated some number of the pending authorizations, that didn’t help with the rate limit (due to #2309) and it further left them unable to be updated into state #3.
re: When you say they are all in one of those three states, how many are in each?
I’m not entirely sure of the exact numbers but the large majority are in a deactivated state.
re: Are you able to provide full logs via a non-public channel? Can you provide more information about your lego invocations (e.g. the command line arguments)?
I can definitely provide every authz URL from 10/30 - 11/2.
re: Are you using the CLI at all or is your system integrating with Lego as a library?
it’s all within our integrated system
ok, it definitely seems like the deactivated authorizations is killing us which is what we were told to do. I’m not entirely following if I can clear those out on my own?
I can remove that code very easily that currently makes a POST call to deactivate the authorizations once they complete (regardless of success/failure) and it would be really nice if we can get the rate limit cleaned up so that our service isn’t impacted any longer.
I’d recommend rolling over to a new ACME account/registration. This specific rate limit is per account, so you can easily get around it that way. Having multiple accounts is perfectly fine (the rate limit for creating new accounts is fairly high).
@pfg we have a rate limit exception on our domains and I couldn’t remember if that was tied to the account or domain. I vaguely remember being told to use a single account so its not so difficult to manage.
Sorry, missed that bit. I think boulder supports exceptions both per domain (independent of the account used) and per account. I suspect the former would be used when it’s a small set of specific domains for which you issue certificates, but it’s probably best to wait for the boulder team to comment in this case.
@pfg it appears as though we over the pending authz rate limit and I was able to trigger a failure to see how our cleanup code handles it. I want to confirm this is the proper behavior we need to implement to ensure we don’t get rate limited like this again in the future.
To clear it from pending, we issue a validate to the DNS01 challenge URL, knowing it will fail:
2016/11/08 10:47:26 [DEBUG][REDACTED.composedb.com] initiating validation for dns-01 on https://acme-v01.api.letsencrypt.org/acme/challenge/REDACTED/CHALLENGE_ID
2016/11/08 10:47:26 [INFO][REDACTED.composedb.com] sending validate for https://acme-v01.api.letsencrypt.org/acme/challenge/REDACTED/CHALLENGE_ID
2016/11/08 10:47:28 [ERROR] validate error acme: Error 400 - urn:acme:error:connection - DNS problem: NXDOMAIN looking up TXT for _acme-challenge.REDACTED.composedb.com
2016/11/08 10:47:28 [DEBUG] successful validation of https://acme-v01.api.letsencrypt.org/acme/authz/REDACTED
If I then make a GET call to the authz URL, I get the following:
2016/11/08 10:48:01 [DEBUG][1991322541.composedb.com] https://acme-v01.api.letsencrypt.org/acme/authz/ZpwLF2B0lYBB9C3t9WQsPWoKFZL2XsC1sCJRQLiZyuc not in pending status, invalid
By being in the invalid state, we shouldn’t be impacted by the rate limit correct?