Possible new feature: paused ACME accounts

I ran some numbers to estimate the size of impact. So far just looking at validation logs. From 03-21 to 03-28, we had:

  • 216M validation attempts
  • 179M validation failures (83%)
  • 97M of those failures came from accounts that had 0 validation successes of the course of that week. Removing those failures would bring the failure rate down to 38%. (edited)

Here are some numbers bucketed by how many validation attempts a given account had during the week. A "total failure" is an account that had 0 validation successes; these are the candidates for pausing (if they also had no issuances for X days). I summed up the error counts from them.

bucket accounts validation attempts errors errors from total failures error rate
1 786,802 786,802 185,146 185,146 0.23531
2-5 1,129,112 3,126,791 636,033 473,726 0.20341
6-25 547,993 6,865,347 4,637,927 3,622,346 0.67556
26-625 521,949 74,910,336 69,896,477 49,091,267 0.93307
626-3125 45,053 55,769,849 52,059,549 25,932,861 0.93347
3126-15625 6,053 35,594,192 32,814,899 13,122,730 0.92192
15625+ 696 39,353,654 19,274,601 4,582,055 0.48978

Interesting that the error rate starts out very low, in the buckets with few attempts. In buckets with larger number of attempts, we see the error rates get much higher. Presumably this is a matter of clients that retry faster than they should.

Since these are actual validation attempts, they don't include requests that were stopped by the rate limits. So for instance, a client that was retrying failed validation as fast as allowed by the rate limits (5 failed validations / hostname / hour) would have 840 attempts for a single hostname.

6 Likes