Series of 500s with "Error creating new cert"

as a service provider we are getting series of 500s with “Error creating new cert” on attempted provisioning.
Started around 2017-07-28 12:33 Pacific Time, likely going from


as this is an internal, we don’t get any reasonable details from server, although they seem to be logged on your side.

Could you check the logs for domains:

I suspect this could be related to Boulder Update to +d2af4a0.


We have found occurrences from well before Boulder Update to +d2af4a0 so please disregard this.

I suspect this could be just combination of various transient issues surfacing similarly over time.

I suspect you saw these errors transiently during the Boulder upgrade maintenance window from components being restarted, not because of the content of the update itself.

Hope that helps,

The last burst of issues we have seen were around 2017-07-31 22:12 Pacific Time which doesn’t correspond to any prod rollout (to my best knowledge). That is also the time window I sampled domains from.

Adding some details to the mbwalas report, as the problem is ongoing.
107/3087 ~ 3% of new-cert requests (from the last 8 days) we made for subdomains of failed with “500 urn:acme:error:serverInternal: Error creating new cert”. On the other hand, for all other domains we had only 0.05% such errors for new-cert request. The problem is ongoing for about 1 month, the errors appear regularly, with peaks of ~5 errors in a row almost every day. We haven’t observed any specific time pattern apart from that.

Thanks for adding more detail.

I’ll raise this internally for more digging.

We too are seeing the error: “500 urn:acme:error:serverInternal: Error creating new cert”.

We are reliably producing this error (100% of attempts fail for a specific list of domains) every 5 minutes (our back-off re-try period).

What information can I provide to help debug?

If you could provide the list of domains that reliably fail, that would be helpful. We’ve dug into the problem a bit and are pretty sure it’s related to a slow database query in our rate limiting code, some of which changed recently. But we haven’t nailed down exactly why the query is slow. The list of domains would help.

@jsha Thank you for your response. Here’s the list. I notice it has a lot of TLDs. I’m not sure if LE’s database design is impacted by that.

Please let me know if I can assist in any other way. I’m happy to look at Boulder logs/source or whatever else. Thanks!

1 Like

We’re happy to let you know that we haven’t observed this issue since Aug 10, 10:18 PDT. This coincides with the last week’s planned Boulder push, so I guess the fix must have gone in with the new release.

Do you have any more context on what could have caused these problems? I couldn’t find any obvious fix in the changelog

Hi @stanwise,

I’m happy to hear the problem hasn’t resurfaced for you.

This was related to a new approach to calculating an existing rate limit that was introduced in master with 71f8ae. We were able to cross reference the information you provided with when this feature was enabled in production and identified that it interacted poorly with certain issuance patterns.

Since this code was feature-flag gated per our usual practice we disabled the feature flag as a configuration change which is why you aren’t able to see a fix in the changelog. As you observed this was done on Aug 10th :slight_smile: See this API announcement post for more.

At this point I believe we intend to abandon the approach in master and will revisit with a more performant solution involving a database migration in the future when we have the resources on both the dev and ops side available.

Hope that helps clarify!

1 Like

2 posts were split to a new topic: Consistent 500’s for new-cert (failing CAA for one domain)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.