Certbot and multiple/fail-over ACME servers

Good day,

I have a fun setup where we are hitting some of the rate limits for BuyPass and LetsEncrypt, but not big enough to request rate limit lifting (still just PoC) but we have some spurious peaks that make us hit the limits, and the solution so far had been to switch the failing certificates/domains to the other CA... until it fails again.

So my request is for the addition of multiple ACME servers to certbot, that will (both at creation and renewal) first try the preferred ACME server, and when that fails to try the next, and then next before erring.

2 Likes

You should file this as feature request against Certbot itself:

I've had issues with "cascades" creating ratelimit issues myself. It is infuriating. I ended up writing my own client that could queue these for processing and throttle ACME communication based on usage.

3 Likes

Interesting idea.

I wonder if you could swing this by having two cron alternating cron jobs with different --server flags.

If there was a certificate pending renewal and one CA was unwilling to renew a certificate, you'd get an attempt on the other one.

3 Likes

You might also be able to use a post hook to mark a ratelimited server, and then crontab a script that checks/cleanups the semaphores and invokes certbot against the currently valid server.

2 Likes

Just wanting to note that this is actually one of the features of Caddy: Automatic HTTPS — Caddy Documentation

(I understand that you would like to see this as a feature to more acme clients, like certbot, and I personally fully support that idea, as long as it's configurable)

4 Likes

Is this for a bespoke hosting platform? Sounds like you must have a lot of domains/hosts happening and I wondered what the scenario was.

I'm also planning to add this to Certify The Web (well, I say that, it's been planned for years and it's not been added yet due to lack of demand). Certify is a commercial certificate management server product, and I'm always interested in what problems people are trying to solve with the tools they have available to see if we can make it easier. It supports multiple CAs (you can choose per certificate) but it doesn't yet fallback to a backup CA.

It's such as a simple idea though and I agree it would be a great feature for most acme clients. As already mentioned Caddy is the only things I can think of that does that.

2 Likes

Blockquote

OK, the documentation pointed me here though ;(

1 Like

:wink: I know
but my need/case is for things thats NOT behind Caddy, while Caddy is also getting issueing certificates in that same domain, so the Caddy "succeeds, but then the certbot "specials" failed ;(

IoT related, as I mentioned "PoC phase", but are getting similar (or near similar) limits for another clients dev environments (needs to load the certs into firewalls for IPS and/or internal/local IP environments)

Are you hitting rate limits due to duplicates or too many subdomains under one domain? The 50 certificate per week limit for Let's Encrypt is indeed pretty low if you are trying to get a cert for every device and have hundreds of devices.

2 Likes

more details:
Too many certificates THAT week (Caddy typically doesn’t do multiple SANs, but single domain certs using TLS-01 authentication, and that typically breaks things for a device that (for various reasons) needs 4 or 5 domains, and I just added 20 devices like that (Okay the caddy case was “fixed” when they introduced multiple ACME servers) but now the devices that need internal certificates too (Read <…> Chrome forcing HTTPS only) can’t get those as the caddy sucked the limits dry and I need to try another ACME.

During this PoC, things averaged out over time with renewals getting staggered/randomized, but the problem is that initial loadings during the PoC - will revisit/request/buy special cert as and when we go production, but the PoC would be great to have that, for cases like the main provider aren't available to get a certificate from the 2nd/3rd provider to keep the certificates valid.

My other case are similar, want and new set of dev/test environments where the client's stuff has multiple subdomains, so spinning up the 5-6 new domains with bunches of certificates on the same domain, quickly sucks that dry too, and those are certificates thats either Caddy (read single (sub)domain certificates no SAN) that quickly hits/nears the limits, and then the internal ones are stuck

I see, does the device have different domains in order to map to different services on the same port? If services are on different ports they could all use the same cert (for different things).

You could have a subdomain per device so that you can have *.device-01.domain.com instead of svc1-device-01.domain.com,svc2-device-01.domain.com,svc3-device-01.domain.com. Wildcards require DNS validation which is (potentially) tricker but not impossible as long as you control the domain (or subdomain) DNS.

2 Likes

the multiplexing I go through is beyond this discussion, safe to say: I need each distinct service on it's own http/https:// without port numbers and as the "root" / as anything else confuses the javascript and remote embedded webservers ;(

Wildcards wasn't feasible given caddy, so even if I apply a wildcard to the device's internal stuff, the caddy already sucked things dry as it doesn't do wildcards and need to already do some fun at the ingress point which wasn't doable with wildcards as we had to specify the various endpoints/reverse forwarding per URL/hostname

1 Like

So my client is probably not right for you, but it may help:

The client is part of webserver based Certificate Manager, backed by postgres or sqlite. You can script the entire acme-order process with API requests, or query for existing certs that already suit your needs. It ships with an openresty library to do dynamic cert loading (and autocert) via a four tiered failover cache (nginx worker, nginx main, cluster redis cache, actual python server). It's designed for handling situations that involve infinitely scalable domains or servers, and lots of logging/debugging for troubleshooting.

The automated renwals aren't working correctly right now, because the design changed a lot during the AcmeV2 migration, and we had been doing all of that work within the api client.

3 Likes

It did? Which documentation? Perhaps we should update it.

2 Likes

That how I read:
https://certbot.eff.org/docs/resources.html

That's quite a generic bunch of links. It just says "Community" with a link to this forum. I might be mistaken, but I don't see a "Please open a thread in the Community for feature requests", while I agree this section on the Community actually does say this section is the right place for certbot feature requests. But I didn't see that as such in the link you've provided.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.