Taefik/Letsencrypt too many orders recently

In my traefik/letsencrypt setup which runs on a bunch of Raspberry Pis (a docker swarm cluster) and worked fine for quite some time, traefik without any changes started returning its self-signed default certificate. In the traefik log I see the "too many orders recently" errors - please see below.

I am a bit puzzled because I use a specific version of traefik (not the latest) - so it can't be because of traefik update.

I have checked my acme.json - there are certificates for my 11 domains/subdomains - I guess the number is low enough to not hit the rateLimit.

Any ideas what could it be and how to fix that? Thanks a lot!

management_traefik.1.a8j7sm99e1hn@rpi4    | time="2022-01-16T14:42:33Z" level=error msg="Unable to obtain ACME certificate for domains \"foo.bar.com\": unable to generate a certificate for the domains [foo.bar.com]: acme: error: 429 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rateLimited :: Error creating new order :: too many new orders recently: see https://letsencrypt.org/docs/rate-limits/" providerName=certresolver.acme routerName=r-fooapps@docker rule="Host(`foo.bar.com`) && (Path(`/`, `/favicon.ico`) || PathPrefix(`/static`, `/foo`, `/bar`, `/about`, `/contact`, `/privacy`, `/disclaimer`, `/termsandconditions`))"
1 Like

The rate limit is quite specific:

You can create a maximum of 300 New Orders per account per 3 hours.

So it seems something is generating 300+ new orders per 3 hours, which is A LOT.. And that for just at most 11 websites? Sounds like a buggy ACME client to me..

4 Likes

@Osiris Thanks for your comment - I use traefik, I think it fails somewhere then retries until retry count expires - would that explain too many orders? It is still 11 sub/domains

2 Likes

Could be. But I'm not familiar with traefik. And 300 new orders per 3 hours is quite a lot, so your traefik would be rampaging through the retries.. That's very unwanted behaviour. Heck, if an ACME client really does stupid things, Let's Encrypt could ban the entire IP address.

Best you check out what traefik is doing.

4 Likes

As @Osiris said, "300 new orders per 3 hours is quite a lot, so your traefik would be rampaging through the retries.". You should look deeper through your logs and see what errors pop up BEFORE that error. It might be easiest to process the log files and omit all the lines with that exact ratelimit message. That should give you a clue on what is messed up.

5 Likes

@jvanasco Thanks for your comment

I probably had inconsistencies in my traefik configuration which made it behave erratically towards LE - I think I fixed that - no "too many orders" anymore (at least on LE staging), now struggling with the errors below - could that be that my domain is banned or something on LE?

acme: error presenting token: timeout

time="2022-01-18T21:11:17Z" level=error msg="Error renewing certificate from LE: {xxxxx.com []}, error: one or more domains had a problem:\n[xxxxx.com] [xxxxx.com] acme: error presenting token: timeout 2022-01-18 21:11:17.153691811 +0000 UTC m=+8.153257192\n" providerName=certresolver.acme

acme: error: 400 :: urn:ietf:params:acme:error:tls :: remote error: tls: internal error

time="2022-01-18T21:21:55Z" level=error msg="Error renewing certificate from LE: {xxxxx.com []}, error: one or more domains had a problem:\n[xxxxx.com] acme: error: 400 :: urn:ietf:params:acme:error:tls :: remote error: tls: internal error\n" providerName=certresolver.acme
1 Like

I found a solution that worked - my setup runs in a Docker Swarm cluster, and I recreated from scratch the node where Traefik ran, specifically updated/upgraded the OS, completely purged/reinstalled docker - I think there was something with it, I saw somewhere people described similar issue and their conclusion was that the docker's ip tables were messed up. I am not sure it was the same in my case, but I saw the errors below in my logs - that might have had to do with docker's ip tables I guess. All works after the node re-creation - I hope this info will help other people too.

time="2022-01-20T08:05:41Z" level=error msg="close tcp [::]:443: use of closed network connection" entryPointName=ep-https
time="2022-01-20T08:05:41Z" level=error msg="close tcp [::]:8080: use of closed network connection" entryPointName=traefik
time="2022-01-20T08:05:41Z" level=error msg="close tcp [::]:80: use of closed network connection" entryPointName=ep-http
1 Like

Glad you fixed it. The errors did look like routing concerns.

Are you sharing certs across these devices or persisting them in the cloud? There aren't much details of your network setup / design above, but a common anti-pattern shared by people with automated systems is to have multiple nodes requesting duplicate certificates - which always evolves into a problem.

If you have any anti-patterns in use -- you may not, no idea -- all the people here can help you avoid them evolving into your next problem before the technical debt gets too insurmountable.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.