Thanks for your help.
I’ll decrease the batch size to 50. Not that many customers actually have that much certs so the overall number shouldn’t grow too much.
We use https://hitch-tls.org to terminate tls. They actually claim to perform with 500.000 certs but we never could verify this number in our tests. It sill performed better then every other software. The reloads with new certs are relatively graceful but still take a while and we always see small hiccups with client connections during it and quite a bit of memory usage.
We validate all domains hourly (until successfully validated, then daily) and batch them only on success. Maybe i can add a forced extra check just before cert issuing.
If the smaller batch size doesen’t help i will try the ratelimit adjustment request.