@jcjones@mcpherrinm It looks like notifying @lestaff doesn't work anymore, so I'm bugging you directly.
We've had several threads now where it looks like there was some sort of regression in Unbound (either the updated version or in configuration) for the use case of multiple (20+) domains using DNS-01 where the challenge record for all of those domains is CNAME'd to one single record which is populated with all the TXT entries for all of them.
Not a particularly common configuration, no, but it is described as a standard way for acme.sh's alias mode for a multiple-SAN certificate, so it might be something that others are trying too. (And I think it should be working.)
People have theorized that it's related to the new default Unbound setting for max-udp-size, though I don't fully understand myself how that'd be related when it should be switching to TCP well before that point.
That is promising. But, what is the maximum number of records the new config works for?
We've been seeing reports of around 70-80 were working but having to trim down to around 20 to get it to work. Would 100 TXT records work to match the 100 name limit in SANs?
That should be the target goal.
That said, I'm not sure 100 FQDNs is an exact length.
To that end...
The only thing I can be certain of is the limit of the size for an FQDN: 255 octets.
So... How big would that packet need to be?
[carrying 100 names each 255 octets long]
I'm not sure of all the technical constraints. Was hoping to get a more formal definition of what should work so as to better advise people. If the limit is based partly on the length of the fqdn along with the TXT validation data that's fine.
In the docs, I only saw a limited comment about deleting old TXT records so as not to get "too large".
It came up in one of the threads caught by this recent change. Would just be nice to have more detailed description of what is "too large"
Maybe we could make the ACME authentication process "work smarter".
Like: Have it request authentications one name at a time and, as each name passes, it continues until all names have passed OR it reaches a limit of 100 and the server refuses to process any more requests [for that one cert].
Isn't that already what it's doing (the ACME server validator)? The problem is that each name resolves separately to the same set of many TXT records because they're all CNAME'd to the same target. So this practice is doubly bad because the validator not only has to resolve huge responses. It has to do it over and over again for each name in the cert.
Not if each TXT record was added, verified, and immediately removed [before moving to next TXT record].
They could all be from the same target - just one at a time.
Ah, by "it" you meant the client. I thought you were talking about the server. Yes, the client could serially validate each name rather than create all the records and then ask the server to verify them all at once. Some clients even support doing this explicitly (because certain lame DNS providers only allow for a single value to exist at a time).
Hello everyone.
We are also affected by this issue. It started with the upgrade of Unbound. The max-udp size seems to be the problem. I have 67 FQDNs in a single certificate, and more than half of them are wildcards, all pointing to the same zone/record for verification. Got a blog post about how we are handling it. Anyone curious, feel free to contact me.
@petercooperjr When Let's Encrypt attempts to verify, it also utilizes DNS, hence using UDP. Do you have the maximum UDP size here, I believe yes, The Unbound.
We encounter this issue when the FQDN count exceeds 10-12; I haven't tested it with a higher number.
Our (redacted) script to issue certificates:
Our temporary workaround involves running the script 7 times. We issue 10-12 domains with each command, and on the 7th command, we include all domains to skip verifications. As a result, we obtain a certificate with 67 FQDNs included.
It would be great to have this issue resolved by reverting the max UDP size to its previous value.
It would also be great if the ACME client [or the ACME spec itself] would cover such unexpected problems and throttle down the requests [as you did] until they verify and loop there until all names have been verified and then request a single cert with all the names on it.
I guess that's work for [much] later...
Right now, finding the max UDP size seems to be the biggest hammer in the room.
In theory yes. I don't know the Unbound implementation, but it's not happening on this occasion. That might also be an option (enable/disable) with Unbound.
Yeah, I wonder if somehow there's a UDP packet too big, and Unbound sees it as a "packet too big" instead of retrying with TCP like it should?
Honestly I'm curious if Let's Encrypt just configured Unbound to always and only use TCP, if it'd up being less net traffic for them because it wouldn't need to try UDP first and then switch to TCP, even with the overhead for the cases where responses are small enough to be handled by UDP needing to have a TCP handshake in there instead. (Though there are probably broken DNS servers out there with only UDP support which are managing to validate for now…)