DNS-01 challenge fails since unbound 1.18. TXT records can be fetch using unbound 1.16 but not 1.18 1.19

@jcjones @mcpherrinm It looks like notifying @lestaff doesn't work anymore, so I'm bugging you directly. :slight_smile:

We've had several threads now where it looks like there was some sort of regression in Unbound (either the updated version or in configuration) for the use case of multiple (20+) domains using DNS-01 where the challenge record for all of those domains is CNAME'd to one single record which is populated with all the TXT entries for all of them.

Not a particularly common configuration, no, but it is described as a standard way for acme.sh's alias mode for a multiple-SAN certificate, so it might be something that others are trying too. (And I think it should be working.)

People have theorized that it's related to the new default Unbound setting for max-udp-size, though I don't fully understand myself how that'd be related when it should be switching to TCP well before that point.

4 Likes

I did not see a swich to TCP in firewall logs. I also thought this will be happend but it seems not.

2 Likes

Thanks for doing so!

As mentioned above, we don't use the default edns-buffer-size because of IP fragmentation attacks against DNS, so that gives me pause to just adjust it back immediately. I/we need to run back through that paper and make sure we know what we're doing.

However, we need to get an unboundtest.com log into the Unbound issues list for them to take a look at this, pointing particularly at DNS-01 challenge fails since unbound 1.18. TXT records can be fetch using unbound 1.16 but not 1.18 1.19 - #12 by JonhBonJob I think . @JonhBonJob since you have the most context, can you open that bug report at Unbound?

Meanwhile, we're now discussing this internally, too, and we'll be happy to weigh in on the Unbound issue.

4 Likes

I've updated Staging with the increased max-udp-size, and I've tested that new configuration as returning the TXT records correctly:

Query results for TXT _acme-challenge.abhtest.cloudengine.mercedes-benz.com

Response:
;; opcode: QUERY, status: NOERROR, id: 236
;; flags: qr rd ra; QUERY: 1, ANSWER: 21, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 512

;; QUESTION SECTION:
;_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	IN	 TXT

;; ANSWER SECTION:
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"PeTIYmnDU2cyq-l_VljNIYO7tRjdWI5yzpexoDP3Z1U"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"Q1QrrSEEInPrag2g7y4_EH"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"Q1QrrSEEInPrag2g7y4_EH-GTcUmL8XlcWv6SqDdCsE"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"QSadJSxPUioThP2XHNH1aXvJKEjyPbkttdINObZZGfA"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"YTJxX-cdB5bXJQ2oR03rhN1Au1BZFZS955DrnhDbOBI"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"ex76rM--NrtcwTlx1rqpxtsk_0fv4oSEVcfxjiqm8VQ"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"jakxcJHi_sAFnE64fjyVh1fhPk3SLOLfIrNssr5YX6Q"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"qAacYHuljj2mkA82MkYEUbACRVcWkLYNkUU8lwrLHAY"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"rbV3iZeujOvzfsl7Vpj9vM0L0CMoPaPLzXHb0yM3DB4"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"yfu8J7zX1TqTtBau9Mdm7aBui3Sba8BlG5XYCjMOWkw"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"3O6-HP0qVo03wno7w7dLPuSCDfZKBXJM1nNbQYY-Y1Q"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"6dWY3tFbyPWIgkL46ok1TE63UFqGnfQzaRdd7a1YPUI"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"A4EGy0uFH_79sdkVkMJhT9_U4Ltkf0-6Uoqup-SLI70"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"AYuYR7_0pTjkDaHa5-vjhsMDWDvGp7ZgJP2HqzWzXSA"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"C6CtkcfTd21q5m2FB-QH0vbArg_g0QNQaeIUHIc91vI"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"CbwkUnUZz6-plPNxhpkCSXbCZDt87gMf-JED4pIUv0E"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"DeJjJ9cImQwPPFUMsI39UBFVtj2Fsn6L9uGx4m8qV5A"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"EHQYOqAeQgjOEpYljeyOvHTTKFc2XvLRxz3L3t0GpJg"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"Mo4Jj06GlYbnTpSoGrW9hfUxQmwaACajaAjVQNoDBq0"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"NxTMWNG8vB3_sqBFl-hYLAqNfgYF-CG3EsOXYtNYiTY"
_acme-challenge.abhtest.cloudengine.mercedes-benz.com.	60	IN	TXT	"PbJXwoJr8Xmneky7VBgOf-WfVaQFuu50AaJB-R-u4PY"

;; AUTHORITY SECTION:
cloudengine.mercedes-benz.com.	60	IN	NS	ns-1814.awsdns-34.co.uk.
cloudengine.mercedes-benz.com.	60	IN	NS	ns-603.awsdns-11.net.
cloudengine.mercedes-benz.com.	60	IN	NS	ns-63.awsdns-07.com.
cloudengine.mercedes-benz.com.	60	IN	NS	ns-1193.awsdns-21.org.

@JonhBonJob, are you able to try using our Staging Environment to evaluate if this works for you?

(Note: I don't have a timeline for deploying this to production yet.)

4 Likes

That is promising. But, what is the maximum number of records the new config works for?

We've been seeing reports of around 70-80 were working but having to trim down to around 20 to get it to work. Would 100 TXT records work to match the 100 name limit in SANs?

2 Likes

That should be the target goal.
That said, I'm not sure 100 FQDNs is an exact length.
To that end...
The only thing I can be certain of is the limit of the size for an FQDN: 255 octets.
So... How big would that packet need to be?
[carrying 100 names each 255 octets long]

2 Likes

I'm not sure of all the technical constraints. Was hoping to get a more formal definition of what should work so as to better advise people. If the limit is based partly on the length of the fqdn along with the TXT validation data that's fine.

In the docs, I only saw a limited comment about deleting old TXT records so as not to get "too large".

It came up in one of the threads caught by this recent change. Would just be nice to have more detailed description of what is "too large" :slight_smile:

2 Likes

Maybe we could make the ACME authentication process "work smarter".
Like: Have it request authentications one name at a time and, as each name passes, it continues until all names have passed OR it reaches a limit of 100 and the server refuses to process any more requests [for that one cert].

OR chunk them in groups of no more than 10.

[more cereal - less parallel(universes)]

2 Likes

That would be nice.

Would be easier just to advise people not to CNAME lots of different names to the same _acme-challenge DNS name :slight_smile: (lots TBD)

1 Like

Isn't that already what it's doing (the ACME server validator)? The problem is that each name resolves separately to the same set of many TXT records because they're all CNAME'd to the same target. So this practice is doubly bad because the validator not only has to resolve huge responses. It has to do it over and over again for each name in the cert.

2 Likes

Not if each TXT record was added, verified, and immediately removed [before moving to next TXT record].
They could all be from the same target - just one at a time.

3 Likes

Ah, by "it" you meant the client. I thought you were talking about the server. Yes, the client could serially validate each name rather than create all the records and then ask the server to verify them all at once. Some clients even support doing this explicitly (because certain lame DNS providers only allow for a single value to exist at a time).

3 Likes

Hello everyone.
We are also affected by this issue. It started with the upgrade of Unbound. The max-udp size seems to be the problem. I have 67 FQDNs in a single certificate, and more than half of them are wildcards, all pointing to the same zone/record for verification. Got a blog post about how we are handling it. Anyone curious, feel free to contact me.

@petercooperjr When Let's Encrypt attempts to verify, it also utilizes DNS, hence using UDP. Do you have the maximum UDP size here, I believe yes, The Unbound.

We encounter this issue when the FQDN count exceeds 10-12; I haven't tested it with a higher number.
Our (redacted) script to issue certificates:

acme.sh --issue -d your_domain.com --challenge-alias redacted-domain-validator.com --dns dns_cf -d "*.your_domain.com" -d "*.sub.your_domain.com" -d "*.domain-to-issue-ssl.com" -d "*.domain-to-issue-ssl2.com" -d "*.my-special-website.com" -d "*.another-domain.com" -d "*.us-east1.domain-to-issue-ssl.com" -d "*.us-east2.domain-to-issue-ssl.com" -d "*.eu-stage1.domain-to-issue-ssl.com" -d "*.eu-dev1.domain-to-issue-ssl.com" -d "*.prod.domain-to-issue-ssl.com" ....more-fqdns..... --server letsencrypt --preferred-chain "ISRG Root X1" --keylength 2048

Our temporary workaround involves running the script 7 times. We issue 10-12 domains with each command, and on the 7th command, we include all domains to skip verifications. As a result, we obtain a certificate with 67 FQDNs included.
It would be great to have this issue resolved by reverting the max UDP size to its previous value.

1 Like

It would also be great if the ACME client [or the ACME spec itself] would cover such unexpected problems and throttle down the requests [as you did] until they verify and loop there until all names have been verified and then request a single cert with all the names on it.

I guess that's work for [much] later...
Right now, finding the max UDP size seems to be the biggest hammer in the room.

2 Likes

DNS should fallback to TCP when dealing with responses too large for UPD.. Not sure why this isn't the case here?

2 Likes

In theory yes. I don't know the Unbound implementation, but it's not happening on this occasion. That might also be an option (enable/disable) with Unbound.

1 Like

Yeah, I wonder if somehow there's a UDP packet too big, and Unbound sees it as a "packet too big" instead of retrying with TCP like it should?

Honestly I'm curious if Let's Encrypt just configured Unbound to always and only use TCP, if it'd up being less net traffic for them because it wouldn't need to try UDP first and then switch to TCP, even with the overhead for the cases where responses are small enough to be handled by UDP needing to have a TCP handshake in there instead. (Though there are probably broken DNS servers out there with only UDP support which are managing to validate for now…)

3 Likes

Production is now running with the increased max-udp-size.

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.