CAA exception notifications don't mention the failing domain

I have 15+ domains using letsencrypt certs. I just got a “Let’s Encrypt CAA Exceptions” email telling me that I have a certificate for a domain that’s failing CAA checks.

A naive check of all my domains shows that they’re all responding with happy NOERRORs to CAA requests, so nothing should be failing. (I’m guessing a DNSSEC issue is the underlying problem, but shrug). I’ve checked a couple against the staging server with no problems.

Without knowing which domain is failing a CAA check, though, it’s much harder to diagnose. That’d be a great thing to add (and if it could include which nameserver it queried and what response it got that’d be even better).

1 Like

How much did you test? People have had DNS servers where "example.com." works but "ExAmPlE.CoM." has an invalid DNSSEC signature, or a DNS provider that seems to deploy problematic DDoS scrubbing equipment on a regional basis. The issue could be subtle. :sweat:

This doesn't help. Sorry. :sweat:

Edit: I hit submit before i finished my thought.

You can run your domains and hostnames through https://unboundtest.com/, which runs a resolver with a similar configuration to the Let's Encrypt validator.

If you're using Certbot, you can use "certbot renew --dry-run" to test renewing all of your certificates against the staging environment.

It would definitely be simpler if the information was in the email. :sweat:

Heya,

Sorry for the lack of detail in the notification email, that was an oversight on our part. As @mnordhoff says, the best way to check is through https://unboundtest.com/. There’s some additional information at https://letsencrypt.org/docs/caa/. If you’re still not able to find the failing domain with that tool, let me know and I’ll see what else we can do to help debug.

Yup, got all that. Thanks!

I’ve no doubt it’s a DNSSEC issue for one of the domains. I’m going to check that on all my servers to find which one is failing (no big deal, “DNS is a core competency” here), and I’m pretty sure that’ll fix it.

This is just a feature request for the notification mails, to make it simpler for everyone else.

2 Likes

Yep, we’re already talking about how to improve our processes to make sure we always include affected domain names in emails we send. Thanks for the feature request!

2 Likes

We manage thousands of domains with LE - the emails do not have the domain name making it impossible to trace. Is it possible to get the emails resent with the domains?

I have to second a request for an improved email to be reset, because we manage several hundred also, and we cannot reasonably check the CAA of each one manually.

I’ve just been sent 4 emails from Let’s Encrypt with the subject “Let’s Encrypt CAA Exceptions”, but they are all identical. There is nothing in them to say which domain is causing a problem with CAA failures.

We host hundreds of websites, and therefor are at the mercy of LOTS of NS that may have an issue. Is there any reason why these emails would not contain the URL in question, or is there a way to find out? Short of writing an application to query every NS for the URLs we host, this seems to be unhelpful.

@jsha Thanks for the update on your process surrounding affected domain names.

Is there any way you can resend those emails with the updated process?

The service I’m administering currently has over 1300 valid certificates for a large variety of domains. Putting them all into https://unboundtest.com/ seems hard unless they have an API.

Running a bunch of dig queries is probably doable if that will achieve the same. Is there a dig query that would check a domain in the way that unboundtest does?

dig @8.8.8.8 example.com type257

If that gives a servfail, it’s a domain with a problem, likely with dnssec or caa support. If it gives a noerror, it probably isn’t.

Replace 8.8.8.8 with any recursive resolver that checks dnssec, and “type257” with “caa” if you have a dig new enough to know about CAA.

This is a very simple test and there are corner cases (inconsistent behaviour across auth servers, for instance) that’ll break it, but it worked well enough for me.

1 Like

The big thing that misses is that Let’s Encrypt (and correspondingly https://unboundtest.com/) make more extensive use of 0x20 (case) randomization than Google Public DNS.

Google should usually be case-preserving, though, so you can simulate randomization by typing “eXample.cOm” or whatever.

There are sure to be corner cases, as you said, but any resolver with capitalization and DNSSEC validation will probably catch most issues.

You can download the https://unboundtest.com/ configuration and run Unbound yourself for the closest match.

Hi @voutasaurus,

I’ve updated CAA SERVFAIL Changes (comment) with a list of the affected domains. Can you cross-reference with your own list of domains?

Thanks,
Jacob

2 Likes

It looks like there are only two domains I have to worry about and they’re both Namebright. I hope you manage to get in touch with Namebright. I passed a message on to Namebright myself and they said that they were going likely going to move to a NotImp response “soon”.

What is Let’s Encrypt’s policy on DNS providers responding with NotImp to CAA requests?

NOTIMP is Namebright's current problematic behavior.

Do you mean NOERROR?

I was getting SERVFAIL from them. [edit: I’m also currently getting SERVFAIL from them]

If NOTIMP is not acceptable to Let's Encrypt then that might be a problem given what Namebright support told me they were going to do. I.e. they're moving from SERVFAIL to NOTIMP and then eventually once they fully support CAA it will move to NOERROR.

This is what they told me:

Our DNS servers can't even parse CAA record requests because the spec is so new, along with probably 95% of the public DNS servers on the internet.

Hmmmm.

Namebright's DNS servers do an assortment of invalid and incorrect things. Sometimes that includes returning NOTIMP.

For example: http://dnsviz.net/d/sixclothing.com/WYn9Vg/dnssec/

There are two different matters: What Namebright's authoritative nameservers return, and what your (or Let's Encrypt's) recursive nameserver returns.

I don't think Namebright ever returns SERVFAIL.

When a recursive nameserver encounters a variety of error conditions (invalid DNSSEC, authoritative nameserver is down, NOTIMP, REFUSED, &c), the recursive nameserver will return SERVFAIL to the client.

When the Let's Encrypt validation server, or your local resolver, returns SERVFAIL to you, it's usually not because that's literally what the authoritative server said.

Unpersuasive. There's nothing, as far as I know, difficult about "parsing" an unrecognized record type. It's just a number. A reasonable nameserver will handle it properly, usually with a valid "no such record" response.

On top of that, there is a widely supported standard for making nameservers not only interpret but serve responses to record types that were not explicitly supported and may not even have existed when the server was written! The SSLMate CAA generator demonstrates how to use it, for example.

Also, the "new" spec was published in 2013. January 2013. The first draft was published in 2010, though it was quite different at that time, and the numbers were probably assigned in 2012-2013.

I won't bet my hat on this, but you could quite possibly deploy a BIND beta from literally 15 years ago -- a decade before the CAA RFC was published -- and use CAA via RFC 3597. (I honestly have no idea when BIND first implemented it, but 15 years ago is possible.)

Also also, that doesn't explain Namebright's numerous other protocol violations.

Also also also, "probably 95% of the public DNS servers on the internet" sounds... well.

Edit: I haven't tried to compile it, but BIND 9.2.1 (April 2002) includes RFC 3597's predecessor draft in its documentation, and the changelog seems to suggest that draft -00 was implemented in 9.1.0b1 (December 2000 probably). If anyone's feeling adventurous...

2 Likes

Currently every time I’ve checked, the Namebright nameservers return NOTIMP, which recursive nameservers (e.g. Google’s DNS server - 8.8.8.8) returns as SERVFAIL.

This will indeed be a problem, and isn’t even a change from what they’re doing right now.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.