"SERVFAIL looking up CAA" but the usual culprits don't seem to apply

Thank you for your help! :pray:

In addition to the standard info below:

I've checked for DNSSEC (not enabled on this domain) and for a CAA error (I can't reproduce the servfail using dig). I don't see any errors when I pass this domain to the unbound DNS checker.

I'm attempting these renewals for several expired subdomains. Occasionally one goes through. That sounds like an intermittent DNS problem, however I can't reproduce any DNS errors at all with dig. I appreciate that there is probably some kind of DNS failure here, but I need to be able to reproduce it in order to give the client an idea of what they need to fix.

I use a similar setup for many other domains without problems.

The domain appears to be hosted with Amazon Route 53 (I don't run their DNS).


Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

The affected domain is:

humancare.preprod.fcb.io

I ran this command:

certbot -n renew --cert-name humancare.preprod.fcb.io

It produced this output:

Certbot failed to authenticate some domains (authenticator: webroot). The Certificate Authority reported these problems:
  Domain: humancare.preprod.fcb.io
  Type:   dns
  Detail: DNS problem: SERVFAIL looking up CAA for humancare.preprod.fcb.io - the domain's nameservers may be malfunctioning

My web server is (include version):

nginx 1.18.0 (Ubuntu)

The operating system my web server runs on is (include version):

Ubuntu 22.04

My hosting provider, if applicable, is:

AWS

I can login to a root shell on my machine (yes or no, or I don't know):

Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):

I'm running it via the container:

podman run -it --rm certbot/certbot --version
certbot 2.8.0

(I've now hit the failed renewals limit for the hour for the affected hosts so any further testing on my part will have to wait for that to time out.)

1 Like

Something seems wrong about the DNS delegation, and and some servers are returning "REFUSED" (presumably because they don't think they're the authoritative server?)

https://dnsviz.net/d/humancare.preprod.fcb.io/dnssec/

Main things are:

  • io to fcb.io: The following NS name(s) were found in the authoritative NS RRset, but not in the delegation NS RRset (i.e., in the io zone): ns-1190.awsdns-20.org, ns-562.awsdns-06.net, ns-1820.awsdns-35.co.uk, ns-507.awsdns-63.com
  • io to fcb.io: The following NS name(s) were found in the delegation NS RRset (i.e., in the io zone), but not in the authoritative NS RRset: ns-330.awsdns-41.com, ns-729.awsdns-27.net, ns-1463.awsdns-54.org, ns-1594.awsdns-07.co.uk
  • fcb.io/DNSKEY: The response had an invalid RCODE (REFUSED).
  • humancare.preprod.fcb.io/A: A query for humancare.preprod.fcb.io results in a NOERROR response, while a query for its ancestor, preprod.fcb.io, returns a name error (NXDOMAIN), which indicates that subdomains of preprod.fcb.io, including humancare.preprod.fcb.io, don't exist.

And if you specifically ask for CAA in DNSViz's advanced options:

  • humancare.preprod.fcb.io/CAA: The response had an invalid RCODE (REFUSED).

You should use the staging environment for testing until you've got everything figured out.

3 Likes

Thanks! Do you think this is mainly about the delegation issue? I'm wondering if this is a more general Route 53 problem (tough for my client to fix that) or something they may have specifically done wrong and could fix?

2 Likes

I'd try correcting the delegation issue (making sure that the NS records returned from the .io servers match the ones returned from the fcb.io ones) first, before digging further, yeah.

I don't understand why unboundtest is working, or honestly why you'd get as far as CAA checking rather than failing with the attempt to validate, though. I'm just repeating what DNSViz is saying.

3 Likes

Also, you might want to find out if they're using any of the fancier Route 53 features (geolocation routing or latency routing or the like). Just because if users in different places get different results, that can make troubleshooting more challenging.

3 Likes

Thank you! I appreciate the advice.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.