During secondary validation: DNS problem: SERVFAIL looking up A for pb.ev4.org - the domain's nameservers may be malfunctioning

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. https://crt.sh/?q=example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:
pb.ev4.org

I ran this command:

certbot certonly -d pb.ev4.org --standalone

It produced this output:
Failed authorization procedure. pb.ev4.org (http-01): urn:ietf:params:acme:error:dns :: During secondary validation: DNS problem: SERVFAIL looking up A for pb.ev4.org - the domain's nameservers may be malfunctioning

My web server is (include version):
N/A

The operating system my web server runs on is (include version):
Debian 10

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don't know):
yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot 0.31.0

There is intentionally no A record, this is an ipv6-only host as legacy ipv4 is a costly extra which i have no need for.
Generating a certificate with the test server works successfully with the following command:

certbot certonly -d pb.ev4.org --test-cert --standalone

Your web server refuses HTTP connection. It is strange that the validation worked even with the test ACME server.

I am using the --standalone option of certbot, which should cause certbot to create an http listener for the duration of validation, and then terminate the listener afterwards... That's why it works with the test server, should work with the live server and won't work with your test because certbot isn't running while you're testing.

The problem (at this point) is entirely DNS related.

There is an AAAA record and a CAA record for the hostname in question, there is no A record...
The AAAA and CAA records resolve just fine via public DNS servers.
Does it explicitly need an A record to exist, despite the fact that this server has nothing for it to point to? If i point it somewhere invalid, would that not break other things?

No.

Probably not as far as LE is concerned.
When IPv6 is present, LE will prefer IPv6 and will NOT fallback to IPv4.
So it will basically ignore the IPv4 address completely when an IPv6 address is seen.

Then why would it be failing on the non existence of an A record when a valid and working AAAA record exists?

It is NOT failing because there is no A record.

It is failing because DNS returned SERVFAIL during secondary validation (no records were returned).

this is most likely due to some GeoLocation blocking to your DNS servers.

There's no geoblocking of any kind on any of the dns servers, and the servers themselves are spread out (asia and europe)...
The servers will not return any records if you query them for type=A because it doesn't have any such records. The queries for type=CAA seem to go through fine too...

I completely understand why it should be working.
We are missing the reason(s) why it fails to be working.
We know that LE passed the primary validation.
We know it failed during secondary validation.
We know that it failed due to a DNS SERVFAIL reason (during the secondary validation stage).
We know CAA does not appear to be part of the problem.
We know your five authoritative servers are geo dispersed.
We now know that they are not employing any geolocation blocking.
...
hmm...
...
This leaves strange DNS behaviors when DNS replies via UDP exceed 512 bytes but are not properly switched to TCP and instead truncated and only part is returned.
I do NOT know that is the case here but it does seem to point in that general direction.
Being that all five nameservers are from the same TLD (".net") - that could be improved upon.

So where do we go from here...
I would start with simulating a secondary DNS validation and request on that FQDN to see what exactly is returned from your DNS servers.

Hi @bert64

your configuration is buggy, but I don't understand it.

See the result - https://check-your-website.server-daten.de/?q=pb.ev4.org

Host Type IP-Address is auth. ∑ Queries ∑ Timeout
pb.ev4.org A yes 1 0
AAAA 2001:bd0:100::dead Braintree/England/United Kingdom (GB) - Nitrex yes
www.pb.ev4.org Name Error yes 1 0
*.ev4.org A 127.0.0.1 yes
AAAA yes
CNAME yes
*.pb.ev4.org A Name Error yes
AAAA Name Error yes
CNAME Name Error yes

*.ev4.org has an A record, but pb.ev4.org has an empty A record. Is there an empty record? A-records with empty values may be a problem.

And normally, if the wildcard *.ev4.org has an A-record, *.pb.ev4.org should have the same. But that may be a bug (in my tool).

All of that doesn't explain the difference between the first and the second validation.

Welcome to the Let's Encrypt Community, Bert :slightly_smiling_face:

There have been a number of secondary validation failures of late, but I was under the impression from the Let's Encrypt staff that this issue was resolved for the most part. How long has this issue been happening for you? The known issue relates to the time of attempting to acquire a certificate (~ UTC 0) and overloading the secondary validation servers.

@lestaff

Another 2vf case.

Let's Encrypt production can at least occasionally resolve the domain without failure: https://acme-v02.api.letsencrypt.org/acme/authz-v3/7844331326

That suggests that the problem only affects subset of your nameservers, or it is a networking issue.

This is somewhat of a shot in the dark, but I would try removing the CNAME on ns3.ev6.net and just using A/AAAA records so that it matches the glue from the gTLD nameservers. Although this is probably a valid thing to do, Unbound might be tripping up on it.

2 Likes

There is domain pb.ev4.org, since there is an AAAA record available for it. For a query for that domain with record type A, even if there is no such record, the answer cannot be NXDOMAIN since the domain does exist. The only appropriate answer for the query that the size of the answer section is equal to zero. That is a normal situation.

1 Like

Interesting, i forgot there was a wildcard record. I have now removed this, and will wait for dns caches to expire before trying again...