It produced this output: DNS problem: query timed out looking up A for flowm.daemon.contact; DNS problem: query timed out looking up AAAA for flowm.daemon.contact
My web server is (include version): n/a (we don't get that far)
The operating system my web server runs on is (include version): n/a
My hosting provider, if applicable, is: n/a
The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 1.22.0
I dont currently see what is wrong with the DNS. I can see DNS queries coming in and getting promptly answered.
I tried to test DNS with online tools but got the error "invalid domain name" (whatever that might mean). But Verisign and IPVOID say the DNSSEC is fine, all green.
Not sure if these errors are also the issue here, but they could be.. But what I did notice is how SLOW the analysis was! Painstakingly slow! I can't reproduce that slowness using dig +trace, but it makes me wonder if there are some nameservers in the chain which aren't very timely with their answers. Note that this doesn't have to be your DNS server, it can also be one of the .contact TLD nameservers perhaps.. I just don't know.
I see. They say the error is that there is no answer over UDP. But I I cannot find a place from where I could reproduce this.
From any place I try, I get the answers over UDP just fine. (Obviousely, when I ask for DNSKEY or other cipher stuff, I only get a Truncated message over UDP, because it doesnt fit into single packet size. AFAIK that is the correct behaviour; the client should then implicitely switch to TCP.)
You can find out: just do a geolocation on the nameservers.
The Warnings are interesting, because I didn't know that. I deliberately chose the most extensive algorithm and the most challenging location, because I want to see where and why it fails if it fails. And I want to see that now, and not in two years when another DNSSEC KSK gets added to the cycle and then things suddenly fail. Or such.
(You know the Internet was originally designed for rugged military use, not for superfast-cloudflare-streaming-video-crap. The standards still hold to that, but if iservices don't comply to it any longer, then there is a problem.)
One cannot rely on these webtools. Maybe these guys have not updated their software since the vanity-TLDs came into play; I don't know.
Yes and there shouldn't be any. Did You find one? Then that is a mistake of mine.
Question would be: does Let'sencrypt work with a pure CNAME record? (I think it should, because lots of webserver-virtual-hostname identities are just aliases to a single real machine name. And "flowm" should work in exactly that fashion.)
Unboundtest is developed by one of the developers of Let's Encrypt and part of the team developing the software used by Let's Encrypt (Boulder) It's supposed to mirror the behaviour of the Unbound-instances used by Boulder, although it's always possible slight differences might appear.
See the Unboundtest-link above, it clearly shows an A RR as wel as a CNAME RR.
Boulder does follow CNAME RRs, yes. Shouldn't be an issue.
Yeah, but that's the one that works and recognizes the domain, right?
And that link doesn't look like the usual web services, anyway. Looks rather like my system logs - I like it.
Ah, didn't know that. So we might have a chance to see the error failing the verification in these logs also, right?
Sorry, I don't find it. Not in the zonefile, and not in that link:
;; ANSWER SECTION:
flowm.daemon.contact. 0 IN CNAME flag.daemon.contact.
flag.daemon.contact. 0 IN A 188.8.131.52
"flowm" points to "flag", that's how it is intended.
I found a bunch of other possible issues.
I get a mass of AAAA requests - but IPv6 is not yet implemented on these public nameservers. OTOH, certbot runs in an infrastructure that does already use IPv6 and will likely connect outwards per IPv6.
Then the query for flowm.daemon.contact. IN AAAA is answered with a CNAME. Does it then correctly unravel to the A record from there?
I get a couple of queries for CAA records. I don't have these. Should I?
One of the nameserver machines is sometimes not receiving data from here. When I try to reach this community from there, I get filtered replies. There are no errors, only the content is removed from the webpages, just like with russian webpages.
Strangely this here is the only webpage that showed this problem, others do work normally.
Oh sorry, this is indeed misleading, yes, indeed.
It just happened to grow that way - the pole and the flag are real nodes, and flowm is the name of a software.
It does only partially look like that.
It definitely honors the TrunCated flag and retries with TCP, so this is probably not the issue. But only a few origins do actually ask for an A record (while all do ask for AAAA):
184.108.40.206 | pole | TCP | NOERROR | cd | FLOwM.daEMon.CoNTacT. IN A
220.127.116.11 | wand | TCP | NOERROR | cd | FLAG.DaeMoN.cOnTAct. IN A
18.104.22.168 -> did not ask for it
22.214.171.124 -> did not ask for it
126.96.36.199 -> did not ask for it
188.8.131.52 -> did not ask for it
184.108.40.206 | pole | TCP | NOERROR | cd | FlAG.DAEmOn.cOntacT. IN A
220.127.116.11 | wand | TCP | NOERROR | cd | flOWM.dAemon.coNTact. IN A
18.104.22.168 -> did not ask for it
22.214.171.124 | pole | TCP | NOERROR | cd | flAg.dAemON.ContACt. IN A
126.96.36.199 | wand | TCP | NOERROR | cd | Flowm.DaEMOn.CONtacT. IN A
It doesn't actually look like a "timeout" to me. In the log we can see for all queries at first a request appearing via UDP, getting TC reply, and then a request via TCP. For those servers not asking for an A record, we see nothing at all there. If there were indeed timeouts, they would appear somewhere in the process.
But these IP addresses did ask for an AAAA RR? Weird.. Personally, I don't know what's going on to be honest.. Maybe someone else does. If noone does, we might ask the LE staff for help, maybe they know something we don't.
Trying dig -t AAAA email@example.com I got 1 SERVFAIL then just NXDOMAIN, so not sure if I'm trying the wrong nameserver or not but an intermittent SERVFAIL sounds like a server needs restarted.
dig -t NS flowm.daemon.contact returns a CNAME, and I was expecting a nameserver or two.
Well, requesting flowm.daemon.contact\@pole.daemon.contact. IN AAAA is probably resulting in a NXDOMAIN for a lot of servers too Probably needs a space before the @?
When I "hammer" the servers for a little bit, most of the time I'm getting a response immediately, but often it's also a little bit slow in the order of multiple seconds (about 5). Not sure if that's long enough to cause a timeout though.
Okay. I do now see the logs from yesterday, they look the same.
I will now start a test suite. I would like to see what happens when I switch off the second nameserver. I would also like to see the reaction without DNSSEC, but then, my whole intranet is attached below this domain and will fall apart when I delete the RRSIG records, and it will become an elaborate training in desaster-recovery...
Hi. This looks like a syntax issue. (dig doesn't care about these and just produces the literal answer. That's the philosophy of the ISC people. )
Try this one: dig -t NS @pole.daemon.contact flowm.daemon.contact
This is AFAIK as it should be. As @Osiris mentioned further up, the CNAME should be the only record for the name. In this case, it returns the CNAME, and it returns the SOA (to be used for further queries).
In the staging environment the validation takes 11 seconds and returns successful.
In the production environment the validation does always take 31 seconds and reports DNS problem: query timed out ...
I tried switching off one of the nameservers. (Things are supposed to work nevertheless, nameservers are redundant only for failsafety, since they can fail.) No matter which one I switch off, the result is always the same: DNS problem: query timed out ...
I switched off DNSSEC entirely. That is, I quickly swapped in the unsigned raw zonefile, restarted the nameservers, ran the certbot renew, swapped the zonefiles back and restarted the nameservers.
Now the answers were much smaller, and queries did not need to repeat per TCP - OTOH this would look just like a MitM attack, and might/should fail for a couple of reasons. Anyway, the result was just the same: DNS problem: query timed out ...
Finally, I added the AAAA record for the name. This doesn't help either, the error is still DNS problem: query timed out looking up A for flowm.daemon.contact; DNS problem: query timed out looking up AAAA for flowm.daemon.contact
I changed the CNAME into an A record. Now the error message changes: DNS problem: query timed out looking up CAA for flowm.daemon.contact
So I added these CAA records, one for the delegation point and one for flowm.daemon.contact. Then I got this error: DNS problem: query timed out looking up A for flowm.daemon.contact; no valid AAAA records found for flowm.daemon.contact
This is okay, I had removed the AAAA record again, because it points to nowhere.
So I added the AAAA record back into the zonfile. And then, finally, I got this message:
This is now alright: it tries to connect to an IPv6 address, and it is indeed the address I had edited into the zonefile! This address is not wired, not routed, not connected, not enabled, no nothing yet. So it cannot work, and the timeout appears to be the correct diagnosis.
So this is still no success, but it is now an expected and understandable issue.