Is Let's Encrypt DNS not liking my domain name server?

mpl · October 18, 2016, 1:50pm

Hi,

I’m using the autocert (https://godoc.org/golang.org/x/crypto/acme/autocert) client in an app I’m writing.

When running the app and trying to get a cert for my dev domain (granivo.re), no problem to get a cert (through the TLS-SNI challenge by default I think).

However, when trying to get a cert for the production domain (camlistore.net, more precisely for the host wip.camlistore.net), I get this error:

“acme: identifier authorization failed”.

Now, by adding some prints here and there in the client, it seems to me that the client is properly sending the reply stating it is ready to accept the TLS-SNI-01 challenge, but it looks like the server then never tries to connect to actually verify the challenge.

And since I have this problem for one domain and not the other, I’m suspecting the DNS for Let’s Encrypt has some trouble resolving the problematic domain, and therefore the VA never connects where we expect it to.

Could someone with access to the server-side help me confirm that hypothesis please, so I can try and figure out what I need to fix on the domain (I have access to the authoritative DNS for camlistore.net) that Let’s Encrypt does not like?

Any kind of logs related to these attempts would help too.

Are there any docs on the production setup of the DNS for Let’s Encrypt? This way, I could run boulder myself in a similar fashion and try and figure out what’s going on.

The client is running on either Ubuntu 14 or CoreOS (on Google Compute Engine).

Thanks.

serverco · October 18, 2016, 2:02pm

I don’t have access to the server side - but I can confirm you have DNS errors - see http://dnsviz.net/d/camlistore.net/dnssec/

mpl · October 18, 2016, 2:05pm

Ah, as it seemed to resolve fine from different hosts with client tools like dig, I didn’t know what else to look for. I’ll look into that, thanks!

cpu · October 18, 2016, 2:21pm

I can confirm that our Unbound instance wasn't able to resolve wip.camlistore.net to an IP address.

There aren't as far as I'm aware. Perhaps you could open an issue on the Boulder repo describing what sort of documentation you would like to see? Boulder outsources the heavy lifting to Unbound and we don't include this in the local development environment presently. Reproducing will be a little tricky.

mpl · October 18, 2016, 2:31pm

Excellent, thanks.
Do you have any more information as to why please? I mean, the name does resolve as far as client tools such as dig are concerned, and camlistore.net | DNSViz shows that A and AAAA records are ok (even though they show warnings), which I thought would be enough. I'll work on the warnings, but it would be great if I knew what Unbound does not like exactly.

Will do, thanks.

TCM · October 18, 2016, 5:28pm

Wait a minute. You aren’t working on your own DNS implementation, are you? Or who else thought it would be a good idea to not respond to NS and SOA queries? WTF? I’ve never seen a messed-up DNS zone like that. Also, is there only a single name server listed for that domain?

http://dnsviz.net/d/camlistore.net/dnssec/ tells you exactly what is wrong.

mnordhoff · October 18, 2016, 5:36pm

It seems to set the status code of empty responses to NOTIMP, e.g. for dig camlistore.net CAA or dig camlistore.net MX. Or NS or SOA, which absolutely should exist, as you said.

I’m not surprised some resolvers are unhappy.

mpl · October 19, 2016, 3:43pm

Hey,

We’ve fixed all errors/warnings that were reported at http://dnsviz.net/d/camlistore.net/dnssec/ , and yet it seems your Unbound still can’t resolve, can it? Is there any way you could tell me what kind of error it reports please?
Maybe the changes haven’t propagated yet though…

thanks.

TCM · October 19, 2016, 5:37pm

The records resolve OK for me. May be a TTL issue.

Or maybe it’s because you’re listing ::ffff:104.154.231.160 in public DNS space. If you don’t have an IPv6 address, don’t set an AAAA record. You still have only one name server for the domain.

What message do you get from LE now?

mpl · October 19, 2016, 6:33pm

Doh, indeed. Thanks, should be fixed now.

Same as before:

acme: identifier authorization failed

TCM · October 19, 2016, 6:38pm

$ dig @104.154.231.160 _acme-challenge.camlistore.net txt
[...]
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 50647
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; WARNING: EDNS query returned status FORMERR - retry with '+noedns'

;; QUESTION SECTION:
;_acme-challenge.camlistore.net.        IN      TXT

What server software exactly are you using? Why is it behaving so badly? It almost seems as if you’re trying to be too clever about using it, restricting its answers in ways you don’t understand? Or could it be you’re rolling your own entirely?

Edit: If I retry with +noedns, I still don’t get any proper answer.

mpl · October 19, 2016, 8:49pm

Weird. it looks ok from here.

TCM · October 19, 2016, 9:01pm

Where is the answer? Just because it's "NOERROR" doesn't mean it's correct.

mpl · October 19, 2016, 9:53pm

Why should there be an answer to that query?

In case the reason you’re mentioning that query is because of the dns-01 challenge, I should reiterate that the client (autocert), is using the tls-sni-01 challenge in my case.

pfg · October 19, 2016, 10:29pm

Let’s Encrypt’s unbound instance uses randomized mixed-case DNS queries, which adds some spoofing resistance. Your DNS doesn’t seem to reply to these requests:

dig WiP.CaMlIsToRe.NeT A

; <<>> DiG 9.8.3-P1 <<>> WiP.CaMlIsToRe.NeT A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49802
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;WiP.CaMlIsToRe.NeT.		IN	A

;; Query time: 325 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Oct 20 00:28:58 2016
;; MSG SIZE  rcvd: 36

mpl · October 19, 2016, 10:54pm

Thanks, done.

This is a not so random string to satisfy the 20 characters forum rule.

mpl · October 19, 2016, 11:01pm

Ah, and now I’m finally seeing the server trying to establish the TLS-SNI connection, so it must have been able to resolve now. great!

mpl · October 20, 2016, 2:22pm

Aaand I’ve just been able to get a cert against staging. FYI, I just had one last issue: our DNS wasn’t replying to CAA queries from Let’s Encrypt, which it now does (well, not with the proper response yet, will do in a bit).

The issue is fixed as far as I am concerned. Thanks to all.

TCM · October 20, 2016, 2:56pm

An answer with NOERROR and no records is valid if there is such a record but of a different type. If there are no records at all, which

$ dig _acme-challenge.camlistore.net any

seems to confirm, the proper answer is NXDOMAIN.

Seriously, what are you doing to that DNS server to completely mess up best practices? Is there even a proper DNS server or is it homebrew code? You really shouldn't touch DNS, that much is obvious.

system · November 19, 2016, 3:03pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The Let's Encrypt HTTP challenge failed: acme error 'urn:acme:error:connection': DNS problem: SERVFAIL looking up A for domain.com	17	14089	March 15, 2016
The server could not resolve a domain name	7	5479	November 8, 2015
Cannot create cert ->DEBUG:acme.challenges:dns-01 was not recognized Server	5	3569	May 31, 2016
[Resolved] FailedChallenges: Failed authorization procedure Server	32	13858	June 11, 2016
Failed authorization procedure. example.com (tls-sni-01): connection Server	6	1649	November 17, 2015

Is Let's Encrypt DNS not liking my domain name server?

Related topics