Weird DNS response resolving acme-v02.api via 1.1.1.1

My daily cron job has problem accessing API endpoint every several days like this: (client software: dehydrated)

Problem connecting to server (get for https://acme-v02.api.letsencrypt.org/directory; curl returned with 6)

According to curl man page:

CURLE_COULDNT_RESOLVE_HOST (6)
Couldn't resolve host. The given remote host was not resolved.

So this is a DNS issue. I can reproduce this on multiple machines from different geo locations, DNSSEC off, DoT on/off. I doubt that this might be caused by some strange behavior of Cloudflare public DNS.

When I try to resolve acme-v02.api.letsencrypt.org via Cloudflare public DNS, the answer section may contains multiple entries with TTL=0. (depending on their load balancing and cache status).

I understand that a recursive DNS resolver may return stale (TTL=0) records when they don't have fresh data. But both letsencrypt.org and pacloudflare.com are hosted by Cloudflare. And the ratio of TTL=0 responses is far more than I expected.

Example 1. Cloudflare DNS answers prod.api.letsencrypt.org and Cloudflare Spectrum record with TTL=0

% dig acme-v02.api.letsencrypt.org @1.1.1.1

; <<>> DiG 9.16.15-Ubuntu <<>> acme-v02.api.letsencrypt.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15136
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;acme-v02.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
acme-v02.api.letsencrypt.org. 6835 IN	CNAME	prod.api.letsencrypt.org.
prod.api.letsencrypt.org. 0	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 0 IN	A 172.65.32.248

;; Query time: 4 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jan 11 17:11:58 CST 2022
;; MSG SIZE  rcvd: 155

Example 2. Cloudflare DNS returns sane results

% dig acme-v02.api.letsencrypt.org @1.1.1.1

; <<>> DiG 9.16.1-Ubuntu <<>> acme-v02.api.letsencrypt.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19987
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;acme-v02.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
acme-v02.api.letsencrypt.org. 7051 IN	CNAME	prod.api.letsencrypt.org.
prod.api.letsencrypt.org. 151	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 151 IN A 172.65.32.248

;; Query time: 4 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jan 11 17:54:13 CST 2022
;; MSG SIZE  rcvd: 155

I didn't observe TTL=0 answers from Google Public DNS & Quad9 DNS.

I tried to replicate acme-v02.api & prod.api records on my own domain hosted by Cloudflare, and I didn't observe TTL=0 answers.

2 Likes

What is the content of the /etc/resolv.conf file?

3 Likes

What is the content of the /etc/resolv.conf file?

/etc/resolv.conf is symlinked to /run/systemd/resolve/stub-resolv.conf with the following content (comment stripped)

nameserver 127.0.0.53
options edns0 trust-ad
search .
2 Likes

DoT sticks with a single TCP connection without being load balanced to multiple backend servers (in a not too short timeframe).

I run into servers that always return the TTL=0 response, which is obviously weird.

2 Likes

Then you should add 1.1.1.1 (or replace it with) to your list of nameservers.

2 Likes

What program is providing the name service? The
ss -nap
command may show what process opened socket on 127.0.0.53

3 Likes

Likely:
systemd-resolve

2 Likes

I'm reporting a likely server-side issue. I'm not looking for help about my client configuration.

1 Like

A DNS reply with 0 TTL is a valid one. If that is the reason of the original problem:

for which I haven't seen proof, only probable correlation, then it is definitely an issue on the client side.

0 TTL means: use it but do not cache. The client must use that once. If the same DNS data is needed second time, then the name resolution process supposed to initiate a new DNS recursion.

4 Likes

You probably should be.
Your nameserver 127.0.0.53 isn't working as expected.

4 Likes

Cloudflare public DNS keeps giving me TTL=0 response. That's not the expected behavior for a TTL=300 CNAME record per its authority server.

I'm asking Let's Encrypt team to look into this issue and raise a ticket on Cloudflare side. The symptom looks like some internal configuration issue on Cloudflare side for me.

% dig prod.api.letsencrypt.org @1.1.1.1

; <<>> DiG 9.16.1-Ubuntu <<>> prod.api.letsencrypt.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49566
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;prod.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
prod.api.letsencrypt.org. 0	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 0 IN	A 172.65.32.248

;; Query time: 4 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Jan 13 12:35:05 CST 2022
;; MSG SIZE  rcvd: 132

% dig prod.api.letsencrypt.org @1.1.1.1

; <<>> DiG 9.16.1-Ubuntu <<>> prod.api.letsencrypt.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18150
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;prod.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
prod.api.letsencrypt.org. 0	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 0 IN	A 172.65.32.248

;; Query time: 4 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Jan 13 12:35:29 CST 2022
;; MSG SIZE  rcvd: 132

% dig prod.api.letsencrypt.org @1.1.1.1

; <<>> DiG 9.16.1-Ubuntu <<>> prod.api.letsencrypt.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6629
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;prod.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
prod.api.letsencrypt.org. 0	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 0 IN	A 172.65.32.248

;; Query time: 4 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Jan 13 12:35:33 CST 2022
;; MSG SIZE  rcvd: 132

% dig prod.api.letsencrypt.org @1.1.1.1

; <<>> DiG 9.16.1-Ubuntu <<>> prod.api.letsencrypt.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39394
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;prod.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
prod.api.letsencrypt.org. 0	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 0 IN	A 172.65.32.248

;; Query time: 8 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Jan 13 12:35:37 CST 2022
;; MSG SIZE  rcvd: 132
% dig prod.api.letsencrypt.org @owen.ns.cloudflare.com

; <<>> DiG 9.16.1-Ubuntu <<>> prod.api.letsencrypt.org @owen.ns.cloudflare.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51148
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;prod.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
prod.api.letsencrypt.org. 300	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.

;; Query time: 4 msec
;; SERVER: 172.64.33.219#53(172.64.33.219)
;; WHEN: Thu Jan 13 12:39:04 CST 2022
;; MSG SIZE  rcvd: 116
1 Like

When I repeatedly ask @1.1.1.1 for the same thing, I also get quite a few TTL = 0 responses back, so I can reproduce what you're seeing. In fact I see these TTL 0 responses also for other domain names (e.g youtube.com), especially for those with a short authoritative TTL.

This is however not causing any issues for me - all libraries can resolve just fine using 1.1.1.1, even when the response was a TTL 0 one.

Why not? A recursive resolver can (and should) return the TTL based on its own cache, which may be <= upstream TTL. I don't see the issue here.

6 Likes