DNS-01 client: “During secondary validation: DNS problem: SERVFAIL looking up TXT” with delegated _acme-challenge NS

I’m building a DNS-01 validation client in Python so I can request wildcard certificates, but I keep getting this error from the Let’s Encrypt staging endpoint for any domain I try:

During secondary validation: DNS problem: SERVFAIL looking up TXT for _acme-challenge.dabigg.com - the domain's nameservers may be malfunctioning

The flow I’m implementing is:

  1. An NS record is created in the Azure DNS zone for _acme-challenge..com which points to my resolver, which is hosted on UDP/TCP 53.
  2. Submit a new order (for a wildcard).
  3. On successful response, store the order/challenge data in a local database.
  4. My own DNS resolver answers the TXT record under _acme-challenge.<domain> from that database.

Here’s an example challenge response from the ACME server:

{
  "identifier": {
    "type": "dns",
    "value": "dabigg.com"
  },
  "status": "invalid",
  "expires": "2025-12-03T02:11:03Z",
  "challenges": [
    {
      "type": "dns-01",
      "url": "https://acme-staging-v02.api.letsencrypt.org/acme/chall/245712243/20417969703/dUxBhA",
      "status": "invalid",
      "validated": "2025-11-26T02:11:04Z",
      "error": {
        "type": "urn:ietf:params:acme:error:dns",
        "detail": "During secondary validation: DNS problem: SERVFAIL looking up TXT for _acme-challenge.dabigg.com - the domain's nameservers may be malfunctioning",
        "status": 400
      },
      "token": "v5Zgs2UMkoHDrArUdMFZan9_QE2_8g_lFGNd2kJfVhQ",
      "validationRecord": [
        {
          "hostname": "dabigg.com",
          "addressUsed": ""
        }
      ]
    }
  ],
  "wildcard": true
}

The part that’s confusing me is that from my side (and several public resolvers), the TXT record looks fine.

The zone dabigg.com is hosted on Azure DNS, but I’ve delegated _acme-challenge.dabigg.com to my own authoritative resolver:

dabigg.com.        172800  IN  NS  ns1-35.azure-dns.com.
dabigg.com.        172800  IN  NS  ns2-35.azure-dns.net.
dabigg.com.        172800  IN  NS  ns3-35.azure-dns.org.
dabigg.com.        172800  IN  NS  ns4-35.azure-dns.info.

_acme-challenge.dabigg.com. 300 IN NS jh1.pgh.resolution.certaas.web-infra.io.

My resolver at jh1.pgh.resolution.certaas.web-infra.io serves the TXT record based on what’s in the local database.

Here are some tests I ran after placing the TXT:

dig TXT _acme-challenge.dabigg.com @8.8.8.8
dig TXT _acme-challenge.dabigg.com @1.1.1.1
dig TXT _acme-challenge.dabigg.com @9.9.9.9
dig TXT _acme-challenge.dabigg.com +trace

Output (sanitized slightly):

; <<>> DiG 9.10.6 <<>> TXT _acme-challenge.dabigg.com @8.8.8.8
;; ->>HEADER<<- opcode: QUERY, status: NOERROR
...
;; ANSWER SECTION:
_acme-challenge.dabigg.com. 60 IN TXT "328JJjXtZcnkYRqkscfchEPglNy46I7yOh63q6Kbcbo"
...

; <<>> DiG 9.10.6 <<>> TXT _acme-challenge.dabigg.com @1.1.1.1
;; ->>HEADER<<- opcode: QUERY, status: NOERROR
...
;; ANSWER SECTION:
_acme-challenge.dabigg.com. 60 IN TXT "328JJjXtZcnkYRqkscfchEPglNy46I7yOh63q6Kbcbo"
...

; <<>> DiG 9.10.6 <<>> TXT _acme-challenge.dabigg.com @9.9.9.9
;; ->>HEADER<<- opcode: QUERY, status: NOERROR
...
;; ANSWER SECTION:
_acme-challenge.dabigg.com. 60 IN TXT "328JJjXtZcnkYRqkscfchEPglNy46I7yOh63q6Kbcbo"
...

; <<>> DiG 9.10.6 <<>> TXT _acme-challenge.dabigg.com +trace
...
dabigg.com.        172800  IN  NS  ns1-35.azure-dns.com.
dabigg.com.        172800  IN  NS  ns2-35.azure-dns.net.
dabigg.com.        172800  IN  NS  ns3-35.azure-dns.org.
dabigg.com.        172800  IN  NS  ns4-35.azure-dns.info.
...
_acme-challenge.dabigg.com. 300 IN NS jh1.pgh.resolution.certaas.web-infra.io.
;; Received 108 bytes from ns4-35.azure-dns.info in 14 ms

_acme-challenge.dabigg.com. 60 IN TXT "328JJjXtZcnkYRqkscfchEPglNy46I7yOh63q6Kbcbo"
;; Received 100 bytes from jh1.pgh.resolution.certaas.web-infra.io in 20 ms

From these tests:

  • _acme-challenge.dabigg.com exists and returns the expected TXT.
  • Public resolvers (Google, Cloudflare, Quad9) all see the record with NOERROR.
  • The delegation from Azure DNS to jh1.pgh.resolution.certaas.web-infra.io appears to be working.

Despite that, Let’s Encrypt’s secondary validation still reports SERVFAIL looking up TXT.

Final footnote: I am not using IPv6 anywhere at this point, which I do not imagine is causing the issue. A few months ago, I remember this system working properly, but it seems to have broken again.


Questions

  1. Is there anything about this kind of per-label delegation (_acme-challenge.<domain> → my own NS) that tends to cause issues with Let’s Encrypt’s secondary resolvers?

  2. Are there known requirements or quirks around:

    • single-NS setups (I only have jh1 listed for _acme-challenge),
    • EDNS behavior, or
    • TCP/UDP accessibility
      that could lead to SERVFAIL on LE’s side even if 8.8.8.8 / 1.1.1.1 / 9.9.9.9 succeed?
  3. Is there any additional debugging I can do from my side (e.g., specific dig options or tools) that would more closely mimic what Let’s Encrypt’s resolvers are doing during secondary validation?

Thanks!

The secondary validation centers do the same query for the TXT record as the primary center. And, to see "secondary" in the message means the primary succeeded but one or more of the secondary centers failed.

A key difference is their location. The secondary centers are in several places around the world. Usually DNS Servers are available world-wide. This is hard to mimic from your own location. But, one global testing tool shows failing connectivity from a number of worldwide locations. I don't use this tool often so can't vouch for its reliability. But it could explain the error. See: DNS Propagation Checker - Global DNS Testing Tool

Is it possible you have firewalled your DNS servers to certain geographies?

Just noting that Let's Encrypt queries your authoritative servers directly. The https://unboundtest.com site uses a similar technique but cannot mimic world-wide access nor every element of LE's configuration. Your TXT record works fine in unboundtest, for example

2 Likes

I see that DNSViz reports some problems although I don't know why the Primary center would not have been affected. Note unboundtest wasn't affected by this error either so perhaps this is not important. Still, something to keep in mind in case other causes are not found.

_acme-challenge.dabigg.com/TXT: The server responded with no OPT record, rather than with RCODE FORMERR. See RFC 6891, Sec. 7. (71.182.144.186, UDP_-_EDNS0_4096_D_KN)

2 Likes

I will look into why no OPT is being sent for that part. Interestingly, it seemed to have worked this morning.

I did check my firewall logs, and I do not see any blocked traffic to port 53 for either TCP or UDP anywhere so I am uncertain as to why this would have broken [I really hate magic].

Either way, I at least know one thing I need to fix, and will investigate further.

Thanks for your help!

1 Like

To make this general purpose setup your auth zone as a subdomain like auth.youdomain.com then cname your _acme-challenge record to that. e.g. _acme-challenge.www CNAME _acme-challenge.www.auth.yourdomain.com - unfortunately the per-challenge CNAME is a requirement when delegating to another zone (happy to be wrong!)

As far as I can see if you just NS _acme-challenge you will only be able to answer the top level _acme-challenge response for yourdomain.com and *.yourdomain.com, but not _acme-challenge.www.yourdomain.com (as a specific example).

For the SERVFAIL stuff, use https://unboundtest.com/ as @MikeMcQ suggests because if that resolves then Let's Encrypt domain validation will probably work as a well.

Note that it's easier to run your own acme-dns instance for most people GitHub - joohoi/acme-dns: Limited DNS server with RESTful HTTP API to handle ACME DNS challenges easily and securely.

An alternative to trying to delegate challenge responses to another zone is to delegate challenge response writes to your actual zone to another writer, e.g. provide an API that only updates/removes the _acme-challenge TXT records and use that in your clients instead of distributing DNS credentials. This is also a feature we have in Certify Management Hub (called Managed Challenges) and can be used via an API or via the Managed ACME feature (which uses order delegation via an ACME server implementation, and can be used from any ACME client).

2 Likes