DNS problem: SERVFAIL looking up CAA for tk

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. https://crt.sh/?q=example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My goal is to use a certificate that uses Elliptical Curve based bey issued by letsEncrypt
Currently, I used openssl to generate it using “ecparam” and I am trying to get it signed by LetsEncrypt.
Please guide me in the right direction if there is a better way to do it.

My domain is: koshaparekh.tk

I ran this command: certbot certonly --dry-run --dns-cloudflare --dns-cloudflare-credentials ./cloudflare.ini --domain “koshaparekh.tk” --domain “*.koshaparekh.tk” --csr csr.pem

It produced this output:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-cloudflare, Installer None
Enter email address (used for urgent renewal and security notices) (Enter ‘c’ to
cancel): kosha.parekh@gmail.com
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org


Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must
agree in order to register with the ACME server at
https://acme-staging-v02.api.letsencrypt.org/directory


(A)gree/©ancel: A
Performing the following challenges:
dns-01 challenge for koshaparekh.tk
dns-01 challenge for koshaparekh.tk
Unsafe permissions on credentials configuration file: ./cloudflare.ini
Starting new HTTPS connection (1): api.cloudflare.com
Starting new HTTPS connection (1): api.cloudflare.com
Waiting 10 seconds for DNS changes to propagate
Waiting for verification…
Challenge failed for domain koshaparekh.tk
dns-01 challenge for koshaparekh.tk
Cleaning up challenges
Starting new HTTPS connection (1): api.cloudflare.com
Starting new HTTPS connection (1): api.cloudflare.com
Some challenges have failed.

IMPORTANT NOTES:

  • The following errors were reported by the server:

    Domain: koshaparekh.tk
    Type: dns
    Detail: DNS problem: SERVFAIL looking up CAA for tk

  • Your account credentials have been saved in your Certbot
    configuration directory at /etc/letsencrypt. You should make a
    secure backup of this folder now. This configuration directory will
    also contain certificates and private keys obtained by Certbot so
    making regular backups of this folder is ideal.
    tmp$ dig koshaparekh.tk

My web server is (include version): I am using httpbin

The operating system my web server runs on is (include version): RHEL 8

My hosting provider, if applicable, is: I got a free domain from freenom and hosted it on Cloudflare

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel):

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): certbot 0.36.0

Well, the problem is with the .tk TLD – or with Let’s Encrypt’s resolvers.

Does it work if you try again?

If something about the tk lookup is busted, you might also be able to work around this by creating your own CAA record in Cloudflare:

koshaparekh.tk. 360 IN CAA 0 issue "letsencrypt.org"
3 Likes

I can resolve it, but .tk is behaving oddly.

At least some of their nodes are returning TC and forcing my resolver to retry over TCP. Which is absurd when the real response is 91 bytes.

And I think sometimes they’re dropping UDP queries.

Or just closing TCP connections.

Might be some sort of rate limiting? Run amok?

hi, I tried it again and the dry run was successful:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-cloudflare, Installer None
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org
Performing the following challenges:
dns-01 challenge for koshaparekh.tk
Unsafe permissions on credentials configuration file: ./cloudflare.ini
Starting new HTTPS connection (1): api.cloudflare.com
Waiting 10 seconds for DNS changes to propagate
Waiting for verification…
Cleaning up challenges
Starting new HTTPS connection (1): api.cloudflare.com

IMPORTANT NOTES:

  • The dry run was successful.

I am trying to run the actual command now!

Thank you

Thank you for your help, I would guess it was rate limiting or something.

Once I re-ran the --dry-run and then the actual command, I got the signed certificate using SHA256, however my public key used EC algorithm, so that should work.

Again, thank you for your help.

hi, thank you for the suggestion, I was able to get it working by re-running the command as suggested by mnordhoff.

I think .tk might be acting up again. I was doing some client testing on the staging server with my poshacme.tk test domain earlier today and it was working until around my lunch time (19:00 UTC) and then started generating errors the rest of the day.

Most of the errors I was getting were “Remote PerformValidation RPC failed”. But I also had a few “SERVFAIL looking up CAA for tk” errors as well. Adding an explicit CAA record to my zone seems to have made the SERVFAIL errors go away. But I’m still getting the RPC errors. The zone is currently hosted on Linode.

I just registered another .tk that I’m gonna throw on Cloudflare to see if the RPC errors are Linode specific.

So the new dvolve.tk domain I setup and am hosting on Cloudflare is also throwing the RPC errors on validation. Occasionally, it also throws Incorrect TXT record errors with values that have never existed in the zone which is really weird. The Linode hosted one, poshacme.tk, is pretty consistent with the RPC errors now.

Meanwhile my other non-.tk zones on Linode and Cloudflare are validating just fine. @_az do you have any idea what the deal might be here? Here are some links to the challenges that most recently failed.

https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9134493/M2S4DA
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9134491/QlugCQ
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9134492/odX7vg
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9134490/ntRCxQ

The TXT value in that last one I swear has never existed in the zone which is messing with my head. Could weird issues with the .tk root servers cause issues like this even if cached NS queries for these zones correctly return the Linode and Cloudflare nameservers?

https://letsdebug.net/poshacme.tk/63174
https://letsdebug.net/dvolve.tk/63175

The ca3- record is automagically created by Cloudflare. It's a validation record for DigiCert for the domain's Cloudflare Universal SSL certificate. It exists in the DNS -- you can see it with dig -- but it's hidden from Cloudflare's dashboard.

Edit:

It exists at dvolve.tk. You can see it with dig dvolve.tk txt.

I don't know how it was found at _acme-challenge.dvolve.tk. Did you have a CNAME pointing * to @ or something?

Ooh, nice. Didn’t know Cloudflare did that. But no, other than the TXT records my client has been creating, this zone has only ever had a single A record in it for the root domain pointing to an IP. No CNAMEs. No CAA record either yet.

That’s weird. :confounded:

The _acme-challenge.dvolve.tk record in the error message and the dvolve.tk record that exists now have the same value.

Maybe Cloudflare and DigiCert are experimenting with ACME?

I tried disabling Universal SSL on the zone. My own dig queries still show the _acme-challenge.dvolve.tk TXT record having nothing though. Perhaps Cloudflare wasn’t the best choice for an alternative DNS host for this .tk test.

I'm not sure whether it's been "fixed", but in the past, Boulder produces RPC errors because of what I understand to be a "race" between the DNS resolver timeout and the RPC timeout between the VA and
e.g. the RA.

In other words, it's a wrongly-reported "DNS query timed out", and the root cause is that Unbound can't talk to the nameservers properly (which might also be why letsdebug.net (libunbound) takes like 20+ minutes to test your poshacme.tk domain).

I’m gonna move dvolve.tk over to DigitalOcean for further tests. I’ll report back with more results.

Ok. dvolve.tk has been moved to Digital Ocean without an explicit CAA record yet. poshacme.tk is still on Linode with a valid CAA record.

First test resulted in the following failed challenges. Linode hosted domain still giving RPC errors. Digital Ocean hosted domain throwing SERVFAIL looking up CAA for tk which definitely indicates something is wrong with the .tk root servers, right? Because DO gave a proper NXDOMAIN and the validation server then checked a level up at .tk which gave SERVFAIL?

https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9153076/rqEY3A
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9153074/_8J03A
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9153075/5RN7DA
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9153073/wCtIug

After adding an explicit CAA record to dvolve.tk and running another test, all validations now fail with the RPC error.

https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9201808/gKyaCQ
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9201806/C8BpFw
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9201807/NGSTiQ
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9201805/gsWH8Q

What if you try from a fresh staging account?

I ask because I can do it from mine and it gives NXDOMAIN (for DNS-01). So it may be that the way the authz (with the root tk CAA lookup) is failing is causing the Boulder database objects related to the account to get into a bad state.

Interesting. With a new account all validations in both domains now give a new error.

DNS problem: query timed out looking up TXT for _acme-challenge.<domain>.tk

https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9205775/DdJcBQ
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9205773/FLER_A
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9205774/nCFQyA
https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9205772/2kR_dA

Weird. No matter how many times I do it, I can’t reproduce the timeout.

$ sudo certbot-auto certonly -a manual --preferred-challenges dns \
-d "dvolve.tk" --dry-run -n \
--manual-auth-hook "/bin/true" --manual-cleanup-hook "/bin/true" \
--manual-public-ip-logging-ok
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator manual, Installer None
Obtaining a new certificate
Performing the following challenges:
dns-01 challenge for dvolve.tk
Running manual-auth-hook command: /bin/true
Waiting for verification...
Challenge failed for domain dvolve.tk
dns-01 challenge for dvolve.tk
Cleaning up challenges
Running manual-cleanup-hook command: /bin/true
Some challenges have failed.

IMPORTANT NOTES:
- The following errors were reported by the server:

  Domain: dvolve.tk
  Type:   dns
  Detail: DNS problem: NXDOMAIN looking up TXT for
  _acme-challenge.dvolve.tk

Are you doing these one by one, or firing them all at once in an integration test or something?

It’s 1 order with 4 names in it: poshacme.tk, *.poshacme.tk, dvolve.tk, and *.dvolve.tk. I wonder if the timeouts only happen when the record actually exists. Let me create some fake records that won’t get deleted automatically by my client when it’s cleaning up after the failure.

Ok, these exist now if you want to try from your side again.

>dig txt _acme-challenge.dvolve.tk @ns1.digitalocean.com +short
"fake-key-auth"

>dig txt _acme-challenge.poshacme.tk @ns1.linode.com +short
"fake-key-auth"