Certbot hanging on 'dns-01 challenge' while using rfc2136 plugin

Hello, I’m attempting to do a dry run with certbot to test out using the rfc2136, but I keep having it hang while running the dns-01 challenge. I have left it run for 10+ minutes without any luck. I have been able to get nsupdate to properly update the record, so the configuration there is working just fine. One thing I’ve noticed is that while running a netstat, I never see a connection to the DNS server I’m attempting to publish the records to.

Another thing is that once it gets to the challenges, there’s no more logging so I’m not sure what else to check to see what it’s doing while waiting.

Here’s the command I’ve used:

certbot certonly --dns-rfc2136 --dns-rfc2136-credentials /root/ncocc-dns.ini --dns-rfc2136-propagation-seconds 120 -d ‘*.shelbyk12.org’ -d shelbyk12.org --debug-challenges --dry-run

I have also tried with -vvv, but nothing further shows up at the point where I’m hanging.

I see this and then nothing:

2019-07-19 12:14:06,910:INFO:certbot.auth_handler:Performing the following challenges:
2019-07-19 12:14:06,910:INFO:certbot.auth_handler:dns-01 challenge for shelbyk12.org
2019-07-19 12:14:06,911:INFO:certbot.auth_handler:dns-01 challenge for shelbyk12.org

I am on Ubuntu 18.04, certbot 0.23.0-1, and dns-rfc2136 0.23.0-1

Hi @Naticus

I don't know how that plugin works.

But your name servers are bad:

X Fatal error: Nameserver doesn't support TCP connection: ns1.ncocc-k12.org: Timeout
X Fatal error: Nameserver doesn't support TCP connection: ns1.ncocc-k12.org / 208.108.112.11: Timeout
X Fatal error: Nameserver doesn't support TCP connection: ns2.ncocc-k12.org / 208.108.116.11: Timeout

Perhaps your name server doesn't really support that API.

It’s running BIND 9 and I’ve been working back and forth with our ISP to make sure that the setup for RFC 2136 is functioning properly. As I’ve said, I’ve been able to push a change to it using nsupdate without issues now, though we had to do some work with moving the zone file to /var/lib/bind since AppArmor doesn’t allow it to update in /etc/bind. But that is functioning. What are you doing to test so I can replicate the problem you’re seeing?

That's the result of https://check-your-website.server-daten.de/?q=shelbyk12.org - own tool, created because of the questions in this forum.

Oh gotcha, thanks for that info and WOW that’s a lot of data that tool provides. I guess the part that has me confused is that internally I’m able to make the TCP connection from my requesting server to the DNS server (it’s possible that yes, there are restrictions to TCP on 53 from the outside), but it should only be my server making its update anyhow. I’m able to telnet on port 53, which uses only TCP, to the address so that should work, right? Or does LE need TCP access directly to the server?

Try running Certbot in strace? It will produce a huge amount of output, but should help show what Certbot is doing while it's seemingly frozen.

From an external perspective, it looks like two of them return "connection refused" and one of them times out.

Authoritative nameservers should absolutely support TCP.

As currently implemented, Let's Encrypt will use TCP depending on the size of the responses.

Still, regardless of whether or not Let's Encrypt is able to validate your domain, Certbot is running into some kind of trouble before it gets to that.

Are you sure TCP is working internally? It's possible for a firewall configuration to return "connection refused", but it's unusual.

@JuergenAuer I’m currently working with my ISP to correct this, I’m hoping this is the one thing we didn’t know was required, I’ll let you know if this takes care of it!

@mnordhoff If the TCP access to 53 isn’t the only issue we’re having, I’ll see about giving strace a go, thanks for the suggestion.

Update: See post #10 first before reading into this too much.

@JuergenAuer Well we now have TCP connections allowed, but unfortunately this hasn’t resolved our problem.

Here’s what we’re currently seeing, including the python tracebacks (after several minutes of hanging at ‘dns-01 challenge for shelbyk12.org’):

# certbot certonly --dns-rfc2136 --dns-rfc2136-credentials /root/ncocc-dns.ini -d shelbyk12.org --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-rfc2136, Installer None
Obtaining a new certificate
Performing the following challenges:
dns-01 challenge for shelbyk12.org
^CCleaning up challenges
^CExiting abnormally:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/certbot/auth_handler.py", line 69, in handle_authorizations
    resps = self.auth.perform(achalls)
  File "/usr/local/lib/python3.6/dist-packages/certbot/plugins/dns_common.py", line 58, in perform
    self._perform(domain, validation_domain_name, validation)
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 76, in _perform
    self._get_rfc2136_client().add_txt_record(validation_name, validation, self.ttl)
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 112, in add_txt_record
    domain = self._find_domain(record_name)
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 186, in _find_domain
    if self._query_soa(guess):
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 209, in _query_soa
    response = dns.query.udp(request, self.server, port=self.port)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 325, in udp
    q.keyring, q.mac, ignore_trailing)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 258, in receive_udp
    _wait_for_readable(sock, expiration)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 156, in _wait_for_readable
    _wait_for(s, True, False, True, expiration)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 131, in _wait_for
    if not _polling_backend(fd, readable, writable, error, timeout):
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 92, in _poll_for
    event_list = pollable.poll()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/certbot", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/certbot/main.py", line 1381, in main
    return config.func(config, plugins)
  File "/usr/local/lib/python3.6/dist-packages/certbot/main.py", line 1264, in certonly
    lineage = _get_and_save_cert(le_client, config, domains, certname, lineage)
  File "/usr/local/lib/python3.6/dist-packages/certbot/main.py", line 120, in _get_and_save_cert
    lineage = le_client.obtain_and_enroll_certificate(domains, certname)
  File "/usr/local/lib/python3.6/dist-packages/certbot/client.py", line 406, in obtain_and_enroll_certificate
    cert, chain, key, _ = self.obtain_certificate(domains)
  File "/usr/local/lib/python3.6/dist-packages/certbot/client.py", line 349, in obtain_certificate
    orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
  File "/usr/local/lib/python3.6/dist-packages/certbot/client.py", line 385, in _get_order_and_authorizations
    authzr = self.auth_handler.handle_authorizations(orderr, best_effort)
  File "/usr/local/lib/python3.6/dist-packages/certbot/auth_handler.py", line 98, in handle_authorizations
    return authzrs_validated
  File "/usr/local/lib/python3.6/dist-packages/certbot/error_handler.py", line 105, in __exit__
    self._call_registered()
  File "/usr/local/lib/python3.6/dist-packages/certbot/error_handler.py", line 124, in _call_registered
    self.funcs[-1]()
  File "/usr/local/lib/python3.6/dist-packages/certbot/auth_handler.py", line 220, in _cleanup_challenges
    self.auth.cleanup(achalls)
  File "/usr/local/lib/python3.6/dist-packages/certbot/plugins/dns_common.py", line 77, in cleanup
    self._cleanup(domain, validation_domain_name, validation)
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 79, in _cleanup
    self._get_rfc2136_client().del_txt_record(validation_name, validation)
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 147, in del_txt_record
    domain = self._find_domain(record_name)
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 186, in _find_domain
    if self._query_soa(guess):
  File "/usr/local/lib/python3.6/dist-packages/certbot_dns_rfc2136/dns_rfc2136.py", line 209, in _query_soa
    response = dns.query.udp(request, self.server, port=self.port)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 325, in udp
    q.keyring, q.mac, ignore_trailing)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 258, in receive_udp
    _wait_for_readable(sock, expiration)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 156, in _wait_for_readable
    _wait_for(s, True, False, True, expiration)
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 131, in _wait_for
    if not _polling_backend(fd, readable, writable, error, timeout):
  File "/usr/local/lib/python3.6/dist-packages/dns/query.py", line 92, in _poll_for
    event_list = pollable.poll()
KeyboardInterrupt
Please see the logfiles in /var/log/letsencrypt for more details.

Update: See post #10 first before reading into this too much.

@mnordhoff So I tried an strace (using -v -f -tt -e network) and I’m seeing this:

# strace -v -f -e network -tt certbot certonly --dns-rfc2136 --dns-rfc2136-credentials /root/ncocc-dns.ini -d shelbyk12.org --dry-run
strace: Process 2633 attached
[pid  2633] 16:51:55.145429 +++ exited with 0 +++
16:51:55.145557 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2633, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
16:51:55.897605 socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
16:51:55.897987 bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
strace: Process 2637 attached
[pid  2637] 16:51:56.440976 +++ exited with 0 +++
16:51:56.441013 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2637, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 2638 attached
strace: Process 2639 attached
[pid  2639] 16:51:56.498316 +++ exited with 0 +++
[pid  2638] 16:51:56.498659 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2639, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid  2638] 16:51:56.499131 +++ exited with 0 +++
16:51:56.499160 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2638, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-rfc2136, Installer None
16:51:56.843157 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 7
16:51:56.843227 connect(7, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
16:51:56.843320 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 7
16:51:56.843365 connect(7, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
16:51:56.844892 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 7
16:51:56.844951 connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, 16) = 0
16:51:56.845076 sendto(7, "T\253\1\0\0\1\0\0\0\0\0\1\20acme-staging-v02\3ap"..., 65, MSG_NOSIGNAL, NULL, 0) = 65
16:51:56.867548 recvfrom(7, "T\253\201\200\0\1\0\3\0\0\0\1\20acme-staging-v02\3ap"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, [28->16]) = 166
16:51:56.867772 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
16:51:56.867885 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:51:56.868015 connect(7, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("104.106.20.183")}, 16) = -1 EINPROGRESS (Operation now in progress)
16:51:56.889094 getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
Obtaining a new certificate
Performing the following challenges:
dns-01 challenge for shelbyk12.org
16:51:57.307628 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 8
16:51:57.308051 sendto(8, "\4\264\0\0\0\1\0\0\0\0\0\0\17_acme-challenge\tshe"..., 47, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.108.112.10")}, 16) = 47 

I’m not really good at reading strace as I’ve only really used it a couple times now, but it looks like it is making a connection to 208.108.112.10 (the DNS server that I’m authorized to publish changes to, which I successfully can do an nsupdate push to as well), on TCP 53 (which does appear to be working with nsupdate), but then I hang there infinitely.

I’m really leaning towards having to use a manual auth script that runs nsupdate since that is working, though that really isn’t what I was hoping I’d have to do.

Hah, I think I know what has happened and I find it quite silly that I didn’t realize what I had been doing all this time. I was following a guide that suggested using nsupdate to verify everything was functioning, but what I didn’t realize was that “-v” is NOT the flag for verbosity (I completely glazed over this and was just using it like a big dummy) but rather it’s the TCP flag. The server I connect to allows TCP but not UDP. Going to work with my ISP tomorrow to see about getting both UDP and TCP allowed for my purposes.

1 Like

Wow! :exploding_head: That's surprising and something I can imagine would catch up lots of folks.

Thanks for reporting back with your progress figuring out the problem :+1:

dig, from the same project, names the TCP option +tcp or +vc, short for “virtual circuit”. (They’re two equivalent names for the same option.) I bet nsupdate is using the same terminology to get -v.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.