Dns-01 challenge - something has changed

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. https://crt.sh/?q=example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is: bigbay4bestbuys.com

I ran this command and It produced this output:

[root@main ~]# certbot certonly --dry-run --dns-rfc2136 --dns-rfc2136-credentials /vpath to/.credentials.ini -d “*.xxxxxxxxxxxx.com” -d xxxxxxxxxxxx.com
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-rfc2136, Installer None
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org
Cert is due for renewal, auto-renewing…
Renewing an existing certificate
Performing the following challenges:
dns-01 challenge for xxxxxxxxxxxx.com
dns-01 challenge for xxxxxxxxxxxx.com
Cleaning up challenges
Encountered exception during recovery:
Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/certbot/error_handler.py”, line 124, in _call_registered
self.funcs-1
File “/usr/lib/python2.7/site-packages/certbot/auth_handler.py”, line 220, in _cleanup_challenges
self.auth.cleanup(achalls)
File “/usr/lib/python2.7/site-packages/certbot/plugins/dns_common.py”, line 77, in cleanup
self._cleanup(domain, validation_domain_name, validation)
File “/usr/lib/python2.7/site-packages/certbot_dns_rfc2136/dns_rfc2136.py”, line 79, in _cleanup
self._get_rfc2136_client().del_txt_record(validation_name, validation)
File “/usr/lib/python2.7/site-packages/certbot_dns_rfc2136/dns_rfc2136.py”, line 163, in del_txt_record
.format(e))
PluginError: Encountered error deleting TXT record: The peer didn’t know the key we used
Encountered error adding TXT record: The peer didn’t know the key we used

same output for certbot renew

My web server is (include version): Apache/2.4.41 (codeit) OpenSSL/1.1.1d mod_fcgid/2.3.9 PHP/7.3.12 mod_perl/2.0.11 Perl/v5.16.3

The operating system my web server runs on is (include version): [root@main ~]# cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)

My hosting provider, if applicable, is: vpsdime

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): no - but I have webmin installed

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): [root@main ~]# certbot --version
certbot 0.39.0

This was working but I had to restart named in the middle while waiting for propagation

I am using a cname setup so the challenge does not screw up the main domain file via a side domain _acme challenge

I am running my own dns with bind pointing to googles public servers

This is in the letsencrypt log

2019-11-30 15:19:49,307:DEBUG:certbot_dns_rfc2136.dns_rfc2136:No authoritative SOA record found for _acme-challenge.bigbay4bestbuys.com
2019-11-30 15:19:49,309:DEBUG:certbot_dns_rfc2136.dns_rfc2136:Received authoritative SOA response for bigbay4bestbuys.com
2019-11-30 15:19:49,311:DEBUG:certbot.error_handler:Encountered exception:

The SOA file for the acme challenge is in the var/named/dynamic directory as so binds Centos setup will work as a dynamic dns and Certbot is able to write mkey and mkey.jnl files there (today)

but a rndc sync -clean does not seem to delete the jnl files anymore and a stop and start of named simply resets the time stamp on them

So what has changed ??

1 Like

I would test that key via some other method.

I generated new keys - they all had a magical space in them - from nowhere -see the first answer at https://stackoverflow.com/questions/8745057/nsupdate-getting-badkey-error and followed the guide at https://certbot-dns-rfc2136.readthedocs.io/en/latest/. I think the problem is somewhere in BIND 9.11 in how it changed how it handles views directives Now I get

2019-12-01 09:36:57,472:INFO:certbot.auth_handler:dns-01 challenge for xxxxxxxxxxxxxxxx.com
2019-12-01 09:36:57,473:INFO:certbot.auth_handler:dns-01 challenge for xxxxxxxxxxxxxxxx.com
2019-12-01 09:36:57,490:DEBUG:certbot_dns_rfc2136.dns_rfc2136:No authoritative SOA record found for _acme-challenge.xxxxxxxxxxxxxxxx.com
2019-12-01 09:36:57,496:DEBUG:certbot_dns_rfc2136.dns_rfc2136:No authoritative SOA record found for xxxxxxxxxxxxxxxx.com
2019-12-01 09:36:57,499:DEBUG:certbot_dns_rfc2136.dns_rfc2136:No authoritative SOA record found for com
2019-12-01 09:36:57,500:DEBUG:certbot.error_handler:Encountered exception:

I do not get the third “com” domain - and then

PluginError: Unable to determine base domain for _acme-challenge.xxxxxxxxxxxxxxxx.com using names: [’_acme-challenge.xxxxxxxxxxxxxxxx.com’, ‘xxxxxxxxxxxxxxxx.com’, ‘com’].

I found this problem also at Problems with certbot dns challenge - Unable to determine base domain but even though
schoen
Certbot engineer / EFF
Mar '18
[said] If you tell me the domain name, I can try to investigate.

No solution was posted

If I update the named.conf and remove the domain from the intertnal view the error now remains the same but NO SOA records found -when I put it back it finds the main domain only

NSUPDATE works to verify the key but then fails on a tsig error

The plugin does not rely on NSUPDATE according to Sydney some months back.

I have a cname setup like this

_acme-challenge.xxxxxxxxx.com. 14400 IN CNAME xxxxxxxxx.com.

and then a zone file in var/named/dynamic (BIND 9.11) like this

$ORIGIN .
$TTL 86400 ; 1 day
_acme-challenge.xxxxxxxxxx.com IN SOA ns1.yyyyyyyyy.com. no-reply.main.yyyyyyyyyy.com. (
2016122352 ; serial
3600 ; refresh (1 hour)
7200 ; retry (2 hours)
2419200 ; expire (4 weeks)
86400 ; minimum (1 day)
)
NS ns1.yyyyyyyyy.com.
NS ns2.yyyyyyyyy.com.
$TTL 14400 ; 4 hours
A xxx.xxx.xxx.xxx.
A yyy.yyy.yyy.yyy.
$ORIGIN _acme-challenge.xxxxxxxxxx.com.
localhost A 127.0.0.1

The two $ORIGIN statements the certbot put in there

is the cname setup wrong ?? It worked for a year

1 Like

The open and close quotes look a bit problematic:image
Try manually replacing those on the command line.
[instead of just copy/paste/run]

Did you try with sudo?
Check the file permission.
Manually delete the files and recreate them and then recheck the permissions.

that was done correctly - it was working for a year

…and then something changed/broke…

There is something buggy about this rfc2136 authenticator - one time through it gets no SOA response - thge nexrt time through without any changes it gets an SOA response. Also it no longer rerspects the propagation delay command The python API must be the issue

2019-12-01 16:53:26,166:INFO:certbot.auth_handler:dns-01 challenge for xxxxxxxxxxxxxxxx.com
2019-12-01 16:53:26,180:DEBUG:certbot_dns_rfc2136.dns_rfc2136:Received authoritative SOA response for _acme-challenge.xxxxxxxxxxxxxxxx.com
2019-12-01 16:53:26,186:DEBUG:certbot.error_handler:Encountered exception:

no changes and the second time through this

2019-12-01 17:09:59,220:DEBUG:certbot_dns_rfc2136.dns_rfc2136:No authoritative SOA record found for _acme-challenge.xxxxxxxxxxxxxxxx.com
2019-12-01 17:09:59,222:DEBUG:certbot_dns_rfc2136.dns_rfc2136:Received authoritative SOA response for xxxxxxxxxxxxxxxx.com
2019-12-01 17:09:59,224:DEBUG:certbot.error_handler:Encountered exception:

Two things changed - the rfc2136 authenticator changed and it no longer stops with the propagation delay switch - so you cannot stop it to flush the TXT file into the zone file - and BIND went from 9.9 which CenOS 7 originally had for a long time to now BIND 9.11.4-P2-RedHat-9.11.4-9.P2.el7

In 9.10 and up - how views was handled changed and now views will not allow an internal and external view of a dynamic zone file

I do not think that matters so much as the python DNS api seems to be unable to reliably get the SOA records when a CNAME setup is used. as shown above.

I just renewed manually and the CNAME setup is correct - as I put the manual TXT entries in the ame-challenge zone file and it worked.

There are no errors on the keys though it is not updating via NSUPDATE though it runs without error

I had to enter the TXT entries manually and then restart named (BIND) to get them to show up in the DIG and then enter the next one.

The --dns-rfc2136-propagation-seconds switch is not working

I have 90 days to do dryruns to get ths working again

I had written a script to restart the DNS (named) while the propagation delay was clocking off 120 seconds - and it now does not work because the delay does not happen

It is nothing I did - I had not touched it - the setup at all

1 Like

That may be true…
But it seems to me that the fix is something you will have to do.
I trust DNS to work - it does exactly what you tell it.
If it is NOT doing what you expect then check what it is being told to do.
Looks like things aren’t propagating properly.
Or maybe one DNS server does but the other doesn’t - you need to check them each (individually).

The google servers are forwarders only - they are doing what they are supposed to. This all worked fine - and would still work if the propagation delay would work by stopping for the 120 seconds as expected by its promise.

I cannot fix the certbot dns rfc2136 authenticator to re-work like it once did.

I can check to make sure the dynamic update in bind works - by nsupdate - but I cannot help the the authenticator find the SOA files reliably when everything else - like dig sees them It may be a timing issue of going too fast. I will look closely at ttl values - but as said this all worked fine before

The main thing that needs to be fixed or checked is the python dns api that the rfc2136 authenticator uses to accesses the zone files reliably - or at least programitically show why it is not - any longer - as it worked for a year

It is nothing I dd - nor for the most part have control over. the rfc2136 authenticator is buggy and cannot reliably find perfectly fine SOA domain files like it once did

It is nothing I did

1 Like

SOLUTION

The rfc2136 authenticator may not work in a multi-view setup of bind as after bind 9.10 bind has changed views. no longer can the internal and external views both use the same dynamic directory zone file

The solution was found at https://serverfault.com/questions/764619/bind-9-10-in-view-directive-doesnt-work

The solution looks like this in the named.conf

view “internal” {

zone example.com {
    type slave;
    file "slaves/example.db";
    masters { 192.168.1.1; };
};

};

view “external” {

zone example.com {
    in-view "internal";
};

};

3 Likes