Bootstrapping a self-hosted acme-dns with split-DNS. Possible?

I need to get the acme-dns server running locally, on a server that is already running an instance of my split-DNS (so 53 is not available).

Outside public DNS for mydomain.tld:

acmedns     IN    NS   usedname.mydomain.tld
usedname    IN    A    100.11.12.13
linuxserver IN    A    100.11.12.14

Inside private DNS for mydomain.tld:

linuxserver IN    A    192.168.10.10

acme-dns is running as a container via docker compose, with this:

    ports:
      - "943:443"
      - "953:53"
      - "953:53/udp"
      - "980:80"

It starts fine:

acmedns-1  | time="2024-06-08T12:46:06Z" level=info msg="Using config file" file=/etc/acme-dns/config.cfg
acmedns-1  | time="2024-06-08T12:46:06Z" level=info msg="Connected to database"
acmedns-1  | time="2024-06-08T12:46:06Z" level=debug msg="Adding new record to domain" domain=vanroodewierda.rna.nl. recordtype=A
acmedns-1  | time="2024-06-08T12:46:06Z" level=debug msg="Adding new record to domain" domain=acmedns.rna.nl. recordtype=NS
acmedns-1  | time="2024-06-08T12:46:06Z" level=debug msg="Adding new record to domain" domain=acmedns.rna.nl. recordtype=SOA
acmedns-1  | time="2024-06-08T12:46:06Z" level=info msg="Listening HTTPS" domain=acmedns.rna.nl host="0.0.0.0:443"
acmedns-1  | time="2024-06-08T12:46:06Z" level=info msg="Listening DNS" addr="0.0.0.0:53" proto=udp
acmedns-1  | time="2024-06-08T12:46:06Z" level=info msg="Listening DNS" addr="0.0.0.0:53" proto=tcp

NAT is like

100.11.12.13:53 -> 192.168.10.10:953

From the outside, my acme-dns is reachable:

nc -v -z -u usedname.mydomain.tld 53
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to k.l.m.n:53.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.01 seconds.

From the inside as well:

nc -v -z -u linuxserver.mydomain.tld 953
Connection to linuxserver port 953 [udp/*] succeeded!

Now, when on the inside I try to register, I get

root@linuxserver:/srv/docker/nameserver# curl -X POST https://linuxserver.mydomain.tld:943/register
curl: (35) error:0A000438:SSL routines::tlsv1 alert internal error

and the log says:

acmedns-1  | time="2024-06-08T12:47:43Z" level=info msg="http: TLS handshake error from 192.168.10.10:53822: no certificate available for 'linuxserver.mydomain.tld'"

My config is a mess, of course, because I don't understand this all very well.

# domain name to serve the requests off of
domain = "acmedns.mydomain.tld"
# zone name server
nsname = "usedname.mydomain.tld"
# admin email address, where @ is substituted with .
nsadmin = "hostmaster.mydomain.tld"
# predefined records served in addition to the TXT
records = [
    # domain pointing to the public IP of your acme-dns server 
    "usedname.rna.nl. A 100.11.12.13",
    # specify that auth.example.org will resolve any *.auth.example.org records
    "acmedns.mydomain.tld. NS usedname.mydomain.tld.",
]

Is there a way to get the acme-dns running self-hosted in this situation?

10.11.12.13 is not a public IP address? Are you also running your CA software locally?

Also, the hostnames are visible in the acme-dns log.. Assuming those are correct, I can't see any IP address behind acmedns.example.com (where example.com is the actual domain taken from the log of course).

The addresses were fake. Sorry, I should not have used 10.x.x.x. I'll change. Then react as I worked around something.

I worked around it by copying a valid cert over from another machine and setting

# possible values: "letsencrypt", "letsencryptstaging", "cert", "none"
tls = "cert"
# only used if tls = "cert"
tls_cert_privkey = "/etc/letsencrypt/live/mydomain.tld/privkey.pem"
tls_cert_fullchain = "/etc/letsencrypt/live/mydomain.tld/fullchain.pem"

That is of course not a real solution.

I can now successfully use the API, but not entirely:

% curl -X POST https://linuxserver.mydomain.tld:943/update -H "X-Api-User: <snip>" -H "X-Api-Key: <snip>" --data '{"subdomain": "<snip>", "txt": "___validation_token_recieved_from_the_ca___"}'| python3 -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   161  100    54  100   107    688   1364 --:--:-- --:--:-- --:--:--  2064
{
    "txt": "___validation_token_recieved_from_the_ca___"
}

That is a call on the inside. 443 is not available from outside (no NAT). The log says:

acmedns-1  | time="2024-06-08T16:06:57Z" level=info msg="  Actual request no headers added: missing origin"
acmedns-1  | time="2024-06-08T16:06:57Z" level=debug msg="TXT updated" subdomain=<snip> txt=___validation_token_recieved_from_the_ca___

But when I try to read this from the outside:

$ dig _acme-challenge.acmedns.mydomain.tld txt

; <<>> DiG 9.11.36-RedHat-9.11.36-14.el8_10 <<>> _acme-challenge.acmedns.mydomain.tld txt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42965
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 327e502452512e255ca948b766647f7248f5c0f8d3a99f53 (good)
;; QUESTION SECTION:
;_acme-challenge.acmedns.mydomain.tld.	IN	TXT

;; ANSWER SECTION:
_acme-challenge.acmedns.mydomain.tld.	1 IN	TXT	""

;; AUTHORITY SECTION:
acmedns.mydomain.tld.		207	IN	NS	usedname.mydomain.tld.

;; Query time: 37 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Jun 08 17:57:38 CEST 2024
;; MSG SIZE  rcvd: 129

So, almost there?

Dunno, I have a hard time following with all the redaction taking place..

2 Likes

Sorry, but I had to fix so it was coherent.

Never mind, the error was mine (no surprise here). I had CNAME'd _acme-challenge.acmedns.mydomain.tld to <acme-dns-subdomain>.acmedns.mydomain.tld but I should have CNAME'd _acme-challenge.mydomain.tld of course because that is where LE CA will look... :grimacing: Now just see if I can get it working in full.

So, probably the bootstrap would have worked too, maybe. I'll probably check later.

2 Likes

The thing with acme-dns (and DNS challenge validation in general) is that your internal DNS is irrelevant for the purposes of ACME domain validation with a public CA, so all that matters is what the public can see.

2 Likes

This is not entirely true. For instance, my machines (including my router) all can access the acme-dns for registration or TXT update on the local net. And as this is https it needs a real name (not an IP address) or the client may refuse because the cert of the https interface of acme-dns is incorrect.

I have three machines that need to use this cert update. One router (OPNsense, which has a acme.sh plugin). Here I run into trouble with OPNsense's acme.sh-based ACME plugin: ACME fails to work with my acme-dns on a curl certificate issue, but why? (curl returns error 60, a cert error). You can see there that I seem to have a working acme-dns server (registrations, updates, requests all work).

Then there is a Linux server where (almost) everything runs in docker containers. Here, I run into not quite understanding the client side. I used to run a certbot with GoDaddy plugin on both and that worked. A acme-dns based setup, I haven't gotten to work yet, in part because it is hard to find extensive documentation for people who are new to acme-dns and such.

The third one is a macOS system. I haven't even started to look at that one.

Got it working on the Linux/Docker system. The acme-dns is running there and there is now a working certbot+acme-dns plugin (a bit old and probably not longer maintained as the email adres of the maintainers โ€” DT network/security people โ€” bounces, but it works) and I've been able to create my cert. Might write it all up later.

The plugin (pip) is certbot-dns-acmedns ยท PyPI and its instructions are certbot 1 so not working.

Having done this before, my 2ยข:

This is relatively easy to do.

I think the difficulty you encountered is that you were trying to set things up in an awkward fashion because you didn't understand the requirements, did not have the correct routing in place (NAT was redirecting only :53 DNS traffic - which is only needed by the public internet; the private LAN needs :443 forwarded to handle the API requests) , and there was some confusion due to the routing. You also decided to implement your own naming system for domains and not utilize the recommended acme-dns style (which is using the same domain name for NS and A records to minimize confusion and make troubleshooting easy).

The easy way to handle this would have been to ignore the NAT/Private routing for acme-dns, and just rely on the public IP address for that. That server needs to be available for public DNS lookups, so there is no reason to bother with NAT and private addresses.

In that situation, acme-dns instance would pull it's own certificate based on the public DNS setting and it's configuration; and local servers can use the public DNS API endpoint. If you wanted to, you could map that in internal DNS to save a few milliseconds, but it's totally unnecessary. It looks like you were just trying to force a lot of traffic to be local, when the services already need to be public.

The easiest way to handle this is to operate the acme-dns server on the public internet with a public domain. If you want to lock things down afterwards, simple options include:

1- Use an internal system to enable/disable firewall rules that expose that server/ports as needed.
2- Lock down the firewall for HTTPs to local traffic, then have NAT remap the 443

I think you just overcomplicated something relatively simple.

3 Likes

NAT-reflection with LAN NAT rules is simpler than split-DNS, I agree. But only using a public DNS is not an option for me, above all: I want to be able to run my internal systems/logic when the internet is down (which you do not have if you fully depend on an outside DNS). So, split-DNS stays.

By the way, it was really quite hard to find clear and above all complete working instructions and for my three different types of systems that use ACME

  • ACME (acme.sh) plugin in OPNsense (still doesn't work even if my acme-dns works fine, I currently suspect OPNsense has an outdated intermediary cert and the acme.sh curl call to /update faiils )
  • Linux/Docker where you for instance run into that instructions for adding a plugin to certbot do not work for the official certbot container because that container doesn't use pip (solved now by creating my own pip-based certbot container) โ€” this I have now working
  • macOS/MacPorts (probably most straightforward certbot+plugin implementation, still have to do, no problems expected here now that acme-dns works.)

That was not my problem/original question, though (nor routing) but the fact that using the 443 interface to store the nonce in a TXT record during validation without a valid cert to start with breaks the whole system (and I still do not understand that, to be honest, my workaround was copying a valid cert over, but how do you start if you do not have that?).

My additional problem was that I had made (see above) a mistake (lack of understanding, indeed) in my DNS, by having for instance _acme-challenge.acmedns.mydomain.tld instead of _acme-challenge.mydomain.tld.

In the end, I added a private IP reference in my public DNS so that my router (which goes outside for resolving) could resolve the DNS name of acme-dns to an internal address (it needs a name for that cert to work on the acme-dns https API). That is ugly and it could indeed be done with NAT, instead, I guess. I'm just old :grinning: and my heritage is a time that routers did not have such sophisticated NAT capabilities (or implementations like Cisco's small business router ) and split-DNS was your only option. This is actually a good thing to think about, thanks. I might get into name-clashes though.

If the internet is down, your systems can't communicate with the ACME Server, and the ACME Server can't reach the acme-dns server for authentication.

acme-dns will procure the certificate for you.

The problem is that you are insisting on making systems designed for, and required to have public internet connectivity work without public internet connectivity.

3 Likes

Yes, clearly nothing will work that depends on internet services if the internet is down, including validating the cert. I don't care about that.

But for the things I do not need internet services, but that are self-hosted, I won't need internet access (like accessing my existing mailboxes) I still need a DNS if I want local access to cert-based services. I don't care at such a time that my cert cannot be renewed, it simply has to work internally. Without a DNS, I cannot use fqdn on the inside and without fqdn the cert doesn't work as it covers the fqdn. Some clients demand a good cert, they simply will not let you override.

You are misunderstanding my point - your complications were because you were trying to force the required public components into your private LAN. There is no requirement or recommendation for that.

The acme-dns server needs to be on the public internet, and should be handled with a public dns mapping and request.

If you want to run other local services on the same machine, you can either use different hostnames (which I would strongly recommend) or handle them with a split horizon dns for that hostname afterward.

The complexity you encountered is from trying to operate the acme-dns server as a private LAN system and specifying your own certificate. While this is supported, it is explicitly not recommended in the acme-dns documentation and comes with warnings over the complexity and extra steps you must take:

The RESTful acme-dns API can be exposed over HTTPS in two ways:

  1. Using tls = "letsencrypt" and letting acme-dns issue its own certificate automatically with Let's Encrypt.
  2. Using tls = "cert" and providing your own HTTPS certificate chain and private key with tls_cert_fullchain and tls_cert_privkey.

Where possible the first option is recommended. This is the easiest and safest way to have acme-dns expose its API over HTTPS.

Warning: If you choose to use tls = "cert" you must take care that the certificate does not expire! If it does and the ACME client you use to issue the certificate depends on the ACME DNS API to update TXT records you will be stuck in a position where the API certificate has expired but it can't be renewed because the ACME client will refuse to connect to the ACME DNS API it needs to use for the renewal.

The easiest way for you to have done this would have been:

  • set up acme-dns at acme-dns.example.com with a public IP address
  • let acme-dns obtain and manage it's own certificate
  • if other services need to run on that same machine, just use another FQDN for those services.

In that situation you don't need to split-horizon the acme-dns FQDN, because if the public internet were down the Certbot renewals would fail attempting an Order Create, and acme-dns would never be communicated with.

2 Likes

We clearly at least partly misunderstand one another as I think you also misunderstand me. I would like to understand you / clear the misunderstanding.

I.e. I think "your complications were because you were trying to force the required public components into your private LAN" isn't true, unless it is required for acme-dns operation to expose the API over HTTPS on the public internet. That isn't the case, right? So what I was trying to do from the start is standard (service on the inside, acme-dns DNS exposed to the outside, API only exposed to the clients (inside). Right? I wasn't forcing anything out of the ordinary here, I think. But you think so.

I tend to minimise my attack surface and opening up a HTTPS port on the outside is simply something that doesn't automatically ticks that box. As soon as that is open, the attack attempts will start. But I guess that is not what you are saying.

I think what went wrong might have been something else altogether. One client (the ACME plugin on OPNsense, which I initially tested with) still doesn't work. I suspect OPNsense simply doesn't have the correct intermediary cert. And initially (see above) I made an error in my public DNS, which probably prevented "tls = letsencrypt" to work. It is hard to get back what really happened. The OPNsense ACME plugin issue really screwed things up.

I might simply turn 'cert' back to 'letsencrypt' and see if that works now.

Besides, my question about bootstrapping was really dumb. Because I overlooked the option 'none' which enables me to bootstrap with HTTP and then switch to the delivered (wildcard โ€” maybe I should have more prominently mentioned I am issuing a wildcard which works as easily for the acme-dns server as it does for my mail server and so will normally not expire) cert.

1 Like