Local DNS Server interfering with issuing certificates

So I'm setting up a new homelab setup, and I was running into the same issue for days unaware it could be my somewhat new home network. I have a pfsense system for a router, it has its own DNS server and it has pfblockerng enabled. Also everything sits in different subnets, my homelab stuff sits in it's very own subnet.

When I setup pfsense, I had a lot of issues with Google Homes and other devices intermittently going down. ..."Hmm..something went wrong...try again in a few seconds." Turns out Google Home plus a host of other home devices forcefully use their own DNS servers, so I setup rules to redirect to ONLY my pfsense DNS server and redirect DNS traffic meant for outside network to redirect to pfsense, and block port 853 traffic.

My domain is: couchspot.com

I had first turned off pfblockerng, which didn't work. When it finally worked, the changes were to disable the rules to redirect DNS to the pfsense router from inside the subnet, disable blocking port 853, and disable the other rule for redirecting outbound traffic. Then I went and disable the DNS server for the homelab subnet while also setting on the DHCP server the CloudFlare DNS server.

It was when I went that far, my certificates got issued! But here's the thing, I lost my DNS server and I don't like that. One of the things I planned to do with that DNS server is put in my own DNS entries pointing to the local IPs to prevent traffic from unnecessarily leaving the entire network each time. And actually, wouldn't that interfere with cert-manager renewing certificates? How can I get this all to play nicely again?

1 Like

Welcome to the Let's Encrypt Community, Tyler :slightly_smiling_face:

I'm not the most experienced in your type of setup, but since you seem to want to have control over your own DNS server, you could delegate the DNS to your DNS server then just use automated dns-01 challenges rather than http-01 challenges. That way your local IP addresses won't be an issue.


I saw that you mentioned Cloudflare. Please make sure you fully understand how TLS/SSL works with Cloudflare.

That's actually what I'm doing, the problem seems to be my local DNS server is interfering or the rules to prevent traffic NOT going to my local DNS server. When I turned everything off and set the DNS server to 1.1.1.1, certificates were being issued. But I don't want my DNS setup that way, I want to use my own local DNS server.

1 Like

If you're sitting behind Cloudflare, you probably want to be using Cloudflare Origin CA certificates rather than Let's Encrypt certificates. They last much longer and are far easier to issue and manage.

As for the DNS interference, I'm not immediately sure, but others around here may know immediately.

@_az

You around?

You would need to run your own internal DNS server.
[There you can add all the entries you like and then forward all other requests to global root DNS servers.]
Then set the local DNS IP in DHCP for all your local DHCP clients to use.

1 Like

I already have the local DNS server running on pfsense, the problem is issuing certs doesn't work with it enabled. I don't know if it's the speed at which changes are propagating, but the two don't play well together.

1 Like

I don't follow...
How are you issuing cert?
How is the pfSense DNS involved in those issuances?

1 Like

I use cert-manager in a Kubernetes cluster. My issuer uses a dns01 challenge. It is able to successfully add the TXT records, I see them, but it times out verifying.

This system cannot access any other DNS server besides my pfsense DNS server, there are firewall rules blocking 53 and 853 and redirecting to my pfsense DNS server.

After disabling those firewall rules and setting the system to 1.1.1.1 DNS server, certs were issued. I would like to NOT have this setup, I would like to reverse these changes and get the dns01 challenge to work again with my own DNS server.

1 Like

Maybe I need a picture... But I still don't follow.

You control the local ACME client.
You control the local DNS server.
You control the local firewall.
You control a global DNS zone in CloudFlare.

How are you not able to get this to do what you want?

1 Like

Well I guess I thought I would get a more insightful answer to know where to start fixing something. I don't even know for sure how the dns01 challenge works, for all I know, the challenge might need to use certain DNS servers and my firewall rules are blocking access, or perhaps it's just my DNS server is slow to update and that was something I more or less set and forgot about.

1 Like

Then maybe that is the first question you should be asking...
How does DNS01 challenge work?
Your ACME client connects to a web server and negotiates a token to be validated via DNS.
That DNS validation will come from multiple (unknown IPs) on the Internet.

If I understand your problem correctly (which I doubt):
When your firewall is set to block inbound port 53, then those DNS validation requests will fail to reach your local DNS server (no cert).
[not even sure why you keep bringing up port 853 - but that is not important for this exercise]
When you use the CloudFlare DNS to host your validation, the tests work.
If you are blocking outbound DNS, then you can't expect the local client to reach CloudFlare DNS to add the required validation token.
But again, I have no picture and I have no idea why you are blocking DNS (and port 853) in either direction. The Internet is built on DNS, without it... well you really don't have much left.

2 Likes

I can concur: I also have a hard time following which is what. I don't have a full grasp on the local environment and network, I don't have a grasp on how @sionicion even tries to get a certificate issued (besides using the dns-01 challenge apparently), I don't know what the actual cert-manager setup is, I don't know which exact error messages are presented.. There are way too many unknowns here..

And I would like to recommend the following page to @sionicion:

Especially "How it works" and the "Challenge Types" pages.

Also, I'm pretty sure Let's Encrypt doesn't use DNS-over-TLS, so that's not really relevant here I think.

2 Likes

I doubt anyone (without a crystal ball) will be able to give you any meaningful insight on how to correct a problem they don't fully understand :frowning:

1 Like

Pfsense as router -> Netgear Switch -> ...vlan10 (cshost0, cshost1)

vlan10 - 192.168.10.0/24 subnet (aka Homelab subnet)

Pfsense rules

  1. NAT Port Forward

Interface: Homelab
Protocol TCP/UDP
Source address: *
Source ports: *
Destination address: ! HOMELAB address
Destination ports: 53
NAT IP: 127.0.0.1
NAT ports: 53

Interface: Homelab
Protocol TCP/UDP
Source address: *
Source ports: *
Destination address: ! HOMELAB address
Destination ports: 53
NAT IP: 127.0.0.1
NAT ports: 53

Interface: WAN
Protocol TCP
Source address: *
Source ports: *
Destination address: WAN address
Destination ports: 80 (HTTP)
NAT IP: 192.168.10.40 (traefik external endpoint)
NAT ports: 80 (HTTP)

Interface: WAN
Protocol TCP
Source address: *
Source ports: *
Destination address: WAN address
Destination ports: 443 (HTTPS)
NAT IP: 192.168.10.40 (traefik external endpoint)
NAT ports: 443 (HTTPS)

  1. Firewall Rules

Interface: Homelab
Action: Reject
Protocol: IPv4 TCP/UDP
Source: *
Port: *
Destination: *
Port 853 (DNS over TLS)
Gateway: *

Interface: Homelab
Action: Pass
Protocol: IPv4 TCP/UDP
Source: *
Port: *
Destination: 127.0.0.1
Port: 53 (DNS)
Gateway: *
This is associated with the NAT rule above.

  1. DNS Resolver

Enable: True
Listen Port: 53
Network Interfaces: All
Outgoing Network Interfaces: All

  1. pfBlockerNG

Enable: True
DNSBL: Enabled

Kubernetes cluster

cshost0 - master node (also a worker) - 192.168.10.2
cohost1 - worker node - 192.168.10.3
DNS server used by both - 192.168.10.1
MetalLB - bare metal load-balancer

csweb namespace:

traefik deployment:
images: traefik:latest, thomseddon/traefik-forward-auth:2
traefik service:
external endpoints - 192.168.10.40:80, 192.168.10.40:443

cert-manager namespace:

cert-manager deployed from jetstack/cert-manager helm chart

# issuer.yaml
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: myemail
    privateKeySecretRef:
      name: letsencrypt-production
    solvers:
    - dns01:
        cloudflare:
          email: myemail
          apiTokenSecretRef:
            name: cloudflare-api-token-secret
            key: api-token

# mycertificate.yaml
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: example-certificate
  namespace: namespacethatneedsit
spec:
  commonName: '*.example.com'
  secretName: example-certificate
  dnsNames:
    - example
    - '*.example.com'
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer

When applying mycertificate.yaml, this is what shows when describing the certificate in the Kubernetes cluster.

status:
  conditions:
    - lastTransitionTime: '2020-12-05T13:45:10Z'
      message: Issuing certificate as Secret does not exist
      reason: DoesNotExist
      status: 'False'
      type: Ready
    - lastTransitionTime: '2020-12-05T13:45:10Z'
      message: Issuing certificate as Secret does not exist
      reason: DoesNotExist
      status: 'True'
      type: Issuing

At the same time, I checked CloudFlare, and the TXT record _acme-challenge was successfully created with a TTL of 2 minutes.

Running this command "host -t txt _acme-challenge.example.com" outputs:

Host _acme-challenge.example.com not found: 3(NXDOMAIN)

But if I create another TXT record just like it, I can immediately resolve it on any system regardless of DNS server.

I'm rediagnosing this as I write this up, but what's interesting to me is if I make another TXT record in the CloudFlare portal, it does show up instantly using the above command. So cert-manager is able to create the TXT record, but neither cert-manager nor I can resolve the record? But by turning off my firewall rules above and setting my DNS servers to 1.1.1.1, this TXT record can be validated by cert-manager and certs are issued.

This just gets weirder and weirder. Has all this information helped?

1 Like

Ok so it does appear to come down to the fact that my DNS server isn't propagating the TXT record, except when it's manually created in CloudFlare's DNS portal. I had said I couldn't resolve it myself but I forgot the system I was on was using my DNS server.

So I guess I have to figure out why when cert-manager sets the TXT record, my DNS server can't see it, or isn't propagating the record.

1 Like

Hi All @sionicion @rg305 @Osiris @griffin!
I have several clients using pfSense in similar configurations with a couple exceptions.
I personally appreciate most of the features offered by this Netgate product and have invested MONTHS of time learning how to manage it properly. Much of what I believe to be the problem here is technically out of scope for this forum, however I'll put in my 2 cents worth starting here:

  1. Local DNS with pfSense 2.4
  2. Let's Encrypt on pfSense
  3. pfBlockerNG on pfSense

By default the firewall BLOCKS EVERYTHING so pfBlockerNG is not necessarily your friend here and may actually interfere with LetsEncrypt servers that may be on a suspicious IP.

@sionicion may be an expert (I don't know) but I suggest the links I have provided can resolve Most issues presented here.

To avoid a BILLION EDITS I suggest to take the time to fully understand the content that Jim Pingle presents on behalf of Netgate before hacking out a band aid fix here.

The reason I suggest the videos is because Netgate forum is not anywhere near as useful as this one.

In the meantime I'll digest the posted information and do my best to share what I know without violating this forums policies to the greatest degree possible. (If my input is not desired just say so and I'll step back and watch the show.)

2 Likes

Where does that happen?

1 Like

If the default for the firewall is to drop things (as it should be), then you don't need to be specific about this - furthermore "reject" and "drop" are not the same thing.
You might actually be giving "information" to attackers by using "reject" instead of (the default) "drop".

1 Like

I think you are changing more than one thing at a time and see them as related.
Because "they" break or "allow" the client to work.
But I think if you did them one at a time, you would find that only one is breaking/fixing things.
Which I can only assume is the forced use of the local DNS.
[which is happening inbound and outbound]
So that when the local ACME client tries to reach CloudFlare DNS, it doesn't - it reaches the local pfSense DNS and that knows not what to do with the request to add a TXT record.

Although this it still technically "a guess" (I don't have all the items involved to lab this) - it is at least a more educated one.
And I do thank you for providing enough detail for us to formulate such an opinion.

2 Likes