Remote IPs for specific renewal

I have been trying to renew one of our domains. I get various error messages, but I suspect an issue with our site blocking one or more LE IP addresses. Is there a way to determine which IPs were used for HTTP queries for a specific renewal request? I was hoping these might be included in the debug logs, but I did not find them at first glance. (Perhaps I overlooked them.)

I know that in general LE does not publish a canonical list of IPs; I only want to know the IPs used for one transaction, so that I can ask my security group to look them up.

I tested our domain with letsdebug.net, and it responded “all OK”. And I can query our web server from various IP addresses, so I do not suspect a general firewall issue.

My domain is: rancher.berkeley.kbase.us

I ran this command:

docker run -it --rm --name certbot -v /certs/certs:/certs/certs -v “/certs/letsencrypt:/etc/letsencrypt” -v “/certs/var:/var/log/letsencrypt” certbot/certbot:latest renew -w /certs/certs --webroot --debug

It produced this output:

[attempt A; from letsencrypt.log:]

2020-02-25 23:45:14,994:DEBUG:certbot._internal.reporter:Reporting to user: The following errors were reported by the server:

Domain: rancher.berkeley.kbase.us
Type: connection
Detail: During secondary validation: Fetching http://rancher.berkeley.kbase.us/.well-known/acme-challenge/wqc0T2jTqQX9ZjErsGiMMzGa6uTRJcefmCJZ1vX2cYk:
Timeout during connect (likely firewall problem)

To fix these errors, please make sure that your domain name was entered correctly and the DNS A/AAAA record(s) for that domain contain(s) the right IP
address. Additionally, please check that your computer has a publicly routable IP address and that no firewalls are preventing the server from communic
ating with the client. If you’re using the webroot plugin, you should also verify that you are serving files from the webroot path you provided.

[attempt B:]

IMPORTANT NOTES:

  • The following errors were reported by the server:

    Domain: rancher.berkeley.kbase.us
    Type: dns
    Detail: During secondary validation: DNS problem: SERVFAIL looking
    up A for rancher.berkeley.kbase.us - the domain’s nameservers may
    be malfunctioning

[attempt C:]

IMPORTANT NOTES:

My web server is (include version):

nginx version: nginx/1.13.8

The operating system my web server runs on is (include version):

“Debian GNU/Linux 9 (stretch)” (from docker container, using standard nginx image)

My hosting provider, if applicable, is:

NA

I can login to a root shell on my machine (yes or no, or I don’t know):

yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel):

no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot):

certbot 1.2.0

Is this (DNS) problem still occurring?

Yes, it is. I had my site whitelist outbound2.letsencrypt.org, but still get this error, which I replicated twice a few minutes ago.

IMPORTANT NOTES:

I saw one connection on our webserver from outbound1.letsencrypt.org, but no other log entries.

I'm more concerned about the DNS failures.
If LE can't resolve your domain name to an IP, you will never see the HTTP authentication request.

I haven’t seen the DNS errors since this afternoon. But I haven’t tried very frequently, since I hit the failure rate limits pretty quickly. (FWIW, I have seen no evidence that our DNS is malfunctioning; I have checked with various tools and they are all fine.)

I also see a functional DNS.
This is what worries me - NOT seeing/knowing where/what the problem is...

LE has thought of this and provided a staging/testing environment.
Not 100% certain how to incorporate it into Docker; might be as simple as adding "--dry-run".

I apologize, I thought I mentioned this, but I see I did not. I used --dry-run many times this afternoon (when I first encountered the errors) with no problems at any time. This is one piece of evidence that led me to suspect my site may be blocking a subset, but not all, of LE’s validation hosts.

1 Like

I can see how that would make sense for the HTTP failures.
But I fail to see how that can explain the DNS errors.

Since I haven’t been able to replicate the DNS errors, I have been focusing on the HTTP failures.

Ok, so focusing on the times it did get DNS, yes, I agree that:

Do you have access to the logs?
[firewall/IPS/etc.]

On a completely different (parallel) train of thought…
Do you know if your DNS provider is supported by any ACME client API?

Unfortunately I do not have direct access to the firewall logs, because the block is done at the border router. I have asked my security group to look for logs going to our host but they have not yet replied. I was hoping to make their job easier with the list of source IPs for a given transaction date/time.

You could specify that the pertinent requests should all contain the “/.well-known/acme-challenge/” path.

That said, and yet on another (parallel) train of thought…
Do you have access to any other Web Server on the Internet that is NOT behind that same firewall?

Try check that none of these are blocked or throttled (assuming you only have IPv4 records for your domains):

18.196.17.13
18.216.110.187
18.219.177.57
3.120.126.223
54.202.29.69
66.133.109.36
34.222.229.130
52.15.254.228
64.78.149.164

I suspect though that building a comprehensive list of source IPs is going to be a bit difficult due to how Let's Encrypt are doing multi-VA.

If you haven't already read ACME v1/v2: Validating challenges from multiple network vantage points , it might help you.

Yes, for example, https://crt.sh/?q=narrative.kbase.us , which was successfully renewed earlier this month. We have other hosts at both sites which have renewed successfully this month; for example, https://crt.sh/?q=ci.kbase.us is at the same site and was renewed successfully.

I will check on these IP addresses, thanks. I have read the docs on validating from multiple vantage points, which is another reason I suspected we are blocking some IPs but not others (I see 66.133.109.36 in our webserver logs, for example).

I presume in reply to:

Two thoughts:

  • Can you move the site to that other (working) location?
  • Can you move it temporarily (just long enough to get a cert)?
    [if even only just a DNS change and then change it back - should prepare by lowering TTL first]

Note: not a fix - just kicks the can down the road a few months...

This site is not movable, and it would be very disruptive to move it temporarily. If it came to that, I would probably try to renew by hand with DNS validation. That would be unpleasant but I think it should work. The cert still has almost two weeks till it expires so I am hoping to avoid the can kicking.

Then you need to get your "border" patrol to understand that HTTP should be allowed (at least to your IP).

I think this lesson needs to be better publicized: HTTP is required... GLOBALLY.
HTTP can, and should, be "managed/controlled/secured" (pick any word that let's you sleep at night) but needs to be allowed.
If only to redirect all HTTP to HTTPS.
[which can be done by a single dedicated device - for all IPs behind/within that network]

Do NOT wait until the last minute.