Remote IPs for specific renewal

I can see how that would make sense for the HTTP failures.
But I fail to see how that can explain the DNS errors.

Since I haven’t been able to replicate the DNS errors, I have been focusing on the HTTP failures.

Ok, so focusing on the times it did get DNS, yes, I agree that:

Do you have access to the logs?
[firewall/IPS/etc.]

On a completely different (parallel) train of thought…
Do you know if your DNS provider is supported by any ACME client API?

Unfortunately I do not have direct access to the firewall logs, because the block is done at the border router. I have asked my security group to look for logs going to our host but they have not yet replied. I was hoping to make their job easier with the list of source IPs for a given transaction date/time.

You could specify that the pertinent requests should all contain the “/.well-known/acme-challenge/” path.

That said, and yet on another (parallel) train of thought…
Do you have access to any other Web Server on the Internet that is NOT behind that same firewall?

Try check that none of these are blocked or throttled (assuming you only have IPv4 records for your domains):

18.196.17.13
18.216.110.187
18.219.177.57
3.120.126.223
54.202.29.69
66.133.109.36
34.222.229.130
52.15.254.228
64.78.149.164

I suspect though that building a comprehensive list of source IPs is going to be a bit difficult due to how Let’s Encrypt are doing multi-VA.

If you haven’t already read ACME v1/v2: Validating challenges from multiple network vantage points , it might help you.

Yes, for example, https://crt.sh/?q=narrative.kbase.us , which was successfully renewed earlier this month. We have other hosts at both sites which have renewed successfully this month; for example, https://crt.sh/?q=ci.kbase.us is at the same site and was renewed successfully.

I will check on these IP addresses, thanks. I have read the docs on validating from multiple vantage points, which is another reason I suspected we are blocking some IPs but not others (I see 66.133.109.36 in our webserver logs, for example).

I presume in reply to:

Two thoughts:

  • Can you move the site to that other (working) location?
  • Can you move it temporarily (just long enough to get a cert)?
    [if even only just a DNS change and then change it back - should prepare by lowering TTL first]

Note: not a fix - just kicks the can down the road a few months…

This site is not movable, and it would be very disruptive to move it temporarily. If it came to that, I would probably try to renew by hand with DNS validation. That would be unpleasant but I think it should work. The cert still has almost two weeks till it expires so I am hoping to avoid the can kicking.

Then you need to get your “border” patrol to understand that HTTP should be allowed (at least to your IP).

I think this lesson needs to be better publicized: HTTP is required… GLOBALLY.
HTTP can, and should, be “managed/controlled/secured” (pick any word that let’s you sleep at night) but needs to be allowed.
If only to redirect all HTTP to HTTPS.
[which can be done by a single dedicated device - for all IPs behind/within that network]

Do NOT wait until the last minute.

The DNS resolution issues were a bit ironic – Let’s Encrypt’s secondary validation servers are currently run in AWS. (Not in GovCloud, of course.) Even if your network was inaccessible, you’d think they’d be able to successfully resolve your DNS records using your DNS server in AWS.

The GovCloud IP is in Great Britain [not global/anycast]?
Not sure if that plays any part in the DNS disruption.

Fortunately, my organization’s security staff are still awake, and found that some of the above IP addresses had been blocked for probing port 80. I do not know all the details yet, but they temporarily whitelisted the IP addresses in question, and I was able to successfully renew the cert.

I am trying to get more information on exactly what triggered the block at the border, which I hope to have tomorrow.

…and I will probably attempt at least a dry run of a DNS validation on one of our hosts so that I will know how to do it if this crops up again. (Perhaps that will expose more details of the DNS failures that were reported this afternoon.)

Security never sleeps!

Sweet; congrats!

I’m also interested, please copy me with whatever they find - thanks.

From what I’ve been told, it sounds like our organization’s IDS saw queries on port 80, from IPs whose PTRs were not to letsencrypt.org names, to enough internal hosts at our site, that it thought the IPs were an attack and triggered an automated block. They have said they will think about how to make the IDS less sensitive to this sort of blocking in the future.

I think it would be useful if the debug logs included which IPs failed to reach the server for validation, but it sounds like LE may be reluctant to do that.

We have another certificate which was expiring soon, and I was able to successfully renew it earlier today. So it seems like the issue is resolved, at least short term. At least I know what to look for if it does recur, and my security group is more aware of the details.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.