Suggestion to Add a FromIP Header/Field to Boulder Responses

@lestaff

In a recent topic, it became clearly evident that readily knowing the IP address from which Boulder has attempted to verify a challenge would be vastly beneficial for those digging through firewall logs for evidence of blocked attempts from Boulder. This is not a suggestion to publish a list or range of IP addresses ahead of time. The verifying IP address would appear in the headers or body of the response from Boulder and thus in the certbot output (via -v). I believe this would entail a small implementation effort and have no foreseeable compatibility impacts on any existing ACME CA or client functionality. I'm more than happy to make the PR suggestion myself, but I'm not yet familiar enough with the Boulder architecture to even know what to target.

2 Likes

This could be a complex nightmare to implement and deploy. How would Boulder know what IP to use in this field? When running behind a firewall, Boulder would have a LAN IP - not a WAN IP. The concern isn't really for Boulder's IP, but for the failed challenge IP (which could happen via proxied traffic). There are also concerns where people may collect and share this data, to reverse engineer the LetsEncrypt network, and that brings up all the concerns LE staff have previously made over publishing IP addresses.

That being said, it might make sense as a compromise to:

  • update the error messages with an identifier of which global node(s) failed verification. this could just let people figure out aggregate issues without exposing the node itself.

  • indicate what region(s) or highest-assigned ranges the failed nodes are in. this wouldn't need to be a small range/block that could undermine LE's concerns, but something that could help users associate the issue with a firewall rule.

1 Like

The connecting IP address is always known anyhow. A bad actor could do this now with little effort. The IP address will appear in their webserver logs anyhow, so what's the harm in it appearing in the headers/body?

I'm not asking to publish any type of list here. I only want the servicing IP address included with the response.

Is it really difficult to know your own outbound IP address? I really want to know.

1 Like

I haven't yet thought through if there's actually a threat here, but one needs to be mindful of the cases where the person sending the request to boulder doesn't actually control the web server. Some bad actor would be able to get some number of IPs of Boulder servers (up until they hit rate limits) regardless of whether they're trying to actually get a real certificate for a real web server. Again, not sure how much of a problem it'd be in practice.

Yeah, you pretty much need to hit some known-good outside service and ask what they see your IP as. (Or, you know, move to IPv6 where you don't need to use NAT.)

1 Like

How does the IP address being in the content make that any easier than just looking at the connecting IP address? Is Boulder benefiting from security through obscurity somehow?

1 Like

If you control the network where the web server is, you'll see the IPs from Boulder's validation servers in your network logs, yes.

If you don't control the network where the web server is, this feature would tell it to you.

1 Like

I think I see your concern, maybe. Are you concerned that someone would be attempting to acquire a certificate for someone else and seeing the origin of the requests? Wouldn't they see that anyhow in their webserver logs?

1 Like

Or try to send something to yourself, perhaps? Not sure if that's reasonable.

1 Like

I guess what I'm saying is:

The IP address from whence a response comes is already tied to the response. Is there harm in also including it in the content?

1 Like

There's also a social element. While today people who get LE certs could collaborate and pool their logs to find out a comprehensive list of LE validator IPs, they generally don't. This might be in part because LE has expressed their desire that such a list not circulate, and if they start publishing IPs directly people might feel different about it?

If I request a cert for griffin.example (which is a web server you host and I have no control over), then while I have no expectation of being able to satisfy the challenge your proposal will tell me a few IPs that are used to originate validation checks. While they'll show up in the griffin.example network/webserver logs as coming from those IPs too, this shows them to me even though I don't own that name. And I could probably easy create some accounts, try from several IPs (from cloud services I've rented, say) and try validating a bunch of names I don't own, and then get a pretty good-sized list of most or all of the IPs used for checks.

Suffice it to say that NAT can get complicated, and it's not that simple. And if Boulder is designed (in theory) for any CA to be able to use, not just Let's Encrypt, it might be good for it to be able to work in any of the number of network infrastructures it might be called upon to get used in. And Let's Encrypt may want to add more cloud or other hosting providers on short notice, too.

1 Like

The LetsEncrypt staff have written volumes on this topic. Part of it is security, other parts address usability, best-practices and overall policies oriented. IIRC, one stated reason was they did not want to support people tailoring firewalls to block large regions but carving out exceptions for LetsEncrypt.

There have been dozens of discussions on this, and they've made their position clear. Using a disguised identifier for errors could be a middleground.

That approach creates an issue if that external service is down, and complicates Continuous Integration and testing services. You could also configure the server to present a specific IP, which means seeding that IP onto the server instance (it could be an env var or similar). There are a lot of ways to implement this feature, each with a set of drawbacks and advantages; none of them are very simple to implement once you account for all the work to properly implement and create a range of unit tests that account for the potential failures.

2 Likes

Couldn't you just do that against your own servers anyhow? Why do mine need to be involved? I see no difference between you requesting validations for your own server and logging the IPs of responses from your own logs. Are there specific servers allocated to validating specific domain names? I would think it would be quasi-random.

1 Like

I never knew it was this complicated to discern the address on your own outbound envelope.

1 Like

It'd probably let me get around the rate limits faster. I don't even need to have my own hostname or server, I can just send a bunch of requests for other names.

And yeah, given some time I can probably derive the IP list myself "legitimately" even without this proposed feature, just by checking a bunch of names I do control and tracking my network logs. But that doesn't mean that making doing so easier is worth it.

I do understand that anyone digging into their firewall logs is already having a frustrating time, but I would think that looking for all blocked connections from around the right time would usually be good enough. In order to get a certificate from Let's Encrypt, one needs to have a firewall somewhere with a port open to the entire world (either port 80 for HTTP, port 53 for DNS, or port 443 for TLS-ALPN). If you can't open one of those ports up to everybody, then you can't get a Let's Encrypt certificate reliably.

IPv4 is an old prototype and has had a lot of hacks added onto it over the years in order to make it last longer. This is one of the consequences. The source IP you put on your packets is likely changed at least once by a NAT system before it gets to its destination.

1 Like

I see what you mean now. Separation of functions. The machine running the client need not be the machine being validated.

1 Like

Well it's easy to discern your own outbound address, but it's not easy to discern your entire network or application's "effective" address - from the perspective of a consumer on the public internet. There could be a half dozen nodes between a Boulder server and the LAN's exit to the public internet address.

For internal troubleshooting, the LAN ips are relevant; for external usage, the WAN ips are relevant. Things get even more complex when you have gateways that operate with multiple IPs!

A few years ago, I kept getting blocked by some APIs and SSH servers, they all seemed to ignore my allowlist configurations. I eventually learned my office had a bridged network, and traffic could be routed through any one of four different IPs. We had to track down the list of potential IPs and the infrastructure managers had to put in place systems that would alert everyone when the list of active IPs changed. Over the next two years, the IPs changed several times, the proactive notification systems rarely worked, and people only learned of this when things started to break.

3 Likes

One would think that being aware of the public IP address(es) of your own packets would be fundamental in internetwork operation. I suppose, as was already suggested, bouncing the packets off some public entity is possibly the most evident way. Seems to me like solving this problem has a great many benefits (and possibly many more not yet mentioned here).

1 Like

It seems to really be a problem of "binding" a response to an IP address from end-to-end (ish), which may not be reasonable.

1 Like

Yup. That's (part of) why moving to IPv6 is so important. You usually know your own actual IP address(es) in most (and any "correct" if I can be opinionated about it) implementations of it. (And why I'm disheartened when I see someone having trouble here where it's suggested to "remove the AAAA record" instead of "point the AAAA record to the correct server".)

2 Likes

Amen to that. I must admit that I'm not yet well-versed on IPv6 and its implications/ramifications.

1 Like