Firstly I should say that we've been using LetsEncrypt for over 3 years now without any issues, and grateful for the great service that was provided up to this point.
With the advent of the the expiration of DST Root CA X3 and the switchover to ISRG Root X1 and the new R3 intermediary this caused us a world of pain. Our use case requires for the root CA to exist in the OS CA store (required for IPSec+IKEv2 based VPNs). We've been noticing that that ISRG root simply does not exist in seemingly random versions of Windows 10, including the most recent fully updated builds. We've been able to temporarily solve the issue by presenting the crosssigned intermediary and the DST root to the clients, but come Sept 2021, this will no longer be feasible, and our service will break down for an unknown portion of the Windows user base (our biggest user base). The key here is the "unknown" part, as we've noticed no pattern in the ISRG root being present or missing from the OS certificate store. The browser root stores are all okay, but this doesn't really matter for the above presented use case.
Is there any recourse here, or perhaps some misunderstanding on our part? Is LE aware of this problem? At this point we're experimenting with ZeroSSL, which appears to suit all our needs, due to them using a well established Sectigo root CA. We'd prefer to stick with LE if at all possible, but so far this appears to be impossible come Sept 2021.
EDIT: Summary of the issue after discussion below and more fiddling around.
Some W10 versions don't ship with ISRG root baked in
If a browser (IE, Edge or Chrome) encounters an ISRG root signed certificate, it DOES lazy load the ISRG root and it DOES appear in the root store.
That being said, non-browser OS subsystems (like rasdial, maybe others) do NOT trigger this flow and the chain cannot be verified until the ISRG root is lazy loaded through a browser.
Ok, my old PC has that certificate. But checked the servers from server-daten.de (not so old Windows 2019 Server), they have the ISRG root.
Use Find Certificates, then search
008210cfb0d240e3594463e0bb63828b00
PS: Oh, what's that: The DbServer has the ISRG, the Webserver has not. Curious. Ok, I have the same problem. But why has the DbServer that root? May be the result of an installation.
I've been looking in certmgr, in the Trusted Root Certification Authorities tab. ISRG root is missing on random installs of Windows, we've checked all the way down to version 1803. Some have it, some don't.
I'm unable to perform a search in certmgr, and I think the instructions you provided are for Windows server, which is not the issue here. We're using Strongswan under Linux, and present the chain to the connecting client. The issue is the Windows rasdial client, which uses the OS certificate store. When ISRG root cert is missing, rasdial throws error 13801, which means the certificate chain cannot be verified. As soon as we present to the DST + R3 chain, it works immediately. Same goes for switching to ZeroSSL that uses the Sectigo root.
Here is a screenshot from v2004, not the absolute latest one, but still from 2020 (don't have access to other devices from here). ISRG is missing.
Windows 10 v1803 (i.e. the April 2018 update) was released before ISRG Root X1 was incorporated into the Microsoft Root Program. But its root store should have been updated many times (they release updates monthly) since then. Are these Windows hosts prevented from updating?
Yes, I'm aware of the inclusion date, however that's not what we're seeing in practice. We have millions of Windows customers who use our product and we've been getting a steady stream of reports ever since December 2nd, when we started seeing the R3 intermediary for newly generated certificates that were deployed to the endpoints.
I'd expected Windows versions 1809 and above to work, even without any updates, however this is not the case. Additionally, my home computer, which is running a fresh install Win 10 Pro 20H2, fully updated, does not have the ISRG root. As soon as I import it manually, everything works as expected.
We've tested this with over a dozen VMs running different versions (1803, 1809, 1903, 1909, 2004, 20H2), Pro, home, LTSC. Do not see any pattern. Sometimes it's there, sometimes it's not. Hence my confusion (and frustration).
@sahsanu Chrome, Edge and Firefox use their own built in certificate stores and ISRG root exists in those, even if the OS store does not. In Internet Explorer, I'm seeing the DST root being used on that check page.
Okay, this is SUPER bizarre. After going to https://valid-isrgrootx1.letsencrypt.org/ in Internet Explorer, the ISRG root CA immediately appears in certmrg, and the issue is "fixed". Rasdial is able to verify the ISRG chain.
I have no explanation for this. I tried it in 2 different VMs just now, and that appears to be the behavior.
You can reproduce by removing the ISRG root, restarting Internet Explorer, and visiting that page.
The explanation is that Windows fetches certificates into its root store lazily. Most programs (including Chrome and Edge) delegate their certificate validation to the operating system: they just hand it an end-entity and the provided chain and say "validate this for me please?". Then Windows does all of the chain building and validation, including fetching any necessary intermediates and including fetching any necessary roots from the Microsoft Trusted Root Program. Then it caches these to make future validations faster.
The real "issue" here is that Rasdial is referencing the on-disk root store without using the OS's built in validation routines, and therefore isn't able to take advantage of the OS's lazy-loading.
As far as I understand it, this complex mechanism is one of the reasons that Firefox ships its own root store (among many other reasons as well), and that Chrome has announced they will do the same.
It appears Internet Explorer is able to trigger the lazy loading flow, but not other parts of the OS, like rasdial. This is a bit of a pickle for us, as the "load this URL in Internet explorer" is not a super viable solution for millions of consumers.
Your help page should include a link like: click here if you are having trouble connecting your VPN with error: 13801
And just ensure they use IE and provide a similarly configured site.
[which just says: Your system should now be updated]
This is not a practical solution. 99% of people do not bother with contacting support, they will simply go "This product doesn't work, I'm uninstalling and getting something else".
I fail to follow the logic here.
If they aren't connected to the Internet, what good does it for them to know about any new trusted roots that are on the Internet?
But I don't really need the use case for that.
Here is what you are looking for:
I'm not interested in the logic I must say. I think having an online root store to pick the certs from when required is a rather stupid idea as a root store. It does of course have advantages such as quick updates, but it a root store should be a complete local root store. Updates can be fixed with other methods.