yup acme.sh with ZeroSSL works fine from my testing
I went with GoDaddy since I know it well and it took all of 10 minutes to get the cert and another few minutes to install it. Things are back to normal finally. However, it's left a huge mess to figure out which I now need to document and figure out. I'm hoping I'll have a bit of help on this tomorrow.
Once things calm down, I'll spend some time looking into the alternatives mentioned here.
Thank you to all who helped, again, it was truly appreciated even if we didn't solve the problem.
What a mess, now I'm finding other servers that are having problems because they are using the letsencrypt certs on those servers too. Have to buy yet more godaddy certs.
I'm not sure, I understand you correctly, but if you are able to update your remote devices to trust the ISRG Root X1 certificate, you probably won't have to update the other servers.
He's in a catch-22 situation where he can't update the remote devices unless they can connect to the central server. But they can't connect to the central server until they get updated to trust ISRG Root X1...which is why the work around is to change CAs on the server to something that is already trusted by the devices.
It doesn't seem like it should have been catch-22, @jsha posted Help thread for DST Root CA X3 expiration (September 2021) on April 6th. To me it feel a bit like the theater is on fire and people are yelling fire yet not every got out of their seat until the flames actually reached them. The issue being the web world as a whole didn't take on all the duties and responsibilities to be ready for this expiration; the LE community did do due diligence on address and informing the powers that be.
As far as I understood the last message, he already got the devices back online with the godaddy cert at at least one server. Maybe I understood that wrong, and he is not able to update the devices through that channel ...
Correct, remote devices stopped communicating as soon as the change was implemented with LE.
I found testing units and was able to confirm the problem and once I found out what it was, started asking how I might be able to add something back into the server, even temporarily.
Nothing worked so I went with a GD SSL cert. The moment I restarted the web server using GD certs, all remotes started communicating.
Someone will have to build updated versions for the devices but now I'm reading in the openwrt forums that only source code users will be able to use curl with updated SSL while those using image builder will be SOL unless the curl maintainer updates.
I'm not sure, just reading as I look into this now that things are a little calmer.
And yes, it's easy to blame that someone overlooked something this important but there is a lot to technology and some things can fall between the cracks. Usually not this big but even the largest companies make mistakes and learn from each one.
@Bruce5051 that's not how I'd put it. Maybe more like we picked up the Internet and shook it around and found a few loose bolts here and there, which we're now helping people tighten. In an ideal world everyone would get our announcements about upcoming changes and know exactly how to test them, but this type of change (expiration on a future date) is quite hard to test accurately in advance.
That is probably true for some users, but your issue won't be solved by updating curl or SSL alone. You also need to get an updated ca-certificates bundle onto all your devices.
Yes, indeed, that will be updated in the packages on the new firmware.
I know so far that someone did see the notice about the upcoming change but there was also a mention that for the most part, everything should keep working so it didn't seem like anything all that urgent. No one took into account the openwrt software not being well updated or even mentioned for this scenario.
From what I see in the openwrt forums, and I could be wrong, the developers are basically taunting those who don't know how to use source that it's their own fault.
Not everyone has the same skill sets. Some use source, others don't but get by just fine except for that one time where everything breaks. Projects need to understand that not everyone that uses their solution is a high level techy that has time to learn everything about everything. Most of us know a little about a lot of things and leave the rest to pro's when possible.
I won't mention whom but in the initial answers to my own question, I felt like I was being chastised for not knowing everything about SSL and the person just kept on going with high level information that folks like me will not understand because SSL is a small part of our lives. We configure machines, generate certs, install them and we don't need to learn everything about SSL.
I'm happy this was a temporary problem and love that organizations like LE exist. There are a lot of things that should not cost people on the Internet and basic security that slows hackers down is one of them.
I'll keep using LE and recommending it.
I'm sorry to hear that. That sort of experience - chastising someone for not knowing enough - is all too common in tech forums, and it's something we actively try to avoid here. Sounds like we didn't do a great job of that today, but we'll try to do better in the future.
Everyone was amazing. Only one person sounded that way and I just looked past it because as you said, there are always those types in forums.
I have a few OpenWrt devices that the maintainers and developers will not update OpenSSL to 1.1.1 branch. Basically since OpenSSL 1.0.2 is no longer supported then neither is my product. But OpenWrt aren't the only ones either for example my QNAP NAS.
uname -a Linux NAS3BA281 4.2.8 #2 SMP Thu Sep 23 06:02:16 CST 2021 armv7l unknown [~] # openssl version OpenSSL 1.0.2za 24 Aug 2021
I am not chastising individuals in my comment above, I was trying to say if the community had taken this more seriously the impact could have been reduced. I apologize for being rough and / or harsh.
It wasn't you I was talking about but all good :).
Should I ask?
Was I not myself during that extended sleep/beer/chocolate/ice cream withdrawal experience?
Was it me?
This might be wishful thinking and/or crazy talk, but were these devices set up so that they synchronize their time from a remote NTP server that you control? Or can you trick their DNS resolvers into using a different NTP server? If so, you could perhaps set the time back for them all, and then communicate with them again to install the new ISRG Root X1 certificate, and then put the time right again. Of course, that might have adverse effects.
It's a pity (and fairly surprising) that you don't have SSH access. I imagine it's a fairly normal way to communicate with OpenWRT devices, and it would be unaffected by this problem.
Maybe not so much wishful thinking but a lot of creative ideas came up on that day, trying to figure out how to regain control of them.
SSH is available on them but only if someone put it directly onto the Internet with a forwarding port. Otherwise, there are no services accessible on them which is specifically how they were built so that they are safer, more secure for the end location.
They do use NTP but ntp.org.
DNS, there would have been no way to do anything there since we could not send them any controls to do anything because they were simply not communicating as of the LE change.
They function perfectly but in this case, this change was not caught in time or overlooked and this happened.
The question for me at this point is why did all this happen to begin with? While it was a nightmare, the solution was to buy a GoDaddy cert and all went back to normal. Why didn't LE support what ever GD is doing so that this would not have happened?
While this was happening, I came across countless posts across many sites from people who were experiencing all kinds of problems too so we were not alone.
Short answer from an expert:
This happened because root certs need to expire at some point. The root certificate that expired claimed to be issued in the year 2000 - that's older than pretty much every computer I currently use. It is unwisely to use such old things forever, because the security of the entire PKI/web depends on them. They need to be removed at some point, and September 30 2021 was this day.
Other CA's like GoDaddy simply use other root certificates, that don't expire this year - but they will some day. Maybe in 2022, maybe in 2030, maybe in 2040 - but they will at some point.
Since root certs expire, new root certs are issued to replace them. However, if systems do not get updates, they won't trust the new roots - now what do you do? This will inevitably break at some point, until the industry learns that not updating internet connected devices for 5+ years is not an option.
Let's Encrypts root certificate was accepted into root store programs roughly in 2016 - 5 years for software updates to distribute this certificate. For 5 years some systems were not updated, so it broke now.
In addition to that, Let's Encrypt decided to still serve a chain up to this (now expired) root certificate delibaretly. This fixed older Androids (those not having a new enough trust store - the update story again), but broke libraries with incorrect chain verification logic. This was a calculated risk Let's Encrypt decided to take, to help Android users. This issue can be mitigated by serving Let's Encrypts alternate chain, but it won't help systems without an updated trust store.
Looking at this thread, your issue apparently was that while you had the intention of serving reasonably up to date software, you actually weren't. These type of supply chain issues are hard to come by, I know that. I also want to point out that I don't intend to blame you or your company in any way. Rather I blame poor industry standards. You're not alone with these issues. I own a Smart TV by a well-known company that hasn't gotten an update to its root store since about 2012-2013, even though it is much more recent and has gotten software updates since then. Needlessly to say that it won't work with any LE site as of a certain day.
I fully agree, security should be upgraded on a regular basis and in our case, we didn't take into account that openwrt would not be keeping non source based packages updated which lead to this. Even the newest version doesn't work unless you know how to use source.
To me, that is not a responsible way to offer a project and any project worth its salt should have dealt with this for its masses. I'm sure that anyone who develops with openwrt will not agree but they need to keep in mind that in all projects, there are different levels of knowledge.
BTW, the reason I believe that our devices were overlooked is because they aren't directly connected to the Internet, they are merely clients on the LAN and really not any kind of security issue. They only need to communicate outward so no one was thinking about something like this.
I imagine that a large number of companies with embedded devices across the Internet will have been hugely affected by this.
And now I better understand about GD too, it's only a matter of time. In that case, I hope openwrt updates all of the packages that use openssl without forcing people to have to change out countless devices out in the field.
it's outside of this forum's subject but what's your device? I have openwrt buildroot installed for my potato in pocket so maybe I'm able to compile something for you if you want - if you trust me for compiled image/ipk file