Revocation Issues with CRL for R3 (was: r3.o.lencr.org)

Agreed, I couldn't see any issues with curl either. I can't pinpoint the root cause as such; other than tell the symptoms (Java isn't very forthcoming with details).

All of the affected applications in our case are JRE 1.8.0; specifically:
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

And are trying to communicate with a service which has a letsencrypt cert on the frontend.

We're working to re-rollout the affected applications with disabled OSCP validation, but i'd rather not really. Annoyingly, java doesn't seem to be falling back to CRL verification for this issue.

FWIW: an endpoint for example: https://management.api.umbrella.com

1 Like

It appears (from other people's reporting about the error you've seen, but in another context) that OpenJDK can emit this error (a java.security.cert.CertPathValidatorException: Certificate does not specify OCSP responder) when it's not actually the cause at all.

The text of that error suggests it is examining a certificate which it wanted to see an OCSP responder listed in, and the certificate doesn't have one. But the certificates haven't changed (as you say it's Friday)

Misleading error messages are a pain, but I thought I'd highlight this result of some trawling because it's possible that something quite different is wrong and knowing the error could be wrong might help you find that.

2 Likes

For me, most of my systems aren't noticing the issue except for things connecting to MongoDB using a LE cert. Worked fine until ~4pm ET today when everyone else started reporting it and no changes were made on my side to cause it. Happening both on systems that I use LE certs on as well as cloud services we connect to that are LE.

1 Like

OK so, we've had two of our locations have spontaneously recovered without us doing anything (South Africa + Brazil); so.... maybe someone is rolling out a fix somewhere? Starting with less-prime sites?

1 Like

and now the Netherlands + Italy too...

Any feedback on what the issue was would be much appreciated :slight_smile:

And now all DCs are back and happy. Whatever you did, thanks.

1 Like

Hi, folks,

Thanks for posting about this, and we're sorry about the trouble. We just identified and fixed the problem, and created a retroactive status report. The short version is that although our OCSP responder was working correctly, there was another validation problem for some clients: the CRL.

https://letsencrypt.status.io/pages/incident/55957a99e800baa4470002da/608c9dd384a5cf052fc6ed24

To summarize and confirm some ways troubleshooting in this thread was tricky:

  • SSL Labs does tend to incorrectly report OCSP failures. In this case, OCSP was working fine.
  • OCSP is always served over HTTP, so a mismatched TLS certificate at the edge CDN won't cause any failures.
  • The JRE's exception messages can identify the wrong type of validation failure entirely.

This was also unfortunate timing for us. The start of the incident (that is, the previous CRL's expiration) coincided with our troubleshooting a database performance problem. We didn't believe it was affecting more than a tiny fraction of OCSP requests - so few that it didn't reach our threshold to declare an incident - but our initial troubleshooting efforts assumed that it was related.

7 Likes

Thank you @JamesLE ! Much appreciated.

2 Likes

Just to add info. The issue also affected all .NET SslStream connections which were using the default RemoteCertificateValidationCallback. The callback would give sslPolicyError=RemoteCertificateChainErrors accompanied by ChainStatus with the following statuses

  1. RevocationStatusUnknown "The revocation function was unable to check revocation for the certificate."
  2. OfflineRevocation "The revocation function was unable to check revocation because the revocation server was offline."

We had about 45 satellite servers go down simultaneously for a period of about 4.5 hours. We were really thrown by the fact that there was no indication of an issue on https://letsencrypt.status.io/ That along with browsers being unaffected lead us to think it was some kind of weird issue in the guts of dotnet.

We didn't find this thread until after the fact but it is quite a relief to see it acklowledged and explained. Thanks!

3 Likes

Oh no, three terrible hours.

At 00:15 (Berlin) I had a terrible problem in my system (Server-Daten with customers). The webserver was ok, but the DbServer wasn't able to send mails to the smtp server running on the webserver.

With a curious error message - "Could not establish a ssl connection".

Checked logs, that started 21:45 (that's 2021-04-30 19:41 - 2021-05-01 00:04 UTC).

Two reboots later and a changed configuration (added some registry keys) it worked.

But that was 02:05 (00:05 UTC).

Ok, so I know, it wasn't a local problem.

Yep, that's my situation. Must check it next day. Normally, I don't want to ignore certificate errors.

But I think, I have to change that.

1 Like

Thanks for the update. Just so people are clear, was this some sort of failure on the part of IdenTrust, which would affect any system that checked the CRL signed by DST Root CA X3 of the validity of Let Encrypt's R3 (when using the R3-signed-by-DST chain)? Is there not already monitoring around CRL (and OCSP) expirations?

I'm guessing that the cause of error message some people were seeing around certificates not having OCSP were that R3-signed-by-DST only has a CRL listed, so when that failed the system was hoping to use OCSP instead but couldn't because there was no OCSP for that certificate? It'd sure be helpful if more system error messages said exactly which certificate was having the problem (as I was looking at end-entity certs when it doesn't seem like they were the ones with the problem), but it's amazing how error messages seem to always tell you bazillions of details but never the one detail that would actually help you.

2 Likes

Yes, exactly.

We monitor the CRLs we generate, and our OCSP responder, but DST's operator monitors their own. It's likely that we'll explore extending our own monitoring (for the few months that the DST chain remains unexpired).

Oddly enough, CRL expiration causes the same or similar error messages even when OCSP is also published and available (at least on some clients).

3 Likes

Thanks again.

Hmm. So once the DST Root expires, the CRL won't get updated anymore? I wonder if that will cause this same kind of issue again for these clients that broke during this incident, unless the server they connect to switch to the "alternate" chain before then? I know there are some expected issues with old OpenSSL once the expiration happens, but I'm wondering if people might see something more widespread with these other clients that are checking CRLs? Or will they stop checking the CRLs when they see the root is expired as long as ISRG Root X1 is in their trust store?

I was trying to play around with this scenario in the staging environment (which has an expired DST-Root-X3-equivalent to help with testing this, right?), and The Staging-Pretend-Pear-X1-signed-by-Staging-Doctored-Durian-X3 cert lists a CRL of http://stg-dst3.c.lencr.org/ but that URL returns a 404 for me. Is that what will happen for http://crl.identrust.com/DSTROOTCAX3CRL.crl once the root expires, that it will turn into a 404? Or will there be some CRL there "forever" just with an expired signature? Or do we not know yet?

2 Likes

This is currently intended. We set-up the new staging hierarchy to match Production but did not implement all of the details like CRLs. There was a balance for making this change in a timely fashion and providing enough similarity with Production for testing. Initially, we considered issuing a hierarchy where all the URIs were "fake" and not fetch-able but decided that was important aspect for testing. We don't have a timeline for configuring CRLs for Staging.

4 Likes

For anybody interested in following IdenTrust's incident report on this (at least as reported to the Mozilla root program):

4 Likes

SO I appreciate the information here, it explains why our certs show a broken chain of trust. I am assuming this is transient thing and that IdenTrust & the issues with OCSP stapling will ease themselves. Is there anything I need to do on my end to assure our users that the cert IS still ok? DO we have to reissue a new cert, or will the OCSP stapling errors we are encountering be resolved by the parties listed above?

1 Like

@e1haskins, the issue in this thread was just from 2021-04-30 19:41 - 2021-05-01 00:04 UTC (per Let's Encrypt's status page), and was about the CRL of IdenTrust (validating that Let's Encrypt's R3 intermediate wasn't revoked) not being updated. It's got nothing to do with OCSP stapling, so I'm a bit confused by your questions. There's no need to issue new certificates based on this incident.

But if you're having some sort of OCSP problem, I recommend starting your own new thread in the Help section and filling out the template the forum gives you there, including as much detail as you can about the problems you're seeing.

2 Likes

I will do that my issue was ALMOST identical to the one in the opening post of this thread except that mine started on the 3rd of May not April 30th. I literally just checked again and no more issue :slight_smile:

2 Likes

The first post of this thread has a screenshot from SSL Labs, and SSL Labs often can't connect to Let's Encrypt's OCSP server (which looks to be an issue on the SSL Labs side of things). It really has nothing to do with the problems in this thread, it just confuses people a lot since SSL Labs reports an error there even though everything's actually working fine.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.