Revocation Issues with CRL for R3 (was: r3.o.lencr.org)

We're seeing application issues (Veeam) due to revocation checks failing on our LE SSL certificates. It appears to maybe relate to [r3.o.lencr.org] being down at present?

Maybe the incorrect SSL certificate on that site?

Anyone else aware of this, any ETA or knowledge by Lets Encrypt of this issue being investigated?

2 Likes

Hi @MShar9e

that's a known Ssllabs bug. Completely independend from your domain / configuration.

See

same with a GoDaddy OCSP check.

2 Likes

Yeah, ssllabs can't seem to connect to it for quite some time (I think I've seen reports on this forum for weeks about it), though I don't know if it's because one end or the other is blocking it.

I don't see any problems checking OCSP through crt.sh.

You probably need to be clearer about what actual problem you're seeing, including the domain/certificate.

3 Likes

This isn't a bug in SSL Labs. Akamai is legit presenting a bad cert for r4.o.lencr.org which is resulting in global SSL certificate errors. Does anyone know if Let's Encrypt is engaging with Akamai to fix it?

The OCSP info embedded into certificates is a http: link, not https:, so I don't see how Akamai could be serving a certificate for it? And R4 hasn't issued any certificates ever, so why are you looking at the R4 OCSP? I don't see any issues from my side connecting to http://r3.o.lencr.org. Again, can you be really clear about what problems you're seeing? Maybe include steps to reproduce with openssl or other tools?

3 Likes

Please read the next reply - OCSP Error when testing new website - #3 by rg305

Some minutes later - it had worked. So it's a temporary Ssllabs problem - sometimes visible (with different OCSP domains), sometimes invisible.

2 Likes

This is the application error and the URL originally mentioned listed on the certificate....

Seems to be getting traction with other people on Twitter as an active issue.

We have a few applications which enforce OCSP validation with Java 1.8.0, and these are currently failing with:

Caused by: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Certificate does not specify OCSP responder

No changes on our side (especially bearing in mind that's pretty late in the day here, and on a friday :slight_smile: )

Started about 20:57:48 BST (+1) // 19:57:48 UTC

1 Like

Well,when I try from my computer I don't see a problem…

$ curl -O https://letsencrypt.org/certs/lets-encrypt-r3.pem
$ curl -O https://letsencrypt.org/certs/isrgrootx1.pem
$ curl -o cert.pem https://crt.sh/?d=4316789025
$ openssl11 ocsp -no_nonce -url http://r3.o.lencr.org -header Host=r3.o.lencr.org -issuer lets-encrypt-r3.pem -CAfile isrgrootx1.pem -cert cert.pem
Response verify OK
cert.pem: good
        This Update: Apr 29 18:00:00 2021 GMT
        Next Update: May  6 18:00:00 2021 GMT

I'm curious if what you're seeing is related to the specific certificates, or something different about how your system is validating, or what. Hmm…

1 Like

Agreed, I couldn't see any issues with curl either. I can't pinpoint the root cause as such; other than tell the symptoms (Java isn't very forthcoming with details).

All of the affected applications in our case are JRE 1.8.0; specifically:
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

And are trying to communicate with a service which has a letsencrypt cert on the frontend.

We're working to re-rollout the affected applications with disabled OSCP validation, but i'd rather not really. Annoyingly, java doesn't seem to be falling back to CRL verification for this issue.

FWIW: an endpoint for example: https://management.api.umbrella.com

1 Like

It appears (from other people's reporting about the error you've seen, but in another context) that OpenJDK can emit this error (a java.security.cert.CertPathValidatorException: Certificate does not specify OCSP responder) when it's not actually the cause at all.

The text of that error suggests it is examining a certificate which it wanted to see an OCSP responder listed in, and the certificate doesn't have one. But the certificates haven't changed (as you say it's Friday)

Misleading error messages are a pain, but I thought I'd highlight this result of some trawling because it's possible that something quite different is wrong and knowing the error could be wrong might help you find that.

2 Likes

For me, most of my systems aren't noticing the issue except for things connecting to MongoDB using a LE cert. Worked fine until ~4pm ET today when everyone else started reporting it and no changes were made on my side to cause it. Happening both on systems that I use LE certs on as well as cloud services we connect to that are LE.

1 Like

OK so, we've had two of our locations have spontaneously recovered without us doing anything (South Africa + Brazil); so.... maybe someone is rolling out a fix somewhere? Starting with less-prime sites?

1 Like

and now the Netherlands + Italy too...

Any feedback on what the issue was would be much appreciated :slight_smile:

And now all DCs are back and happy. Whatever you did, thanks.

1 Like

Hi, folks,

Thanks for posting about this, and we're sorry about the trouble. We just identified and fixed the problem, and created a retroactive status report. The short version is that although our OCSP responder was working correctly, there was another validation problem for some clients: the CRL.

https://letsencrypt.status.io/pages/incident/55957a99e800baa4470002da/608c9dd384a5cf052fc6ed24

To summarize and confirm some ways troubleshooting in this thread was tricky:

  • SSL Labs does tend to incorrectly report OCSP failures. In this case, OCSP was working fine.
  • OCSP is always served over HTTP, so a mismatched TLS certificate at the edge CDN won't cause any failures.
  • The JRE's exception messages can identify the wrong type of validation failure entirely.

This was also unfortunate timing for us. The start of the incident (that is, the previous CRL's expiration) coincided with our troubleshooting a database performance problem. We didn't believe it was affecting more than a tiny fraction of OCSP requests - so few that it didn't reach our threshold to declare an incident - but our initial troubleshooting efforts assumed that it was related.

8 Likes

Thank you @JamesLE ! Much appreciated.

2 Likes

Just to add info. The issue also affected all .NET SslStream connections which were using the default RemoteCertificateValidationCallback. The callback would give sslPolicyError=RemoteCertificateChainErrors accompanied by ChainStatus with the following statuses

  1. RevocationStatusUnknown "The revocation function was unable to check revocation for the certificate."
  2. OfflineRevocation "The revocation function was unable to check revocation because the revocation server was offline."

We had about 45 satellite servers go down simultaneously for a period of about 4.5 hours. We were really thrown by the fact that there was no indication of an issue on https://letsencrypt.status.io/ That along with browsers being unaffected lead us to think it was some kind of weird issue in the guts of dotnet.

We didn't find this thread until after the fact but it is quite a relief to see it acklowledged and explained. Thanks!

3 Likes

Oh no, three terrible hours.

At 00:15 (Berlin) I had a terrible problem in my system (Server-Daten with customers). The webserver was ok, but the DbServer wasn't able to send mails to the smtp server running on the webserver.

With a curious error message - "Could not establish a ssl connection".

Checked logs, that started 21:45 (that's 2021-04-30 19:41 - 2021-05-01 00:04 UTC).

Two reboots later and a changed configuration (added some registry keys) it worked.

But that was 02:05 (00:05 UTC).

Ok, so I know, it wasn't a local problem.

Yep, that's my situation. Must check it next day. Normally, I don't want to ignore certificate errors.

But I think, I have to change that.

1 Like

Thanks for the update. Just so people are clear, was this some sort of failure on the part of IdenTrust, which would affect any system that checked the CRL signed by DST Root CA X3 of the validity of Let Encrypt's R3 (when using the R3-signed-by-DST chain)? Is there not already monitoring around CRL (and OCSP) expirations?

I'm guessing that the cause of error message some people were seeing around certificates not having OCSP were that R3-signed-by-DST only has a CRL listed, so when that failed the system was hoping to use OCSP instead but couldn't because there was no OCSP for that certificate? It'd sure be helpful if more system error messages said exactly which certificate was having the problem (as I was looking at end-entity certs when it doesn't seem like they were the ones with the problem), but it's amazing how error messages seem to always tell you bazillions of details but never the one detail that would actually help you.

2 Likes