Hey folks; searching the forums, I see that this is a problem which has occurred in the past and the admins have been stomping down on the root-cause triggers each time.
For the Exim MTA project’s mailhub, we use a Let’s Encrypt cert, CN=mx.exim.org, SAN DNS:hummus.csx.cam.ac.uk, DNS:mx.exim.org; we refetch an OCSP staple via a cron job every two days and also immediately after renewing the cert (a monthly cron job).
Since the latest renew, we’re getting Responder Error: unauthorized (6) from openssl ocsp; this is 100% reproducible … reading the previous forum responses, I’m guessing it’s a cached error?
Is this anything we’ve done wrong to trigger it? Would a sleep N between getting the new cert and requesting the first OCSP staple for it help reduce the likelihood of problems? Anything we can do to get this resolved, other than my posting here?
This has been working fine with being invoked every other day for the past couple of months, so the core functionality works fine.
This is not used with a webserver (except for handling the challenge) but with the Exim MTA, as stated. A mail-server, speaking SMTP. No changes have been made recently other than the mentioned cert renewal.
Output is some text from ed splitting the certs (because at two months this is new enough that I’m still watching cron output mails instead of making them less verbose), followed by the exact error I cited in my post:
reading the articles there seemed to be 2 key issues:
1 was service availability (server not responding at all due to businesses)
2 was a long reply for the OCSP response (longer than what the server was waiting for)
I am assuming you have submitted multiple OCSP request?
If so I am not sure how Boulder and LetsEncrypt deals with these
For now, a sleep 1 after getting the cert and before requesting OCSP for the first time will help avoid caching issues like this. We're planning to fix this soonish by making the OCSP generation a blocking part of certificate generation.
In the meantime, the every-two-days crontab renewal succeeded at 17:28 UTC today in picking up a staple generated at 20:00 UTC yesterday, so we have working OCSP and the our OCSP status monitoring stopped whining at me when that succeeded. So this particular incident resolved itself.
(Currently cron, "every two days", to be able to ride out temporary failures, but the cert renewal is a special case where we can't just ride it out because the old one is immediately invalid; somewhere on my long todo list is a standalone service to use DHCP-style timer-driven retries).
Note: If you depend on OCSP stapling, one way to make renewals more robust is to renew the certificate, but if for any reason you can’t get OCSP, keep the old certificate in place until you succeed in fetching OCSP for the new one.