Current official status page for API?

webprofusion · February 20, 2021, 12:50am

Hey folks, I've gotten a couple of reports from users who's renewals are failing overnight and it seems to be a timeout talking to the Let's Encrypt production API.

What's the official status reporting page for LE?

Looking at https://letsencrypt.status.io/ doesn't show anything recent, but Let's Encrypt Status. Check if Let's Encrypt is down or having problems. | StatusGator shows a bunch of things.

webprofusion · February 20, 2021, 12:54am

Looks like this is the page I was after: Let's Encrypt Status

jillian · February 20, 2021, 1:05am

The official status page is https://letsencrypt.status.io

The Let’s Encrypt SRE team does their best to keep it up to date with maintenances and incidents. Usually, Let’s Encrypt knows about an incident from internal alerting but it takes a bit to confirm, assess the impact, and update the page. The status is currently ‘Operational’ and our internal metrics and alerting confirm that. If you can get user’s to provide the specific errors we can help assess the problem in Help.

webprofusion · February 20, 2021, 1:15am

Thanks Jillian, I did a new release of my app yesterday and got complaints of renewal errors today, so I'm in firefighting mode currently. In particular during new certificate orders they didn't get any http challenges in the API response, which my code wasn't really expecting

https://acme-v02.api.letsencrypt.org/acme/authz-v3/10663697360

I'm guessing this coincided with API maintenance, which is absolutely fine. I probably need to hook into the status.io API to see if I can inform users of maintenance dynamically.

griffin · February 20, 2021, 1:54am

This is was EXACTLY why I ~~keep~~ kept pushing for an Under Maintenance page with a 503 return code for the directory endpoint during maintenance for both staging and production. When end-users see this on-screen and/or in their error logs, they ~~will~~ get the picture INSTANTLY and thus hopefully won't flood developers (and this community) with unnecessary help requests.

Edit: Thanks Let's Encrypt for implementing this!

webprofusion · February 20, 2021, 2:16am

I guess what was interesting here was that the API was returning stuff, but it was apparently impaired (no http challenges), or at least that's my impression. Hard to tell without a trace of the http responses at the time (which I don't have).

Graceful degradation is is a cool feature but you have to be expecting it to in turn build a client that expects that to happen (i.e. you can talk to the API, but all might not be well and you may not know that).

webprofusion · February 20, 2021, 2:34am

For info, the current staging downtime did indeed return a 503, which is great.

StatusCode: 503, ReasonPhrase: 'Service Temporarily Unavailable'

With response body:

{
  "type": "urn:acme:error:serverInternal",
  "detail": "The service is down for maintenance or had an internal error. Check https://letsencrypt.status.io/ for more details."
}

griffin · February 20, 2021, 2:44am

Awesome!

That's precisely what I was hoping for.

griffin · February 20, 2021, 2:50am

@webprofusion

I think CTW should be able to easily handle that return and convey the information to users, I suspect without modification.

I know that CertSage (my own client) reflects this directly and in the response history.

webprofusion · February 20, 2021, 2:54am

While we do handle the error overall we could report it more specifically. Unfortunately the library we use hides that initial failure (fetching the directory), but not for long!

griffin · February 20, 2021, 2:56am

Ah... I ran into that too the other day.

@jillian

I see you down there.

Did Let's Encrypt recently change the Content-Type header away from "application/problem+json"? I noticed my detailed error-handling stopped working.

webprofusion · February 20, 2021, 3:01am

The staging 503 error header included it:

{
  Connection: keep-alive
  Date: Sat, 20 Feb 2021 02:16:59 GMT
  ETag: "5f76372f-b2"
  Server: nginx
  Content-Length: 178
  Content-Type: application/problem+json
}

jillian · February 20, 2021, 3:03am

Let’s Encrypt will return a 503 when we are certain the infrastructure is unavailable- but this is only possible when our load balancers are still up and accessible. There will always be some maintenances where we make changes to our networking gear and our datacenter is essentially offline. This usually only affects Staging because we don’t have a secondary datacenter that we can fail over to. As a result, the errors returned to the users are from our CDN. We think this is ok because of how rarely we take Staging entirely offline and cannot serve a proper 503 response.

In production, we mostly do non-interruptive rolling restarts and rarely turn off all access to the API. On the occasions where do stop Production API access, we make sure to return a 503 from our frontends whenever possible and provide a maintenance notice on status page about the downtime.

griffin · February 20, 2021, 3:03am

Cool. Wonder why I didn't get that type during some of my testing for certain errors. I'll need to look into this more. Osiris has been generously helping to integrate cPanel support into CertSage, so we've encountered a few odd things along the way (from our own doing).

griffin · February 20, 2021, 3:07am

That sounds like an excellent logic and strategy.

Comically, both myself then Osiris smacked right into the two unscheduled staging maintenances during testing for CertSage. His reaction was priceless. Basically mirrored my own. Just unfortunate timing in our testing.

system · March 22, 2021, 3:07am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Status.io widget Site Feedback	5	2451	July 29, 2016
Ongoing Maintenance Help	6	501	May 26, 2021
Today's outage - questions about status page update Help	7	3136	June 18, 2017
Let's Encrypt Status.io Usage Changes Site Feedback	1	622	July 13, 2022
Let's Encrypt Undergoing Planned Maintenance Issuance Tech	4	1669	February 14, 2021

Current official status page for API?

Related topics