Cert renew returned 500, repeat attempt successful

My domain is:

staging.eastwood.com

I ran this command:

ACME certificate renewal using custom client, 2022-09-12 19:46:06,584 UTC

It produced this output:

{'type': 'urn:ietf:params:acme:error:serverInternal', 'detail': 'Problem getting authorization', 'status': 500}

Specifically we are using DNS challenge, and the TXT records were verified to be available for query by our ACME client. Our client state machine transitioned to waiting for the challenge verification. We received a 500 error:

AcmeClient 631f8c6373e36c706a7059f0: process_request: ---->> current state: WAIT_CHALLENGE_VFY. Calling <function AcmeClient.wait_challenge_vfy at 0x7fb8209b2730>

AcmeClient 631f8c6373e36c706a7059f0: request: url:https://acme-v02.api.letsencrypt.org/acme/authz-v3/152663732077 method: POST

AcmeClient 631f8c6373e36c706a7059f0: response: HTTP Error 500: Internal Server Error upcalling

AcmeClient 631f8c6373e36c706a7059f0: response details: `{'type': 'urn:ietf:params:acme:error:serverInternal', 'detail': 'Problem getting authorization', 'status': 500}`

Attempting the certificate renewal approximately 50 minutes later resulted in successful certificate renewal:

INFO 2022-09-12 20:38:02,983 acme_manager 754 140429479524096 - AcmeClient 631f988673e36c706a7059f2: finish: Cert retrieved OK.
Cert:
 -       subject: {b'CN': b'staging.eastwood.com'}
 -       issuer: {b'C': b'US', b'O': b"Let's Encrypt", b'CN': b'R3'}
 -       serialNumber: 4c11f6b21a956bad998b74daf4d4d2ac9c8
 -       version: 2
 -       notBefore: 2022-09-12 19:38:01
 -       notAfter: 2022-12-11 19:38:00

I can check boulder source code to see what causes 500/'Problem getting authorization' response, thought I would post here first in case anyone else has encountered this.

thanks!!

Is this a chronic problem? Because transient problems are expected from time to time.

Better Boulder experts may have better insight as to cause. But, still, frequency of problem is always helpful to know.

3 Likes

this is the first time encountering that 500 response that I am aware of - we have been using LE for cert generation/renewal for 2+ years.

1 Like

hmm...

I'd try that on staging.
And check for any MiTM.

2 Likes

we have nightly regressions that run against LE Staging doing a fair amount of cert creates/renews/revokes, no issues see with those, just BTW.

1 Like

The function probs.ServerInternal("Problem getting authorization") can be called from six different locations in /wfe2/wfe.go, making it quite impossible to determine what the exact reason for this internal server error was with the info provided by the response from the server. (Of course LE sysops could dig into their logs, but we can't.)

That said, an internal server error is just that: an internal error. Internal to Boulder. By definition almost, it's not something you would have triggered or something you could do to solve it.

I would only worry about it, if this error would repeat itself multiple times over a larger time span and you're the only user affected.

LE keeps count of all the errors per unit of time and their sysops would get a notification if something would be wrong structurally.

3 Likes

thanks @osiris (and @rg305 and @MikeMcQ) - i looked at wfe.go/sa-wrappers.go/pb-marshalling.go as well and concluded knowing exact cause wasnt possible from error info returned to client. i will watch for additional 500s but guessing this was a one-off. thanks again!

2 Likes

I actually got paged for this: There was a brief outage yesterday as a restart of a component of our database stack led to some 500s as connections were re-established to the database. Sorry for any inconvenience we may have caused.

5 Likes

no worries, thanks for that info @mcpherrinm!

all the hard work LE does making certs available is very much appreciated...

4 Likes

I´m trying to create certificates but are failing, these outage still persists ? @mcpherrinm

No, that outage was for about 1 minute. If you're having problems, please create a new thread.

5 Likes