Specifically we are using DNS challenge, and the TXT records were verified to be available for query by our ACME client. Our client state machine transitioned to waiting for the challenge verification. We received a 500 error:
AcmeClient 631f8c6373e36c706a7059f0: process_request: ---->> current state: WAIT_CHALLENGE_VFY. Calling <function AcmeClient.wait_challenge_vfy at 0x7fb8209b2730>
AcmeClient 631f8c6373e36c706a7059f0: request: url:https://acme-v02.api.letsencrypt.org/acme/authz-v3/152663732077 method: POST
AcmeClient 631f8c6373e36c706a7059f0: response: HTTP Error 500: Internal Server Error upcalling
AcmeClient 631f8c6373e36c706a7059f0: response details: `{'type': 'urn:ietf:params:acme:error:serverInternal', 'detail': 'Problem getting authorization', 'status': 500}`
Attempting the certificate renewal approximately 50 minutes later resulted in successful certificate renewal:
I can check boulder source code to see what causes 500/'Problem getting authorization' response, thought I would post here first in case anyone else has encountered this.
The function probs.ServerInternal("Problem getting authorization") can be called from six different locations in /wfe2/wfe.go, making it quite impossible to determine what the exact reason for this internal server error was with the info provided by the response from the server. (Of course LE sysops could dig into their logs, but we can't.)
That said, an internal server error is just that: an internal error. Internal to Boulder. By definition almost, it's not something you would have triggered or something you could do to solve it.
I would only worry about it, if this error would repeat itself multiple times over a larger time span and you're the only user affected.
LE keeps count of all the errors per unit of time and their sysops would get a notification if something would be wrong structurally.
thanks @osiris (and @rg305 and @MikeMcQ) - i looked at wfe.go/sa-wrappers.go/pb-marshalling.go as well and concluded knowing exact cause wasnt possible from error info returned to client. i will watch for additional 500s but guessing this was a one-off. thanks again!
I actually got paged for this: There was a brief outage yesterday as a restart of a component of our database stack led to some 500s as connections were re-established to the database. Sorry for any inconvenience we may have caused.