Cert-Manager Issuance Backoff

Hi,

My domain is: hflawreport.com

We have had an issue with cert-manager running in our k8s cluster that has lead to many failed requests. We believe the issue is resolved now and have been able to renew one certificate, however a number of others are now reporting "Backing off from issuance due to previously failed issuance(s). Issuance will next be attempted at ...." including the domain mentioned above.

Is there anyway to reset this failure count?

That's probably a very good question for the cert-manager support channel(s). There are some volunteers here with knowledge about k8s and cert-manager, but I don't think those are daily visitors of the Community.

4 Likes

Thanks, i guess i wasn't sure if cert-manager was just respecting a back-off from an API.

Thanks will check with the cert-manager channel

1 Like

Well, from the limited information you've provided I'm not sure too :man_shrugging:t2: But the text mentions a previously failed issuance, so it's already backing off without trying again, right? So to me that would mean the current backing off is due to some kind of built-in cert-manager thing.. But I don't have any experience with cert-manager, so I'm not sure about that. Might be my reading into the text is incorrect.

Without proper logs it's just an educated guess.

3 Likes

Hello @IanJ, welcome to the Let's Encrypt community. :slightly_smiling_face:

When you opened this thread in the Help section, you should have been provided with a questionnaire. Maybe you didn't get it somehow (which is weird), or you've decided to delete it. In any case, all the answers to this questionnaire are required:

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:

I ran this command:

It produced this output:

My web server is (include version):

The operating system my web server runs on is (include version):

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don't know):

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):

Thank you for assisting us in helping YOU!

2 Likes

@IanJ using the online tool Let's Debug yields these results
https://letsdebug.net/hflawreport.com/1924029

Several of these WARNINGS "HTTP response 503 Service Unavailable"

UnexpectedHttpResponse
WARNING
Sending an ACME HTTP validation request to hflawreport.com results in unexpected HTTP response 503 Service Unavailable. This indicates that the webserver is misconfigured or misbehaving.
503 Service Unavailable

<!DOCTYPE html>
<html>
<head>
<title>Service Unavailable</title>
</head>
<body>
<h1>Service Unavailable</h1>
<p>
The site you requested is currently unavailable.
</p>
</body>
</html>

Trace:
@0ms: Making a request to http://hflawreport.com/.well-known/acme-challenge/letsdebug-test (using initial IP 151.101.1.208)
@0ms: Dialing 151.101.1.208
@3ms: Server response: HTTP 301 Moved Permanently
@3ms: Received redirect to https://www.hflawreport.com/.well-known/acme-challenge/letsdebug-test
@3870ms: Dialing 151.101.129.208
@3933ms: Server response: HTTP 503 Service Unavailable
2 Likes

Thanks, yes, the domain is hosted by Fastly CDN which is giving the error because of the invalid certificate from the k8s cluster.

I saw in the cert-manager code there is the exponential back-off here: cert-manager/pkg/controller/certificates/trigger/trigger_controller.go at aa17b34edea7d0cce6efaa099053b03606d3f84e · cert-manager/cert-manager · GitHub

We have cleared any old requests we can find, but it's stuck in this back-off process.

1 Like

However the HTTP-01 challenge states
"Our implementation of the HTTP-01 challenge follows redirects, up to 10 redirects deep. It only accepts redirects to “http:” or “https:”, and only to ports 80 or 443. It does not accept redirects to IP addresses. When redirected to an HTTPS URL, it does not validate certificates (since this challenge is intended to bootstrap valid certificates, it may encounter self-signed or expired certificates along the way)."

I would think any proper ACME HTTP-01 validation though a CDN should have the CDN behave the same way Let’s Encrypt does for HTTP-01 challenge.

I suggest checking with Fastly CDN as well.

2 Likes

For clarity, the CDN is returning the 503 because the CDN has determined that the cert from the backend (that is served to the CDN, not Let's Encrypt) is invalid, not Let's Encrypt.

As someone with direct experience with this type of situation (Cloudflare in front of AKS in my case), I recommend either temporarily reducing the "strictness" of Fastly to acquire your cert with HTTP-01 then adjusting back and ensuring in future the cert always renews correctly OR switching to DNS-01 to acquire your certs (a more robust solution).

3 Likes

Correct, I suggest if it is going to be used for the HTTP-01 challenge that it too should behave like Let's Encrypt does for the HTTP-01 challenge with regards to validating a certificate.

3 Likes

I agree, @Bruce5051. :slightly_smiling_face: I think I misinterpreted your recommendation earlier and just modified my post above in similar light.

3 Likes

That makes sense. It was failing the challenge because the CDN rejected the request. Makes sense. We got ourselves in to a bad situation because the certificate expired

3 Likes

Thanks @Bruce5051 & @griffin for the pointer!

Not how I wanted the week to end!

4 Likes

Well, this where the situation is and we are trying how to move forward to resolve the situation. :slight_smile:
And how avoid this situation in the future.

I hear you.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.