Troubleshoot authorization error

Hi, I am having trouble authorizing the domain pp.myplan.on.bluecross.ca. I am using kube-lego for automatically providing LE certificates for my ingress services in Google Kubernetes Engine. I have another domain, bluecross.demo.direct.getbreathe.life, which is on the same IP and it works correctly. The authorization error I get doesn’t provide any information:

error while authorizing: waiting for authorization failed: acme: authorization error for : " context=acme domain=pp.myplan.on.bluecross.ca

pp.myplan.on.bluecross.ca is not a domain I control myself, it is controlled by our client. They initially created an A record to point to our IP address. Seeing it didn’t work I asked to create a CNAME record that points to our own domain bluecross.demo.direct.getbreathe.life, but it seems it still doesn’t work.

According to this: https://check-your-website.server-daten.de/?q=pp.myplan.on.bluecross.ca
The http-01 challenge should work.

Is there a way to get more information about the authorization error on LE side?

Hi @fallard84

is this - 34.98.76.127 - the ip address your kube-lego is running?

http + https have a

Server: nginx/1.17.2

http + /.well-known/acme-challenge doesn't has a Server, instead, there is a

Visible Content: kube-lego (version 0.1.6-61705680) - 404 not found

Hi,
kube-lego is not running directly on 34.98.76.127, this is the GCE load balancer IP address that is used by the Kubernetes ingress. The root of the domain is going to a nginx server with basic authentication, and the /.well-known/acme-challenge is served by kube-lego directly.

But it's the correct network, not a completely different.

So the different headers are expected, that's good (and not a problem).

I'm not so firm with kubenet. But then it looks like an internal kubenet problem

Yes, it is the correct network.

Is there a way to have more details on the authorization error from LE perspective when it is doing the challenge?

Is there any particular reason you’re using kube-lego? According to https://github.com/jetstack/kube-lego the project is in maintenance mode and Jetstack recommends switching over to their other tool called cert-manager which has debugging steps https://docs.cert-manager.io/en/latest/reference/orders.html.

To debug kube-lego, per the documentation, you can set LEGO_LOG_LEVEL=debug. By default it’s set to LEGO_LOG_LEVEL=info.

We started using kube-lego a while ago and it worked fine for other applications that use the same design. We were planning to migrate to cert-manager but didn’t have a chance yet. We need to get this certificate working ASAP so migrating to cert-manager wasn’t my first choice to possibly resolve this issue, and it might not fixed it if the problem is elsewhere. But this issue will definitely accelerate our migration.

We already have LEGO_LOG_LEVEL=debug but we don’t get any detail regarding the authorization error…

Also, I just tried adding another domain to the certificate request: bluecross.demo.tripnik.com. It got authorized right away without any issue. I only added an A record to point to the same IP address 34.98.76.127. So the problem must be with the DNS of pp.myplan.on.bluecross.ca but I can’t see what could be causing the issue…

The problem is, that your client doesn't show the error message.

Is there an order url you can share?

I don’t see any order URL in the kube-lego logs. I only get an URL for a successful authorization with the validationRecord, but nothing when it fails. Is it possible to get it a different way than from the client?

Something strange that I’m noticing with your DNS server is that sometimes queries take 10+ seconds and sometimes return within a second or two. Any query longer than 10 seconds will hit the boulder-va dnsTimeout and fail. As a result, you may be encountering a rate limit for too many failed authorizations. https://letsencrypt.org/docs/rate-limits

If kube-lego isn’t giving you the verbosity you need, it should be dumped. Especially considering that cert-manager is the path forward from kube-lego and that bluecross.ca is a medical company.

Are there any challenge or authz URLs? You can open them in a browser (for now) and see what error messages Let's Encrypt might have returned.

Not really. Let's Encrypt (almost always) returns detailed error information to your client. If your client chooses to discard everything, the only other option is for Let's Encrypt staff have to manually check the server-side records.

I modified the code of kube-lego so it now outputs the authorization URL, now I got something to work with :slight_smile:

https://acme-v01.api.letsencrypt.org/acme/authz-v3/399439278

DNS problem: query timed out looking up CAA for myplan.on.bluecross.ca

Any reason it is looking for the CAA of myplan.on.bluecross.ca instead of pp.myplan.on.bluecross.ca?

2 Likes

Let’s Encrypt examines the CAA records on every level. pp.myplan.on.bluecross.ca, myplan.on.bluecross.ca, on.bluecross.ca, bluecross.ca and even ca.

It uses the most specific (in other words, farthest left) CAA records found. (If none are found, Let’s Encrypt and every other CA are permitted to issue.)

If, going left to right, Let’s Encrypt gets a DNS error before it gets any CAA records, Let’s Encrypt fails the validation and returns an error.

(As an implementation detail, it sends all the DNS queries simultaneously, but the algorithm makes decisions in a specific order.)

If your DNS service is unreliable, it might help to create a CAA record set like 0 issue "letsencrypt.org" for myplan.on.bluecross.ca pp.myplan.on.bluecross.ca. That way only one query has to succeed, instead of all of them, so less luck is required.

Edit: I accidentally wrote myplan.on.bluecross.ca instead of pp.myplan.on.bluecross.ca.

1 Like

Thanks @mnordhoff, it makes plenty of sense that it checks at all level until one is found or none exist from a security perspective. We will make the DNS change to add the CAA record and afterward I am quite confident it will work. Otherwise I now have access to the authorization URL :slight_smile:

Thank you @JuergenAuer, @Phil for your support! :slight_smile:

2 Likes

Ah, now debugging is possible! :+1:

Just to confirm, the issue has now been resolved with the new CAA record :smiley:
Cheers!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.