Unable to create certificate - Problems with CAA records using ACME terraform provider

Hi folks, I am getting errors trying to renew a certificate, ( I have been renewing correctly the last 4 years with no issues before), I would really appreciate help to fix this issue

My domain is: .us-west1.gcp.cloud.es.io (.ent.us-west1.gcp.cloud.es.io,.es.us-west1.gcp.cloud.es.io,.fleet.us-west1.gcp.cloud.es.io,*.kb.us-west1.gcp.cloud.es.io.. 34 domains more )

I ran this command: terrafom apply (with vancluever terraform provider 2.41.0 ( also 2.10.0 )) ( DNS challenge )

It produced this output:

│ [*.apm.psc.us-west1.gcp.cloud.es.io] acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/finalize/125525485/466805120136 :: urn:ietf:params:acme:error:caa :: Error finalizing order :: Rechecking CAA for "*.profiling.us-west1.gcp.cloud.es.io" and 1 more identifiers failed. Refer to sub-problems for more information, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.profiling.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.app-search.psc.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for app-search.psc.us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning

The operating system my web server runs on is (include version): "hashicorp/terraform:1.14.2" docker image

My hosting provider, if applicable, is: GCP

I can login to a root shell on my machine (yes or no, or I don't know): no

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): vancluever terraform provider 2.41.0 ( also 2.10.0 )

I have tried creating a new CAA records for to try to fix the issues, but the issue is still there:

Adding CAA record us-west1.gcp.cloud.es.io with values item1 "0 issue "letsencrypt.org" and item2 0 issue "pki.goog"

The error I get:


│ *.us-west1.gcp.elastic-cloud.com: acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/finalize/125525485/466805120136 :: urn:ietf:params:acme:error:caa :: Error finalizing order :: Rechecking CAA for "*.app-search.psc.us-west1.gcp.cloud.es.io" and 1 more identifiers failed. Refer to sub-problems for more information, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.app-search.psc.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.profiling.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for profiling.us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning

Adding gcp.cloud.es.io with values item1 "0 issue "letsencrypt.org" and item2 0 issue "pki.goog" instead I get this outputs:

.us-west1.gcp.elastic-cloud.com: acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/finalize/125525485/466805120136 :: urn:ietf:params:acme:error:caa :: Error finalizing order :: Rechecking CAA for ".ent.psc.us-west1.gcp.cloud.es.io" and 5 more identifiers failed. Refer to sub-problems for more information, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.ent.psc.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for ent.psc.us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.apm.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.es.psc.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for es.psc.us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.app-search.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for app-search.us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.ent-search.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning, problem: "urn:ietf:params:acme:error:caa" :: Error finalizing order :: rechecking caa: While processing CAA for *.apm.psc.us-west1.gcp.cloud.es.io: DNS problem: SERVFAIL looking up CAA for us-west1.gcp.cloud.es.io - the domain's nameservers may be malfunctioning

It appears that for some reason, DNS lookups for cloud.es.io are returning NXDOMAIN which indicates that there are no subdomains. If you're able to do so, try adding any DNS record at cloud.es.io.

2 Likes

Thanks, we manage es.io. we have tried adding a txt record and dig command looks better:

dig cloud.es.io

; <<>> DiG 9.20.17 <<>> cloud.es.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13772
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;cloud.es.io.                   IN      A

but we are still getting the same CAA errors unfortunately.

To add some more context, we have 4 more identical certificates changing the region name, that were created successfully.

I waited for 7 days so the original request ID would expire, https://acme-v02.api.letsencrypt.org/acme/finalize/125525485/466805120136
after expiring, I tested again but the new request still failed [*.us-west1.gcp.elastic-cloud.com] acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/finalize/125525485/469263491186 :: urn:ietf:params:acme:error:caa :: Error finalizing order :: Rechecking CAA for

Then I created a copy of the terraform workspace, that would force creating a new registration key. The creation of the certificate was successful.

For some reason, while using the same registration key it failed for >7 days, but from a clean new key it worked fine. Why is that?

Don't quote me on this as this is a guess and not something I know for sure, but I would assume the old key was associated with the failed request.

It may not be related to the changed key specifically.

The new key means you would have gotten a new order. The sequence of DNS queries for a new order are different than the CAA recheck error you saw originally.

The CAA Recheck is done when an existing order has authorizations than can be reused but LE needs to (re-)check the CAA to ensure it can still issue for those domains.

A new order will have all the DNS queries related to new authorizations (ex: TXT records for your wildcard names) and also the CAA checks.

I realize this isn't actionable info. Just thought it might allow other insights.

There are a fairly large number of domain names in your cert. It makes it difficult to do detailed assessment of any other changes. Some of your domain names do have oddities in their DNS config (using https://dnsviz.net) but nothing obvious that relates to the above situation. Still, generally if there are DNS config problems for a system suffering DNS query error it's worth reviewing those.

If the problem repeats it may be worth reducing the number of domain names in a single cert. If nothing else it might help debug later.

That's all I have. It is a bit strange :slight_smile:

4 Likes

One potential explanation is that the combination of multiple DNS providers with name servers in multiple zones that are served by multiple DNS servers is causing some of Let's Encrypt's resolvers to fail because they needed to send so many queries.
delv +ns us-west1.gcp.cloud.es.io caa is consistently failing because it would need to send over 50 queries to resolve your domain.

This limit is reset on CNAME and DNAME referrals so you might be able to change your gcp subdomain to gcp-1 and add the DNS record gcp.cloud.es.io. DNAME gcp-1.cloud.es.io. however I don't know what effect this will actually have.

2 Likes

Thanks for the replies.
As next step we will reduce certificates to include a single domain instead of multiple domains, that would reduce the size of the certificates and isolate each certificate to use a single domain. hopefully this will simplify things and reduce operations per certificate.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.