Rechecking caa fails with 99 identical subproblems

I’m surprised to see 100 subproblems that are all identical. Does it mean that LE system tried to recheck the CAA for this domain 100 times before giving up?

I ran this command:

certbot certonly --webroot --staging --csr /var/cert_manager/certs/san132.csr -w /var/www/html -d ace.pusd.org -d admin.ace.pusd.org -d admin.alcott.pusd.org -d admin.anderson.moreland.org -d admin.armstrong.pusd.org -d admin.arroyo.pusd.org -d admin.baker.moreland.org -d admin.barfield.pusd.org -d admin.bradfield.hpisd.org -d admin.cortez.pusd.org -d admin.countrylane.moreland.org -d admin.decker.pusd.org -d admin.dellagoacademy.org -d admin.es.moodyisd.org -d admin.es.nscougars.com -d admin.hms.huntingtonisd.com -d admin.hpisd.org -d admin.hs.hpisd.org -d admin.hs.nscougars.com -d admin.hsd153.org -d admin.hubbardisd.com -d admin.huntingtonisd.com -d admin.hyer.hpisd.org -d admin.jameshart.hsd153.org -d admin.kellogg.pusd.org -d admin.kingsley.pusd.org -d admin.leadership.kippneworleans.org -d admin.lincoln.pusd.org -d admin.lopez.pusd.org -d admin.marshall.pusd.org -d admin.mcems.cherokee.k12.nc.us -d admin.mesacharter.org -d admin.mhs.cherokee.k12.nc.us -d admin.mishpms.hpisd.org -d admin.mms.cherokee.k12.nc.us -d admin.montvue.pusd.org -d admin.mrhs.hwrsd.org -d admin.ms.moodyisd.org -d admin.ms.nscougars.com -d admin.nbfacademy.org -d admin.nscougars.com -d admin.parkwest.pusd.org -d admin.pes.cherokee.k12.nc.us -d admin.popcs.org -d admin.ranchhills.pusd.org -d admin.sacredheartacademy.org -d admin.sanjose.pusd.org -d admin.tcec.cherokee.k12.nc.us -d admin.toa.cherokee.k12.nc.us -d admin.up.hpisd.org -d aes.cherokee.k12.nc.us -d ams.cherokee.k12.nc.us -d ar.hpisd.org -d armstrong.hpisd.org -d arroyo.pusd.org -d bookertwashington.kippneworleans.org -d br.hpisd.org -d bradfield.hpisd.org -d centralcityacademy.kippneworleans.org -d cherokee.k12.nc.us -d churchill.hsd153.org -d es.moodyisd.org -d es.nscougars.com -d hdems.cherokee.k12.nc.us -d hdhs.cherokee.k12.nc.us -d hes.huntingtonisd.com -d hhs.huntingtonisd.com -d hms.huntingtonisd.com -d hpisd.org -d hpms.hpisd.org -d hs.hpisd.org -d hs.moodyisd.org -d hs.nscougars.com -d hsd153.org -d huntingtonisd.com -d hy.hpisd.org -d hyer.hpisd.org -d jameshart.hsd153.org -d mcems.cherokee.k12.nc.us -d mis.hpisd.org -d mishpms.hpisd.org -d mms.cherokee.k12.nc.us -d ms.nscougars.com -d mt.hwrsd.org -d nscougars.com -d pes.cherokee.k12.nc.us -d popcs.org -d sacredheartacademy.org -d tcec.cherokee.k12.nc.us -d toa.cherokee.k12.nc.us -d universitypark.hpisd.org -d up.hpisd.org -d wms.hwrsd.org -d www.hpisd.org -d www.inspiredteachingschool.org -d www.moodyisd.org -d www.nscougars.com -d www.payne.moreland.org -d www.popcs.org -d www.sacredheartacademy.org

It produced this output:

Error finalizing order :: Rechecking CAA for “admin.mrhs.hwrsd.org” and 99 more identifiers failed. Refer to sub-problems for more information

Since my command to generate SAN includes 100 total domains, I thought it was telling me all 100 failed Rechecking CAA, based on the log ... and 99 more identifiers failed

However, within the log, all 100 subproblems are about admin.mrhs.hwrsd.org and not about any of the other domains in the original command:
Within the logs…

Content-Type: application/problem+json
Transfer-Encoding: chunked
Connection: keep-alive
Boulder-Requester: 12404031
Cache-Control: public, max-age=0, no-cache
Link: <https://acme-staging-v02.api.letsencrypt.org/directory>;rel="index"
Replay-Nonce: 0002Cyc6K095FkRG2eFShQx1DKIrJJt8azieih4SXFg7xIk

{
  "type": "urn:ietf:params:acme:error:caa",
  "detail": "Error finalizing order :: Rechecking CAA for \"admin.mrhs.hwrsd.org\" and 99 more identifiers failed. Refer to sub-problems for more information",
  "status": 403,
  "subproblems": [
    {
      "type": "urn:ietf:params:acme:error:urn:ietf:params:acme:error:caa",
      "detail": "Error finalizing order :: While processing CAA for admin.mrhs.hwrsd.org: DNS problem: SERVFAIL looking up CAA for admin.mrhs.hwrsd.org - the domain's nameservers may be malfunctioning",
      "status": 403,
      "identifier": {
        "type": "dns",
        "value": "admin.mrhs.hwrsd.org"
      }
    },
    {
      "type": "urn:ietf:params:acme:error:urn:ietf:params:acme:error:caa",
      "detail": "Error finalizing order :: While processing CAA for admin.mrhs.hwrsd.org: DNS problem: SERVFAIL looking up CAA for admin.mrhs.hwrsd.org - the domain's nameservers may be malfunctioning",
      "status": 403,
      "identifier": {
        "type": "dns",
        "value": "admin.mrhs.hwrsd.org"
      }
    },

That identical subproblem for admin.mrhs.hwrsd.org is repeated 99 times.

My web server is (include version):
Apache

The operating system my web server runs on is (include version):
Ubuntu 18

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot):
certbot 0.31.0

I’m about to write a PR for our system that will allow it to retry without the hostname present in the error message Error finalizing order :: Rechecking CAA for "admin.mrhs.hwrsd.org" , but I’m a little concerned that perhaps the log output is wrong and it’s actually all 100 domains are failing?

1 Like

I’m confused…
You are using a .csr file and also detailing all the domains with -d in the same call.
That seems redundant.

1 Like

There do seem to be a couple of real problems in here:

First, a simple one. A subproblem’s type should not include the urn:ietf:params:acme:error: prefix twice.

Second, the duplicate sub-problems does seem like a bug. While Let’s Encrypt does look up the CAA records multiple times per authorization (multi-VA, and assuming 8 hours has elapsed since the authz first went into valid status), it should not produce multiple subproblems.

I have a slight suspicion about the cause. I think that your CSR contains admin.mrhs.hwrsd.org (or some other identifier) multiple times, and then Let’s Encrypt is failing to do proper checking/filtering of it before using it to recheck CAA records. Would you be able to post the contents of the CSR file?

I managed to reproduce this problem myself locally …

{
  "type": "urn:ietf:params:acme:error:caa",
  "detail": "Error finalizing order :: Rechecking CAA for \"xoo.foo.monkas.xyz\" and 1 more identifiers failed. Refer to sub-problems for more information",
  "status": 403,
  "subproblems": [
    {
      "type": "urn:ietf:params:acme:error:urn:ietf:params:acme:error:caa",
      "detail": "Error finalizing order :: While processing CAA for xoo.foo.monkas.xyz: CAA record for xoo.foo.monkas.xyz prevents issuance",
      "status": 403,
      "identifier": {
        "type": "dns",
        "value": "xoo.foo.monkas.xyz"
      }
    },
    {
      "type": "urn:ietf:params:acme:error:urn:ietf:params:acme:error:caa",
      "detail": "Error finalizing order :: While processing CAA for xoo.foo.monkas.xyz: CAA record for xoo.foo.monkas.xyz prevents issuance",
      "status": 403,
      "identifier": {
        "type": "dns",
        "value": "xoo.foo.monkas.xyz"
      }
    }
  ]
}

By creating a CSR with these SANs and sending it into finalization:

X509v3 Subject Alternative Name:
    DNS:foo.monkas.xyz, DNS:foo.monkas.xyz, DNS:xoo.foo.monkas.xyz

@lestaff any confirmation? (I’m wary that this might actually be possible to apply as a CAA re-checking bypass … maybe I should delete and send to security@ …)

5 Likes

@_az, It seems you’ve determined this is a bug in your system and have steps to reproduce. is it really a bug on your end or have changes on your side simply revealed a misuse on our end? Should we not specify the domains with -d toggle if our CSR includes the domains?

Trying to figure out if we should make a change on our end or wait for LE to make a change on your end.

1 Like

I don’t work for Let’s Encrypt but in my opinion it is a problem on their end, regardless of what you did in Certbot (improper use or not).

Any chance you can post the contents of /var/cert_manager/certs/san132.csr? It would be very helpful.

5 Likes

I don’t work for Let’s Encrypt

Woops! You seem so knowledgable on this forum :sweat_smile:

The csr looks exactly as you’d expect. Here’s the contents after decryption, with cryptographic info redacted:

Certificate Request:
    Data:
        Version: 0 (0x0)
        Subject: 
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                <redacted>
        Attributes:
        Requested Extensions:
            X509v3 Subject Alternative Name: 
                DNS:ace.pusd.org, DNS:admin.ace.pusd.org, DNS:admin.alcott.pusd.org, DNS:admin.anderson.moreland.org, DNS:admin.armstrong.pusd.org, DNS:admin.arroyo.pusd.org, DNS:admin.baker.moreland.org, DNS:admin.barfield.pusd.org, DNS:admin.bradfield.hpisd.org, DNS:admin.cortez.pusd.org, DNS:admin.countrylane.moreland.org, DNS:admin.decker.pusd.org, DNS:admin.dellagoacademy.org, DNS:admin.es.moodyisd.org, DNS:admin.es.nscougars.com, DNS:admin.hms.huntingtonisd.com, DNS:admin.hpisd.org, DNS:admin.hs.hpisd.org, DNS:admin.hs.nscougars.com, DNS:admin.hsd153.org, DNS:admin.hubbardisd.com, DNS:admin.huntingtonisd.com, DNS:admin.hyer.hpisd.org, DNS:admin.jameshart.hsd153.org, DNS:admin.kellogg.pusd.org, DNS:admin.kingsley.pusd.org, DNS:admin.leadership.kippneworleans.org, DNS:admin.lincoln.pusd.org, DNS:admin.lopez.pusd.org, DNS:admin.marshall.pusd.org, DNS:admin.mcems.cherokee.k12.nc.us, DNS:admin.mesacharter.org, DNS:admin.mhs.cherokee.k12.nc.us, DNS:admin.mishpms.hpisd.org, DNS:admin.mms.cherokee.k12.nc.us, DNS:admin.montvue.pusd.org, DNS:admin.mrhs.hwrsd.org, DNS:admin.ms.moodyisd.org, DNS:admin.ms.nscougars.com, DNS:admin.nbfacademy.org, DNS:admin.nscougars.com, DNS:admin.parkwest.pusd.org, DNS:admin.pes.cherokee.k12.nc.us, DNS:admin.popcs.org, DNS:admin.ranchhills.pusd.org, DNS:admin.sacredheartacademy.org, DNS:admin.sanjose.pusd.org, DNS:admin.tcec.cherokee.k12.nc.us, DNS:admin.toa.cherokee.k12.nc.us, DNS:admin.up.hpisd.org, DNS:aes.cherokee.k12.nc.us, DNS:ams.cherokee.k12.nc.us, DNS:ar.hpisd.org, DNS:armstrong.hpisd.org, DNS:arroyo.pusd.org, DNS:bookertwashington.kippneworleans.org, DNS:br.hpisd.org, DNS:bradfield.hpisd.org, DNS:centralcityacademy.kippneworleans.org, DNS:churchill.hsd153.org, DNS:es.moodyisd.org, DNS:es.nscougars.com, DNS:hdems.cherokee.k12.nc.us, DNS:hdhs.cherokee.k12.nc.us, DNS:hes.huntingtonisd.com, DNS:hhs.huntingtonisd.com, DNS:hms.huntingtonisd.com, DNS:hpisd.org, DNS:hpms.hpisd.org, DNS:hs.hpisd.org, DNS:hs.moodyisd.org, DNS:hs.nscougars.com, DNS:hsd153.org, DNS:hy.hpisd.org, DNS:hyer.hpisd.org, DNS:jameshart.hsd153.org, DNS:mcems.cherokee.k12.nc.us, DNS:mis.hpisd.org, DNS:mishpms.hpisd.org, DNS:mms.cherokee.k12.nc.us, DNS:ms.nscougars.com, DNS:mt.hwrsd.org, DNS:nscougars.com, DNS:pes.cherokee.k12.nc.us, DNS:popcs.org, DNS:sacredheartacademy.org, DNS:tcec.cherokee.k12.nc.us, DNS:toa.cherokee.k12.nc.us, DNS:universitypark.hpisd.org, DNS:up.hpisd.org, DNS:wms.hwrsd.org, DNS:www.hpisd.org, DNS:www.inspiredteachingschool.org, DNS:www.moodyisd.org, DNS:www.nscougars.com, DNS:www.payne.moreland.org, DNS:www.popcs.org, DNS:www.sacredheartacademy.org
    Signature Algorithm: <redacted>
3 Likes

Thanks for the first look at this, @_az. I agree this looks like a bug on our side. I’ve filed https://github.com/letsencrypt/boulder/issues/4681 to look into it.

I’m pretty confident this isn’t a CAA re-checking bypass, but of course will keep a sharp eye on the possibility as I check the code.

4 Likes

I filed https://github.com/letsencrypt/boulder/issues/4682 as well for the error namespace thing.

:+1:. I got spooked by seeing two duplicate CAA checks at the VA during finalize and none for the other domain in the order, but it turned out to be because of tree climbing lol.

3 Likes

Sorry, this is totally off here, one may split the topic.
DNS tree climbing for CAA record is a huge design mistake in the RFC. That is my humble opinion. I have to write it down somewhere, this thing really frustrates me. Originally, I put CAA records into my DNS zones, then removed them all.

@jsha Do you understand the root cause of this “Rechecking” error that I’m hitting, or merely verifying that it’s not working as intended?

We’re unable to renew certs right now and have about 8 more days until our potential “point of no return” and start expiring certs for production customers. All the renewal failures we get are either this one or the usual “DNS Server may be malfunctioning” because all of our SAN certs contain at least 1 web.com domain. However, I’m noticing that the domains hitting this error are also web.com domains.

So my question: does this error message indicate an actual problem with customer CAA, or is it just a symptom of CAA checking against a web.com domain?

1 Like

The problem I was looking at in this thread is that one CAA error (for admin.mrhs.hwrsd.org) was duplicated many times into other subproblems. We’re going to be working on that issue this sprint.

I actually hadn’t realized that the nameservers for hwrsd.org are ns56.worldnic.com. and ns55.worldnic.com. That suggests that the reason that one hostname is failing CAA recheck is probably indeed related to Numerous inexplicable challenge failures across disparate domains with unreproducable SERVFAILs. I suspect that if you removed that domain (assuming the other domains on this certificate are not using worldnic), the certificate overall should issue fine.

2 Likes

An example domain:

https://unboundtest.com/m/CAA/br.hpisd.org/MUXWOXO6

They apparently have bad CAA records (letsencrypt.org is not present). They must have had them correctly set up at some point in order for us to generate for them… Is it really bad CAA, or just bad attempt to query web.com server. I feel like my own inexperience with DNS technology is getting in the way.

1 Like

thank you. That solution isn’t easy for us, but we may not have a choice.

1 Like

According to my resolver, they have no CAA records, which allows Let’s Encrypt and every other CA to issue certificates.

That error message means what it says (though it can also happen for other reasons, like a DNSSEC misconfiguration or problems at the TLD).

If there were CAA records blocking Let’s Encrypt from issuing, and Let’s Encrypt successfully resolved them, the error message would have been different.

Given the Boulder bug, it’s hard to completely trust the error message it’s reporting, but I’d be surprised if it’s incorrect – duplicating errors is very different from modifying them. The error message is generated almost right after the error happens. The errors are probably being duplicated at a different layer. Mangling a pointer then shouldn’t be able to change it.

1 Like

I’ve also confirmed from logs that this really was a SERVFAIL specifically for admin.mrhs.hwrsd.org.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.