Undocumented: challenge hangs for dns-01 on the apex domain w/ valid http-01

I’m rolling my own client for a project has about 2,000 domains. We’re migrating from http-01 based challenged to dns-01.

Something I discovered along the way is that if you have the following domain, example.com and *.example.com in the order, but example.com already has a solved http-01 challenge … the corresponding dns-01 challenge for example.com will just hang and never get solved.

My workaround here was to reverse the order of solving and first focus on *.example.com and then on each authorization challenge I’m checking the status of the order. Because example.com already had one solved challenge (http-01) and now one dns-01 on the wildcard the order became ready. Tricky tricky.

I’m not sure if this is a bug or working as intended on the Let’s Encrypt side of things … but it was certainly unintuitive on my end as a developer for a custom ACME client. If I were to have an expected behavior it would be that multiple challenges can be solved for any given strategy rather than only one challenge may be valid at a time per domain/subdomain/wildcard and the rest will always and forever hang in a pending state. That’s mostly because I’m doing a migration and being able to valiate that my new dns-01 challenges all succeed gives warm-fuzzies in the cutover.

1 Like

Hi @jackdpeterson :wave: welcome to the community forum

Do you have more fine-grain logs you can share? Ideally with indications of what resource URLs you're POSTing and copies of the response bodies?

I ran into the opposite problem a few weeks ago in the staging environment. Domains that had already been validated via dns-01 got stuck in pending when I tried to validate them via http-01. I was only able to solve the problem by deleting the ACME account and registering a new one.

Unfortunately I don’t have any logs from this event (I was doing some testing in a Docker container, which has since been blown away) so I’m not sure how helpful this is. But the client was greenlock v3.

1 Like

This sounds like a case where the ACME client is polling at the authz/challenge level and not at the order level and so it doesn't realize the order is already valid by way of the reused DNS-01 authorization. Please provide logs if you're able to reproduce.

1 Like

There was a separate report about a similar "stuck in pending" problem on the forum today that I was able to confirm from our server-side logs was client misbehaviour and confusion from polling at a too fine-grained level: DNS Challenge is "pending" 2 days - #12 by cpu

1 Like

Thanks, I suspected it was client misbehavior since I didn’t see any posts about server-side issues on this forum. I’ll try reproducing it once I have some spare time and get in touch with the developer.

3 Likes

Hey all, thanks for the replies on this topic. To provide some background, I’m implementing a custom ACME client based off of the acmephp project.

https://github.com/acmephp/acmephp/blob/master/src/Core/AcmeClient.php#L180-L204 is where the challenge authorization flow is happening and things are stuck in a loop until the timeout is met.

What I’m implementing on my end that seems to correct things is as follows :

 /**
 * @test
 */
public function filtersOutDns01WhenValidHttp01ChallengeExists()
{
    $domain = 'example.com';

    $challenges = [
        $domain => [
            new AuthorizationChallenge($domain, "valid", 'http-01', $domain, "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'dns-01', $domain, "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'tls-alpn-01', $domain, "this is a token", random_bytes(32))
        ],
        "*.$domain" => [
            new AuthorizationChallenge($domain, "pending", 'http-01', "*.$domain", "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'dns-01', "*.$domain", "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'tls-alpn-01', "*.$domain", "this is a token", random_bytes(32))
        ]
    ];
    $this->assertCount(1, RequestANewCertificate::filterAuthorizationChallenges($challenges));
}

/**
 * @test
 */
public function filtersAuthorizationChallengeReturnsTwoDns01ChallengesWhenAllChallengesArePending()
{
    $domain = 'example.com';

    $challenges = [
        $domain => [
            new AuthorizationChallenge($domain, "pending", 'http-01', $domain, "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'dns-01', $domain, "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'tls-alpn-01', $domain, "this is a token", random_bytes(32))
        ],
        "*.$domain" => [
            new AuthorizationChallenge($domain, "pending", 'http-01', "*.$domain", "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'dns-01', "*.$domain", "this is a token", random_bytes(32)),
            new AuthorizationChallenge($domain, "pending", 'tls-alpn-01', "*.$domain", "this is a token", random_bytes(32))
        ]
    ];
    $this->assertCount(2, RequestANewCertificate::filterAuthorizationChallenges($challenges));
}

This is the most pertinent detail here as my implementation about filtering logic isn’t particularly interesting, rather that a challenge group with an already solved challenge – even if it isn’t the same type is solved because then the order becomes valid.

By pre-filtering the already valid authorizations at a per (subdomain / wildcard) level the problem is obviated and the order will succeed – was able to issue the remaining 600 certs we needed last night / this morning. Given that I’m relying on the upstream acmephp implementation for challengeAuthorization I’m guessing the problem is related to some of the previous comments; however, for now, I have a working implementation now that I’m filtering in a way that meets the above test condition. Perhaps something like this could / should be implemented at the AcmePHP core level or perhaps it’s up to implementing parties to do that. Either way, the above test gets things working for me where I have a little flat list of challenges to loop through and complete.

1 Like

Aha, so that's interesting because the other thread I linked to above where there was a confusion about challenge state also involved acmephp. Overall I'm feeling confident the diagnosis of a "hung" or stuck challenge isn't at fault here and it's a client quirk.

That makes sense given the code linked. It will poll a specific challenge URL without paying attention to the state of the associated authorization.

In the coming week or two we'll be shipping a change on the ACME server side that will remove the unused pending challenges from valid or invalid authorizations. I think this may also solve the problem you're experiencing. I'll make a note to try and update this thread when the change is live in the staging environment so that you can test explicitly.

Thanks for following up with more detail.

3 Likes

The fix you’re describing sounds like it will fix the unnecessary challenge attempt cycle based on the behavior I was observing with the challenges hanging.

Thanks for the reply!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.