DNS TXT challenge verification

My domain is: testing-8.us-east-1.test.netflix.ne
(authzr: https://acme-staging-v02.api.letsencrypt.org/acme/authz/dJnmpiLw9WfFut1-hVUvN06d_wiWGyUZtQLlEJnTBa4 )

I ran this command: (I maintain Lemur’s acme client and recently made some changes that aren’t working as expected) - https://github.com/Netflix/lemur/blob/master/lemur/plugins/lemur_acme/plugin.py#L124

‘acme==0.33.1’


Hello,

For reference, I am using Lemur which has a handy LetsEncrypt integration (for dns-01 authorization) https://github.com/Netflix/lemur/blob/master/lemur/plugins/lemur_acme/plugin.py#L124 . We’re also using ‘acme==0.33.1’

I am trying to figure out the optimal order of operations when using the acme python client to request a certificate using dns-01. I would previously poll and validate DNS myself before using acme client’s built in polling. In most cases, this method worked fine - There were just a few problems given our unique DNS environment (different internal/external DNS, slow or nonexistent syncing) that made me want to rely solely on ACME’s polling.

After setting DNS TXT record, I will run a simple_verify on my dns challenges. Once that returns True, I will attempt to call answer_challenge. However if I do this too quickly, my challenge ends up being ‘invalid’.

1). Is it correct that once a challenge is deemed ‘invalid’, that challenge is immutable? (We can’t “re-answer” the challenge again to make it valid?)

2). When am I supposed to poll (with acme_client.poll) for the DNS change? Before I call ‘answer_challenge’? It appears that if I add the DNS record and then call answer_challenge without waiting, the challenge status is “invalid” with an error type of “urn:ietf:params:acme:error:unauthorized” and detail of “Incorrect TXT record “v=spf1 -all” found at _acme-challenge.”. However, if I poll for changes too early before calling ‘answer_challenge’, the challenge never gets updated. If I manually sleep for some bit of time, or if I verify DNS on my own prior to calling answer_challenge, things usually work fine.

3). Is there anything you could recommend looking at Lemur’s ACME logic?

For reference, this is the code I’m playing with for polling:

attempts = 0
            for authzr in authz_record.authz:
                while True:
                    if attempts > 60:
                        break
                    attempts += 1
                    authzr, authzr_response = acme_client.poll(authzr)
                    challenge = self.get_dns_challenge(authzr)
                    if challenge.status.name == "valid":
                        break
                    time.sleep(1)

Edit: Due to necessity, I’ve re-added our own DNS validation logic prior to the point where we run answer_challenge: https://github.com/Netflix/lemur/blob/master/lemur/plugins/lemur_acme/plugin.py#L140

Hi @CertIssuer

yes, that’s correct. If a challenge is invalid, you have to start new.

I don’t really understand your question.

My own client (not published, too limited) creates the TXT entry (only one explict dns provider - INWX - and one domain, other domains don’t need a wildcard).

Then I wait some minutes.

Then I try to confirm the challenge.

Then I check the new status.

So changing the TXT entry and immediately confirming the challenge normally can’t work.

Thank you for the context. What is acme-client’s polling functionality actually used for? I figured it would have LetsEncrypt poll for the appropriate DNS/HTTP records and update the status of the authorizations appropriately. https://letsencrypt.readthedocs.io/projects/acme/en/stable/api/client.html#acme.client.ClientV2.poll

I thought the same for poll_and_finalize.

These are for polling the ACME order and authorization resources, e.g. they translate to POST-as-GET requests to the order and authorization URLs on the ACME server. They aren’t related to polling challenge responses (DNS TXT records, .well-known/acme-challenge files, etc). The resource polling is done to synchronize with server-side changes to the status fields mostly (e.g. for poll_and_finalize the order is polled to watch for the status to become “ready” such that a finalization request can be sent).

I work on Boulder/ACME on the server side and don’t have a lot of experience with the Certbot acme module but I believe that simple_verify isn’t super reliable/recommended. It tries to replicate some of the server-side validation logic for a “precheck” but it isn’t a 1:1 match. @bmw @schoen Am I speaking incorrectly here?

I think different clients have different philosophies about this. It comes with a false negative vs. false positive tradeoff because some servers where people run clients have a different view of the Internet than the Let’s Encrypt CA does, so the servers can’t necessarily always confirm that they’ve set up an ACME challenge correctly!

Currently Certbot doesn’t attempt to test its own challenges, but there are other clients that do perform a self-test. We’ve had a number of forum threads about false positives in other clients’ self-tests, but it’s reasonable to believe that there were far more true positives that were useful to users and that they successfully acted on. :slight_smile:

1 Like

I’ve toyed around with putting some sort of a DNS based pre-check in my client many times. But this type of situation is ultimately what keeps me from doing it. There are so many edge cases where you just can’t trust the local DNS resolver the client is running on.

I’ve longed for a REST service that exists purely to check TXT record values from clients. It could be as simple as a GET method that takes an FQDN and TXT value and returns either HTTP 200 if it exists or HTTP 404 if it doesn’t like this:

https://example.com/check/_acme-challenge.mydomain.com/xxxTheTxtValuexxx

The scope of what you can do is purposefully limited so that it can’t really be abused as a general purpose DNS resolver. Clients could then use this as an alternative to just waiting for a timeout (though the timeout would still likely be needed as a fall back).

I imagine that @JuergenAuer might be willing to create something like this because it seems somewhat related to his existing testing tools.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.