Seeking technical clarification on certbot certificate creation using DNS-01 and HTTP-01 challenge

Dear Let's Encrypt Community,

I have encountered an interesting issue while testing Certificate Installation and renewal on a whitelisted domain (acme-challenge-test.domain_name.com) that allows traffic only from specific range of IP addresses. During this testing, I faced problem with the http-01 challenge failing with the following error.

Command used: certbot certonly --standalone -d acme-challenge-test.domain_name.com --http-01-port=8888 --debug-challenges -v


Saving debug log to /var/log/letsencrypt/letsencrypt.log

...

Waiting for verification...

Challenge failed for domain acme-challenge-test.domain_name.com

http-01 challenge for acme-challenge-test.domain_name.com

Certbot failed to authenticate some domains (authenticator: standalone). The Certificate Authority reported these problems:

Domain: acme-challenge-test.domain_name.com

Type: connection

Detail: xxx.xxx.xxx.xxx: Fetching http://acme-challenge-test.domain_name.com/.well-known/acme-challenge/xUNTJyXuqVJM7CYhFWrWrrN4sKZPUqVfqlfadsfdfdsf: Error getting validation data

Hint: The Certificate Authority failed to download the challenge files from the temporary standalone webserver started by Certbot on port 8888. Ensure that the listed domains point to this machine and that it can accept inbound connections from the internet.

Cleaning up challenges

Some challenges have failed.

Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.


To resolve this issue, I decided to create the cert with DNS challenge, which completed successfully. Here's the command used for the same:

certbot -d acme-challenge-test.domain_name.com --manual --preferred-challenges dns certonly --debug-challenges -v

Interestingly, after the DNS challenge succeeded, I gave the http-01 challenge another shot and this time it worked without any issues (same command as above).

I am seeking your expertise to clarify on the following:

Question 1: Why did the http-01 challenge succeed on the second attempt? Could it be possible that the authentication was cached, and the second time it renewed the certificate without re-authentication?

Question 2: If caching is involved, where is this saved? And is there a way to clear the authentication cache to replicate the initial failure and investigate it further?

I have tried "resolvectl statistics" and "resolvectl flush-caches" to clear the cache (Ubuntu v22.04), but the behaviour persisted. I attempted to find any cache-related files in the certbot directories, but couldn't find anything apparently.

Question 3: What could be the reason behind the successful completion of the http-01 challenge after the DNS challenge? Is there a connection between these two challenges, especially considering that the http-01 challenge failed before the DNS challenge?

I would appreciate any insights or explanations that can help me better understand this behavior.

Thank you for your assistance!

Best regards,

ChandrGupt

trainingbycoding@gmail.com

1 Like

As you saw, that won't let you get a certificate through http-01, because Let's Encrypt needs to verify that you own the name as seen by everywhere on the Internet and so they check from many places.

Note that this only changes the port that the standalone server is listening on, the validation will still happen over port 80. That option is designed for weird cases where you have some NAT device mapping incoming port 80 to some other port on the server, and isn't useful nearly as often as people try to use it.

Yes, Let's Encrypt saves successful validations for 30 days, though they're considering reducing it. During that time, you can get a certificate for the name without needing to re-authorize.

In Let's Encrypt's database, the "authorization" object for your name is marked as successful, and has an expiration of when it will no longer work and your ACME account would need to validate again.

In theory, yes your ACME client can explicitly invalidate the authorization. I don't think certbot exposes the functionality directly, but when you do --dry-run to test against staging, it should invalidate all the authorizations and so it will actually test the authorizations. If you're trying to do testing, then definitely use the staging environment, as that's what it's for.

The connection is just that they're for the same domain name.

8 Likes

Thank you very much Peter, for the detailed and super prompt response. This clarifies most of my queries.

I tried running certbot with --dry-run and on --staging too, but it kept on referring to the cached auth object. Couldn't find a way to invalidate / bypass it and make the http-01 challenge fail. I used this command -
sudo certbot certonly --standalone -d acme-challenge-test.domain_name.com --http-01-port=8888 --staging --dry-run --debug-challenges -v

Anyways, I was doing this out of technical curiosity and to understand the internal working better. If it's difficult to replicate it, I'll probably leave it at that. But, if anything comes up that can help me invalidate / ignore the cache for dry-run in staging, would love to try it out.

1 Like

To provide just a little bit more context here: The ACME protocol specifically supports "authorization deactivation", which prevents an authorization from being re-used for a future order. Some ACME clients (such as acme.sh) expose this functionality directly, allowing the user to run a command which causes the client to make the appropriate authorization deactivation requests. Certbot, with its emphasis on full automation, does not.

7 Likes

Thanks, but I had thought that certbot's --dry-run did the invalidation to ensure that it was testing current status of being able to complete authorizations. I know only enough about certbot to be dangerous though (I use it sometimes for testing something but it's not my "daily driver" client) so maybe someone else knows better how to ensure that it's not reused cached authorizations. I might suggest trying --dry-run but without also specifying --staging, but again it's not something I've tried.

4 Likes

It does, but Certbot doesn't have some method to deactivate valid authz exposed to the user. Only internally for use by --dry-run. See:

You can see it being used (and only being used) at:

If one would remove the and self.config.dry_run, Certbot would always deactivate any authz :stuck_out_tongue: Hack hack hack..

3 Likes

Thank you very much Aaron, this clarifies.
We started with certbot and continued with it, as it's the default recommended option and it served all our purposes till date. This is probably the first such instance where we are seeing the need to try another ACME client. Will try out acme.sh for this particular use-case & see how it works out. Thanks again for your suggestion.

1 Like

Why's that exactly? IMO there isn't any reason to want to deactivate valid authz on the production environment? What use case would you have for that?

5 Likes

Yes, I tried both the options : with --dry-run and --staging and also only with --dry-run; in both these cases observation was the same and the auth cache wasn't invalidated / ignored.
Thanks again for all your inputs.

1 Like

I highly doubt that a valid cached authz was not deactivated when using --dry-run? Do you have the log to support that? As would very interested to see that.

Unless perhaps your Certbot version is older than when this feature was introduced? Although that would mean your used version is older than 0.40.0 as that was the version where the deactivation feature for --dry-run was introduced, almost 4 years ago now. Current most recent version is 2.6.0... Not sure when authz reuse was introduced though, can't find it in the changelog.

5 Likes

Fair point Osiris, I agree and maybe my earlier point needs rephrasing.
In production, there isn't any valid use case to carry this out, however, given the scenario articulated above, I thought of digging deeper into this and establishing that http-01 is working out after dns-01, only because of the cached auth object, which when invalidated, it doesn't.

1 Like

Yes, I just checked on that, after seeing your (--dry-run codebase) response above.
The current certbot version is 1.21.0, it should then have the deactivation feature for --dry-run.
However, let me still check after upgrading to the latest version and see if the behavior changes. Will update shortly.

Version 1.21.0 should be fine with deactivating valid authz when using --dry-run. Interested to see a log where --dry-run didn't deactivate any valid authz :slight_smile:

5 Likes

letsencrypt.txt (33.0 KB)
Here you go, the log file excerpt containing the --dry-run output.
I can see "Recreating order after authz deactivations" (line 212) however, the dry-run completed successfully.

It looks to me like the dry-run completed successfully because the standalone web server is in fact responding to the challenges and making a new valid authorization. What makes you think there's a problem?

5 Likes

To reiterate, http-01 works out successfully only after dns-01 does, because it finds a cached auth object. However, if --dry-run with http-01 actually ignores the existing cached authz and creates a new one, then expectedly it should have failed because http-01 challenge originally fails for this domain, because of it being a whitelisted one that allows traffic from designated CIDRs.

Let's Encrypt uses Multi-Perspective Validation Improves Domain Validation Security - Let's Encrypt

2 Likes

The log you posted showed the http-01 challenges being responded to by the standalone web server, though.

2023-07-19 00:02:04,299:DEBUG:acme.standalone:::ffff:127.0.0.1 - - Incoming request
2023-07-19 00:02:04,299:DEBUG:acme.standalone:::ffff:127.0.0.1 - - Serving HTTP01 with token 'M4-KxboYi3rIIt5kKNdjP2ec9Ef0sN-XIRwObOsxJDk'
2023-07-19 00:02:04,299:DEBUG:acme.standalone:::ffff:127.0.0.1 - - "GET /.well-known/acme-challenge/M4-KxboYi3rIIt5kKNdjP2ec9Ef0sN-XIRwObOsxJDk HTTP/1.1" 200 -
2023-07-19 00:02:04,398:DEBUG:acme.standalone:::ffff:127.0.0.1 - - Incoming request
2023-07-19 00:02:04,399:DEBUG:acme.standalone:::ffff:127.0.0.1 - - Serving HTTP01 with token 'M4-KxboYi3rIIt5kKNdjP2ec9Ef0sN-XIRwObOsxJDk'
2023-07-19 00:02:04,399:DEBUG:acme.standalone:::ffff:127.0.0.1 - - "GET /.well-known/acme-challenge/M4-KxboYi3rIIt5kKNdjP2ec9Ef0sN-XIRwObOsxJDk HTTP/1.1" 200 -
2023-07-19 00:02:04,405:DEBUG:acme.standalone:::ffff:127.0.0.1 - - Incoming request
2023-07-19 00:02:04,406:DEBUG:acme.standalone:::ffff:127.0.0.1 - - Serving HTTP01 with token 'M4-KxboYi3rIIt5kKNdjP2ec9Ef0sN-XIRwObOsxJDk'
2023-07-19 00:02:04,406:DEBUG:acme.standalone:::ffff:127.0.0.1 - - "GET /.well-known/acme-challenge/M4-KxboYi3rIIt5kKNdjP2ec9Ef0sN-XIRwObOsxJDk HTTP/1.1" 200 -

So perhaps you intend for your firewall to be blocking the traffic, but it doesn't look like it is.

6 Likes

I concur with @petercooperjr. Let's walk through the log together, shall we? (The relevant parts that is.)

  • 00:02:00,026: Certbot requests the ACME servers directory
  • 00:02:00,027: Certbot retrieves the directory
  • 00:02:00,851: Certbot requests a new order for acme-challenge-test.domain_name.com
  • 00:02:01,125: Certbot retrieves an order already in the "ready" state with an authz "7356118554"
  • 00:02:01,127: Certbot requests the authz "7356118554"
  • 00:02:01,383: Certbot retrieves the authz "7356118554" already in the "valid" state
  • 00:02:01,385: Certbot makes a POST to authz "7356118554" with content "status": "deactivated", thereby deactivating the already valid authz
  • 00:02:01,645: Log notes "Recreating order after authz deactivations"
  • 00:02:01,648: Certbot requests a new order
  • 00:02:01,923: Certbot retrieves a new order with status "pending" containing a new authz "7356423174"
  • 00:02:01,925: Certbot requests the authz "7356423174"
  • 00:02:02,183: Certbot retrieves authz "7356423174" in the "pending" state, containing three challenges, all also in the "pending" state
  • 00:02:02,184: Certbot fires up the standalone authenticator
  • 00:02:03,794: Certbot makes a POST to the http-01 challenge
  • 00:02:04,299 to 00:02:04,406: Certbot serves the three tokens
  • 00:02:05,056: Certbot checks the authz by sending an empty POST to the authz URI
  • 00:02:05,311: Certbot retrieves the now valid authz containing the now valid http-01 challenge
  • 00:02:05,410: Certbot sends the CSR to the finalize URI of the order, triggering the ACME server to generate the certificate
  • 00:02:05,677: Certbot retrieves the order in the "processing" state as a response
  • 00:02:06,680: Certbot polls the order
  • 00:02:06,938: Certbot retrieves the order poll and gets the order with "valid" state and containing a certificate URI
  • 00:02:06,940: Certbot requests the certificate
  • 00:02:07,198: Certbot retrieves the certificate

Aaaaaand done.

9 Likes

Thank you @petercooperjr and @Osiris for your prompt and precise assistance on this topic.

Sorry, my bad with the last testing iteration; due to an oversight, a required ACL was wrongly configured for the domain being tested for. After having fixed that, I can now confirm that --dry-run worked as expected and the certificate renewal with http-01 challenge failed because --dry-run deactivates the valid cached authz object.

Summarizing the findings here, for anyone who might refer this thread in the future -

  • http-01 fails for a whitelisted domain, that's accessible only from specific CIDRs
  • dns-01 challenge validates successfully for the same domain
  • retrying http-01 on this domain now succeeds, as authz object is cached in Lets Encrypt db for a designated period (30 days currently, considering reducing it)
  • retrying http-01 challenge with --dry-run fails, as expected, because --dry-run deactivates the valid cached authz object.

Your insights were invaluable, thank you very much for your support.

7 Likes