The existing (working) setup uses python3.9 with certbot v1.22 and certbot_dns_route53 v1.17. These dependencies are deployed within the lambda zip file. The process completes and certificates are successfully renewed.
The problem I'm getting is with trying to update the Lambda function, to use python3.14 (the current supported version in AWS lambda), plus updated certbot and certbot_dns_route53 - both of these v5.2.2. When I deploy and run this updated version (no code changes, just updated dependencies), certbot fails to authenticate the domain -
Certbot failed to authenticate some domains (authenticator: dns-route53). The Certificate Authority reported these problems:
Domain: deltaxml.com
Type: unauthorized
Detail: No TXT record found at _acme-challenge.deltaxml.com
I have debugged the DNS side and can confirm that the _acme-challenge record does get created in the correct DNS zone within route53, which to me would suggest some kind of propagation timing issue. I'm just wondering if there's something I'm missing here? I'm curious as to why this works in the older version but not the new one. The only related things I found seemed to point towards using the --dns-route53-propagation-seconds argument, but I've seen it mentioned that this is deprecated anyway. Trying to add this causes an 'unrecognized argument' error in any case.
Any help with this would be very appreciated, thanks!
That's a clever approach. I am not a Python packaging expert and limited experience with Lambda. With that in mind ...
The reason the propagation-seconds settings is deprecated is because the Route53 DNS plugin uses the getChange API for Route53. You can poll that api until R53 says the change has sync'd across all its DNS servers worldwide. It is similar to what you see when making changes in the Console where is shows the status and later "in sync".
I use that API regularly and R53 gets in sync in about 20 seconds.
I agree that the error suggests a propagation issue. But, because of above this is not likely the reason. How did you check that the TXT record update was performed? 20s is a pretty small window.
I've been able to verify in a couple of ways that the record does get added -
The logging output from Lambda shows the 'PENDING' and then finally 'INSYNC' GetChangeResponse messages that it gets back from the polling that it does, plus I've opened up route53 within the AWS console alongside testing the Lambda function, and seen the record appear in the relevant zone by refreshing quickly enough (the cleanup part of the code runs successfully after the certbot auth failure - the TXT record gets removed).
Are you able to check the Certbot log? Normally in /var/log/letsencrypt not sure if same w/Lambda. I'd carefully review the timestamps to ensure the ACME API challenge doesn't get sent until after the DNS is in sync. If you need help interpreting that log just post it here.
The only other thing I can think of is there is a missing glue record for one of your name servers. I don't see how it would produce the "not found" error. But, it is unusual to see that with R53 so you should check out why that is happening. Might point to a problem with the current name servers. Let's Encrypt walks the authoritative tree and may choose any path. Every path must produce the correct result.