Hello,
We have this issue from like two months ago and I was exchanging several emails with AWS Support and the Route53 team trying to debug the issue, as it is a lot of material I will just publish the latest information and if you have any questions just let's me know.
The problem
When you want to generate/renew a certificate with several domains (I don't have the exactly number but for sure with 15+ domains) using the certbot-dns-route53
plugin certbot
enter in a loop displaying the following error Resetting dropped connection: route53.amazonaws.com
How to reproduce it
TLDR;
Go to run the command section using route53 plugin with 15+ domains or subdomains to request a certificate.
Our instance
Ubuntu 16.04 LTS choosen direcly from the EC2 marketplace.
$ uname -a
Linux bristol 4.4.0-1055-aws #64-Ubuntu SMP Thu Apr 5 17:06:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Register a domain and place it in Route53
I think that it issue will be reproducible also if it is only one domain with 20 subdomains. So, the team doesn't need 20 different domains.
Install AWS CLI and configure permissions (instance role, env variable, config file..)
Install certbot and dns-route53 pluggin
sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-g$ et install certbot
sudo apt-get install python3-certbot-dns-route53
More information here https://certbot.eff.org/lets-encrypt/ubuntuxenial-other
Configure AWS policy for Route54 in order to allow access to dns-route53 pluggin
{
"Version": "2012-10-17",
"Id": "certbot-dns-route53 sample policy",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:ListHostedZones",
"route53:GetChange"
],
"Resource": [
"*"
]
},
{
"Effect" : "Allow",
"Action" : [
"route53:ChangeResourceRecordSets"
],
"Resource" : [
"arn:aws:route53:::hostedzone/YOURHOSTEDZONEID"
]
}
]
}
More information here Welcome to certbot-dns-route53’s documentation! — certbot-dns-route53 0 documentation
- Run the command
/usr/bin/certbot certonly --non-interactive --dns-route53 --cert-name domain.com --domain domain.com --domain 1.domain.com --domain 2.domain.com --domain 3.domain.com --domain 4.domain.com --domain 5.domain.com --domain 6.domain.com --domain 7.domain.com --domain 8.domain.com --domain 9.domain.com --domain 10.domain.com --domain 11.domain.com --domain 12.domain.com --domain 13.domain.com --domain 14.domain.com --domain 15.domain.com --domain 16.domain.com --domain 17.domain.com --domain 18.domain.com --domain 19.domain.com --domain 20.domain.com --keep-until-expiring --renew-with-new-domains --rsa-key-size 2048 --email any@email.com --agree-tos --test-cert
What we reach after debugging
TLDR;
The debugging of the network traffic and certbot
log using the --debug
flag reveal that certbot
is trying to reuse a TCP connection which was already in place to send new HTTP request, since the TCP connection might be already timed out, the certbot tries to reuse a TCP connection which was already closed and need to reset it, hence we see the following log line.
Technical information
Looking at the log files, we see successful connection as well as one which was initially dropped:
Successful:
2018-10-10 08:51:49,371:DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [GET]> 2018-10-10 08:51:49,461:DEBUG:requests.packages.urllib3.connectionpool:"GET /2013-04-01/change/C2IWBA1RLANJZ5 HTTP/1.1" 200 314 2018-10-10 08:51:49,462:DEBUG:botocore.parsers:Response headers: {'Date': 'Wed, 10 Oct 2018 08:51:48 GMT', 'x-amzn-RequestId': 'b9b5027d-cc69-11e8-ae8a-03d834cd6f4e', 'Content-Type': 'text/xml',
Dropped:
2018-10-10 08:51:44,033:DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [GET]> 2018-10-10 08:51:44,033:INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: route53.amazonaws.com 2018-10-10 08:51:44,361:DEBUG:requests.packages.urllib3.connectionpool:"GET /2013-04-01/change/C2IWBA1RLANJZ5 HTTP/1.1" 200 314 2018-10-10 08:51:44,362:DEBUG:botocore.parsers:Response headers: {'Date': 'Wed, 10 Oct 2018 08:51:43 GMT', 'x-amzn-RequestId': 'b6a9e55e-cc69-11e8-ae8a-03d834cd6f4e', 'Content-Type': 'text/xml',
More from the feedback from AWS Support, I quote:
In a successful request, we see the library being able to get a TCP connection which is most likely to be open state and then it sends the HTTP request:
2018-10-10 08:51:49,461:DEBUG:requests.packages.urllib3.connectionpool:"GET /2013-04-01/change/C2IWBA1RLANJZ5 HTTP/1.1" 200 314Based on that, I don't think there's an issue with Route53 endpoint. Its more likely the "certbot" trying to reuse an expired TCP connection and thus it gets reset by Route53 endpoint.
Also, some HTTP 405 were observed on letsencrypt endpoint which means that some "not allowed" http method are being received:
2018-10-10 08:43:40,613:DEBUG:acme.client:Sending HEAD request to https://acme-staging-v02.api.letsencrypt.org/acme/new-order .
2018-10-10 08:43:40,811:DEBUG:requests.packages.urllib3.connectionpool:"HEAD /acme/new-order HTTP/1.1" 405 0
2018-10-10 08:43:40,812:DEBUG:acme.client:Received response:
HTTP 405
Content-Length: 103
Server: nginx
Connection: keep-alive
Expires: Wed, 10 Oct 2018 08:43:40 GMT
Allow: POST
Content-Type: application/problem+json
Pragma: no-cache
Date: Wed, 10 Oct 2018 08:43:40 GMT
Cache-Control: max-age=0, no-cache, no-storeIt is recommended by our Route53 team to try and perform this task using the AWS CLI as certbot is a third party tool and it is really difficult to guess what is happening under the hood (as to how the certbot sends the API request to Route53 endpoint and how can we force the certbot to always initiate a new TCP connection). Thus, we recommend to perform this task of record verification (using the GETCHANGE API) using the AWS CLI. We also recommend you to keep the packet capture and log collection (using --debug) running as it will help us to troubleshoot in case we run into similar issues. Performing this would help us to verify if this issue is pertaining to the certbot.
Conclusion
So far, we were not able to solve the issue and after a long time of debugging together with the AWS team, the issue looks to be from the certbot
or certbot-dns-route53
side. It is really outside of our scope can solve it.
As it affect our production server, as we use Cloudfront and a quite complicate setup (that is why we can not use automatize http-01 challenge as we were doing before) I was able to generate the certificate doing a manual bypassing for the http challenge, now this is affecting the renew of the certificate, producing several error and loops.
I was able to reproduce this issue in several different EC2 instance, but I think that also using Route53 and certbot
with the plugin in any server, the result will be the same. It happen with stagging and production let's encrypt servers.
If you need the full log and network capture as well the command output I can provide this information.
Thanks,
J.