We are currently experiencing issues with our cert renewel with certbot. Our infrastructure consists of two GWs running nginx as loadbalancer.
Both of those GWs have the ips 1.1.1.1 and 1.1.1.2 as VIP's on their respectiv NIC.
Within DNS testsite.de is pointing to 1.1.1.2 and 1.1.1.1
We have two Nginx-GWs. We use keepalived for failover functionality. GW1 uses GW2 as an backup. GW2 uses GW1 as an backup
If GW1 fails the other GW2 gets the ip from GW1 too. Hence both GWs has the two IP's on their nic
If we start a dryrun on GW1, we get following output:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for ext-services.testsite.de
http-01 challenge for www.testsite.de
http-01 challenge for testsite.de
Using the webroot path OBSCUREDPATH for all unmatched domains.
Waiting for verification...
Challenge failed for domain ext-services.testsite.de
Challenge failed for domain www.testsite.de
Challenge failed for domain testsite.de
http-01 challenge for ext-services.testsite.de
http-01 challenge for www.testsite.de
http-01 challenge for testsite.de
Cleaning up challenges
All challenges have failed.
IMPORTANT NOTES:
- The following errors were reported by the server:
Domain: ext-services.testsite.de
Type: unauthorized
Detail: 1.1.1.2: Invalid response from
http://ext-services.testsite.de/.well-known/acme-challenge/generatedNounce:
404
Domain: www.testsite.de
Type: unauthorized
Detail: 1.1.1.2: Invalid response from
http://www.testsite.de/.well-known/acme-challenge/generatedNounce:
404
Domain: testsite.de
Type: unauthorized
Detail: 1.1.1.2: Invalid response from
http://testsite.de/.well-known/acme-challenge/generatedNounce:
404
To fix these errors, please make sure that your domain name was
entered correctly and the DNS A/AAAA record(s) for that domain
contain(s) the right IP address.
We're thinking the issues stems from certbot chosing the wrong ip address, since 1.1.1.2 is being used, even though 1.1.1.1 should have been used.
By default 1.1.1.1 should be in master state, as configured in the keepalived config, which was the case at the time of the test. For those challenges, both of those IPs are being provided and are
seen in the response, but the wrong one is being used.
{
"identifier": {
"type": "dns",
"value": "www.testsite.de"
},
"status": "invalid",
"expires": "2022-09-28T09:39:00Z",
"challenges": [
{
"type": "http-01",
"status": "invalid",
"error": {
"type": "urn:ietf:params:acme:error:unauthorized",
"detail": "1.1.1.1: Invalid response from http://www.testsite.de/.well-known/acme-challenge/generatedNounce: 404",
"status": 403
},
"url": "https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/3703319624/ddjOvg",
"token": "generatedToken",
"validationRecord": [
{
"url": "http://www.testsite.de/.well-known/acme-challenge/generatedNounce",
"hostname": "www.testsite.de",
"port": "80",
"addressesResolved": [
"1.1.1.1",
"1.1.1.2"
],
"addressUsed": "1.1.1.1"
}
],
"validated": "2022-09-21T09:39:01Z"
}
]
}
his is part of the output from the letsencrypt.log file from GW2. In this case, 1.1.1.2 should have been used, since requests addressed to 1.1.1.1 are being forwarded to GW1.
On GW1 the addressesResolved part looks like this:
"addressesResolved": [
"1.1.1.2",
"1.1.1.1"
],
with 1.1.1.2 being used for all the requests aswell as value for "addressUsed".
If we shutdown one of the GWs, the dryrun finishes without any errors.
Is there a setting we are missing? Why wont Certbot try all the resolved addresses?
I obscured most of the data but I think for the questions asked it should suffice. If there is anything missing or not clear enough, feel free to ask.
We are using certbot 0.40.0