Certificates OK in staging but fails in production


#1

Hello,

I successfuly installed certificates on one of my web servers, for 2 subdomains.
I’m now trying to install another certificate for my production server with the domain “offshadow.com”.

Both servers are managed by OVH. But for the production one, the domain “offshadow.com” is managed by Google Domain (the other domains are managed by OVH directly).

I set “A” DNS entries on google domain to point my OVH IP address.

Important note : I use a Docker architecture with a “certbot” container.

My problem : certificate is generated successfuly when I set the --staging option. But when I remove it, I got a 404 error on my webserver. DNS-01, HTTP-01 and TLS-ALPN-01 challenged return as “invalid”.

I getting crazy because I cannot figure out why everything works fine on staging env and not on production. I checked privileges on my www/dir and it seems to be OK.

Is there any difference between staging and production challenges ? Do I have to add some entries on my google domain configuration ?

My domain is:
offshadow.com

I ran this command:
certbot certonly --verbose --webroot -w /var/www/certbot -d offshadow.com

It produced this output:

{
“identifier”: {
“type”: “dns”,
“value”: “offshadow.com
},
“status”: “invalid”,
“expires”: “2018-11-30T09:53:11Z”,
“challenges”: [
{
“type”: “dns-01”,
“status”: “invalid”,
“url”: “https://acme-v02.api.letsencrypt.org/acme/challenge/oJ3qQ1hYk0ElgI4DBo4U4OFu6BOYoOPzJ3HEb1YgROA/9567469168”,
“token”: “d–3sY6l-Jxq0w04iZlQRWdzCazRA5O2u6nwePGKV6U”
},
{
“type”: “tls-alpn-01”,
“status”: “invalid”,
“url”: “https://acme-v02.api.letsencrypt.org/acme/challenge/oJ3qQ1hYk0ElgI4DBo4U4OFu6BOYoOPzJ3HEb1YgROA/9567469171”,
“token”: “fw_2nPCae_RWY7HqIVuB-SiBFFSo1FZCLXfgF3oj1pY”
},
{
“type”: “http-01”,
“status”: “invalid”,
“error”: {
“type”: “urn:ietf:params:acme:error:unauthorized”,
“detail”: “Invalid response from http://offshadow.com/.well-known/acme-challenge/2OhKY8ljXxGxT5-2m5wKPQJUhI9UwtxMgQT9_EC6XdI: “\u003chtml\u003e\r\n\u003chead\u003e\u003ctitle\u003e404 Not Found\u003c/title\u003e\u003c/head\u003e\r\n\u003cbody bgcolor=\“white\”\u003e\r\n\u003ccenter\u003e\u003ch1\u003e404 Not Found\u003c/h1\u003e\u003c/center\u003e\r\n\u003chr\u003e\u003ccenter\u003e””,
“status”: 403
},
“url”: “https://acme-v02.api.letsencrypt.org/acme/challenge/oJ3qQ1hYk0ElgI4DBo4U4OFu6BOYoOPzJ3HEb1YgROA/9567469172”,
“token”: “2OhKY8ljXxGxT5-2m5wKPQJUhI9UwtxMgQT9_EC6XdI”,
“validationRecord”: [
{
“url”: “http://offshadow.com/.well-known/acme-challenge/2OhKY8ljXxGxT5-2m5wKPQJUhI9UwtxMgQT9_EC6XdI”,
“hostname”: “offshadow.com”,
“port”: “80”,
“addressesResolved”: [
“51.68.83.89”
],
“addressUsed”: “51.68.83.89”
}
]
}
]
}

My web server is (include version): OVH Public Cloud

The operating system my web server runs on is (include version): Debian 8

My hosting provider, if applicable, is: OVH and domain manged by google domain

I can login to a root shell on my machine (yes or no, or I don’t know): YES

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): NO

Thanks a lot !


#2

Nope.

It’s possible that staging could be caching a previous successful authorization which may lead you to believe that staging is working, but it’s hard to say.

What’s the actual Docker command you are running? With the -v mounts and everything.


#3

Ok you helped me with this point : I tried to remove everything on my certbot www folder, and the certificate generation is now failing on staging too…

My Docker configuration is a bit tricky because I got a Nginx container that cannot start without certificates, but has to be running to generate them. I reuse a shell script on the web that generate dummy certificate using openssl to start Nginx, and then run the certbot certonly command.

Tutorial is there : https://medium.com/@pentacent/nginx-and-lets-encrypt-with-docker-in-less-than-5-minutes-b4b8a60d3a71

And the running script can be found there : http://tpcg.io/yREEQt

THanks again :slight_smile:


#4

Ok so if I clear all the cache and previous certbot data, I got the error even in staging.
But if I run the command twice, the second try succeeds…

Dammit, I’m lost.


#5

Ok it’s getting stranger and stranger.
If I run my command a lot of time, there is a very regular behavior.

Attemp #1 : working
Attemp #2 : failing
#3 : working
#4 : failing
etc…

I thought about a privileges issue on files and folders generated by the script, but even if I run a chmod 777 on it, the problem stays. If I cleans the whole certbot cache after every attemps, this behavior keeps looping, as if it came from the server or the challenge itself…!


#6

OK found my issue and a solution.

In fact, the issue was that that the Nginx container had to be stopped before running the script. If the container was alive, it waw making the OPENSSL command failed, and then the next steps too.

I just added a docker-compose stop nginx at the beginning of the script, and everything works now well in staging and production.