I'm attempting to find an answer to this over on the main Discourse forum as well, but I see there have been similar questions asked here in the past, so here we go ...
Sometime back – it’s not clear exactly how long but at least several months – Let’s Encrypt renewals started failing on my Discourse forum, after running fine for years. The cert being served is expired. Oddly, it initially was expired August 22, and after some rebuilds and nginx restarts and such it now reads expired Dec 22. Still not current, obviously. I believe HSTS may be adding confusion -- I've cleared that in Chrome and Firefox for now.
Manually running acme.sh to force a renewal (within the discourse container) is yielding this error (where [site] is my site address, of course):
[site]:Verify error:Fetching http://[site]/.well-known/acme-challenge/[long alpha challenge string]: Error getting validation data
Testing the verify using wget returns a 404. However, I do not know where this data is configured into nginx for Discourse in a container and how it relates to nginx proxied outside the container.
If you are running nginx outside the container and using that to proxy HTTP & HTTPS, you should expect a valid certificate to be required there (and kept updated there).
The container itself might not even need HTTPS.
OR it might also be ok to use a snakeoil cert.
Hi. I'm just using the default docker-based version of Discourse and never set up nginx explicitly. I don't know why renewals would suddenly stop this way. I haven't located the configuration where .well-known would be specified or where that directory would be located. The configuration inside the container (in /etc/nginx) is just at default settings. When I run nginx -t outside the container, nginx does not appear to be installed at all. Thanks.
I think I may see the problem. But I don't know how to fix it. It appears that all attempts to access the forum on https port 80 are (as expected) being redirected https 443. Right. But this means that when Let's Encrypt attempts to validate for the renewal, it fails, because the current certificate has expired. I can see the redirect with wget. So the question is, how do I disable the redirect temporarily so that Let's Encrypt can validate and get me a new non-expired cert? An additional possible complication is that the redirect is a 301 permanent. Thanks.
Error getting validation data is a catch-all error that can be caused by many different things; maybe try https://letsdebug.net/ to see if it can identify anything in particular that's likely to be going wrong here?
In fact, I was able to disable the redirect and the acme challenge access is still failing. It's a 404. The file it's looking for apparently just isn't there. Wherever exactly "there" is in this Discourse inside docker configuration, that is (so, nginx). To the outside world it's an expired certificate error.
I'm getting closer to solving an expired cert problem under Discourse. The ultimate problem is that the (apparently curl) GET of the verification token always seems to timeout in acme.sh when I try to renew.
I've been testing using the extra --force and --renew-all parameters.
One aspect I believe I've solved. The machine with the Discourse docker container has to use a different IP address to access itself, due to routing issues (it can't use the public address). I've now temporarily appended a local address in /etc/hosts in the Discourse container. I can now manually (using wget or curl) get the token from inside the container, so long as redirects are followed and expired certs are ignored. So, for example, this means for curl using -L -k.
I can also successfully get the token from external sites, again using wget or curl.
However, acme.sh is still apparently timing out on the same URL. I'm assuming that acme.sh invokes curl in a manner that permits following redirects and ignoring expired certs, but I'm not sure. Given that only acme.sh is now failing to get from that URL, it is a bit puzzling.
Here is the exact error. Though in later tests I'm seeing code 60 (cert validation failure). I had thought acme.sh would skip validation so it could deal with renewing expired certs. After adding --insecure to the acme.sh command, things are back to the generic "Error getting validation data" error after the URL fetch again.
[Sun 26 Dec 2021 09:40:25 PM UTC] url='http://[site]/.well-known/acme-challenge/IeRRV2ra9DwgUa4ZORL2_OC2iVG2Dw-PQxRmRieceJE'
[Sun 26 Dec 2021 09:40:25 PM UTC] timeout=1
[Sun 26 Dec 2021 09:40:25 PM UTC] _CURL='curl --silent --dump-header /shared/letsencrypt/http.header -L -g --connect-timeout 1'
[Sun 26 Dec 2021 09:40:25 PM UTC] Please refer to https://curl.haxx.se/libcurl/c/libcurl-errors.html for error code: 56
Looking more closely at the detailed debug output, it appears that the token IS being successfully retrieved. So now I'm completely stymied -- why is the validation error still occurring? This is a real problem.
[Mon 27 Dec 2021 04:48:37 PM UTC] url='https://acme-v02.api.letsencrypt.org/acme/chall-v3/62518011250/DoV0bA'
[Mon 27 Dec 2021 04:48:37 PM UTC] payload
[Mon 27 Dec 2021 04:48:37 PM UTC] POST
[Mon 27 Dec 2021 04:48:37 PM UTC] _post_url='https://acme-v02.api.letsencrypt.org/acme/chall-v3/62518011250/DoV0bA'
-----> WHAT IS THIS? >>> [Mon 27 Dec 2021 04:48:37 PM UTC] _CURL='curl --silent --dump-header /shared/letsencrypt/http.header -L -g --insecure '
[Mon 27 Dec 2021 04:48:37 PM UTC] _ret='0'
[Mon 27 Dec 2021 04:48:37 PM UTC] code='200'
-----> STILL AN ERROR! >>> [Mon 27 Dec 2021 04:48:37 PM UTC] [site]:Verify error:Fetching http://[site]/.well-known/acme-challenge/2pSLG_7W9sDrfIT-F2kM-0daQLNaKca1iI2Nwg6CejA: Error getting validation data
[Mon 27 Dec 2021 04:48:37 PM UTC] Debug: get token url.
[Mon 27 Dec 2021 04:48:37 PM UTC] GET
[Mon 27 Dec 2021 04:48:37 PM UTC] url='http://[site]/.well-known/acme-challenge/2pSLG_7W9sDrfIT-F2kM-0daQLNaKca1iI2Nwg6CejA'
[Mon 27 Dec 2021 04:48:37 PM UTC] timeout=1
[Mon 27 Dec 2021 04:48:37 PM UTC] _CURL='curl --silent --dump-header /shared/letsencrypt/http.header -L -g --insecure --connect-timeout 1'
-----> HERE IT IS! >>> 2pSLG_7W9sDrfIT-F2kM-0daQLNaKca1iI2Nwg6CejA.vKA8FrKeyX8nJdc_2PzikCV_PLNTM5SEBXrmNKVtz3o[Mon 27 Dec 2021 04:48:37 PM UTC] ret='0'
[Mon 27 Dec 2021 04:48:37 PM UTC] Debugging, skip removing: /var/www/discourse/public/.well-known/acme-challenge/2pSLG_7W9sDrfIT-F2kM-0daQLNaKca1iI2Nwg6CejA
Substitute the site domain name you are using for [site]
This challenge file was left in place by acme.sh so you should still see it. It looks to me like the folder acme.sh is placing the file is not where your server uses to reply to public requests. That is, does not match the root folder value for this server conf.
The diagnostics indicate that the token IS being pulled by curl. But the script is still throwing a failed authentication error. I can (apparently) see the token in the debug output after the curl command executes. Thanks.
I am sorry but I do not understand your explanation. Can you just show the results of the curl command I suggested? You can edit it to remove your domain name if you must but otherwise show it as it is.
Well, I don't know what to try next. I think you may be better off asking on the acme.sh github. You have some unusual errors in the log and your manual curl retrieves a file that the acme.sh curl cannot. The acme.sh developers are at the github (and less often show up here).
I would have liked to see the entire output of the curl command. And, your need to add -k indicates you are redirecting the original http request. But, as long as you used the URL as I showed it should work the same manually as in acme.sh.
Well, Discourse automatically redirects everything to https. The fundamental problem is that a cert that was renewing fine automatically for years suddenly stopped renewing. Of course curl can't get past the expired cert unless -k (or --insecure) is used. Very frustrating to have this just suddenly break this way.