Hi all! My company has a platform where customers can purchase custom domains for their shop. We utilize a "Cert Service" with the AcmePHP client to request certificates. This process is triggered at the time of domain purchase or from a cronjob to update expiring certs.
About a week and a half ago, we noticed an uptick in failed authorizations (see example output below). Our service is not in active development and has been working properly since the Acme V1 deprecation. I'm curious if anyone else has experienced this type of issue before.
My domain is: woollastudio.com but we have many others that are also getting an invalid authorization.
I ran this command: Automated task that utilizes AcmePHP to order, authorize, and issue certs.
It produced this output: Invalid response from https://woollastudio.com:443/.well-known/acme-challenge/dgSHTK5GET4_5S3VY9k_bBbPGs3mBbh4HNycLH_dxTs: 400
Note: our cronjob is still active and may result in this link becoming invalid. I can always provide an updated acme-challenge URL as needed.
My web server is (include version): Apache 2.4.6
The operating system my web server runs on is (include version): CentOS 7
I can login to a root shell on my machine (yes or no, or I don't know): yes
I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no
Yes, but, they do acquire a Let's Encrypt cert under that domain name and one is even valid for another 10 days. It is odd they don't send it out. I should have been more explanatory with my last sentence so I am glad you brought this up.
You are probably right in thinking it is more likely a firewall problem. But, the collection of odd things I saw pointed at their "custom domain platform" as possible culprit.
Q1. Has the redirection changed recently?
Q2. Do you really need to redirect to HTTPS (and add :443 to the URL)?
Q3. Since it is Apache, have you verified there is no name:port overlap?
[apachectl -t -D DUMP_VHOSTS]
Thanks for the logs! These look like ACME client logs. I'm also particularly interested in logs (both error and access) from your Apache instance.
Being slightly pedantic: If it were a traditional firewall, we'd probably see a "Timeout during connect" problem. I suspect this is at the web server level (Apache); though products that operate on the web server level are often called "Web Application Firewalls (WAF)".
A1. Redirection has not changed recently.
A2. It is our security policy to always redirect to HTTPS.
A3. There doesn't appear to be any overlap. This particular server is only used for this Cert Service to make the calls to Lets Encrypt.
Have you been able to inspect your apache access and error logs? It is important clue if we see the acme challenge requests in them. For example, if they don't appear there then something in front of that server is blocking the request.
Also, a Let's Debug test just now revealed a wrongly formatted CAA record. Is that new? The wrong format will prevent LE from issuing a cert to that domain (I think so anyway). Do you have this CAA record on the other domains?
Then there really seems like even less reason to redirect the HTTP challenge requests to HTTPS.
It only delays what could have been dealt with then.
Further, you overlooked the "and add :443 to the URL". https://any.site/
is technically equal to: https://any.site:443/
but they may not be handled exactly equally (especially by systems outside your control).
I checked with curl -v and the client's headers don't mention the port number on which the connection took place (and this information also isn't expressed inside of the TLS protocol). I think it would be challenging to construct a situation in which https://example.com/ and https://example.com:443/ URLs produce detectably different behavior on the wire. (Maybe if a client is using an application-layer proxy and the user-agent passes the complete URL to that proxy. A reverse proxy or WAF on the server side will apparently not be able to detect this distinction, though.)
If I were to make a program (like a proxy or browser) where HTTPS defaults to use some other port...
Then any use of HTTPS forced to port 443 would fail.
Unless :443 was explicitly stripped off or ignored.
Which begs the question: Why have it there at all?
You're correct. The :443 was causing an issue on our end. I'm curious where the port is being added in. Our codebase doesn't appear to be adding it anywhere. Could something have changed in the requests LetsEncrypt is making for the acme-challenges?