Embedded devices can't connect to MQTT server after certificate renewal

The issue we were having earlier is that we were unable to connect to our MQTT server. It turned out there was an expired LetsEncrypt TSL certificate which we renewed. broker.avasmartgardens.com-0003 was generated as the new one. Now we can connect to the MQTT server, but the handshake fails between our embedded linux devices and the MQTT server. We can't receive any information back from the embedded devices.

My domain is: broker.avasmartgardens.com

The operating system my web server runs on is (include version): ubuntu-xenial-16.04-amd64

My hosting provider, if applicable, is: AWS

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 1.22.0

I ran these commands:

OUTPUT OF sudo certbot certificates cmd:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Renewal configuration file /etc/letsencrypt/renewal/broker.avasmartgardens.com-0001.conf produced an unexpected error: expected /etc/letsencrypt/live/broker.avasmartgardens.com-0001/cert.pem to be a symlink. Skipping.
Renewal configuration file /etc/letsencrypt/renewal/broker.avasmartgardens.com-0002.conf produced an unexpected error: expected /etc/letsencrypt/live/broker.avasmartgardens.com-0002/cert.pem to be a symlink. Skipping.
Renewal configuration file /etc/letsencrypt/renewal/broker.avasmartgardens.com.conf produced an unexpected error: expected /etc/letsencrypt/live/broker.avasmartgardens.com/cert.pem to be a symlink. Skipping.

Found the following certs:
Certificate Name: broker.avasmartgardens.com.com-0003
Serial Number: 4c80c531572ccb97fc7cf5ff693e4fe96be
Key Type: RSA
Domains: broker.avasmartgardens.com
Expiry Date: 2022-04-12 15:38:50+00:00 (VALID: 89 days)
Certificate Path: /etc/letsencrypt/live/broker.avasmartgardens.com-0003/fullchain.pem
Private Key Path: /etc/letsencrypt/live/broker.avasmartgardens.com-0003/privkey.pem
The following renewal configurations were invalid:
/etc/letsencrypt/renewal/broker.avasmartgardens.com-0001.conf
/etc/letsencrypt/renewal/broker.avasmartgardens.com-0002.conf
/etc/letsencrypt/renewal/broker.avasmartgardens.com.conf

==================================================================

OUTPUT of ls -l /etc/letsencrypt/{archive,live}/* cmd:ls -l /etc/letsencrypt/{archive,live}/*
ls: cannot access '/etc/letsencrypt/archive/*': Permission denied
-rw-r--r-- 1 root root 740 Oct 13 23:25 /etc/letsencrypt/live/README
/etc/letsencrypt/live/broker.avasmartgardens.com:
total 20
-rw-r--r-- 1 root root 1874 Oct 14 04:20 cert.pem
-rw-r--r-- 1 root root 1826 Oct 14 04:33 chain.pem
-rw-r--r-- 1 root root 3700 Oct 14 04:25 fullchain.pem
-rw-r--r-- 1 root root 1704 Oct 14 04:20 privkey.pem
-rw-r--r-- 1 root root 543 Oct 14 04:20 README
/etc/letsencrypt/live/broker.avasmartgardens.com-0003:
total 8
lrwxrwxrwx 1 root root 55 Jan 12 16:38 cert.pem -> ../../archive/broker.avasmartgardens.com-0003/cert2.pem
lrwxrwxrwx 1 root root 56 Jan 12 16:38 chain.pem -> ../../archive/broker.avasmartgardens.com-0003/chain2.pem
lrwxrwxrwx 1 root root 60 Jan 12 16:38 fullchain.pem -> ../../archive/broker.avasmartgardens.com-0003/fullchain2.pem
lrwxrwxrwx 1 root root 58 Jan 12 16:38 privkey.pem -> ../../archive/broker.avasmartgardens.com-0003/privkey2.pem
-rw-r--r-- 1 root root 692 Jan 12 16:27 README

Well, I am a little puzzled by your description. Here's what I see:

You have some damaged certbot folders but the latest folder is the -0003 and seems to be fine. So, as long as your server references the certs in that folder it should be ok. Fixing the mess in these folders is a job for later.

I can connect using http to your domain and see an Apache server response

curl -I http://broker.avasmartgardens.com

HTTP/1.1 200 OK
Date: Thu, 13 Jan 2022 03:20:49 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Mon, 18 May 2020 15:28:41 GMT
ETag: "2c39-5a5edd1537892"
Accept-Ranges: bytes
Content-Length: 11321
Vary: Accept-Encoding
Content-Type: text/html

I cannot connect to that server using HTTPS - the attempt times out.

Have you checked to make sure your EC2 Security Group allows inbound requests to port 443? That is my #1 guess as to what is wrong.

That said, I do not understand how this domain name is involved with MQTT or embedded devices. I only see the Apache response. If what I described above is not helpful please describe more specifically what is wrong. Example, show a URL that is failing and the failure.

3 Likes

Thanks for response ! I'm a part of Stella's team. We have never allowed inbound requests to port 443, so I do not think that would be the issue.

The way it is involved with MQTT is that port 8883 is the connection from from instance to MQTT broker, and the 1883 port is the connection from embedded device to the instance.

After we updated the certificate, we can now connect to the MQTT broker, but the embedded devices cannot connect anymore. When I SSH into the embedded device, I get this error code MQTTASYNC_DISCONNECTED using the Paho library when the device tries to connect.

We have a different MQTT configuration for our internal development devices, and did not update the certificates (not expired yet), and those devices are still fine. Which leads to me to think something related to adding the new certificates caused this problem.

Well, I do not know MQTT well enough to advise further. Perhaps another volunteer will help but you might find better assistance on an MQTT support forum. You don't have any trouble getting Let's Encrypt certs (I see a bunch <g>). It is configuring your system that is problematic.

I went searching this forum and see your org sought and got a resolution once before for this issue. Perhaps reviewing that thread will lead you to a solution. Maybe when you renewed your certs you used a different chain than previously? I can't check as I cannot connect to your EC2 instance on port 1883.

2 Likes

My guess is your org installed the certificate incorrectly on the server.

Looking at the CT logs (crt.sh | broker.avasmartgardens.com) I confirmed your last Certbot certs were issued on 2021-10-13 (and not copied). That was the Certbot default chain switched - so your clients had been able to connect with that for months, and a simple certbot renew would have used the same chain than as it did now with the failed configuration files on both.

Since there is an issue with the Certbot archive on your machine, it's possible/likely the last person to issue the cert manually overwrote some files.

I would not be surprised if the last person switched the chain to manually to use the ISRG Root instead of the (default) expired DST cross sign that is used for android compatibility. Your embedded machines probably trusted the ISRG root, and don't have a version of OpenSSL or other library that implements the short-circuit logic to build alternate trust chains.

3 Likes

Since your clients are not connecting I would take this opportunity to modernise your broker service (OS etc) and get a handle on automatically renewing your certificates rather than using a config you don't fully understand.

If after you have your certificate installed your clients still have trust issues with recent Let's Encrypt certificates you would either need to update the clients as required or switch to a different CA.

4 Likes