Well this stinks, with very little notice. I have about 10 websites impacted, because of using Traefik's TLS challenge.
In my case, I looked at the acme.json in all my various server's traefik configs, and searched for "acct". There's an entry in the json that looks like:
Here is what I did: grep for the account number. Then remove the acme.json (assuming that is the name of the traefik-generated cert/key file) and bounce traefik.
I stopped my traefik container, made a backup of my acme.json file, then deleted everything inside, and restarted the container. An entirely new certificate was generated, the account number in the acme.json file is different now, and the website still works with no warnings, and the date of validity has been extended. So while it wasn't a renewal, it solved my problem.
Is there any other option that doesn't make you delete the acme.json and restarting Traefik? I have lots of certificates that were created using DNS-01 so they don't need to be renewed, but also lots of TLS-ALPN-01 certs that will take some time to renew, thus deleting and restarting implies that all those sites are going to be down for some time with users getting certificate errors.
Hmm, I think I understand what needs to be done. But looks like we're dealing with some amount of downtime while Traefik restarts and waits for new certificates to arrive. I guess I'll be up tonight.
For people that need help to clean their acme.json file, I quickly created a simple tool to help to remove your certificates from the acme.json (Traefik v2 only).
The process:
run the tool with the right arguments
copy the content of the generated file to your acme.json file
restart Traefik
The readme contains examples for all the options (only 3 options).
After replacing the value of the certificates key with an empty array "Certificates": [ ], I bounced Traefik docker service update platform_traefik --force and everything came back with fresh certificates. On one of my busier swarms, it took a minute since there were many more domains to process, so I did see certificate errors briefly. But it cleared up before any of my monitors noticed.
I did this, and I found it to be easiest for me. I did all the steps except touch acme.json to create the new, blank file. It seemed to confuse Traefik, so I just made the backup "revoked_acme.json", exited the Docker container, and restarted the container. When I re-entered the container, the new "acme.json" was there, and Traefik created new certificates.
Awesome thanks! I ended up just deleting the acme.json file and restarting Traefik, which worked really well for me, but I will save this for future reference!
Following the instructions provided we're getting the following error when trying to renew the certificates in question (quite a bunch of them).
time="2022-01-29T16:28:11Z" level=error msg="Unable to obtain ACME certificate for domains "mydomain.com": unable to generate a certificate for the domains [mydomain.com]: acme: Error -> One or more domains had a problem:\n[mydomain.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Incorrect validation certificate for tls-alpn-01 challenge. Requested mydomain.com from 188.166.200.61:443. Received 1 certificate(s), first certificate had identifiers "e415182936ebe9b8d955d8479985a8c5.89f80cd55880258102a1c2cb655a436a.traefik.default, traefik default cert"; got error dNSName does not match expected identifier, url: \n" routerName=default-my-app-name-8c772840d127f0d16fa8@kubernetescrd rule="(Host(mydomain.com)) && PathPrefix(/)" providerName=default.acme
Using Traefik 2.1.9 in Kubernetes.
What does "dNSName does not match expected identifier" really mean?
We have also patched our CA software Boulder to correctly respond and error when client TLS challenge certificates are not compliant to the RFC.
If you are receiving this error, then your client has likely incorrectly implemented the TLS-ALPN-01 challenge type. Your error indicates the the TLS challenge certificate contains incorrect or unexpected information such that the TLS challenge certificate is not compliant with the RFC. You will need to use another challenge type to issue a certificate until your client is fixed. If you are able to open an issue on their source code repository, you should do so. cc @elDez