Looks like @jcjones may have found the culprit. Unintended switch to serving the short chain by default for renewals.
I'm confused about why that would result in an inability for App Service to find the issuer though since we confirmed the self-signed ISRG Root X1 is trusted and it doesn't expire until 2035. It doesn't bode well for the change currently scheduled for next February that will do this on purpose:
That would be my first guess... and what sticks about to me is the following:
@robertg123 should make sure the chains are not hardcoded into their client. in the past, many clients have had issue during similar API changes because the developers hardcoded the chains. some end users have also had issues because their web server integrations hardcoded a chain.
I will need some advice on what to look for regarding hard coding chains into the code. Would like to fully underdstand this so I can future proof the code for the up and coming changes that will happen.
As stated - this is based off some code I/we did not right.
Look at the last step of the ACME protocol, when the certificate is downloaded.
What you should see is the following: the client should download a fullchain PEM and split it into the Leaf/EndEntity (for your domain) and the chain (everything else). The client may also inspect the headers of the ACME response to look for Alternate chains, and download/process/store them. The End-Entity Certificate should be the same for everything under LetsEncrypt as the alternates are provided by cross-signed versions of the same upstream Certificate - only the upstream chains will change - however this is not guaranteed by the specification and the Certificates could be different.
The earliest versions of many clients did not store the full chain , and many still do not handle alternates. What many clients erroneously did, was to distribute the source with a copy of the chains and then rebuild the full cert by adding the end-entity certificate onto it.
The production change which led to this have been fixed. If you renew your certificate now (or simply re-download the existing certificate) it should come with the chain that you expect.
That said, this same change will become permanent soon: the short chain you got this time will become the default in Feb 2024, and and longer compatibility chain you expected will go away entirely in June 2024.
Thanks - we have our own version of the extension and I upgraded it to the latest version of Certes. Also the latest version of the azure web sit management libraries. Will be deploying over the next few days.
As far as I can see from the source code of certes it is not shortening the chain when converting to pem. This needs tested as well.
The bit I don't understand about this situation is when the change to what is returned from LetsEncrypt comes into play, who will be the root certificate for any Lets Encrypt cert and will it be installed on Azure by MS so the certificate is validated correctly?
I don't think your error is coming from Azure itself, I think it's coming from your extension - "Can not find issuer" is the literal error message embedded in Certes.
When the order for the cert completes with Let's Encrypt the download API offers the default chain, which is normally Leaf > R3 > ISRG Root X1 > DST Root CA X3 (expired) - but included is a link header to an alternative chain download that your client could select instead ( Leaf > R3 > ISRG Root X1 ).
Certes has a "preferred chain" API var cert = await orderCtx.Download(preferredChain);, but I don't know if your extension allows for you to specify that. You need to tell it you want "ISRG Root X1" if possible and if that all works then you will be set until the next time Let's Encrypt change their root. Certes also had embedded resources for common roots, so upgrading may help but it won't future proof you for newer roots or if you switch CA.
The code you linked to is for PEM conversion of the chain, but I'd suspect that the extension is working with the PFX which is built via BouncyCastle and is subject to the constraints of knowing about the required root (the code literally has to pass the root into the call to have the PFX build properly). The PFX build will be the part that failed.
Azure can be a little unpredictable as some services are windows based (using the path validation that windows uses) and some are linux, some things like application gateway are using entirely custom cert validation.
You should possibly consider whether you can switch to azure app managed certificates, as these are provided by the platform and I'm pretty sure they're free now, so this would remove some complexity or potentially brittle moving parts.