Were the retired x1 and x2 pem certificates properly formatted?

jvanasco · February 19, 2021, 8:32pm

I recently updated some certificate parsing routines/tests and ran them against the LetsEncrypt certificates.

I understand x1 and x2 are retired, but I have a question if these 2 files are technically valid. I could not find anything online about them.

Both PEM certificates use the following header/footer, in which the underscore (_) is a space.

-----BEGIN CERTIFICATE-----_
-----END CERTIFICATE-----_

I believe the formatting is supposed to conform to RFC 7468 (RFC 7468 - Textual Encodings of PKIX, PKCS, and CMS Structures), which provides for ignoring whitespace between the encapsulation boundaries and outside of them, but defines the encapsulation boundaries as the entirety of a line -- which suggests to me the space is non-conformant. Perhaps another RFC covers this though?

No other LetsEncrypt/ISRG certs present this detail. Running the certs through a handful of random libraries showed them both passing and failing due to the space; without the space they always parse.

I'm adjusting my tools to support this variant, but I still wonder if this is correct.

jvanasco · February 19, 2021, 8:50pm

Adding to the above, one such library that would fail to consider these certs as invalid is Certbot:

github.com

certbot/certbot/blob/master/certbot/certbot/crypto_util.py#L516-L524


# Finds one CERTIFICATE stricttextualmsg according to rfc7468#section-3.
# Does not validate the base64text - use crypto.load_certificate.
CERT_PEM_REGEX = re.compile(
    b"""-----BEGIN CERTIFICATE-----\r?
.+?\r?
-----END CERTIFICATE-----\r?
""",
    re.DOTALL # DOTALL (/s) because the base64text may include newlines
)

aarongable · February 19, 2021, 9:00pm

Interesting! This is news to me personally, but I wasn't around when those certificates were issued.

I would read RFC 7468 Section 2 slightly differently:

Furthermore, parsers SHOULD ignore whitespace and other non-
base64 characters and MUST handle different newline conventions.

Empty space can appear
between the pre-encapsulation boundary and the base64, but generators
SHOULD NOT emit such any such spacing.

Most extant parsers ignore blanks at the ends of lines;

My reading is that whitespace SHOULD be ignored no matter where it appears, and at the same time whitespace SHOULD NOT appear anywhere. Basically "be strict in what you output, but generous in what you accept" (which is... not always a good idea in general, but appears to be what the RFC is setting forth).

But that's just my take on an RFC that I have admittedly never read before

Osiris · February 19, 2021, 9:44pm

Well, actualy, the term "label" in the RFC is used for the text directly behind "BEGIN " and "END ", such as "CERTIFICATE".

I'm more enclined to agree with @aarongable and read the part of whitespaces in the RFC more as a general thing, including directly after the pre- and post-encapsulation boundaries.

jvanasco · February 19, 2021, 9:45pm

Our readings of the RFC slightly diverge as I place emphasis on the RFC defining those lines as "encapsulation boundaries" and is fairly strict on their definition.

In any event, LetsEncrypt and Certbot have differing views on this, and the only thing that really matters is the two groups come to a consesnsus. I created a ticket on Certbot (CERT_PEM_REGEX potentially incorrect · Issue #8672 · certbot/certbot · GitHub) for that.

@Osiris I mean to write "encapsulation boundary" there; I will edit.

aarongable · February 19, 2021, 10:07pm

Thanks for filing that! At the same time, though, we can simply remove those spaces. The canonical form of the certificate is the DER, not the PEM, (and the certs in question are just the certs served by the website, not the DER-encoded certs served at their respective AIA URLs) so we can remove that space without affecting any encoding or fingerprints or anything. Change to do that here:

jvanasco · February 19, 2021, 10:27pm

Thanks, @aarongable !

This popped up today, because we have a test that downloads all the LetsEncrypt certs from the website, and ensures conversions from PEM>DER>PEM and DER>PEM>DER roundtrip correctly and against the expected values. A setup routine for our developer environment imports an archive of production certs too.

Some underlying functions in our client/manager utilize Certbot's chain parsing, and we discovered a bug where Chains were being stored as Certificates (hence my posts yesterday into verifying chains). To protect against this, a new check was added to the system where Certificates are parsed (using Certbot's library for the initial trial) to ensure we have 1 Certificate when appropriate or more than one Certificate when otherwise appropriate. Everything started breaking when the legacy LetsEncrypt certs got brought in, because of that space and the Certbot regex. OpenSSL is fine with them... Python's cryptography is fine with them... it's just the "gatekeeper" routine saying "this does not look like a certificate!"

Fun 48 hours! Fixing one bug surfaced 5 unknown bugs, and an incompatibility between old LetsEncrypt chains and Certbot!

_az · February 20, 2021, 8:55am

I'm sorry I missed out on all this action today!

I wrote the offending regex as part of a CRLF-related bugfix. Whitespaces were intentionally omitted from it, because the regex is only supposed to parse stricttextualmsg PEMs:

In stricttextualmsg, whitespaces are not allowed on the on the encapsulation lines. Personally, I interpret this to cancel out all of the wording about ignoring whitespace.

Why is this regex limited to stricttextualmsg?

Because ACME says that the application/pem-certificate-chain format, which is the default format for certificate downloads, uses stricttextualmsg rather than the more relaxed textualmsg which would allow these whitespaces.

I hope I have not misread the documents and that the problem here is that the regex is being used for a purpose for which it was not intended, but please feel free to correct me! If it's indeed that case that a parser of stricttextualmsg should handle whitespace, then I'd be happy to bring that to Certbot.

jvanasco · February 20, 2021, 4:12pm

No, it sounds more like the intermediate that is published on the website has an erroneous space. I picked this up via tests that use the intermediates on the site, but after going through a lot of archives, it seems some client(s) I used in the past had hard coded the web versions and used it.

I don’t think many other people use legacy certs, so it hasn’t come up.

system · March 22, 2021, 4:13pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.