Public comments on "No Meaningful Subject Distinguished Name"

How does a typical affected certificate look like?

3 Likes

5 Likes

guess revoke only happens on cert without CN?

4 Likes

Yep, that's correct -- the affected certificates are those that have no Subject Common Name (and equivalently, have Subject Alternative Names that are all greater than 64 characters long).

8 Likes

How many certificates were revoked? Curious how many in the Caddy community might be experiencing the ARI renewal behavior.

3 Likes

Does anybody know how to search crt.sh for empty CNs? :thinking:

2 Likes

crt.sh supports searching by the sha1(subject), which you could use to theoretically search for these, but I wasn't able to get that to work without timing out.

This Censys search appears to work properly:

​​parsed.issuer.organization=`Let's Encrypt` and not parsed.subject.common_name: * and labels=`ever-trusted`

which says:

158.91K 	unexpired
6 Likes

Hm, that's roughly 0.03 to 0.04 % of all issued certs. (Assuming 5⋅106 certs/day issued, which obviously is not an accurate number.)

4 Likes

Now that 133613 certs were revoked, I'm curious how many were renewed with the assistance of ARI.

6 Likes

Just being nosy :nose:, what exactly is the "Let’s Encrypt Policy Management Authority" and how is it different than "Let's Encrypt" (or ISRG, I guess)? The CP/CPS says that the PMA is what approves the document and handles revisions. Is it just a committee of people within ISRG, or is it someone external?

While I'm definitely curious about that too (and about ARI adoption in general), I'm also curious about how many were renewed at all. Last I checked, certbot checked OCSP and not ARI, so it would be renewing within a day of it being revoked if run on the recommended schedule. (And of course, roughly a third had probably already been replaced hopefully just based on scheduled expiration anyway.)

And was there an email sent to affected subscribers too?

5 Likes

Yes, the "PMA" is a committee inside ISRG.

5 Likes

Thanks! I already saw it :stuck_out_tongue:

Question though:

When LE decided to halt issuance, why was that 36 minutes after the incident was declared? I know it's a very small amount of time, but if one appreciates the fact there was just 19 minutes between the halting of issuance and restarting issuance, that latter amount of time is even smaller!

To me this indicates LE was and is very efficient in fixing the CP/CPS which of course is a great thing. However, for myself I don't have an answer to why the latter window of no issuance was shorter than the time between the decleration of the incident and halting of issuance.

Possible reasons I thought of:

  • it simply takes a certain amount of time between the decision to halt issuance and the actual halting itself (buttons have to be pressed, things have to be set in motion et c. One does not simply halt issuance at the largest CA in the world :stuck_out_tongue:);
  • perhaps the decision to halt issuance was made a certain amount of time after the incident was declared;
  • probably a combination of the above with perhaps a few other reasons I'm not familiar with.

Note that this is not intended as some criticism, as I think LE acted very, very fast. Frankly too fast for my liking, because I was curious what kind of error the production server was generating during the incident, but look at that, I just got myself a worthless certificate because issuance was restarted already :rofl: I'm just curious how these kind of things work in such a crisis setting :slight_smile:

Second question: is the use of "CN=none" for an empty subject even valid? I guess so.. But personally I would read that as it would produce an invalid Subject with literally an empty CN :stuck_out_tongue:

Thirdly: props to @lenaunderwood for her first incident report :stuck_out_tongue:

4 Likes

PMA discovered the problem, declared an incident, communicated that fact to me (I was oncall).

I spent a few minutes understanding what the situation was, joining the incident video call, then a few more getting logged into production and flipping the switch.

In the meantime, they updated the CP/CPS - since they're the ones who can do that, it was much faster for them to edit some text, push and merge in github. They were already reviewing the document after all, so editing it was very fast.

9 Likes

Aah, I see, PMA themselves declared the incident :slight_smile: I can see following that it would indeed take some time.

I thought PMA notified some other body within Let's Encrypt and that body would declare the incident. My assumption of who declared the incident was thus incorrect :slight_smile:

Curious to know what "flipping the switch" actually is :stuck_out_tongue: Hopefully surrounded with lots of safe guards!

2 Likes

Oh, probably all AI-driven these days.

(Sorry!)

6 Likes

It’s nothing exciting, just a script which disables the API via load balancer configuration. Returns a static error message instead of load balancing.

Our production access is tightly controlled, so only a few people can run it from specially privileged laptops.

8 Likes

One for you, one for the boss and one for the intern :rofl: :stuck_out_tongue:

J/K, I'm sure the script has a meaningful filename which wouldn't get run accidentally :slight_smile:

Let's hope this incident report gets praises and dismissed without any fuss :slight_smile:

4 Likes
sudo ./stop_the_world.sh
7 Likes

I appreciate the fact that you think there are enough of us for there to be multiple bodies to be informed :smiley:

More seriously, anyone at LE can declare an incident, since anyone might be the person to discover one. In this case it just happened to be that the incident was discovered by PMA during document review.

8 Likes

"PMA" and "non-PMA" or perhaps some "operator" group, although I guess multiple people can have multiple functions :stuck_out_tongue:

3 Likes