One nit about ambiguity: is it the IP address that the VA connected to that is reported, or the IP address of the ACME client that requested the certificate?
My thinking was that a list of domain names might be most useful to people who don’t have a lot of domains or servers. E.g. “Is it my Synology or my home server?” It’s theoretically easy to check only a few servers, but sometimes they don’t have good logs making it actually harder to figure out what’s going on.
Someone with hundreds of web servers and thousands of domains should usually know what their systems are doing anyway.
Current plan is that it would be the IP address the VA connected to, since it’s easiest to collect that information alongside the list of validations that used TLS-SNI-01 - it’s in the validation logs.
To be really precise, for each given account, I’ll have the most recent IP address validated using TLS-SNI-01 (a limitation of our log querying software). Then I’ll be merging accounts with the same email address, so if you have clients on three different IP addresses (each with a different account ID), you’ll get a single email with three IP addresses. I think it’s hard to convey that nuance really precisely in the email, but definitely interested in better ways to talk about it.
I’d like to hear what other people think, but this sentence concerns me. In some ways it’s easier to manage a flood of repetitive threads than a smaller flood of people necroing other, semi-related threads.
Edit: Confession: Discourse’s tool to split threads is fantastic, but coming up with titles is hard!
That’s a good point, I hadn’t thought of that! What do other folks think?
Also, what documentation can we add that will make it easier to point people at help?
Wrt the IP address thing, I’d perhaps update this copy to:
In the past 60 days, your Let’s Encrypt client used ACME TLS-SNI-01 domain validation to issue certificates for domains hosted on these IP addresses:
I think the post at How to stop using TLS-SNI-01 with Certbot could be improved with an example of how to actually perform a dry-run and identify that TLS-SNI is indeed not being used during that run.
There was also one thread where dry-run gave a false positive result on staging due to cached authz (port 80 was clearly inaccessible but there was a previously valid http-01 authz). Maybe it’s asking too much, but killing the authz caching on staging for a few days could help? I’d be pretty mad if Certbot lied to me like that and my cert expired :|.
I think perhaps that the Help template could temporarily be modified to include an explicit prompt for
certbot --version, since it seems necessary to ask that in every Help thread.
In a similar vein, How to stop using TLS-SNI-01 with Certbot could perhaps promote
certbot --version to a proper styled code block, with example output indicating the correct version, so that it doesn’t get buried in the prose.
Re: necro, I think it’s possible to greatly reduce that problem by making @bmw’s post a bit more “step by step” in nature, to match the audience who are looking for very prescriptive advice/instructions.
I’ll work on getting this added.
This is a cool idea! I think it’s doable; I’ll check with the team.
Done! I think I phrased this in a way that covers multiple clients, so it’s a pretty good candidate for keeping long term.
I like this. For now I put
--version in a code block to make it stand out, but I’ll rework it some more in a bit.
After some more thinking and experimenting I think the way that makes sense to organize this information is that we’ll report one domain and one IP address per account. This limits the size and complexity, is consistent and easy to explain, and since most of the time one account is operated by one piece of software, that should be sufficient to inform people about all the places they need to update.
I’ve edited the sample post to reflect this. What do you think?
I think it may be worth providing a link about the staging environment here. https://letsencrypt.org/docs/staging-environment/ works, but I kind of want to remove the suggestion to use
--staging with Certbot in favor of
--dry-run if we’re going to be including a link to this page in the email. (If you naively use
--staging, you can end up with staging certificates installed in Apache/Nginx!)
Thanks @jsha. I like your post a lot better than mine. I’m tempted to delete all/most of my post before the email goes out in favor of people reading jsha’s. What do people think?
That’s a great idea. Can you send a PR at https://github.com/letsencrypt/website?
Thanks! I would be totally fine if you wanted to hoist my post into yours with an edit, then we could just delete mine. Feel free to do that directly.
Great. I’m going to give people on the Certbot team who worked on that post with me a brief chance to object, but if they don’t, I’ll do that.
I also wanted to add that I think this is a good idea:
I’m not sure how many people will have reusable staging authzs and how significant the increased load would be on Let’s Encrypt to disable authz reuse in staging to some degree, but I think authz reuse really negatively affects people’s ability to test here.
With Certbot, if you test against staging but have a valid authz, all that was tested is it is possible for Certbot to set up the challenge (e.g. standalone could bind to port 80, we could modify your Apache/Nginx config, etc.). You could have bad firewall rules, your ISP is blocking port 80, etc. and you would have no idea.
I would expect other clients to have similar problems here.
Yep, we’ve disabled authz reuse in staging as of today.
I’m not seeing it yet. I did a
--dry-run on a certificate, and then I issued two staging certificates for a new hostname, and it still did normal authz reuse.
--dry-run‘s authzs’ original order was:
You’re right. I saw your comment and double checked myself and saw the same results. I checked the V1 API as well and it had the intended effect there. I filed a bug to fix the ACME v2 valid authz reuse: https://github.com/letsencrypt/boulder/issues/4026
Hopefully the majority of people using TLS-SNI-01 with staging are using ACME v1 in the meantime.
Thanks again for flagging this @mnordhoff!
Quick update here: We deployed a fix outside of the usual cycle to make sure this was addressed. The staging environment is now properly not reusing valid authorizations for both the V1 and the V2 API.
I’ve updated the draft post, mainly to make this sentence simpler and clearer:
TLS-SNI-01 validation in the production environment is reaching end-of-life. It will stop working temporarily on February 13th, 2019, and permanently on March 13th 2019. Any certificates issued before then will continue to work for 90 days after their issuance date.
(It used to try and explain how to check your certificate’s NotAfter date from browser developer tools).
I also added a link for staging.
Heads up: This second round of emails will start going out today. We’ll be spacing them out slowly just in case someone reports a significant last-minute problem.
We hit a snag with the email sending yesterday but we’ve restarted it today.