Multi-domain Cert using Crypt:LE and DNS.pm Module

I’m trying to generate/renew multi-domain certs using Crypt:LE with the DNS.pm module for verification.

My domains are: *.wc.madbray.com, *.wc2.madbray.com

I ran this command:

le64.exe --key account-brad.key --csr MD_madbray.com\MD_madbray.com.csr --csr-key MD_madbray.com\MD_madbray.com_priv.key --crt MD_madbray.com\MD_madbray.com.pem --domains "*.wc.madbray.com,*.wc2.madbray.com" --generate-missing --handle-with DNS.pm --handle-as dns --api 2 --export-pfx xxxxxxxx

The first time I ran it, it produced this output. I waited a week or more and ran the same command again and got the same results where the last step fails with “Could not finalize an order” :

c:\ssl>le64.exe --key account-brad.key --csr MD_madbray.com\MD_madbray.com.csr --csr-key MD_madbray.com\MD_madbray.com_priv.key --crt MD_madbray.com\MD_madbray.com.pem --domains "*.wc.madbray.com,*.wc2.madbray.com" --generate-missing --handle-with DNS.pm --handle-as dns --api 2 --export-pfx xxxxxxxx
2019/02/27 14:01:20 [ ZeroSSL Crypt::LE client v0.32 started. ]
2019/02/27 14:01:20 Loading an account key from account-brad.key
2019/02/27 14:01:20 Generating a new CSR for domains *.wc.madbray.com,*.wc2.madbray.com
2019/02/27 14:01:20 New CSR will be based on a generated key
2019/02/27 14:01:22 Saving a new CSR into MD_madbray.com\MD_madbray.com.csr
2019/02/27 14:01:22 Saving a new CSR key into MD_madbray.com\MD_madbray.com_priv.key
2019/02/27 14:01:23 Registering the account key
2019/02/27 14:01:23 The key is already registered. ID: xxxxxxx
2019/02/27 14:01:24 Processing the 'dns' challenge for '*.wc.madbray.com' with DNS

Add TXT Record for wc.madbray.com.verify.schoolinsites.com. at schoolinsites.com.
Command completed successfully.

2019/02/27 14:01:24 Processing the 'dns' challenge for '*.wc2.madbray.com' with DNS

Add TXT Record for wc2.madbray.com.verify.schoolinsites.com. at schoolinsites.com.
Command completed successfully.

2019/02/27 14:01:26 Processing the 'dns' verification for '*.wc.madbray.com' with DNS
Domain verification results for '*.wc.madbray.com': success.
Deleting '_acme-challenge.wc.madbray.com.verify.schoolinsites.com' DNS record

Deleted TXT record(s) at schoolinsites.com.
Command completed successfully.

2019/02/27 14:01:28 Processing the 'dns' verification for '*.wc2.madbray.com' with DNS
Domain verification results for '*.wc2.madbray.com': success.
Deleting '_acme-challenge.wc2.madbray.com.verify.schoolinsites.com' DNS record

Deleted TXT record(s) at schoolinsites.com.
Command completed successfully.

2019/02/27 14:01:28 Requesting domain certificate.
2019/02/27 14:01:28 Could not finalize an order.

My web server is (include version): Testing from my machine running Windows 10 (IIS)

The operating system my web server runs on is (include version): Windows 10

I can login to a root shell on my machine (yes or no, or I don’t know): YES

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): NO

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): Crypt:LE (le64.exe) version 0.32.1.0

@leader It appears that I have DNS.pm setup correctly. Is there a known issue getting le64.exe to generate & download multi-domain and/or wildcard certs?

There definitely should be no issues with getting multi-domain and/or wildcard certs. I will need to have a closer look at this specific case though. Will PM shortly regarding the details.

nslookup -q=txt _acme-challenge.wc.madbray.com
Returns:
_acme-challenge.wc.madbray.com canonical name = wc.madbray.com.verify.schoolinsites.com

nslookup -q=txt wc.madbray.com.verify.schoolinsites.com
Returns:
*** UnKnown can't find wc.madbray.com.verify.schoolinsites.com: Non-existent domain

Yes, that is the way we are handling DNS verification for customers whose DNS we do not control. The idea was given to me in this post:

This DNS verification method has been working very well so far. Your nslookup didn't show anything, because the TXT record itself is created during verification by DNS.pm and then it gets removed from the DNS server. This has saved us some back-and-forth with new web hosting customers whose DNS we do not control.

I wonder if something has changed in terms of APIv2 response to a newOrder request. If I remember correctly, the behaviour previously was like this: if you had issued newOrder with the same set of names as for already issued one (which was in “Ready” state) you would get back that “Ready” order. It seems that right now, even if you have a valid order in a “Ready” state, asking for newOrder against the same name(s) would create a completely new order (tied to new pending authz). @cpu, would you be able to confirm? Thanks.

We only reuse orders with status=pending, not status=ready. That hasn't changed recently as far as I know.

That said, If you have valid authorizations from a previously ready order they should be reused with the new pending order.

Could I get a explanation of that in layman’s terms? If it doesn’t directly affect me (the end-user or Crypt:LE), then now worries. Just trying to understand what’s going on so I can be more helpful and not have to bother folks as much :slight_smile:

Thanks!

Basically what I’m observing (thanks to the log provided) is this:

  • New order is created in the pending state with the pending authz.
  • Challenge is successfull, at this point authz becomes valid and the order becomes ready.
  • At this point new order request creates a new order in pending state with new pending authz.

That new order can’t be finalized (since the authz is not yet vaild, because it is new). I’m pretty sure that was not the case before though. As @cpu said, “If you have valid authorizations from a previously ready order they should be reused with the new pending order.”, which makes sense. But it does not look like what is happening now. I’ll try to double-check and possibly reproduce over the weekend, provided I feel a bit better than at the moment :slight_smile:

Hmm! That's interesting. I will flag someone to take a look from our side as well. Perhaps there was a regression/bug.

This appears to be a combination of a regression of sorts in boulder and slightly strange client behavior.

Looking at our logs for the specified time period the client appears to take the following steps (note also that all of this is from our staging server, you don’t mention if this was against staging or prod but I couldn’t find any logs for the mentioned names in production):

  • Create a new order
  • Attempt to finalize the order (this fails as the order is pending)
  • Validate the authorizations
  • Create a new order (because the previous order is in the ‘ready’ state rather than the ‘pending’ state we create a new order)
  • Attempt to finalize the new order (this fails as the order is pending)

When we introduced the ‘ready’ state we failed to implement reuse of ‘ready’ orders, I’ve filed a bug to implement this. This change would, technically, allow the above client behavior to succeed, but in general it seems to be acting rather strangely.

1 Like

Hi @roland,

Thanks for the explanation. The immediate attempt to finalize a newly created order in that case is to avoid unnecessary verifications. As it was always stated (and worked in that way), the successfully verified authz would be kept for a while, so you would not need to re-verify again if you were to re-issue the certificate against the same set of names. If that was changed/broken, that means you need to re-verify the same set of names again, even if you successfully did that a minute ago.

Second request to create a new order might seem a bit odd, but that is in essence there to keep it compatible with v1 and custom endpoints. That I might change though.

Thanks for raising that bug (and @cpu for escalating this), subscribed, hopefully that gets fixed indeed.

1 Like

My tests were indeed run against the staging server, since I was just testing. I don’t have any legitimate need for the certs I was trying to create. I’m just testing and verifying my automation scripts at this point.

Should I ever expect different results between the staging & production servers?

Yes - it can happen. They aren't a perfect mirror of each other. We typically vet new features in the staging environment before they are promoted to production and this introduces differences between the two. In general the results should be the same but it isn't a guarantee.

1 Like

I have changed the behaviour of the Crypt::LE code to account for the current issue with “ready orders” and in my tests it worked fine (while still remaining compatible with v1). This should be reflected in the binaries this week. It would still be nice to see that bug with not returning ready orders resolved sometime soon though :slight_smile:

Thanks! I will check out the new binaries when they are available.

Also, I just confirmed that I am able to renew a multi-domain cert with multiple wildcards and other SANs when I did it on the production server (–live). I had a cert that is expiring today, so I’m glad this worked :slight_smile:

One thing to note: the order and authorization reuse that your client was depending on before you changed the behaviour is not specified in RFC 8555.

If an ACME client relies on order/authorization reuse (and is affected by the related Boulder bug) then it was relying on an optimization specific to Let's Encrypt and not something described by the protocol. It probably indicates over-fitting Let's Encrypt's implementation of ACME and might cause problems down the road using other ACME based certificate authorities.

We'll fix the bug in the next few weeks but the priority is lower specifically because it's a bug that only breaks clients that are making out-of-spec assumptions about the issuance flow.

It is a fair statement indeed. As I mentioned before, it was about the behaviour described for LE/boulder, rather than RFC specifically.

Doesn't existence of that bug also mean that the following is not working as stated in the FAQ currently and new validation will be required every time?

I successfully renewed a certificate but validation didn’t happen this time - how is that possible?

Once you successfully complete the challenges for a domain, the resulting authorization is cached for your account to use again later. Cached authorizations last for 30 days from the time of validation. If the certificate you requested has all of the necessary authorizations cached then validation will not happen again until the relevant cached authorizations expire.

That's still working as intended in ACME v1. There is also still authorization re-use happening within orders for ACME v2, just not order re-use of ready orders.

For v1 everything seemed to be working as expected indeed. I'm trying to wrap my head around the statement above though regarding v2 authorization re-use happening, because it did not look so from the logs - newOrder for the already validated domain would have new pending authz on it. That effectively means that validation has to happen again, even though there is a valid non-expired authz. The valid ones are indeed cached, but where would they be re-used then - only for the orders having some new names on them in addition to already validated ones?

@roland Does this match what you saw when you investigated 'ready' orders should be reused · Issue #4117 · letsencrypt/boulder · GitHub ?