Starting around June 7th, we’ve noticed ACME v1 occasionally returning errors referencing “sub-problems”. I see now this is part of ACME v2 RFC 8555 (thanks @cpu for the post).
Here is an ACME v1 example: Rechecking CAA for "[www.example.com]" and 1 more identifiers failed. Refer to sub-problems for more information
Typically, we see this when the name servers for an apex domain either are unavailable or when the servers respond with REFUSED.
Was the change to ACME v1 intentional? We plan on re-writing our client to use ACME v2, but we’re relying on ACME v1 until that’s done.
Thanks for the bug report! You’re right that we probably should have kept the ACMEv1 behavior the same. It may wind up being fairly complicated to plumb that through, since only WFE/WFE2 know which API version they are serving, but this error is generated in the RA.
I’ll file a ticket and we’ll discuss on the team how to prioritize. To help us with prioritization, can you tell us more about what sort of problems is this causing you? Do you automatically retry issuance with the failed domains removed?
We serve multiple customers on shared certificates. We do error matching to determine if the failing domain has a transient failure or a permanent one. We have a safe-default of assuming errors are transitory in nature. Without having a matching case for the subproblems error, we manually investigate failing domains to determine the cause and manually decide if the domain should be removed. We automatically retry reissuance of the same set of domains while a manual removal decision is being made.
Fortunately, the number of occurrences has been fairly infrequent so far. As I mentioned before, it seems limited to a case where the NS for the apex are either unavailable or responding with “refused” (which means the site is already effectively broken and it’s safe for us to remove it from our certificates).
We talked about this some more as a team. The error message did change in ACMEv1, but the subproblems are also present in ACMEv1. So our plan is to not change the behavior, since the error message was never something we formally expected to stay the same. In fact, we developed the subproblem mechanism specifically to make it possible to handle situations like this without resorting to string parsing of error messages.
My recommendation is to implement subproblem handling in your ACMEv1 client. Hopefully that will be one less thing to worry about when you finish your ACMEv2 migration.