This is just a general question to see if anyone else has seen recent issues with Let's Encrypt production specifically not populating the certificate url for finalized valid orders.
I've seen this reported a few times by my users over the last few weeks and initially thought it might be a new bug in my app, but it's happening across multiple versions.
That's not to say there's not a bug, but without digging into it much it seems like we are waiting for the order to be finalized (order status valid) as usual but the certificate url is occasionally not populated in the first valid order result that comes back.
We will add more polling when the url is empty to allow for this but wondered if this was expected behavior (I don't think it is).
Was curious though and had a quick look at the source code to see if there is a plausible code path for this scenario (e.g. race condition when database is updated?). But it doesn't seem like the scenario you describe is possible on the server-side. The code that generates the response JSON has strict logic that computes the certificate URL on the fly, when the order is valid, always:
So that field isn't even coming from the database, the web-frontend just adds that on its own whenever the order status is valid. That JSON is directly returned over HTTP, so not really much room for an issue here.
(Assuming order.CertificateSerial is not initialized this URL will be incorrect, but the JSON field should be assigned with a valid URL regardless)
I've had this comment in my client around the finalization code for quite a while, so I must have run into it.
Boulder's ACME implementation (at least on Staging) currently doesn't quite follow the spec at this point. What I've observed is that the response to the finalize request is indeed the order object and it appears to have 'valid' status and a URL for the certificate. But it skips the 'processing' status entirely which we shouldn't rely on according to the spec.
So we start polling the order directly and the first response comes back with 'valid' status, but no certificate URL. Not sure if that means the previous certificate URL was invalid. But we ultimately need to check for both 'valid' status and a certificate URL to return.
So basically, the order object returned by the finalize request has the cert URL. But if you POST-as-GET the order object, it's missing for some period of time. Maybe it has to do with internal DB propagation across nodes?
Good thinking. I suspect it's this. During the /finalize/ request, we write to the database primary, then tweak the in-memory Order object to add the certificate serial and set status to "valid" before returning it to the user.
On polling the order, we query a read-only replica. That could be lagged behind the primary, in which case you'd see an order in "pending" state with no certificate URL.
So, I'd be very surprised to ever see an order with "valid" status and no certificate URL. But a situation where /finalize/ returns a "valid" order but subsequent polling gives a "pending" order again is very plausible.
It would be more consistent to have /finalize/ always return a "pending" order, and require subsequent polling to see the order become "valid". This is the intent behind our AsyncFinalize feature flag, which we've tried turning on in the past; but too many deployed clients had trouble with it.