Anyone else having problems with the new profile selection feature?
I get the following error (staging and production):
Error creating new order :: unrecognized profile name "classic" (urn:ietf:params:acme:error:invalidProfile)
Anyone else having problems with the new profile selection feature?
I get the following error (staging and production):
Error creating new order :: unrecognized profile name "classic" (urn:ietf:params:acme:error:invalidProfile)
I'm seeing the same. Staging directory endpoint currently advertises both classic
and tlsserver
, but accepts neither. Prod advertises only classic
but doesn't accept it either.
Not sure what's going on.
Boulder on production currently is running eda49660, which is fairly recent. A few commits back we can see some profile related commits.
Maybe some bug was introduced.
Thanks for the confirmation!
There were some changes around handling profiles, so it is definitely possible something has broken. It’s the weekend so I think it likely won’t be resolved til Monday.
But I assume the working of profiles is tested in some integration test?
As far as I can tell, the problem is not in the code but in the config (which we don't see).
When using the "classic" profile for example:
The call to
wfe.validateCertificateProfileName(newOrderRequest.Profile)
succeeds.
In the next step
ra.profiles.get(req.CertificateProfileName)
fails with
Error creating new order :: unrecognized profile name "classic"
So it seems the Web Frontend (wfe2/wfe.go) has the profiles configured but the Registration Authority (ra/ra.go) has not.
If the config somehow has disabled profiles or not configured the classic
profile, why is it announced in the directory? Sure also sounds like a bug in the code to me Unless perhaps there are two separate places in the configuration where profiles are "enabled", but that sounds kinda strange to me.
I would also assume integration tests are using (mostly) the same configuration as staging and/or production.
There is more than one config.
The Web Frontend uses c.WFE.CertProfiles
, which is read from "wfe2.json".
The Registration Authority uses c.RA.ValidationProfiles
, which is read from "ra.json".
This is all in the "config-next" folder. However in the "config" folder itself there is the "validationProfiles" key missing in ra.json:
The directory works since it belongs to the Web Frontend.
And there isn't a check in place that makes sure certain items of both JSONs are congruent? Guess not.
I would think Boulder would refuse to stop/reload/restart/start if the configurations aren't compatible with each other.
there could be time it'd diverge, like when a profile is retired it'd removed from wfe but it would have in RA side until old orders with that profile expires
Have left a suggestion on the boulder repo for additional automated production testing to cover this: Implement Testing in Production to cover new feature sets · Issue #8001 · letsencrypt/boulder · GitHub
Hi folks, we've tracked down the bug, and indeed it was a deployability bug -- some of the code deployed in this past week's code deploy inadvertently expected certain configuration to be in place, when that config was not yet in place.
For the curious, the bug is right here:
func (vp *validationProfiles) get(name string) (*validationProfile, error) {
if name == "" {
name = vp.defaultName
}
profile, ok := vp.byName[name]
if !ok {
return nil, berrors.InvalidProfileError("unrecognized profile name %q", name)
}
return profile, nil
}
This function should have had an additional check at the top:
if vp.defaultName == UnconfiguredDefaultProfileName {
return vp.byName[vp.defaultName], nil
}
to ensure that all requested profiles are accepted and given the default values, as long as the RA's configuration hasn't yet been updated with specific per-profile settings.
I've put together a fix. Thanks for your patience, everybody!
Good analysis! Note that the configs found in the config and config-next folders are not the configs that our ACME Server is deployed with: they're test configs for Boulder's integration tests. So while they are intended to be similar to our prod (config) and staging (config-next) configuration, one can't generally use them as an indication of exactly what the current production configs look like.
If boulder services refused to start unless their configs were congruent, we'd never be able to do rolling-restarts where individual components (and even individual replicas of each component) are brought up with new config one-by-one so we don't have any downtime during a deploy.
So instead we defend against these with the config-loading code. In this case, you can see that the RA was made happy to start with either the new-style or old-style configs, because we know that we can't deploy the new-style configs at the exact same instant as the new code. I just happened to write a bug a little bit further down the line, and didn't catch it in unit or integration tests.
Hm, makes sense. I shouldn't compare Boulder with stuff like Apache and nginx, which are (as far as I know in my little home server environments) single component services.
What should have happened here is that our automated issuance testing should have caught that it couldn’t issue certificates, but we haven’t yet got it using profiles yet. That’s on the TODO list before we are ready to launch multiple profiles in production.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.