Profile selection

stewe · February 8, 2025, 2:25am

Anyone else having problems with the new profile selection feature?

I get the following error (staging and production):

Error creating new order :: unrecognized profile name "classic" (urn:ietf:params:acme:error:invalidProfile)

rmbolger · February 8, 2025, 7:17am

I'm seeing the same. Staging directory endpoint currently advertises both classic and tlsserver, but accepts neither. Prod advertises only classic but doesn't accept it either.

Not sure what's going on.

Osiris · February 8, 2025, 7:42am

Boulder on production currently is running eda49660, which is fairly recent. A few commits back we can see some profile related commits.

Maybe some bug was introduced.

stewe · February 8, 2025, 9:06am

Thanks for the confirmation!

mcpherrinm · February 8, 2025, 5:23pm

There were some changes around handling profiles, so it is definitely possible something has broken. It’s the weekend so I think it likely won’t be resolved til Monday.

Osiris · February 8, 2025, 9:19pm

But I assume the working of profiles is tested in some integration test?

stewe · February 8, 2025, 10:55pm

As far as I can tell, the problem is not in the code but in the config (which we don't see).

When using the "classic" profile for example:

The call to

wfe.validateCertificateProfileName(newOrderRequest.Profile)

succeeds.

In the next step

ra.profiles.get(req.CertificateProfileName)

fails with

Error creating new order :: unrecognized profile name "classic"

So it seems the Web Frontend (wfe2/wfe.go) has the profiles configured but the Registration Authority (ra/ra.go) has not.

Osiris · February 9, 2025, 9:15am

If the config somehow has disabled profiles or not configured the classic profile, why is it announced in the directory? Sure also sounds like a bug in the code to me Unless perhaps there are two separate places in the configuration where profiles are "enabled", but that sounds kinda strange to me.

I would also assume integration tests are using (mostly) the same configuration as staging and/or production.

stewe · February 9, 2025, 10:25am

There is more than one config.

The Web Frontend uses c.WFE.CertProfiles, which is read from "wfe2.json".

github.com/letsencrypt/boulder

test/config-next/wfe2.json

eda496606


      
          		"certProfiles": {
          			"legacy": "The normal profile you know and love",
          			"modern": "Profile 2: Electric Boogaloo"
          		},

The Registration Authority uses c.RA.ValidationProfiles, which is read from "ra.json".

github.com/letsencrypt/boulder

test/config-next/ra.json

eda496606


      
          		"validationProfiles": {
          			"legacy": {
          				"pendingAuthzLifetime": "168h",
          				"validAuthzLifetime": "720h",
          				"orderLifetime": "168h"
          			},
          			"modern": {
          				"pendingAuthzLifetime": "7h",
          				"validAuthzLifetime": "7h",
          				"orderLifetime": "7h"
          			}
          		},

This is all in the "config-next" folder. However in the "config" folder itself there is the "validationProfiles" key missing in ra.json:

github.com/letsencrypt/boulder

test/config/ra.json

eda496606

{
	"ra": {
		"rateLimitPoliciesFilename": "test/rate-limit-policies.yml",
		"limiter": {
			"redis": {
				"username": "boulder-wfe",
				"passwordFile": "test/secrets/wfe_ratelimits_redis_password",
				"lookups": [
					{
						"Service": "redisratelimits",
						"Domain": "service.consul"
					}
				],
				"lookupDNSAuthority": "consul.service.consul",
				"readTimeout": "250ms",
				"writeTimeout": "250ms",
				"poolSize": 100,
				"routeRandomly": true,
				"tls": {
					"caCertFile": "test/certs/ipki/minica.pem",

This file has been truncated. show original

The directory works since it belongs to the Web Frontend.

Osiris · February 9, 2025, 10:27am

And there isn't a check in place that makes sure certain items of both JSONs are congruent? Guess not.

I would think Boulder would refuse to stop/reload/restart/start if the configurations aren't compatible with each other.

orangepizza · February 9, 2025, 10:55am

there could be time it'd diverge, like when a profile is retired it'd removed from wfe but it would have in RA side until old orders with that profile expires

webprofusion · February 10, 2025, 2:44am

Have left a suggestion on the boulder repo for additional automated production testing to cover this: Implement Testing in Production to cover new feature sets · Issue #8001 · letsencrypt/boulder · GitHub

aarongable · February 10, 2025, 4:43pm

Hi folks, we've tracked down the bug, and indeed it was a deployability bug -- some of the code deployed in this past week's code deploy inadvertently expected certain configuration to be in place, when that config was not yet in place.

For the curious, the bug is right here:

func (vp *validationProfiles) get(name string) (*validationProfile, error) {
	if name == "" {
		name = vp.defaultName
	}
	profile, ok := vp.byName[name]
	if !ok {
		return nil, berrors.InvalidProfileError("unrecognized profile name %q", name)
	}
	return profile, nil
}

This function should have had an additional check at the top:

	if vp.defaultName == UnconfiguredDefaultProfileName {
		return vp.byName[vp.defaultName], nil
	}

to ensure that all requested profiles are accepted and given the default values, as long as the RA's configuration hasn't yet been updated with specific per-profile settings.

I've put together a fix. Thanks for your patience, everybody!

aarongable · February 10, 2025, 4:47pm

Good analysis! Note that the configs found in the config and config-next folders are not the configs that our ACME Server is deployed with: they're test configs for Boulder's integration tests. So while they are intended to be similar to our prod (config) and staging (config-next) configuration, one can't generally use them as an indication of exactly what the current production configs look like.

aarongable · February 10, 2025, 4:51pm

If boulder services refused to start unless their configs were congruent, we'd never be able to do rolling-restarts where individual components (and even individual replicas of each component) are brought up with new config one-by-one so we don't have any downtime during a deploy.

So instead we defend against these with the config-loading code. In this case, you can see that the RA was made happy to start with either the new-style or old-style configs, because we know that we can't deploy the new-style configs at the exact same instant as the new code. I just happened to write a bug a little bit further down the line, and didn't catch it in unit or integration tests.

Osiris · February 10, 2025, 5:07pm

Hm, makes sense. I shouldn't compare Boulder with stuff like Apache and nginx, which are (as far as I know in my little home server environments) single component services.

mcpherrinm · February 10, 2025, 5:19pm

What should have happened here is that our automated issuance testing should have caught that it couldn’t issue certificates, but we haven’t yet got it using profiles yet. That’s on the TODO list before we are ready to launch multiple profiles in production.

system · March 12, 2025, 5:19pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Re: Announcing Certificate Profile Selection Issuance Policy	86	1008	March 23, 2025
Need help testing potential Boulder bug condition with revocation Client dev	3	759	June 13, 2021
Update Registration/Key Rollover Temporarily Unavailable API Announcements	1	2561	September 28, 2017
ACME v2 Staging Server: Known Bugs/Issues Issuance Tech	4	7176	February 28, 2018
Authorization Cache Not Reused with profile: 'classic' in New Order Client dev	12	186	April 26, 2025

Profile selection

Related topics