"Unable to deactivate account :: only valid registrations can be deactivated" since 2024-12-18

_az · December 18, 2024, 7:42am

Hello, long time no see.

I have a nightly CI job that tests a plugin of mine against the staging server.

It is as follows:

sudo certbot --staging register -n --no-eff-email \
  --agree-tos --email certbot-dns-multi-nightly@zorin.au
sudo certbot certonly --staging \
  -d "*.zorin.au" -a dns-multi \
  --dns-multi-credentials=/etc/letsencrypt/dns-multi.ini
sudo certbot --staging show_account -vvv
sudo certbot --staging unregister -n -vvv

During the unregister step, I get the following error:

{
  "type": "urn:ietf:params:acme:error:malformed",
  "detail": "Unable to deactivate account :: only valid registrations can be deactivated",
  "status": 400
}

I haven't seen it before, and it now happens on 100% of runs since it first occurred on 2024-12-18 04:31:37 UTC-ish (I added the account email address after then, sorry).

In the show_account step, I get:

JWS payload:
b'{\n  "onlyReturnExisting": true,\n  "contact": []\n}'
Sending POST request to https://acme-staging-v02.api.letsencrypt.org/acme/new-acct:
{
  "protected": "***",
  "signature": "j_GFoWUDflb99-kBowQSjLxUI5-ZPGjdEpeEkNlnSc_bDVzzi4NfDkDKL4Gu0fc2MjEq7VE17nE7S2NZsFOSORI_62_bie5vMIkPCSnCjAGhVozbjFNn9yztcoTkBvhbcsQGAwbNjEuhSdRGa5xFgdetx6u6mc3epmN3W1v46JRTD2g69GNhFJroR6GW3bmCtpTxG5OewLbdOViIehp_gAmfTr0HOt2LrnInzcNLRdXz_Jk0w1IsmRYcn3t1f2I5uG7GZ-5BWtDa2-KmfVcRVfcEsr49rwhKX6A2BUfGisyK5nFETWsSvANnkK2wV5bV6lFk1CHxa0cR7SmH_fXiqQ",
  "payload": "ewogICJvbmx5UmV0dXJuRXhpc3RpbmciOiB0cnVlLAogICJjb250YWN0IjogW10KfQ"
}
https://acme-staging-v02.api.letsencrypt.org:443 "POST /acme/new-acct HTTP/1.1" 200 535
Received response:
HTTP 200
Server: nginx
Date: Wed, 18 Dec 2024 07:32:59 GMT
Content-Type: application/json
Content-Length: 535
Connection: keep-alive
Boulder-Requester: 176584664
Cache-Control: public, max-age=0, no-cache
Link: <https://acme-staging-v02.api.letsencrypt.org/directory>;rel="index"
Location: https://acme-staging-v02.api.letsencrypt.org/acme/acct/176584664
Replay-Nonce: XCuYn79es3B1HR38m_Qg2dZyzmmIFZik8yq8gu9PKnMj_URmM88
X-Frame-Options: DENY
Strict-Transport-Security: max-age=604800

{
  "key": {
    "kty": "RSA",
    "n": "pHwmQh7xug9DWHLWNdaXQEVo0KKlRFVvFSJS9Ka4TwpEZdSzeRsIBISn4uD1nKrvDwAyDoVbpy04h3vWSza3nu8Ych8u6fFxya2n20KXkUeCHl_u-YtKopAx_N5h-hJgxeXd3kNupe5eUg5a5k5ap8g52NJ9jj3X0Bzq-XB-fDTV6FU3KVuVZSuYC79yo_dMfzjNSQ7Yb3qTxXD3f60_NNpIkXVEw_vPegBhsTS7Vksb5kAvAlOCmG_DH78OcL2_DOX_6ifzNxof8ULdbVEOE9dC1M2tmvsGIxX1PXfOZ84ITvUT0EqhdHbuXhfsnVKrCHA2cHafOBInIBnoMEc8Ew",
    "e": "AQAB"
  },
  "contact": [
    "mailto:certbot-dns-multi-nightly@zorin.au"
  ],
  "createdAt": "2024-12-18T07:31:52Z",
  "status": "valid"
}

So, the status reported by the server is valid, but then the subsequent deregistration step fails:

JWS payload:
b'{\n  "status": "deactivated"\n}'
Sending POST request to https://acme-staging-v02.api.letsencrypt.org/acme/acct/176584664:
{
  "protected": "***",
  "signature": "TT9TvxSTdaMIIw202kniqv09sQnp9r-VMZh43kQ9pHV7izNWY3kgmXP_tp5SdQ-L8bIfYH3_ZKsxzOk4TV5n6fFFTay98F_jlPkmYEJCgldXdLvhPLaOgJpPluvl70j-YpLGICW9kpslt2NorQ5S2sPhCVq7fvoQM-SVMnaTInvGQzQmGt_1roedtUyxe9qYl-RHux7t-ULnWHNeENV8kVS_RAeYUcpDNWlNvwatsLEUNvYK-SQN-JoKFbfTMPuJUjlD8ZUXmZalrCL4bdVR87rGIXkbP7YQKugeaYX7zLPJ3ViDmGKEfyXO5I9zIy_lmiXXy5RT7MDPctCpxNo44g",
  "payload": "ewogICJzdGF0dXMiOiAiZGVhY3RpdmF0ZWQiCn0"
}
https://acme-staging-v02.api.letsencrypt.org:443 "POST /acme/acct/176584664 HTTP/1.1" 400 160
Received response:
HTTP 400
Server: nginx
Date: Wed, 18 Dec 2024 07:33:00 GMT
Content-Type: application/problem+json
Content-Length: 160
Connection: keep-alive
Boulder-Requester: 176584664
Cache-Control: public, max-age=0, no-cache
Link: <https://acme-staging-v02.api.letsencrypt.org/directory>;rel="index"
Replay-Nonce: J6d90a-tWaq4ts2NmVA-2Ko0abSAxtU8UXJLUsqR0b8l4sGk_B4

{
  "type": "urn:ietf:params:acme:error:malformed",
  "detail": "Unable to deactivate account :: only valid registrations can be deactivated",
  "status": 400
}

It looks like I'm not getting a correct read-after-write.

Any tips? I can always update the job to not deregister, but I don't want to clutter the staging server with a bunch of valid accounts. Not too keen on holding long-term account credentials for an account on the CI job either.

Osiris · December 18, 2024, 10:11am

If I "Github blame" wfe.go where the error is produced, I see a recent PR wfe: Use separate UpdateRegistrationContact & UpdateRegistrationKey methods by jprenken · Pull Request #7827 · letsencrypt/boulder · GitHub which has modified some code, see Blaming boulder/wfe2/wfe.go at 11d543bd9878f0624b3929a691ae6aa4838d451b · letsencrypt/boulder · GitHub. Staging currently runs 11d543bd, and that PR is included.

Maybe the culprit?

Nummer378 · December 18, 2024, 1:14pm

Staging was updated to 11d543bd around 2024-12-17 23:00Z, which is a bit earlier than when _az reported the first failure. I also had a look at CI and there's a successful run earlier on 2024-12-17, before the update - so the timeline fits quite well with this update causing the error.

Osiris · December 18, 2024, 2:33pm

Where did you get this info? There's no info on the LE status.io about staging updates.

Nummer378 · December 18, 2024, 2:36pm

My own monitoring

https://germancoding.com/boulder-version-history/

Osiris · December 18, 2024, 2:46pm

Noice

aarongable · December 18, 2024, 3:41pm

Good eye, and thanks for the heads up! I suspect that the diagnosis here is exactly correct; we'll investigate and fix and make sure this doesn't go to prod.

Phil · December 18, 2024, 3:54pm

The staging boulder has been rolled back downhill.

petercooperjr · December 18, 2024, 4:06pm

Just how long have you been waiting to use that pun? Or is it standard terminology amongst the Boulder devs?

Phil · December 18, 2024, 4:08pm

The magic flowed out of fingers into the reply as I was typing. Like Michael Scott says, "Sometimes I'll start a sentence, and I don't even know where it's going. I just hope I find it along the way. Like an improv conversation. An improversation.”

griffin · December 18, 2024, 5:12pm

Speaking of... yes, and...

Osiris · December 18, 2024, 5:53pm

In combination with:

@Nummer378 Is it a deliberate choice to keep just the single commit entry for rollbacks? I see you use "First seen" and "Last seen", but for me it would make more sense to have a new entry for rollbacks with that certain commit a second time with a new "From" and "To" timestamp.

Maybe even add an exclamation mark with "rollback" title popup so it's clear it was rolled back

Nummer378 · December 18, 2024, 6:26pm

I don't recall exactly, but I think it was easier this way to cover different scenarios. I do recall a time when LE did some type of A-B testing* where the client would frequently flip-flop between two different Boulder versions. Describing that as a "rollback" every time an old version reappeared would quite quickly overload the table, so I went with just a single entry. The drawback is, as you've realized, that multiple rollbacks are very difficult to reconstruct from the data set.

*It may also have been due to different DC's running slightly different build IDs, I don't know for sure.

Osiris · December 18, 2024, 6:27pm

Hmm, but how often would that be the case?

Also maybe some "grace period" might help there..? Just thinking out loud

That said, I suspect there would be a third commit after a rollback, so it's possible to reconstruct where the rollback was..

I.e., if the commits after reach other were A -> B -> C and B was rolled back to A with the reason for the rollback fixed in C, the "First seen" and "Last seen" times of B would be in the middle of the "First seen" and "Last seen" of A. I.e., the timestamps of A would be before and after the respective timestamps of B. And the "First seen" from C would correspond with the new "Last seen" of A, while the now missing earlier "Last seen" of A corresponds with the "First seen" of B.. So it is possible to reconstruct the events.

mcpherrinm · December 18, 2024, 8:28pm

We "Canary" a new release for a few moments and verify it's working properly before doing a full deploy, and the deploy itself is rolling, so there is often periods where two releases are running. Even the public endpoint is just the "WFE" component, not the full stack.

The deploy process for each DC is also independent, so there may be a period of time where there's two releases running in prod (we only have 1 staging)

Here's what it looks like in terms of running processes (though the old processes that are in the process of shutdown don't show up here so it's not 100% right):

You can see the big vertical spike as the canaries come up, a (mostly) flat line as we wait for the canary period, and then a big yellow drop as we start swapping out the old release

Osiris · December 18, 2024, 8:35pm

Looks like that s all done within the hour, perhaps even on a scale of minutes if the next value on the X-axis is "23:20".

mcpherrinm · December 18, 2024, 8:38pm

Yes, that's times at the bottom, so that staging deploy started at 23:13 and finished at 23:18 UTC. So about 5 minutes. Double that in prod assuming we do the DCs sequentually, plus any time we might wait in between.

Osiris · December 18, 2024, 9:08pm

Sounds like an hour as "grace period" for @Nummer378 s history application would be more than enough (overkill probably) to prevent erroneous entries in that history.

Assuming no AB-testing is going to happen any time soon without prior warning.

On another note (and more on topic): seems like @aarongable fixed this bug in Fix error when deactivating an account by aarongable · Pull Request #7899 · letsencrypt/boulder · GitHub.

Topic		Replies	Views
Account is not valid, has status "deactivated" Help	40	1607	September 15, 2023
JWS verification error on account deactivation preceded by key rollover Help	8	894	April 13, 2022
Site secured but dry run is telling me the account does not exist Help	66	2847	October 19, 2020
Staging account deactivated Help	6	2048	March 1, 2019
cme_registration.reg: Creating... ╷ │ Error: acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-acct :: urn:ietf:params:acme:error:unauthorized :: An account with the provided public key exists but is deactivated Help	4	113	April 6, 2026

"Unable to deactivate account :: only valid registrations can be deactivated" since 2024-12-18

Related topics