Since Sept 2024 previously renewing HTTP-01 challenge failing with "Cannot negotiate ALPN" errors

My domain is: saflyfishers.asn.au

I ran this command: wucme.exe --uacme issue saflyfishers.asn.au

It produced this output:

2024-09-11 00:20:17.31: the server reported the following error:
{
"type": "urn:ietf:params:acme:error:unauthorized",
"detail": "Cannot negotiate ALPN protocol "acme-tls/1" for tls-alpn-01 challenge",
"status": 403
}
2024-09-11 00:20:17.31: failed to authorize order at https://acme-v02.api.letsencrypt.org/acme/order/17298823/303971410646

My web server is (include version): WASD v12.2.4

The operating system my web server runs on is (include version): VMS 8.4

My hosting provider, if applicable, is: unknown

I can login to a root shell on my machine (yes or no, or I don't know): yes (equivalent)

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): n/a

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): WUCME IA64-1.1.8 (1.1.2) (OpenSSL 3.0.7 9 Nov 2022)

There has been no change of server or LE client in over 12 months.

Well here's a turnaround ... the specified host just (automatically, on schedule) successfully renewed:

2024-09-12 00:20:00.54: (4) $10$dkd1:[wasd_root.ia64-bin.][000000]wucme.exe --uacme issue saflyfishers.asn.au
2024-09-12 00:20:06.90: challenge=http-01 ident=saflyfishers.asn.au token=ud6gJa83FaYwITQ5*********************5mbumnv2k
key_auth=ud6gJa83FaYw******************************Z8vC8Vm-TLKOBJmZUstG7ffjBvhNmd_G-Ls8
2024-09-12 00:20:21.76: reading private key /*********************wucme_k_saflyfishers_asn_au.pem
2024-09-12 00:20:22.89: wuCME successful saflyfishers.asn.au renewal

No changes to server or client. As if the LE server last night decided TLS-ALPN-01 was the way to go, then tonight reverted back to HTTP-01 !!!???

It's not the ACME server deciding that, it's the ACME client.

1 Like

Well if that's the case it must be a latent bug in the client as it was last updated 15-JAN-2023. Not impossible but this is the first report from multiple sites in those 20 months.

This is not the saflyfishers.asn.au example I opened with as that has resolved itself without intervention.

The client has been stable over 20+ months and over that period renewed LE (probably) hundreds of certs over multiple systems. The client uses HTTP-01 challenge. Here is a single example (this has started happening over three systems that I know of).

The previous 24hr renewal check failed on host resolution. Clearly reported as (there may well have been name service changes underway during that 24hr period - not my turf).

2024-09-10 00:20:05.30: the server reported the following error:
{
    "type": "urn:ietf:params:acme:error:dns",
    "detail": "DNS problem: NXDOMAIN looking up TXT for _acme-challenge.******.******.org - check that a DNS record exists for this domain",
    "status": 400
}

The very next 24hr renewal (without change to any of the LE data) must have resolved the name OK but reports:

2024-09-11 00:20:11.83: the server reported the following error:
{
    "type": "urn:ietf:params:acme:error:unauthorized",
    "detail": "Cannot negotiate ALPN protocol \"acme-tls/1\" for tls-alpn-01 challenge",
    "status": 403
}

Speculating here: As the client and cert data have not changed, perhaps name resolved OK but port 80 could not be contacted for some reason (was and is always open and operating). The LE server has fallen back to TLS-ALPN-01 in the absence of a responding cleartext port. Possible?

WUCME IA64-1.1.8 (1.1.2) (OpenSSL 3.0.7 9 Nov 2022)

Is that really Intel Itanium in production server?

I think that client was always broken that wrongly selects random challenge: (sever looked _acme-challenge subdomain, which only requests on dns-01 challenge and we have some error about tls-alpn) but it tries often enough that eventually pick right one and succeeds

2 Likes

Regarding the client GitHub - ndilieto/uacme: ACMEv2 client written in plain C with minimal dependencies I think it might be cleverly trying tls-alpn-01 because it couldn't reserve TCP port 80 (which your webserver was probably using), if it then can't in turn reserve port 443 (which your webserver is also using) then it probably gets tricky.

I imagine that internally this client messages the webserver process to tell it what to do and work cooperatively, how well the server performs that would be a question for the developer. I'm assuming the system has had a recent reboot.

As an aside, while it's very much a good thing that there are alternative web servers like WASD, you're doing things the hard way by going so far off the beaten path, presumably because your CPU architecture isn't well catered for (or you specifically prefer these tools).

3 Likes

it stared DNS challenge at 9/10, so it's unlikely to be intended behavior

The previous 24hr renewal check failed on host resolution. Clearly reported as (there may well have been name service changes underway during that 24hr period - not my turf).

2024-09-10 00:20:05.30: the server reported the following error:
{
    "type": "urn:ietf:params:acme:error:dns",
    "detail": "DNS problem: NXDOMAIN looking up TXT for _acme-challenge.******.******.org - check that a DNS record exists for this domain",
    "status": 400
}
3 Likes

The log uses the timestamp YYYY-MM-DD HH:MM:SS.hh

Is that really Intel Itanium in production server?

Yes.

1 Like

I think that client was always broken that wrongly selects random challenge: (sever looked _acme-challenge subdomain, which only requests on dns-01 challenge and we have some error about tls-alpn) but it tries often enough that eventually pick right one and succeeds

I do not think so. I have just been back through the 20+ months logs of the aforementioned Itanium server and the only 403 reported by LE is this one.

2024-09-11 00:20:07.66: challenge=tls-alpn-01 ident=saflyfishers.asn.au token=E9xcXN******dQpao key_auth=48HV-OHq1T******UpTkNC1t7Y8
2024-09-11 00:20:17.31: challenge https://acme-v02.api.letsencrypt.org/acme/chall-v3/401861930006/6GgWeA failed with status invalid
2024-09-11 00:20:17.31: the server reported the following error:
{
    "type": "urn:ietf:params:acme:error:unauthorized",
    "detail": "Cannot negotiate ALPN protocol \"acme-tls/1\" for tls-alpn-01 challenge",
    "status": 403
}

and as reported the very next renewal period succeeds

2024-09-12 00:20:06.90: challenge=http-01 ident=saflyfishers.asn.au token=ud6gJa8******nv2k key_auth=ud6gJa83FaYwITQ51******stG7ffjBvhNmd_G-Ls8
2024-09-12 00:20:21.76: reading private key /wasd_root/local/wucme_k_saflyfishers_asn_au.pem
2024-09-12 00:20:22.89: wuCME successful saflyfishers.asn.au renewal

The logs contain all activity by the automated ACME client we named wuCME. All other renewals in the logs were successful. Up until recently it only tried the once.

1 Like

Ah I see you're the author of the ACME client, so you'll know if it should be trying that tls-alpn-01 challenge or not. As shown by https://acme-v02.api.letsencrypt.org/acme/chall-v3/401861930006/6GgWeA your client did ask LE to attempt to validate that challenge.

2 Likes

for some reason your client selects tls-alpn challenge: can we get more log before it to guess why it does?

2 Likes

Ah I see you're the author of the ACME client, so you'll know if it should be trying that tls-alpn-01 challenge or not. As shown by https://acme-v02.api.letsencrypt.org/acme/chall-v3/401861930006/6GgWeA your client did ask LE to attempt to validate that challenge.

What characteristic of that challenge string indicates ALPN vs HTTP? Or is the mere fact of the challenge itself.

The URL is just a link to the unique challenge info (and it could be an HTTP challenge, DNS validation or TLS-ALPN), the JSON response shows us the actual challenge details and resulting status.

When your client creates the order it's give a list of challenges (by the CA) to attempt for each DNS identifier you want to prove control of. Your client then picks one (per identifier), prepares to complete the challenge, then tells the CA you're ready for them to check. If your chosen challenge fails to validate then so does the rest of your order.

1 Like

Just realised ... dumb question.

2024-09-11 00:20:07.66: challenge=tls-alpn-01

Thank you people. I need to look at the code (again). Could be a latent bug. The original (wrapped) acme.c code still contains

                if (strcmp(type, "dns-01") == 0 ||
                        strcmp(type, "tls-alpn-01") == 0)
                    key_auth = sha2_base64url(256, "%s.%s", token, thumbprint);
                else if (asprintf(&key_auth, "%s.%s", token, thumbprint) < 0)
                    key_auth = NULL;

which may - MAY - being triggered by something unforeseen BUT 20+ months of hundreds of successes, now all of sudden small explosions. Such is life. I will report back anything further discovered.

3 Likes

Just a courtesy reply to these comments.

Regarding the client GitHub - ndilieto/uacme: ACMEv2 client written in plain C with minimal dependencies I think it might be cleverly trying tls-alpn-01 because it couldn't reserve TCP port 80 (which your webserver was probably using), if it then can't in turn reserve port 443 (which your webserver is also using) then it probably gets tricky.

As mentioned, the uacme is wrapped in some further C code to make it amenable to our web server and O/S. The wrapper actually checks for access to port 80 and emulates one if not. So it can be used on a system that does not have a cleartext service.

I imagine that internally this client messages the webserver process to tell it what to do and work cooperatively, how well the server performs that would be a question for the developer. I'm assuming the system has had a recent reboot.

Yes, indeed. An independent, semi-autonomous process that is under the control of the server but also does its own thing. It's the bit that executes wuCME (the wrapper for acme).

As an aside, while it's very much a good thing that there are alternative web servers like WASD, you're doing things the hard way by going so far off the beaten path, presumably because your CPU architecture isn't well catered for (or you specifically prefer these tools).

Well, it was a project begun thirty years ago and has proved valuable to various users, systems, even enterprises, and so I've just kept plugging away at it. The O/S, DEC's VMS, has a special place in many hearts, and of course fodder for derision for many others. Choose your poison.

4 Likes

webprofusion wrote the ah-ha comment:

When your client creates the order it's give[n] a list of challenges (by the CA) to attempt for each DNS identifier you want to prove control of. Your client then picks one (per identifier), prepares to complete the challenge, then tells the CA you're ready for them to check. If your chosen challenge fails to validate then so does the rest of your order.

The following is a simplified response to the WASD wuCME community, by way of explanation and bug fix, provided here to close the issue. The fix has been trialled on three independent systems and is confirmed. Thanks to the LE community for its inputs.

---------------

Now the "nor really come to grips with the internals", has returned like the Ghost of Christmas Past.

The wuCME version log begins

  01-JUN-2020  MGD  v1.0.0, initial release
  08-AUG-2019  MGD  initial development

so just over four years in production. Four years, and hundreds of Let's Encrypt certificates issued across multiple systems (with a few enhancements and tweaks along the way) ... basically all due to serendipity.

Within a handful of days some fifty months on, multiple reports emerged of strange challenge failures. In particular

challenge=tls-alpn-01 ident=www.******.au token=bxf_******aPg8 key_auth=ZYPv******EMxBc

and also the likes of

challenge=dns-01 ident=www.******.au token=oMj3Vk******A3VU key_auth=cayvE2 ******QbAuY

but challenges that were successful as well

challenge=http-01 ident=www.******.au token=bxf******jbaPg8 key_auth=bxf_y******d9_G-L8

I had coded wuCME's use of the core uacme.c on the presumption that unless you specified an alternate, the challenge defaulted to the simplest, "http-01". Nope! Incorrect! Not even close! wuCME defaults to the first offered by the Let's Encrypt servers. LE currently offers three.

Just happened that for the first four years of wuCME production LE must have offered "http-01" first, which wuCME supported. Recently LE's algorithm obviously changed from a fixed order of challenges, to a varying order. wuCME's simple grab the first and run as often as not was suddenly broken (probably with a large enough sample, 2 in 3 likely broken :-)

This bug had remained latent for all those years.

The fix, an embarrassing

                if (strcmp (type, "http-01"))
                {
                   /* discard unsupported challenge */
                   free(key_auth);
                   continue;
                }

And to rub salt into the wound, the uacme.c default behaviour (right there in adjacent code) is to similarly step through the challenges allowing the user to accept or decline each provided by LE.

                msg(0, "type 'y' followed by a newline to accept challenge"
                        ", anything else to skip");
                if (scanf(" %c", &c) != 1 || tolower(c) != 'y') {
                    free(key_auth);
                    continue;
                }
3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.