Letsencrypt doesn't verify dns-01 and leaves challenge in status: pending state

We have a problem where lets encrypt server occasionally doesn’t verity dns-01 tasks and keeps these in “pending” state for ever (or until we send “resource”: “challenge” again).

Example, we get new authz for psql02.trm-trans.beep.pl:

2019-03-12 15:09:58,236 - DEBUG - JWS payload:
b'{\n  "identifier": {\n    "type": "dns",\n    "value": "psql02.trm-trans.beep.pl"\n  },\n  "resource": "new-authz"\n}'
2019-03-12 15:09:58,277 - DEBUG - Sending POST request to https://acme-v01.api.letsencrypt.org/acme/new-authz:
{
  […]
  "payload": "ewogICJpZGVudGlmaWVyIjogewogICAgInR5cGUiOiAiZG5zIiwKICAgICJ2YWx1ZSI6ICJwc3FsMDIudHJtLXRyYW5zLmJlZXAucGwiCiAgfSwKICAicmVzb3VyY2UiOiAibmV3LWF1dGh6Igp9"
}

letsencrypt server answers fine:

2019-03-12 15:09:58,684 - DEBUG - Received response:
HTTP 201
Server: nginx
Content-Type: application/json
Content-Length: 1280
Boulder-Requester: 1732128
Link: <https://acme-v01.api.letsencrypt.org/acme/new-cert>;rel="next"
Location: https://acme-v01.api.letsencrypt.org/acme/authz/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes
Replay-Nonce: UpJHtxZ_2Dpi986_vGKSobDTvh8vklCmclALEsp5oSM
X-Frame-Options: DENY
Strict-Transport-Security: max-age=604800
Expires: Tue, 12 Mar 2019 14:09:58 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Tue, 12 Mar 2019 14:09:58 GMT
Connection: keep-alive

b'{\n  "identifier": {\n    "type": "dns",\n    "value": "psql02.trm-trans.beep.pl"\n  },\n  "status": "pending",\n  "expires": "2019-03-19T14:09:58Z",\n  "challenges": [\n    {\n      "type": "http-01",\n      "status": "pending",\n      "uri": "[https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958532",\n](https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958532%22,%5Cn)      "token": "urZi6CE5T2T_yy9yTyKCC6xLvhPYUdVX512XRas5Jvs"\n    },\n    {\n      "type": "dns-01",\n      "status": "pending",\n      "uri": "[https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534",\n](https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534%22,%5Cn)      "token": "5BsMA6xfGs1h5tVPL-C7fbv_nIK2lpeai_hfs_QrsPA"\n    },\n    {\n      "type": "tls-sni-01",\n      "status": "pending",\n      "uri": "[https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958535",\n](https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958535%22,%5Cn)      "token": "hqTsmgpFur34lSeN8fydIn56yyK6hhqOrEaaq9XYMcM"\n    },\n    {\n      "type": "tls-alpn-01",\n      "status": "pending",\n      "uri": "[https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958536",\n](https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958536%22,%5Cn)      "token": "ckjl5F13rLlbq3VOvOr7P4pAlWg3KIqjehXGsbq6koA"\n    }\n  ],\n  "combinations": [\n    [\n      1\n    ],\n    [\n      0\n    ],\n    [\n      2\n    ],\n    [\n      3\n    ]\n  ]\n}'

We choose dns-01, put proper records in our DNS zones and told letsencrypt server about that:

2019-03-12 15:15:05,484 - DEBUG - JWS payload: b'{\n  "resource": "challenge",\n  "keyAuthorization": "5BsMA6xfGs1h5tVPL-C7fbv_nIK2lpeai_hfs_QrsPA.ndBNik9Qn4ddsVca8VLjaHENcFnRCFa1Rg30N_p3M8w",\n  "type": "dns-01"\n}'
2019-03-12 15:15:05,516 - DEBUG - Sending POST request to https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534:
{
  […]
  "payload": "ewogICJyZXNvdXJjZSI6ICJjaGFsbGVuZ2UiLAogICJrZXlBdXRob3JpemF0aW9uIjogIjVCc01BNnhmR3MxaDV0VlBMLUM3ZmJ2X25JSzJscGVhaV9oZnNfUXJzUEEubmRCTmlrOVFuNGRkc1ZjYThWTGphSEVOY0ZuUkNGYTFSZzMwTl9wM004dyIsCiAgInR5cGUiOiAiZG5zLTAxIgp9"
}

Where letsencrypt accepted that:

2019-03-12 15:15:05,816 - DEBUG - Received response:
HTTP 202
Server: nginx
Content-Type: application/json
Content-Length: 336
Boulder-Requester: 1732128
Link: <https://acme-v01.api.letsencrypt.org/acme/authz/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes>;rel="up"
Location: https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534
Replay-Nonce: tdoYpnwBNxMYB-6g9hZxgiCdGQEka4apknkD7OjhN2s
Expires: Tue, 12 Mar 2019 14:15:05 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Tue, 12 Mar 2019 14:15:05 GMT
Connection: keep-alive

b'{\n  "type": "dns-01",\n  "status": "pending",\n  "uri": "[https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534",\n](https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534%22,%5Cn)  "token": "5BsMA6xfGs1h5tVPL-C7fbv_nIK2lpeai_hfs_QrsPA",\n  "keyAuthorization": "5BsMA6xfGs1h5tVPL-C7fbv_nIK2lpeai_hfs_QrsPA.ndBNik9Qn4ddsVca8VLjaHENcFnRCFa1Rg30N_p3M8w"\n}'

But until now it didn’t verify dns zones and status stays in pending.

Now we are stuck.

We can get unstuck if we send “resource”: “challenge” again. Then letsencrypt server will do dns validation. Just like letsencrypt didn’t save information about resource challenge.

Why is that happening?

Recently we see this problem few times per day. Earlier it was like once per week.

Hi @arek

I'm not so firm with the ACME-version 1, I've used only v2.

But did you send a confirmation that the challenge is done?

Your link

https://acme-v01.api.letsencrypt.org/acme/authz/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes

says: Letsencrypt waits that you confirm the challenge.

Isn't above doing that?

Yes, it would appear to be doing the right thing.

ACME v1 never implemented the processing state on challenge resources. It could only ever be (pending, valid, invalid).

So it's hard to distinguish between challenge getting 'stuck' at the RA and any other problem.

Maybe @lestaff would be able to check what happened to the https://acme-v01.api.letsencrypt.org/acme/challenge/vwKBbydMj09GXl6Cw-L3awe-gEoczQ0M71NWd4JSWes/13564958534 challenge?

What was the response to this request? What ACME client are you using?

See "Where letsencrypt accepted that:" part in initial post. All requests/replies to/from letsencrypt server are there.

It's our own client (based on acme library from older version of certbot).

Thanks, I will ask the on-call engineer to investigate.

@lestaff Any progress with checking?

I’ve reminded someone to follow-up. Thanks

Ok.

Another case for comparision

2019-03-11 16:15:18,604 - DEBUG - JWS payload:
b'{\n "identifier": {\n "type": "dns",\n "value": "mysql07.bcsolutions.beep.pl"\n },\n "resource": "new-authz"\n}'
2019-03-11 16:15:18,633 - DEBUG - Sending POST request to https://acme-v01.api.letsencrypt.org/acme/new-authz:
{
[…]
"payload": "ewogICJpZGVudGlmaWVyIjogewogICAgInR5cGUiOiAiZG5zIiwKICAgICJ2YWx1ZSI6ICJteXNxbDA3LmJjc29sdXRpb25zLmJlZXAucGwiCiAgfSwKICAicmVzb3VyY2UiOiAibmV3LWF1dGh6Igp9"

}

2019-03-11 16:15:18,933 - DEBUG - Received response:
HTTP 201
Server: nginx
Content-Type: application/json
Content-Length: 1283
Boulder-Requester: 1732128
Link: <https://acme-v01.api.letsencrypt.org/acme/new-cert&gt;;rel="next"
Location: https://acme-v01.api.letsencrypt.org/acme/authz/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg
Replay-Nonce: DRqjrqWvV80M4nuyydO5aK6fzWaxr0tjsLwa-A0JAv4
X-Frame-Options: DENY
Strict-Transport-Security: max-age=604800
Expires: Mon, 11 Mar 2019 15:15:18 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Mon, 11 Mar 2019 15:15:18 GMT
Connection: keep-alive

b'{\n "identifier": {\n "type": "dns",\n "value": "mysql07.bcsolutions.beep.pl"\n },\n "status": "pending",\n "expires": "2019-03-18T15:15:18Z",\n "challenges": [\n {\n "type": "tls-sni-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888525",\n "token": "KQm5oAPChGrURAVdBelP7B_lwSHM78FGrlHOn8wQrTE"\n },\n {\n "type": "http-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888526",\n "token": "mPE10-7Hu_a82IVDz-2iNjkBfRRLY2sPSqGFSLrJIrM"\n },\n {\n "type": "tls-alpn-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888527",\n "token": "4PUPYAm8BQMKhV--WBHbyswxVlaYkltSEBckuLWhKFQ"\n },\n {\n "type": "dns-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888529",\n "token": "hc6TeXtmuuBY9x0yisNxb53Dwtqh0DSc8B2ghzIJf-M"\n }\n ],\n "combinations": [\n [\n 1\n ],\n [\n 0\n ],\n [\n 2\n ],\n [\n 3\n ]\n ]\n}'


2019-03-11 16:20:03,616 - DEBUG - JWS payload:
b'{\n "resource": "challenge",\n "keyAuthorization": "hc6TeXtmuuBY9x0yisNxb53Dwtqh0DSc8B2ghzIJf-M.ndBNik9Qn4ddsVca8VLjaHENcFnRCFa1Rg30N_p3M8w",\n "type": "dns-01"\n}'

2019-03-11 16:20:03,649 - DEBUG - Sending POST request to https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888529:
{
[…]
"payload": "ewogICJyZXNvdXJjZSI6ICJjaGFsbGVuZ2UiLAogICJrZXlBdXRob3JpemF0aW9uIjogImhjNlRlWHRtdXVCWTl4MHlpc054YjUzRHd0cWgwRFNjOEIyZ2h6SUpmLU0ubmRCTmlrOVFuNGRkc1ZjYThWTGphSEVOY0ZuUkNGYTFSZzMwTl9wM004dyIsCiAgInR5cGUiOiAiZG5zLTAxIgp9"
}

2019-03-11 16:20:04,957 - DEBUG - Received response:
HTTP 202
Server: nginx
Content-Type: application/json
Content-Length: 336
Boulder-Requester: 1732128
Link: <https://acme-v01.api.letsencrypt.org/acme/authz/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg&gt;;rel="up"
Location: https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888529
Replay-Nonce: Yty99rtb0v2SfFuje2ob54pmV27MCn8gXUNJNlDKXLo
Expires: Mon, 11 Mar 2019 15:20:04 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Mon, 11 Mar 2019 15:20:04 GMT
Connection: keep-alive

b'{\n "type": "dns-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/j0ho5Q_BDBFDiHPZMzgQNTHO2kHKRNL1ink6t7g1cAg/13524888529",\n "token": "hc6TeXtmuuBY9x0yisNxb53Dwtqh0DSc8B2ghzIJf-M",\n "keyAuthorization": "hc6TeXtmuuBY9x0yisNxb53Dwtqh0DSc8B2ghzIJf-M.ndBNik9Qn4ddsVca8VLjaHENcFnRCFa1Rg30N_p3M8w"\n}'

Sorry for the delay here. It looks like our RPCs to store the updated challenge information after validation are failing some fraction of the time. Our SRE team is looking into what’s causing an elevated error rate there. We’ll update you when we know more.

4 Likes

8 posts were merged into an existing topic: DNS01: How is the challenge supposed to be formatted?

Hi Folks,

I don’t have any conclusions yet, but I wanted to let you know that this issue has been receiving attention. I have confirmed for the examples you’ve provided here that Jacob’s point regarding our RPCs to store updated challenge information is correct.

In both of the cases listed here, we verified CAA records, successfully performed a domain validation lookup, and enacted the RPC to store the “valid” authorization result, but those RPCs failed.

However, the underlying cause of the RPC failure is not consistent, and I’m still tracing data and metrics to identify a cause. Thank you for the valuable data you’ve provided! I’ll update again when I have more information.

2 Likes

Another case to look:

We get new authz for imap.fryzer87.beep.pl:

2019-04-10 06:40:12,182 - DEBUG - JWS payload:
b'{\n "identifier": {\n "type": "dns",\n "value": "imap.fryzer87.beep.pl"\n },\n "resource": "new-authz"\n}'
2019-04-10 06:40:12,210 - DEBUG - Sending POST request to https://acme-v01.api.letsencrypt.org/acme/new-authz:
{
[...]
"payload": "ewogICJpZGVudGlmaWVyIjogewogICAgInR5cGUiOiAiZG5zIiwKICAgICJ2YWx1ZSI6ICJpbWFwLmZyeXplcjg3LmJlZXAucGwiCiAgfSwKICAicmVzb3VyY2UiOiAibmV3LWF1dGh6Igp9"
}

letsencrypt server answers fine:

2019-04-10 06:40:12,431 - DEBUG - Headers used in request:
[...]
Accept-Encoding: gzip, deflate
Accept: /
Connection: keep-alive
Content-Type: application/jose+json
Content-Length: 1753
2019-04-10 06:40:12,431 - DEBUG - Received response:
HTTP 201
Server: nginx
Content-Type: application/json
Content-Length: 1003
Boulder-Requester: 1732128
Link: <https://acme-v01.api.letsencrypt.org/acme/new-cert&gt;;rel="next"
Location: https://acme-v01.api.letsencrypt.org/acme/authz/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o
Replay-Nonce: orwJLkONR8WjppZG1wRAJRfFVbbdJgPNkH7We2_ig68
X-Frame-Options: DENY
Strict-Transport-Security: max-age=604800
Expires: Wed, 10 Apr 2019 04:40:12 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Wed, 10 Apr 2019 04:40:12 GMT
Connection: keep-alive

b'{\n "identifier": {\n "type": "dns",\n "value": "imap.fryzer87.beep.pl"\n },\n "status": "pending",\n "expires": "2019-04-17T04:40:12Z",\n "challenges": [\n {\n "type": "dns-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o/14594194671",\n "token": "UxNzaTLJwJKBdlJU3LKm2s4H2k1oGIx_wSbA5TaJK_U"\n },\n {\n "type": "tls-alpn-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o/14594194672",\n "token": "faS_cAqmz-wuWBP7kYY43YMv2PX3wzsnr7xnlZ576tc"\n },\n {\n "type": "http-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o/14594194674",\n "token": "pkLuY38H1c53ZxKKml12AhBf_DUpFnfw7SzRq5MDpn0"\n }\n ],\n "combinations": [\n [\n 1\n ],\n [\n 2\n ],\n [\n 0\n ]\n ]\n}'

We choose dns-01, put proper records in our DNS zones and told letsencrypt server about that:

2019-04-10 06:45:06,091 - DEBUG - Sending POST request to https://acme-v01.api.letsencrypt.org/acme/challenge/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o/14594194671:
{
[...]
"payload": "ewogICJyZXNvdXJjZSI6ICJjaGFsbGVuZ2UiLAogICJrZXlBdXRob3JpemF0aW9uIjogIlV4TnphVExKd0pLQmRsSlUzTEttMnM0SDJrMW9HSXhfd1NiQTVUYUpLX1UubmRCTmlrOVFuNGRkc1ZjYThWTGphSEVOY0ZuUkNGYTFSZzMwTl9wM004dyIsCiAgInR5cGUiOiAiZG5zLTAxIgp9"
}

Where letsencrypt accepted that:

2019-04-10 06:45:06,527 - DEBUG - Headers used in request:
User-Agent: Domena.pl ACME client/Questions: it-admin@domena.pl
Accept-Encoding: gzip, deflate
Accept: /
Connection: keep-alive
Content-Type: application/jose+json
Content-Length: 1825
2019-04-10 06:45:06,527 - DEBUG - Received response:
HTTP 202
Server: nginx
Content-Type: application/json
Content-Length: 336
Boulder-Requester: 1732128
Link: <https://acme-v01.api.letsencrypt.org/acme/authz/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o&gt;;rel="up"
Location: https://acme-v01.api.letsencrypt.org/acme/challenge/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o/14594194671
Replay-Nonce: N7-P7EswEBGrm-tr9TAHjCT5fk0-dhwcCiq_hHP-cpM
Expires: Wed, 10 Apr 2019 04:45:06 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Wed, 10 Apr 2019 04:45:06 GMT
Connection: keep-alive

b'{\n "type": "dns-01",\n "status": "pending",\n "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/b909NEj9J3u08H83_LkwSAMfT6aAWd1CaRT8TAxsI6o/14594194671",\n "token": "UxNzaTLJwJKBdlJU3LKm2s4H2k1oGIx_wSbA5TaJK_U",\n "keyAuthorization": "UxNzaTLJwJKBdlJU3LKm2s4H2k1oGIx_wSbA5TaJK_U.ndBNik9Qn4ddsVca8VLjaHENcFnRCFa1Rg30N_p3M8w"\n}'

And another one:

ftp.tbbis.beep.pl
https://acme-v01.api.letsencrypt.org/acme/challenge/g7qWBo2Vmkr7ntD59APOXuN1YLlbcv0h4EN0XY-CHqk/15440948668

(not pasting full logs as above url should make it easy to get all data)

Isn’t this some major problem unnoticed by letsencrypt users?

Because if we hit it so often then it has to be a major problem on global level for dns-01 users (or there is some special case why we hit it that often).

Another 3 cases:

psql01.marsb.beep.pl
https://acme-v01.api.letsencrypt.org/acme/challenge/tT16SnlalsjxymeGabuffU4uJsViJDHAXTQGbuWPofA/15625575269

pop3.goset.beep.pl
https://acme-v01.api.letsencrypt.org/acme/challenge/Gc8tGiIdIXAlSjah2yx-kRtxRdBRxQwGUpblP0CKoH4/15694471265

mysql09.peims-dns.beep.pl
https://acme-v01.api.letsencrypt.org/acme/challenge/tkqPiuZteZtWVsO28WUcnQhV4BguXbxzTh9oHYzQ4Ps/15658848973

Thank you for posting these, I'll have a look at our systems for these occurrences.

Occurrences of this problem are a very small fraction of our overall challenge handling volume. That is why it is very helpful to have this data.

@arek is there an indicative error message you receive upon retry attempts?

2 Likes

If we tell letsencrypt to do dns validation again then it does its job correctly. We do that manually for problematic cases.

We never saw any API error in these problematic cases.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.