Server 500 but certificate still issued

Yep, I’m hitting the limit too.

Haven’t tried yet, will save my tries for the next test)

Hi @nit, all,

Sorry for the troubles. We rolled out some production changes that caused an increase in latency, which in turn resulted in an increase in 500s due to timeouts. We’ve rolled back those changes for now.

The relevant certificates are stored in our database and logs. In the instance of crans.org, I’m investigating why they didn’t get originally logged to CT. For now I’ve logged them: https://crt.sh/?q=hostnames-a-m.crans.org&iCAID=7395. @Nit, if you have your original keys you should be able to download one of those certs via the “certificate” link on its page, and use that.

So for now everything should work properly?

Works not for me, I wanted to expand my SANs, so I run with certonly and all SANs. It seems when I choose expanding and replacing the old cert there is used a new privkey. When I download the cert from crt.sh and try to bind it in apache there is an startup error because privkey doesnt matches with the cert. :c

  1. If the bug has truly been fixed, would it be possible for you to temporarily increase the request limit so that those who ran into the bug without the original keys could retry without waiting for up to 7 days which could potentially be past the certificate’s expiration date?

  2. Why has this bug not existed on the staging server? We only run commands on the live server after verifying that they work on the staging server. Does this mean some development work is being pushed to production without also being pushed to the staging server?

Thanks, I was able to retrieve a valid (key, cert) couple.
I just will have to wait 7 days for issuing hostnames-n-z.crans.org with the SAN for the other half of our domains, hopefully without any issue.

7 posts were split to a new topic: Retrieving issued certificates from CT

Update on this topic: We found the issue that caused CT logging not to be retried right away for some certs and are working on a fix. We also have a committed fix for what we think was the core issue causing 500s. This should go out with Thursday's release, at which time we'll try again.

We always push code to staging before production. Unfortunately, staging will never be exactly like prod - it has different data in its DB, different query patterns, and different latencies between components. We always try to make staging as much like prod as possible, but this was a case where the differences meant this bug only showed up in prod.

I try to generate the certificate using crans.fr which is a DNAME of crans.org and it works. So this issue is solved to me.

Thanks for the quick solving !

Just wanted to let you guys know: getting cert is working again, with 10 more names added (38 in total now).

Hi all,

I’m still experiencing similar problems. When issuing a whole bunch of certificates, it fails on a single domain with the internal server error but does issue the .pem file. Running the command certbot --apache --debug on the single domain this is the last part of the log. I’m a noobie and can’t seem to find the problem. I took out all the cryptograpic output and replaced it with redacted in the post below. Any help would be welcome!

    2017-09-11 17:18:22,223:INFO:certbot.auth_handler:Cleaning up challenges
2017-09-11 17:18:23,739:INFO:certbot.crypto_util:Generating key (2048 bits): /etc/letsencrypt/keys/0025_key-certbot.pem
2017-09-11 17:18:23,752:INFO:certbot.crypto_util:Creating CSR: /etc/letsencrypt/csr/0025_csr-certbot.pem
2017-09-11 17:18:23,752:DEBUG:certbot.client:CSR: CSR(file='/etc/letsencrypt/csr/0025_csr-certbot.pem', data='**redacted**'), domains: ['stichtingleenaertboon.nl', 'www.stichtingleenaertboon.nl']
2017-09-11 17:18:23,753:DEBUG:acme.client:Requesting issuance...
2017-09-11 17:18:23,754:DEBUG:acme.client:JWS payload:
{
  "resource": "new-cert", 
  "csr": "**redacted**"
}
2017-09-11 17:18:23,766:DEBUG:root:Sending POST request to https://acme-v01.api.letsencrypt.org/acme/new-cert:
{
  "header": {
    "alg": "RS256", 
    "jwk": {
      "e": "AQAB", 
      "kty": "RSA", 
      "n": "**redacted**"
    }
  }, 
  "protected": "**redacted**", 
  "payload": "**redacted**", 
  "signature": "**redacted**"
}
2017-09-11 17:18:26,531:DEBUG:requests.packages.urllib3.connectionpool:"POST /acme/new-cert HTTP/1.1" 500 101
2017-09-11 17:18:26,537:DEBUG:acme.client:Received response:
HTTP 500
Server: nginx
Content-Type: application/problem+json
Content-Length: 101
Boulder-Request-Id: ZzOCGMgKwzUBAwXRlRdxtNUosYnd1JMaLTW5ocidfo4
Boulder-Requester: 8125549
Replay-Nonce: tEkh_ku5vrNb4UFMDLAoWRuDU85cHB6puvsti_NGaEY
Expires: Mon, 11 Sep 2017 17:18:26 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Mon, 11 Sep 2017 17:18:26 GMT
Connection: close

{
  "type": "urn:acme:error:serverInternal",
  "detail": "Error creating new cert",
  "status": 500
}
2017-09-11 17:18:26,539:DEBUG:acme.client:Storing nonce: tEkh_ku5vrNb4UFMDLAoWRuDU85cHB6puvsti_NGaEY
2017-09-11 17:18:26,541:DEBUG:certbot.main:Exiting abnormally:
Traceback (most recent call last):
  File "/usr/bin/certbot", line 11, in <module>
    load_entry_point('certbot==0.10.2', 'console_scripts', 'certbot')()
  File "/usr/lib/python2.7/dist-packages/certbot/main.py", line 849, in main
    return config.func(config, plugins)
  File "/usr/lib/python2.7/dist-packages/certbot/main.py", line 575, in run
    action, lineage = _auth_from_available(le_client, config, domains, certname)
  File "/usr/lib/python2.7/dist-packages/certbot/main.py", line 107, in _auth_from_available
    lineage = le_client.obtain_and_enroll_certificate(domains, certname)
  File "/usr/lib/python2.7/dist-packages/certbot/client.py", line 291, in obtain_and_enroll_certificate
    certr, chain, key, _ = self.obtain_certificate(domains)
  File "/usr/lib/python2.7/dist-packages/certbot/client.py", line 272, in obtain_certificate
    return (self.obtain_certificate_from_csr(domains, csr, authzr=authzr)
  File "/usr/lib/python2.7/dist-packages/certbot/client.py", line 243, in obtain_certificate_from_csr
    authzr)
  File "/usr/lib/python2.7/dist-packages/acme/client.py", line 318, in request_issuance
    headers={'Accept': content_type})
  File "/usr/lib/python2.7/dist-packages/acme/client.py", line 671, in post
    return self._post_once(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/acme/client.py", line 684, in _post_once
    return self._check_response(response, content_type=content_type)
  File "/usr/lib/python2.7/dist-packages/acme/client.py", line 570, in _check_response
    raise messages.Error.from_json(jobj)
Error: urn:acme:error:serverInternal :: The server experienced an internal error :: Error creating new cert

@cpu, could you look at this new internal error?

@joeyboon, redacting this stuff is a very good intuition, but as it happens what you redacted is basically representations of your public key, which will be visible to everyone who connects to your server once the certificate is issued. An RSA private key has additional parameters p, q, and d, which do need to be kept totally secret, but which are never sent to the certificate authority. In Certbot, these parameters will all be stored in a PEM file called privkey.pem whose contents also should not be posted or shared anywhere.

1 Like

Sure thing: This was a CAA "recheck" failure:

Rechecking CAA: DNS problem: SERVFAIL looking up CAA for www.stichtingleenaertboon.nl

We only recently started checking CAA at issuance time when the original CAA lookup for the authorization is no longer considered usable by the baseline requirements. There was a bug with the recheck implementation that turned the failures into Internal Server Errors instead of a better error representation. We've fixed that bug but the fix hasn't made its way to production yet.

1 Like

@schoen Thanks! I did not know it any of this info was sensitive so i just redacted everything.

@cpu Thanks for the help. If I correctly understand what you wrote the bug only made the error pop out different and had nothing to do with the CAA lookup failure itself. Do you have any pointers on that? The hostingprovider for that domain does not yet support CAA records, could that be the problem? or do I need to look elsewhere? Any help would be much appreciated.

Hi @joeyboon - that's correct.

Sure! We have a documentation page on CAA that is probably the best resource. In your case your DNS provider is falling into the "SERVFAIL" bucket described on that docs page. They don't need to support adding CAA records but they do need to respond the correct way when Let's Encrypt asks if you have a CAA record. You could contact your DNS provider to ask about why they return the incorrect SERVFAIL status or you could switch to an alternative DNS provider.

Hope that helps!

1 Like

I solved it! Problem was located at my domain registrar. They had to reset DNSsec for the domain. Now everything works like a charm! Thanks again for helping out a noobie :wink:

1 Like

Looks like recent changes broke this again. The libreswan.* domains, which are DNAMEs to libreswan.org all broke :frowning:

we changed all libreswan.* domains to not use DNAME for now, and left libreswan.net broken so it can be used to diagnose and test. We do want to go back to using DNAMEs again for all domains through…

Hi @letoams,

In the process of working through the legacy CAA implementation we were required to deploy to meet the baseline requirements we made a choice to not support DNAMEs.

I believe this will work when we return to an erratum 5065 CAA tree climbing algorithm. We're petitioning various parties/root programs to try and get back to this state ASAP.