Renewal failed after months of success

My domain is:
server2.orshost.com (subdomain on different server from parent domain)

I ran this command:
virtualmin automatic cert renewal

It produced this output (as an email):

    An error occurred requesting a new certificate for server2.orshost.com from Let's Encrypt : Failed to request certificate : Parsing account key...
    Parsing CSR...
    Registering account...
    Already registered!
    Verifying server2.orshost.com...
    Traceback (most recent call last):
      File "/usr/share/webmin/webmin/acme_tiny.py", line 203, in 
        main(sys.argv[1:])
      File "/usr/share/webmin/webmin/acme_tiny.py", line 199, in main
        signed_crt = get_crt(args.account_key, args.csr, args.acme_dir, log=LOGGER, CA=args.ca)
      File "/usr/share/webmin/webmin/acme_tiny.py", line 154, in get_crt
        domain, challenge_status))
    ValueError: server2.orshost.com challenge did not pass: {u'status': u'invalid', u'validationRecord': [{u'url': u'http://server2.orshost.com/.well-known/acme-challenge/SozygQTiQyel_XlN4nNbicU26wXNs-1LKQEpJvl_sYI', u'hostname': u'server2.orshost.com', u'addressUsed': u'', u'port': u'80', u'addressesResolved': []}], u'keyAuthorization': u'SozygQTiQyel_XlN4nNbicU26wXNs-1LKQEpJvl_sYI.SOGQnW2mFnjDAvaYJN68ntbdTsQXRXkehj8cGBIU9E0', u'uri': u'https://acme-v01.api.letsencrypt.org/acme/challenge/c2jFsKnap1BD-VK8fKTgWlHSqQOEmNJPQCVQWf8IQF0/851037874', u'token': u'SozygQTiQyel_XlN4nNbicU26wXNs-1LKQEpJvl_sYI', u'error': {u'status': 400, u'type': u'urn:acme:error:connection', u'detail': u'DNS problem: SERVFAIL looking up A for server2.orshost.com'}, u'type': u'http-01'}
    

My operating system is (include version):
Ubuntu 14.04
My web server is (include version):
Apache 2.4.7

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel):
Virtualmin version 5.04.gpl

Extra notes:
I have been receiving failure emails all day, all of a sudden.
The ‘well-known’ links are always available. http://server2.orshost.com/.well-known/acme-challenge/SozygQTiQyel_XlN4nNbicU26wXNs-1LKQEpJvl_sYI
https://dnschecker.org/#A/server2.orshost.com shows no errors on my side.

The main error seems to be with your (great!) service:
urn:acme:error:connection', u'detail': u'DNS problem: SERVFAIL looking up A for server2.orshost.com

Ideas? Suggestions? I’ve updated this server automatically for several months, successfully. (every 3 months). Today it’s being ornery, for some reason.
VirtualMin is now reporting the cert as being 3.01 months since last renewal.
(I also tried to do a new ‘request certificate’ & it responded with the same general errors: DNS)

Update: virtualmin is retrying to renew every 5 minutes. This has filled up my inbox with these errors.

hi @cmroanirgo

doing troubleshooting for a living the first question i ask when someone says it used to work but now it doesn’t is what has changed

I can confirm that i can access your well known file and can resolve your server name

There are two reasons i can think of potentially that may be the root cause something funny with DNS or something wrong with the client

do you have any further logs you can provide apart form the email?

I have had a look at the virtulmin forums and suspect the client maybe misbehaving

I can see a new version has been released https://www.virtualmin.com/ so not sure if the two are related

@ahaw021, the error indicated there is probably the problem. If you check with

you can see that there is an authority delegation (SOA) missing. This is a serious DNS problem even though it doesn’t prevent software from doing the lookup. The Let’s Encrypt CA will require this to be correct before issuing a certificate.

@cmroanirgo, you apparently need to get the DNS provider to fix the SOA records. I think that will clear up this problem quickly.

1 Like

Hi All,

I had a quick look at the virtualmin file: /usr/share/webmin/webmin/acme_tiny.py to see how/where it was logging, I soon learnt that nothing was getting out to SYSLOG, so no more information there. :sigh:

Regarding SOA errors for a subdomain. That’s to be expected when doing a DNS check on a subdomain. That said, I went and made sure that the parent domain (orshost.com) was responding properly and without warnings.

Nevertheless, the problem was found: it was firewall related.

I have CSF installed and for some reason it was blocking it. I have no idea what IP let’s encrypt was using to check, but it wasn’t the IP’s behind v01.api.letsencrypt.org, nor api.letsencrypt.org.
Ironically, this was one thing I checked before coming to this community with the problem.
My solution was to temporarily remove all IP blocks (temporary & permanent) on the DNS server (not a big deal for me).

It would be nice if lets encrypt published a set of IPs that should be whitelisted, then this problem wouldn’t have escalated far. (I searched for it but couldn’t find anything)

Thanks everyone for your time.

this is an age old discussion

cloud services don’t usually publish IPs as they can change

application aware firewalls and level 7 firewalls help with this

it’s also intersting that your firewall blocked letsencrypt

do you have an idea as to what behavior caused this?

Unfortunately I have little forensic evidence to work with. With rejection emails coming in thick and fast, and the FQDNs of lets encrypt not directly being the reason for the problem, it’s a bit a needle-in-the-haystack problem.

If the error I got back from Let’s Encrypt included an IP (or reverse PTR) that was useful, I could track causes at my end. All I could test was ‘acme-v01.api.letsencrypt.org’, & it wasn’t blocked.

Hi @cmroanirgo,

Not facilitating the whitelisting of validation addresses is intentional:

I'll make a note to mention this topic in our documentation.

depending on the firewall a pre-hook may be the best way of doing this

i.e. do a DNS lookup of the letsencrypt API and whitelist it in the firewall

@schoen @jsha - some of the customers we work with have web services but don’t allow it for the entire internet to reduce attack surfaces (i.e. only trusted parties) so i can see why some customers may want to know the IPs of the letsencrypt service

a lot of these customers do use smart firewalls from Palo Alto, CheckPoint and F5 which can see that traffic is coming from a particular host and is trusted type

I agree with the points made about frequently changing IPs. It’s a very common practice for Software as a service companies

Yep, I acknowledge that, but we will never commit to a set IP range for validation. Our intent is actually to change IPs frequently in order to make it more difficult to MITM the validation. Our main failure here is that we haven't actually followed through on changing them frequently enough to break sites that do try to firewall everything but our validation IP. The options are:

  • Use the DNS challenge, or
  • Use the TLS-SNI or HTTP challenge, with port 443 or port 80 open to the world.
1 Like

Although I do understand the need to have moving IPs, as you mention, to reduce MITM, I would be ‘wary’ of new IPs that you aquire: they might have been recently used for some webbot/ spam trickery and be blocked by various servers already. I presume you are careful with new IPs not existing on RBLs, but even then there could be issues. Damned if you do… :wink:

Nevertheless, it would be great if the IP was reported back in the error message (if appropriate to the error). It would have alleviated issues, at least in my case.

That’s a good point about knowing where our IPs have been. I’ll talk it over with the team!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.