DNS problem: SERVFAIL looking up A but only for dry-run

Hello.

My domain is:

My web server is (include version):
nginx version: nginx/1.12.1

The operating system my web server runs on is (include version):
CentOS Linux release 7.3.1611 (Core)

I can login to a root shell on my machine (yes or no, or I don't know):
yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
no

We've faced with very strange issue during regular certificates renewal. We would like to switch renew workflow from nginx to webroot, so /etc/letsencrypt/renewal/javagala.ru.conf has been edited accordingly. After that we tried dry-run first.
certbot renew --dry-run

It produced following error:

IMPORTANT NOTES:
- The following errors were reported by the server:

   Domain: javagala.ru
   Type:   None
   Detail: DNS problem: SERVFAIL looking up A for javagala.ru

After several minutes of googling we found out that typical problems for such kind of issue are: DNSSEC problem (for example https://community.letsencrypt.org/t/getting-dns-problem-servfail-looking-up/41956) and DNS resolving problem.

Our system administrator assure me that we haven't DNSSEC. Please see here: http://dnsviz.net/d/javagala.ru/dnssec/.

Concerning DNS resolving we've checked that our site is available from different world points using http://ping-admin.ru/free_ping.

From certbot log we can see that letsencrypt determined IP address of our site IP correctly. Please see:

  "validationRecord": [
    {
      "url": "https://javagala.ru/.well-known/acme-challenge/XbaKSLg0e1OIefNWR0yWU2dqOGRqSQapii8fQ6GFIjY",
      "hostname": "javagala.ru",
      "port": "443",
      "addressesResolved": [
        "95.172.133.90"
      ],
      "addressUsed": "95.172.133.90"
    },
    {
      "url": "http://javagala.ru/.well-known/acme-challenge/XbaKSLg0e1OIefNWR0yWU2dqOGRqSQapii8fQ6GFIjY",
      "hostname": "javagala.ru",
      "port": "80",
      "addressesResolved": [
        "95.172.133.90"
      ],
      "addressUsed": "95.172.133.90"
    }
  ]

More interesting is that we can see in nginx access log request and correct response :

66.133.109.36 - - [04/May/2018:14:32:17 +0700] "GET /.well-known/acme-challenge/XbaKSLg0e1OIefNWR0yWU2dqOGRqSQapii8fQ6GFIjY HTTP/1.1" 200 87 "http://javagala.ru/.well-known/acme-challenge/XbaKSLg0e1OIefNWR0yWU2dqOGRqSQapii8fQ6GFIjY" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"

After that we tried regular run (in few minutes):
certbot renew

No errors were appeared. We received our certificate.

Could someone clarify what going on? Should we worry about our future renewals?

I can’t find any reason for the DNS failure.
And I would not worry too much about the missed attempt.
Typically, certbot is set to try twice a day and will start attempting renewals 30 days prior to expiration…
That’s 60 attempts before it will expire.
Your DNS would have to be pretty bad for it to miss 60 times.

Is the Russian government still blocking millions of Amazon IP addresses?

The Let’s Encrypt production environment runs entirely on Let’s Encrypt’s servers, but --dry-run uses the staging environment, which currently uses a combination of Let’s Encrypt’s servers and AWS.

I suspect some of the staging AWS DNS resolvers are having trouble reaching your domain’s authoritative nameservers.

I can’t reach either of them from an EC2 instance.

$ mtr -brwz ns1.javagala.ru
Start: Fri May  4 12:11:48 2018
HOST: rush                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???   ???                                           100.0    10    0.0   0.0   0.0   0.0   0.0
  2. AS???   ???                                           100.0    10    0.0   0.0   0.0   0.0   0.0
  3. AS???   ???                                           100.0    10    0.0   0.0   0.0   0.0   0.0
  4. AS???   ???                                           100.0    10    0.0   0.0   0.0   0.0   0.0
  5. AS???   ???                                           100.0    10    0.0   0.0   0.0   0.0   0.0
  6. AS???   100.65.10.161                                  0.0%    10    0.6   1.0   0.4   2.9   0.6
  7. AS16509 52.95.1.41                                     0.0%    10    0.6   1.3   0.6   6.1   1.6
  8. AS16509 52.95.2.26                                     0.0%    10    9.7   6.7   1.3   9.7   2.7
  9. AS16509 52.95.2.33                                     0.0%    10    0.6   2.6   0.6  20.5   6.2
 10. AS???   100.91.39.116                                  0.0%    10   11.0  16.1  11.0  26.3   4.8
 11. AS???   54.239.45.239                                  0.0%    10   11.6  12.6  11.5  16.6   1.7
 12. AS???   100.91.0.23                                    0.0%    10   10.8  11.3  10.8  12.7   0.5
 13. AS16509 54.239.108.146                                 0.0%    10   14.9  30.1  14.9  57.6  12.4
 14. AS16509 54.239.111.239                                 0.0%    10   10.9  11.6  10.9  15.5   1.3
 15. AS???   cat01.frankfurt.beeline.ru (206.126.236.130)   0.0%    10  102.0  98.9  98.0 102.0   1.5
 16. AS???   pe02.Krasnoyarsk.gldn.net (79.104.247.43)      0.0%    10  200.3 199.3 199.0 200.3   0.3
 17. AS???   ???                                           100.0    10    0.0   0.0   0.0   0.0   0.0

(EC2’s network architecture always makes mtr a bit shaky, though.)

1 Like

Thx for comments and quick responses.

Unfortunately yes.

That looks logically. Thx for clarification.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.