LE with Terraform and Route53 domains having public and private zones


#1

My domain is: *.app-kamehameha.anaplan.io

I ran this command:

terraform apply

It produced this output:

* acme_certificate.acme: error creating certificate: acme: Error -> One or more domains had a problem:
[app-kamehameha.anaplan.io] Time limit exceeded. Last error: NS ns-1536.awsdns-00.co.uk. returned REFUSED for _acme-challenge.app-kamehameha.anaplan.io.

My web server is (include version): N/A

The operating system my web server runs on is (include version): N/A

My hosting provider, if applicable, is: N/A

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): no

I’m using the following:
Terraform v0.11.10
which uses terraform-provider-acme
terraform-provider-acme is built on xenolf/lego

Issues raised:


My DNS provider is Route53, and I have public and private zones. I had assumed that when I provide a fully qualified name that the source of the DNS to query would be looked up at the issuer side… but it appears that something additional is being supplied or is required to be supplied. When I lookup the NS records publicly I get the following list:

; <<>> DiG 9.10.6 <<>> NS anaplan.io @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51686
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;anaplan.io.			IN	NS

;; ANSWER SECTION:
anaplan.io.		21599	IN	NS	ns-1344.awsdns-40.org.
anaplan.io.		21599	IN	NS	ns-1818.awsdns-35.co.uk.
anaplan.io.		21599	IN	NS	ns-349.awsdns-43.com.
anaplan.io.		21599	IN	NS	ns-710.awsdns-24.net.

From the error above, somehow the internal NS entries are what’s being queried:

; <<>> DiG 9.10.6 <<>> NS anaplan.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31817
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;anaplan.io.			IN	NS

;; ANSWER SECTION:
anaplan.io.		36	IN	NS	ns-1024.awsdns-00.org.
anaplan.io.		36	IN	NS	ns-1536.awsdns-00.co.uk.
anaplan.io.		36	IN	NS	ns-0.awsdns-00.com.
anaplan.io.		36	IN	NS	ns-512.awsdns-00.net.

From the error above you can see that it’s querying the internal DNS server of ns-1536.awsdns-00.co.uk. I’m not sure how it finds this, or if it’s using the client to query without providing a valid NS to look up against, but some help would be greatly appreciated in debugging this. It appears to be quite random and while it use to work, it’s consistently not working now.

The TF code that performs the certificate issuance is here:

variable "acme" {
  type        = "map"
  description = "ACME provider settings"

  default = {
    bool  = true
    email = "valid@email.tld"
    prod  = "https://acme-v02.api.letsencrypt.org/directory"
    stage = "https://acme-staging-v02.api.letsencrypt.org/directory"
  }
}

variable "domain" {
  type        = "string"
  description = "Domain name"
}

variable "environment" {
  type        = "string"
  description = "Environment name"
}

provider "aws" {
  version = "~> 1.27"

  profile = "default"
  region  = ""us-east-1"
}

data "aws_route53_zone" "route53_public" {
  name         = "${var.domain}."
  private_zone = false
}

provider "acme" {
  version    = "~> 1.0"
  server_url = "${var.acme["bool"] ? var.acme["prod"] : var.acme["stage"]}"
}

resource "tls_private_key" "acme" {
  algorithm = "RSA"
}

resource "acme_registration" "acme" {
  account_key_pem = "${tls_private_key.acme.private_key_pem}"
  email_address   = "${var.acme["email"]}"
}

resource "acme_certificate" "acme" {
  account_key_pem = "${acme_registration.acme.account_key_pem}"
  common_name     = "*.app-${var.environment}.${var.domain}"
  key_type        = 2048
  must_staple     = true

  dns_challenge {
    provider = "route53"

    config {
      AWS_HOSTED_ZONE_ID = "${data.aws_route53_zone.route53_public.zone_id}"
    }
  }
}

#2

I guess my real question is that once a DNS TXT record is created, which I do see being put on the PUBLIC zone, what does the actual validation that the text record exists and from where is that done?


#3

Let’s Encrypt just performs validation from their servers relying on the public DNS. The ACME client doesn’t – and can’t – tell it what to do, except for specifying what the hostname is, and when to validate (now).

Some ACME clients implement some form of separate checks on their own, to try to verify that everything is correct before asking Let’s Encrypt to validate it. When this goes awry, it does more harm than good.

This issue and the error message are coming from the ACME client you’re using. There’s no indication anything is wrong with Let’s Encrypt – or with your DNS setup, for that matter. There might be a bug in the ACME client, or a mistake in your configuration.

I hope that’s part of what you were asking?