I'm facing an issue trying to generate certificates with terraform provider vancluever/acme (2.7.0). The domain pacts.cloud is public and under my control. I do have a public route53 zone available.
Here is my code for the certificate creation:
However, I can see the validation route during the 3min terraform is running.
For me the error suggests that the validation route is not populating in time. However, the delay is set in the certificate seems to get ignored.
Normally, rerunning the script would try to either delete the old resource and recreate it (if it was properly saved in the terraform state file) or run into conflicts (in case of trying to create another validation route).
Now you simply need to start a timer, pause the script, and check all (four) of your authoritative DNS servers to see how long it takes for them all to show that new TXT record.
Do that timer test three times. Then take the average update time and multiply it by three.
And use that number in your line:
with that calculated number OR 1200 [whichever is larger].
If the number was less than 1200, then we may have a real (yet unknown) problem and musst continue searching for clues about it.
If the number was larger, then try the new number and report back your findings.
[remember to always use the staging system when conducting such obvious testing]
What should happen is that the ACME client should be checking Route 53's API to see if the DNS servers are in sync (that is, if the change set is done) before proceeding. That'd be more reliable than just waiting for a while. I don't know if that particular client's Route 53 implementation does so.
I'm also quite confused as to why, even if the systems weren't in sync, a REFUSED response would be involved at all. Something about what's going on seems weird, especially if that REFUSED status is reproducible. That error message doesn't look like it's coming from the Let's Encrypt servers, even.
@FlorianGerdes Do you have a support contact at AWS? They maybe could look at logs to see why the Route53 DNS sent a Refused response.
I see in the RFC that the response is for
Refused - The name server refuses to
perform the specified operation for
policy reasons. For example, a name
server may not wish to provide the
information to the particular requester,
or a name server may not wish to perform
a particular operation (e.g., zone
transfer) for particular data.
I have no idea what policy is being violated. Especially since we can see the TXT record now. But AWS should know or be able to tell you better.
I am not at all expert at DNS so maybe the others here will still resolve it. I am just providing more clues.
UPDATE: Some more random ideas:
I see Route53 offers Traffic Policies. Could you have one that would interfere with requests from LE Servers?
I saw a post via google (which I lost) where someone said they got "Refused" error when the name servers listed in their Registered Domain section in AWS were all valid Route53 Name Servers but they were not identical to the ones listed in the Hosted Zone NS record. I would think if this was your problem many things would fail. Still, "Refused" seems rare so ...
So I just now did the test. I got rid of the pre_check_delay and reexecuted the script. Turns out that (as far as I can tell) the TXT record is getting propagated to the route 53 hosted zone NS immediately.
I have applied terraform with TF_LOG=TRACE and passing recursive_nameservers = ["188.8.131.52:53"] to my acme_certificate asset to attempt to get more data. Here is the piece of result identified with the acme_certificate asset creation:
Extend to see the result (it's somewhat long, that is the reason I have epitomized it!)
Additionally, we have had a go at setting [AWS](https://www.sevenmentor[.]com/amazon-web-services-training-institute-in-pune.php)_PROPAGATION_TIMEOUT = 600 yet the equivalent.
Why do you not have a DNS record for vault.pacts.cloud?
I see your TXT record for it just fine (still), but, how do you plan to access that domain name without a DNS record?
One of my DNS lookup tools refused to show the TXT record because that was missing. Could Terraform be faulting and showing its own odd "refused" message for a similar reason? That may be why we can't reproduce using direct DNS inspection. And maybe why AWS cannot guide either.
Just for my curiosity, what is the Let's Encrypt (LE) cert for vault.pacts.cloud to be used for? I ask because I see that pacts.cloud is managed by AWS CloudFront and you have certs through AWS ACM for that (as normal). Some of your other subdomains also run through CloudFront (but some just EC2). Nothing unusual.
But, I also see an AWS ACM cert for vault.pacts.cloud from several days ago. It would help to understand the context for the LE cert a bit better. Is it for https between CF and your Origin Server, for example? Thanks.
The only other one I saw was for a different product (Concourse, not Terraform) but same "refused" message looking up TXT record for _acme-challenge record. This one used google name servers. This resolved after fixing poorly configured DNS apparently.
So first of on the use-case:
With AWS ACM the issue is, that amazon fully manages the certificates for you. So there is way to get your hands on the private certificate yourself to have an EC2 access it. That was my use-case: I wanted an EC2 auto-scaling group to access the certificate directly.
Now, my (current) workaround is to get a loadbalancer in between which can hold the private ACM certificate. No need to get the certificate on the EC2 this way.
Now to the question of why there is no record for vault.pacts.cloud: Because i was struggling with the certificate in the first place. I would setup such a record once I have the rest of the pieces in place. Is that a problem?
My first instinct is to agree this is best path anyway. Usually you want EC2 spin-ups to be fast and reliable. If you were going to acquire a new cert for each new spin-up that would be problematic for those purposes and may incur LE rate limits. To avoid that you would need to acquire certs in a different dedicated process and store them durably for quick load on spin-up. And, occasionally refresh certs in long-running instances. Maybe this was your plan all along but the LB does that for you.
As to the DNS I was just poking around. This is an extremely rare error message. Even AWS could not explain why Route53 would do that. Sometimes asking questions reveals helpful info. The TXT record works sometimes - many of us have retrieved it. But, the error persists so something is likely not setup quite right. It remains a mystery.