We have rolled out the change and issuing is working on staging. There is one issue I want to fix with NS record for stripe-terminal-local-reader.net
but can't do that until next week due to holidays. That record change seems to be non blocking since issuing on staging works.
Thanks for letting me know! We're going to hold off until next week on re-rolling this to prod just to be on the safe side.
@gurjit: I'm currently planning to go to production with the upgrade tomorrow, 28 Nov, approximately 16:30 UTC.
Thanks for the heads up. I assume it will be gradual rollout like before? We will be watching on our end and will let you know how it goes.
Is the upgrade in progress or complete?
Just starting, actually...
Amazing, issuing working like normal for us!
Thank you so much for working with us @jcjones and @mcpherrinm !
We started seeing these errors today beginning around 19:00 UTC
certificate obtainer failed to renew certificate using acme client: error: one or more domains had a problem:
[*.cert-intl-9d8pqgr5.ca-central-1.aws.glb.confluent.cloud] acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: SERVFAIL looking up CAA for ca-central-1.aws.glb.confluent.cloud - the domain's nameservers may be malfunctioning
@vpadma was this the remediation?
- Add SOA RR to Authority Section on DNS response
- Add AA flag to all dns answers
- Create email target for SOA record
@gurjit Would you mind sharing what changes you needed to make?
Hi everyone, we are also seeing the same as @theduderog we currently have 11 multi-san certificates, and seeing multiple seemingly random CAA errors checking/rechecking customers domains starting with this message urn:ietf:params:acme:error:caa: Error finalizing order
Does anyone know if this is something that is being looked into?
Have a great day
DNS related reports seem inconsistent today:
@ITNiels, is the problem persistent or somewhat cyclical?
Hi @rg305 3 hours ago was our last run where we also saw lot's of sporadic errors, and then looking them up with Lets'Debug it showed sometimes failed and sometimes not.
We have paused all retrying to make sure we do not ratelimit ourselves for now.
Will try again possibly later today if others are reporting that it is fixed, and I will report back here as well regardless of result
Thanks.
Is there any common link in the ones that do fail?
TLD, CAA, DSP, etc.
[I'm assuming all are CAA related - but prefer to ask for more detailed specifics (if any)]
Anytime
All errors without exeption are CAA failed to validate.
TLDs I have seen fail:
.ch
.com
.hr
.dk
.no
.com.br
@ITNiels, are you able to do any testing with staging?
Unfortunately it is a custom built system where this was not thought into the process..
I can maybe try writing a small scripts that can try individual domains, but would not representative of the system or the infrastructure in AWS it is running on!
Is there something specific you have in mind @rg305 ?
And thank you so much for helping
I was hoping to better compare staging with production.
Several changes have been recently made to staging that appeared to have improved DNS related issues. But not all of them have reached production (yet).
Rest assured; This is being looked into.
@rg305 One thing to note would probably be that it is based around GitHub - fszlin/certes: A client implementation for the Automated Certificate Management Environment (ACME) protocol this library.. which is getting a bit old, and also Staging certificates fails with it.
Thank you for the reassurance, very kind of you!