Certbot in AWS Fargate task

I'm trying to write a Python task to (re)issue a certificate from inside a Fargate task. This has proven to be more daunting than I anticipated, but I'm making progress. The basic idea is that I have an ALB that has listeners for ports 443 and 80. The port 443 listener has the certificate, the port 80 listener redirects traffic to a target group which points to my task. The task is launched by:

aws ecs run-task --cluster myCluster --launch-type FARGATE \
     --task-definition certbot-task:2  \
     --network-configuration '{ 
            { "assignPublicIp":"ENABLED",
            "securityGroups": ["sg-mysg"],
            "subnets": ["subnet-mysubnet"]}

The task figures out what privateIPv4Address and availabilityZone mysubnet points at and registers a target:

newtargets = [{'Id': privateIPv4Address,'Port': 80,'AvailabilityZone': availabilityZone}]
response = elb_client.register_targets(TargetGroupArn=tg_arn,Targets=newtargets)

This works, but the problem is that the target is not available yet:

TargetHealth = elb_client.describe_target_health(TargetGroupArn=tg_arn,Targets=newtargets)['TargetHealthDescriptions'][0]['TargetHealth']
print(f"TargetHealth: {TargetHealth}")
TargetHealth: {'State': 'initial', 'Reason': 'Elb.RegistrationInProgress', 'Description': 'Target registration is in progress'}

No problem, I fire up a socket listener on port 80 and return 'HTTP/1.0 200 OK\n\nOK' for 'GET /', the target becomes healthy, and I can respond to curl from an external address. So now I'm ready to call certbot.main() to get my certificate, but I have one minor problem; if I don't respond to my health checks, the target listener will die an ignoble death. I can see 3 possibilities:

  1. Release port 80 and run certbot --standalone, hoping that the server asks for the challenge before my next health check comes along
  2. Keep listening to the socket and return the challenge myself by running certbot --webroot
  3. Bring up nginx in my container and let it handle both health checks and authentication.

I haven't figured out how --standalone works, but I can't see the security risk of the standalone sever responding with 'HTTP/1.0 200 OK\n\nOK' in response to 'GET /'. I'd prefer not to use 3), as I'm trying to keep the Docker image small. 2) will probably require the socket listener to be in a subprocess.

Before I started working on this, I wondered why no one had done this before. Now I know. I get that I can get a certificate for an Amazon-managed domain, but that isn't an option for me right now. Suggestions?

Your domain doesn't need to live in Route53 for you to get a free Amazon managed certificate. You can request an ACM public cert and it'll just have you create a validation CNAME wherever your domain's DNS is hosted.

For what sounds like a mostly AWS native environment, it seems like way more trouble than its worth to try and use certbot and an LE cert.


Right now I need to have my domain the way it is; I have some subdomains pointing at AWS and others to my development server. At some point that will change and I'll have a separate domain exclusively for the stuff running on AWS. In any event, there are probably plenty of people who struggle with renewing LE certificates on AWS, so I thought "what better way to kill a weekend?"

The solution that worked was actually 3). Adding nginx to python:3.11-alpine didn't increase the image size that much, and the configuration was fairly easy. The trick was to launch nginx in a subprocess inside the python task and wait for the target to become healthy:

nginx_process = subprocess.Popen(['/usr/sbin/nginx', '-g', 'daemon off;'])
notdone = True
while notdone:
    TargetHealth = elb_client.describe_target_health(TargetGroupArn=tg_arn,Targets=newtargets)['TargetHealthDescriptions'][0]['TargetHealth']['State']
    notdone = TargetHealth != "healthy"

Once the target is healthy, then calling certbot --nginx is fairly simple. I need to clean up the code a bit and let it run for a while to make sure it is stable before considering release. If anyone has suggestions, let me know.

I read the documentation on ACM public certs, and realized this was the far easier solution. I may still post the code, as it may help someone trying to launch a FARGATE task that needs incoming access.

1 Like