Dockerized infrasctructure suddenly cannot renew the certificate

Hello everybody.
I have the following dockerized structure.

  1. Reverse proxy (nginx)
  2. Nextcloud service
  3. Custom certbot service (running with cron)

The certbot service and the reverse proxy have one common volume to share the certificate files (e.g .pem file) The communication between the reverse proxy and the nextcloud instance is done in plain http and not https.

I'm pretty sure that this same structure last time worked and managed successfully to renew the certificate but suddenly I'm dealing with the following:

/var/log/letsencrypt/letsencrypt.log

2023-04-12 17:47:04,027:DEBUG:acme.client:Storing nonce: B37CfrJCEbMZQ8SJfuCzFEJJHPj2cyRUVhwP-3DwTf_gRRE
2023-04-12 17:47:04,028:INFO:certbot._internal.auth_handler:Challenge failed for domain cloud.eigenval.xyz
2023-04-12 17:47:04,028:INFO:certbot._internal.auth_handler:http-01 challenge for cloud.eigenval.xyz
2023-04-12 17:47:04,029:DEBUG:certbot._internal.display.obj:Notifying user:
Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems:
  Domain: cloud.eigenval.xyz
  Type:   unauthorized
  Detail: 94.130.148.89: Invalid response from https://cloud.eigenval.xyz/.well-known/acme-challenge/PZFnLMgdXShlupzOXi1BvOSAqWR7pmDfu8tVGO_VJJI: 404

Hint: The Certificate Authority failed to verify the temporary nginx configuration changes made by Certbot. Ensure the listed domains point to this nginx server and that it is acc
essible from the internet.

2023-04-12 17:47:04,030:DEBUG:certbot._internal.error_handler:Encountered exception:
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/certbot/_internal/auth_handler.py", line 90, in handle_authorizations
    self._poll_authorizations(authzrs, max_retries, best_effort)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/auth_handler.py", line 178, in _poll_authorizations
    raise errors.AuthorizationError('Some challenges have failed.')
certbot.errors.AuthorizationError: Some challenges have failed.

2023-04-12 17:47:04,030:DEBUG:certbot._internal.error_handler:Calling registered functions
2023-04-12 17:47:04,030:INFO:certbot._internal.auth_handler:Cleaning up challenges
2023-04-12 17:47:05,164:ERROR:certbot._internal.renewal:Failed to renew certificate cloud.eigenval.xyz with error: Some challenges have failed.
2023-04-12 17:47:05,167:DEBUG:certbot._internal.renewal:Traceback was:
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/certbot/_internal/renewal.py", line 475, in handle_renewal_request
    main.renew_cert(lineage_config, plugins, renewal_candidate)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/main.py", line 1386, in renew_cert
    renewed_lineage = _get_and_save_cert(le_client, config, lineage=lineage)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/main.py", line 122, in _get_and_save_cert
    renewal.renew_cert(config, domains, le_client, lineage)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/renewal.py", line 335, in renew_cert
    new_cert, new_chain, new_key, _ = le_client.obtain_certificate(domains, new_key)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/client.py", line 389, in obtain_certificate
    orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/client.py", line 439, in _get_order_and_authorizations
    authzr = self.auth_handler.handle_authorizations(orderr, self.config, best_effort)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/auth_handler.py", line 90, in handle_authorizations
    self._poll_authorizations(authzrs, max_retries, best_effort)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/auth_handler.py", line 178, in _poll_authorizations
    raise errors.AuthorizationError('Some challenges have failed.')
certbot.errors.AuthorizationError: Some challenges have failed.

2023-04-12 17:47:05,167:DEBUG:certbot._internal.display.obj:Notifying user:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2023-04-12 17:47:05,167:ERROR:certbot._internal.renewal:All simulated renewals failed. The following certificates could not be renewed:
2023-04-12 17:47:05,167:ERROR:certbot._internal.renewal:  /etc/letsencrypt/live/cloud.eigenval.xyz/fullchain.pem (failure)
2023-04-12 17:47:05,168:DEBUG:certbot._internal.display.obj:Notifying user: 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2023-04-12 17:47:05,168:DEBUG:certbot._internal.log:Exiting abnormally:
Traceback (most recent call last):
  File "/usr/bin/certbot", line 33, in <module>
    sys.exit(load_entry_point('certbot==1.21.0', 'console_scripts', 'certbot')())
  File "/usr/lib/python3.9/site-packages/certbot/main.py", line 15, in main
    return internal_main.main(cli_args)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/main.py", line 1574, in main
    return config.func(config, plugins)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/main.py", line 1460, in renew
    renewal.handle_renewal_request(config)
  File "/usr/lib/python3.9/site-packages/certbot/_internal/renewal.py", line 500, in handle_renewal_request
    raise errors.Error("{0} renew failure(s), {1} parse failure(s)".format(
certbot.errors.Error: 1 renew failure(s), 0 parse failure(s)
2023-04-12 17:47:05,168:ERROR:certbot._internal.log:1 renew failure(s), 0 parse failure(s)

The nginx config looks like:

server {

  server_name cloud.eigenval.xyz;

  listen [::]:443 ssl http2 ipv6only=on;
  listen 443 ssl http2;

  ssl_certificate /etc/letsencrypt/live/cloud.eigenval.xyz/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/cloud.eigenval.xyz/privkey.pem;

  include snippets/ssl-params.conf;
  client_max_body_size 100M;

  location / {
    proxy_pass http://10.0.0.3:80;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-NginX-Proxy true;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_ssl_session_reuse off;
    proxy_set_header Host $http_host;
    proxy_pass_header Server;
    proxy_cache_bypass $http_upgrade;
    proxy_hide_header X-Frame-Options;
    proxy_redirect off;
  }

  # Make a regex exception for `/.well-known` so that clients can still
  # access it despite the existence of the regex rule
  # `location ~ /(\.|autotest|...)` which would otherwise handle requests
  # for `/.well-known`.
  location ^~ /.well-known {
    # The rules in this block are an adaptation of the rules
    # in `.htaccess` that concern `/.well-known`.

    location = /.well-known/carddav { return 301 /remote.php/dav/; }
    location = /.well-known/caldav  { return 301 /remote.php/dav/; }

    location /.well-known/acme-challenge  { try_files $uri $uri/ =404; }
    location /.well-known/pki-validation    { try_files $uri $uri/ =404; }

    # Let Nextcloud's API for `/.well-known` URIs handle all other
    # requests by passing them to the front-end controller.
    return 301 /index.php$request_uri;
  }

}

And finally the command that cron runs is the certbot renew --quiet

What could possible go wrong?

Thanks in advance.

1 Like

The good news is a 404 with the certbot --nginx plug-in has several tools to debug and resolve the problem. I don't see anything obviously wrong so far (although I see a couple things less than optimal).

I see you redirect all HTTP requests to HTTPS and expect to handle the Let's Encrypt HTTP Challenge in your port 443 server block. And, that's OK but the certbot nginx plug-in does not rely on the location statement as you have it. Instead, it adds its own changes to your nginx server block(s).

Two suggestions to resolve:

  1. copy the /var/log/letsencrypt/letsencrypt.log file to a .txt file and upload it for us to evaluate
  2. run certbot renew --dry-run and look at the nginx access logs. You should see several (probably 3) requests from the Let's Encrypt Staging Server. If you don't then something isn't routing properly to that nginx.

Also, what version is certbot? certbot --version

3 Likes

Put a test file where you thing your challenges are being read from (where you are pointing your /.well-known/acme-challenge/ stuff), browse to that file - if you can't (you still get a 404) then investigate and fix that and everything will start working again.

2 Likes

Chris, the --nginx plug-in does not rely on a file on disk so that test is not as helpful as it is for --webroot, for example. The plug-in makes temp changes such that an explicit return statement provides the response value. A 404 indicates these temp changes are not being processed by nginx. Most likely due to routing problems or a funky nginx config. Although, there are other possibilities (like two nginx systems).

4 Likes

Thanks @MikeMcQ I was being fooled by the location block :slight_smile:

4 Likes

So here is what you have asked:

reverse proxy's logs when running certbot renew --dry-run:

18.119.143.74 - - [13/Apr/2023:06:41:30 +0000] "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1" 301 169 "-" "Mozilla/5.0 (compatible; Let's E
ncrypt validation server; +https://www.letsencrypt.org)" "-"
52.12.250.117 - - [13/Apr/2023:06:41:30 +0000] "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1" 301 169 "-" "Mozilla/5.0 (compatible; Let's E
ncrypt validation server; +https://www.letsencrypt.org)" "-"
23.178.112.107 - - [13/Apr/2023:06:41:30 +0000] "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1" 301 169 "-" "Mozilla/5.0 (compatible; Let's
Encrypt validation server; +https://www.letsencrypt.org)" "-"
2023/04/13 06:41:30 [error] 32#32: *8821 open() "/etc/nginx/html/.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE" failed (2: No such file or directory), cli
ent: 18.119.143.74, server: cloud.eigenval.xyz, request: "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1", host: "cloud.eigenval.xyz", referr
er: "http://cloud.eigenval.xyz/.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE"
18.119.143.74 - - [13/Apr/2023:06:41:30 +0000] "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1" 404 153 "http://cloud.eigenval.xyz/.well-know
n/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"
2023/04/13 06:41:30 [error] 32#32: *8822 open() "/etc/nginx/html/.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE" failed (2: No such file or directory), cli
ent: 52.12.250.117, server: cloud.eigenval.xyz, request: "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1", host: "cloud.eigenval.xyz", referr
er: "http://cloud.eigenval.xyz/.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE"
52.12.250.117 - - [13/Apr/2023:06:41:30 +0000] "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1" 404 153 "http://cloud.eigenval.xyz/.well-know
n/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"
2023/04/13 06:41:30 [error] 32#32: *8823 open() "/etc/nginx/html/.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE" failed (2: No such file or directory), cli
ent: 23.178.112.107, server: cloud.eigenval.xyz, request: "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1", host: "cloud.eigenval.xyz", refer
rer: "http://cloud.eigenval.xyz/.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE"
23.178.112.107 - - [13/Apr/2023:06:41:30 +0000] "GET /.well-known/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE HTTP/1.1" 404 153 "http://cloud.eigenval.xyz/.well-kno
wn/acme-challenge/FaRMrrF6xRtPhy0mLFFzFHOpRwvvKRebwEPWHHrc1iE" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"

letsencrypt.log file hosted here

and the certbot version is 1.21.0

And yes, I indeed redirect all traffic to 443. here is my default.conf for nginx. I forgot to paste it in the initial post.

server {
    listen       80 default_server;
    listen  [::]:80 default_server;
    listen 443 default_server;
    listen [::]:443 default_server;
    ssl_reject_handshake on;
    return 301 https://$host$request_uri;

    server_name  _;
    return 444;
}
2 Likes

Thanks for that. The Certbot log is very unusual. It doesn't update the HTTPS server block with the required rewrite/return lines. Your certbot version is new enough that it should do that it just isn't.

Also, it tries to insert an HTTP server block for that domain because you don't have one but that doesn't seem to work either.

There are two ways to proceed. One is to convert to using the --webroot authentication. This method won't need to modify your nginx config. Two, we can debug your nginx config in detail trying to find what is causing a problem.

Switching to webroot is probably easier. You can try this command as a test. In both places change /your/webroot/path to something proper for your system

sudo certbot certonly --dry-run --webroot -w /your/webroot/path -d cloud.eigenval.xyz

(omit sudo if you don't use that)

And, you should add a server block for HTTP like below. This will still redirect everything except the ACME challenge and more clearly sets the root value for the challenge.

server {
    server_name cloud.eigenval.xyz;
    listen       80;
    listen  [::]:80;

    location /.well-known/acme-challenge/ {
        root /your/webroot/path;
    }
    location / {
        return 301 https://$host$request_uri;
    }
}
3 Likes

Thanks for helping out.

I got a little bit confused though with the webroot method.

What should the path to my webroot since the actual nginx server is running on a different container (the reverse proxy one) where the certbot is running? I mean it doesn't make any sense (at least to me) setting the webroot argument into something like /var/www/html/.

It's not that I don't like the easy solution you provide but the curious me, is telling me to debug the nginx config :-p (if you can further help so).

1 Like

Your Certbot must be able to interact with the nginx container. That's true even with your --nginx method as Certbot makes changes to the actual nginx config files. In addition, when successful it gets the cert files which are used by nginx. So, you must be at least sharing a volume for these files. For --webroot you may need to setup another shared volume but it shouldn't be much more difficult than what you are already doing.

But, maybe your container interaction isn't right anymore. And, maybe that's why the letsencrypt.log file doesn't match what you show as your nginx config. Did you change your container layout recently? You said this worked at one time.

3 Likes

Hihihi it seems that we're getting closer. So here is the docker-compose.yml

version: "3.3"
services:
    reverse_proxy:
        ports:
            - "80:80"
            - "443:443"
        volumes:
            - "/home/user/docker-volumes/reverse_proxy/sites-enabled:/etc/nginx/conf.d"
            - "/home/user/docker-volumes/reverse_proxy/letsencrypt:/etc/letsencrypt"
        container_name: reverse_proxy
        image: reverse_proxy
        networks:
            nextcloud_net:
                ipv4_address: 10.0.0.2

    cron_letsencrypt:
        volumes:
            - "/home/user/docker-volumes/reverse_proxy/letsencrypt:/etc/letsencrypt"
            - "/home/user/docker-volumes/reverse_proxy/cron-jobs:/etc/periodic/weekly"
        container_name: cron_letsencrypt
        image: cron_letsencrypt
        networks:
            nextcloud_net:
                ipv4_address: 10.0.0.5

So the only common volume is the /home/user/docker-volumes/reverse_proxy/letsencrypt:/etc/letsencrypt while it seems there should also be the /home/user/docker-volumes/reverse_proxy/sites-enabled:/etc/nginx/conf.d. Maybe I deleted it accidentally.

I'll check it out and let you know.

So I made some changes in the docker-compose.yml and still getting the same error but with some differences. This time, in the letsencrypt.log file I can see my original nginx configuration for that domain listed. It seems a good sign to me.

docker-compose.yml

version: "3.3"
services:
    reverse_proxy:
        ports:
            - "80:80"
            - "443:443"
        volumes:
            - "/home/user/docker-volumes/reverse_proxy/letsencrypt:/etc/letsencrypt"
            - "/home/user/docker-volumes/reverse_proxy/nginx/conf.d:/etc/nginx/conf.d"
            - "/home/user/docker-volumes/reverse_proxy/nginx/snippets:/etc/nginx/snippets"
        container_name: reverse_proxy
        image: reverse_proxy
        networks:
            nextcloud_net:
                ipv4_address: 10.0.0.2

    cron_letsencrypt:
        volumes:
            - "/home/user/docker-volumes/reverse_proxy/letsencrypt:/etc/letsencrypt"
            - "/home/user/docker-volumes/reverse_proxy/nginx/conf.d:/etc/nginx/conf.d"
            - "/home/user/docker-volumes/reverse_proxy/nginx/snippets:/etc/nginx/snippets"
            - "/home/user/docker-volumes/reverse_proxy/cron-jobs:/etc/periodic/weekly"
        container_name: cron_letsencrypt
        image: cron_letsencrypt
        networks:
            nextcloud_net:
                ipv4_address: 10.0.0.5

Is there any chance I'm issing anything else that these two containers need to have in common? Right now they share:

- /etc/letsencrypt
- /etc/nginx/conf.d <---- this contains the nginx's configurations (default.conf & cloud.eigenval.xyz.conf)
- /etc/nginx/snippets <---- this contains some snippets for the ssl 

Yes, definitely some progress. But, this is a problem

2023-04-13 17:30:20,007:DEBUG:certbot_nginx._internal.configurator:nginx reload failed:
nginx: [error] invalid PID number "" in "/var/run/nginx.pid"
23/04/13 17:30:19 [error] 17#17: invalid PID number "" in "/var/run/nginx.pid"

Ignoring Docker, this kind of error occurs when nginx is started both natively and with systemctl.

Was nginx already running in the Docker container when running certbot? Because if Certbot needs to start nginx it does not use systemctl and that can cause problems.

Again, we can try to parse through this or you could re-consider webroot authentication. It does not make nginx conf changes and just relies on the running nginx to satisfy the HTTP Challenge. Now that you have the volumes sharing better this might be easiest :slight_smile:

3 Likes

Hey! I got you :upside_down_face:

xx.xx.xx.xx - - [13/Apr/2023:18:26:57 +0000] "HEAD /.well-known/acme-challenge/MikeTest1 HTTP/1.1" 301 0 "-" "curl/7.81.0" "-"
xx.xx.xx.xx - - [13/Apr/2023:18:27:05 +0000] "HEAD /.well-known/acme-challenge/MikeTest1 HTTP/2.0" 404 0 "-" "curl/7.81.0" "-"
2023/04/13 18:27:05 [error] 31#31: *54 open() "/etc/nginx/html/.well-known/acme-challenge/MikeTest1" failed (2: No such file or directory), client: xx.xx.xx.xx, server: cloud.eigenval.xyz

So I did also notice, exactly what you mentioned about the invalid PID and I thought that too. But if you carefully look above that error where the logs print my cloud.eigenval.xyz configuration I had forgot this line:

location /.well-known/acme-challenge    { allow all; }#{ try_files $uri $uri/ =404; }

I had replaced the try_files with allow_all while trying to test various things but anyway. I then remove the allow all and just left it as:

location /.well-known/acme-challenge    { try_files $uri $uri/ =404; }

I then restarted the container and for some reason the invalid PID error disappeared. Here is the proof. Don't know if that is related somehow but I just wanted to note it.

I really want to figure it out but at the same time I don't want to waste your time. I'll try to use the webroot idea and check if it's going to work.

I'll change nginx's default.conf to be like:

server {
    listen       80 default_server;
    listen  [::]:80 default_server;
    listen 443 default_server;
    listen [::]:443 default_server;
    ssl_reject_handshake on;
	server_name  _;
    
	location /.well-known/acme-challenge/ {
        root /usr/share/nginx/html/;
    }
    location / {
        return 301 https://$host$request_uri;
    }
}
1 Like

And as you said...

/ # certbot certonly --dry-run --webroot -w /usr/share/nginx/html/ -d cloud.eigenval.xyz
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Simulating renewal of an existing certificate for cloud.eigenval.xyz
The dry run was successful.

So I suppose the crontab should run

certbot certonly --webroot -w /usr/share/nginx/html/ -d cloud.eigenval.xyz

But to be honest, I don't feel good leaving the other approach unsolved. :slight_smile:

crontab should stick to:
certbot renew

The webroot would be remembered by certbot after it has used it to obtain a cert.

3 Likes

Shouldn't those two be the same?

3 Likes

Yes they should. I just edited.

1 Like

This confirms that nginx got in a bad state. Either not fully stopped or as I noted someone started nginx in different ways (pointing to diff pid files).

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.