Restarting an unhealthy docker container based on healthcheck

小开

Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.

For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='\{\{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)

You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.

小开

最佳答案

Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.

At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/

Here is a sample compose file:

version: '2'
services:
autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock

Simply execute docker-compose up -d on this

小开

According to https://codeblog.dotsandbrackets.com/docker-health-check/

Create container and add " restart: always".

In the use of healthcheck, pay attention to the following points:

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

小开

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

小开

You can try put in your Dockerfile something like this:

HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1

Don't forget --restart always option.

kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.

Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.

小开

You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.

The Docker restart policy should be one of always or unless-stopped.

The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.

In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.

HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'

The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.

kill docs: https://linux.die.net/man/1/kill
curl docs: https://curl.haxx.se/docs/manpage.html
docker restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart

Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.

小开

Unhealthy docker containers may be restarted with simple crontab rule:

* * * * * docker ps -f health=unhealthy --format "docker restart \{\{.ID}}" | sh