Restarting an unhealthy docker container based on healthcheck
I am using Docker version 17.09.0-ce, and I see that containers are marked as unhealthy. Is there an option to get the container restart instead of keeping the container as unhealthy?
Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.
You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.
Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.
In the use of healthcheck, pay attention to the following points:
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.
The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.