Docker 数据量与挂载的主机目录

我们可以在 Docker 上建立一个数据量:

$ docker run -v /path/to/data/in/container --name test_container debian
$ docker inspect test_container
...
Mounts": [
{
"Name": "fac362...80535",
"Source": "/var/lib/docker/volumes/fac362...80535/_data",
"Destination": "/path/to/data/in/container",
"Driver": "local",
"Mode": "",
"RW": true
}
]
...

但是如果数据卷位于 /var/lib/docker/volumes/fac362...80535/_data中,那么它与使用 -v /path/to/data/in/container:/home/user/a_good_place_to_have_data将数据挂载到文件夹中有什么不同吗?

88633 次浏览

is it any different from having the data in a folder mounted using -v /path/to/data/in/container:/home/user/a_good_place_to_have_data?

It is because, as mentioned in "Mount a host directory as a data volume"

The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile because built images should be portable. A host directory wouldn’t be available on all potential hosts.

If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it’s best to create a named Data Volume Container, and then to mount the data from it.

You can combine both approaches:

 docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata

Here we’ve launched a new container and mounted the volume from the dbdata container.
We’ve then mounted a local host directory as /backup.
Finally, we’ve passed a command that uses tar to backup the contents of the dbdata volume to a backup.tar file inside our /backup directory. When the command completes and the container stops we’ll be left with a backup of our dbdata volume.

The difference between host directory and a data volume is in that that Docker manages the latter by placing it into the $DOCKER-DATA-DIR/volumes directory and attaching a reference to it (names or randomly generated ids). That is you get a little bit of convenience.

Both host directories and data volumes are directories on the host. Both are host dependent. You can't reference either of them in a Dockerfile; the VOLUME directive creates a new nameless (with randomly generated id) volume every time you launch a new container and cannot reference an existing volume.

* $DOCKER-DATA-DIR is /var/lib/docker here unless you changed the defaults.

Although using volumes and bind mounts feels the same (with the only change being the location of the directory), there are differences in behavior.

Volumes vs Bind Mounts

  • With Bind Mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full or relative path on the host machine.
  • With Volume, a new directory is created within Docker's storage directory on the host machine, and Docker manages that directory's content.

Volumes advantages over bind mounts:

  • Volumes are easier to back up or migrate than bind mounts.
  • You can manage volumes using Docker CLI commands or the Docker API.
  • Volumes work on both Linux and Windows containers.
  • Volumes can be more safely shared among multiple containers.
  • Volume drivers allow you to store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
  • A new volume’s contents can be pre-populated by a container.

EDIT (9.9.2019):
According to @Sebi2020 comment, Bind mounts are much easier to backup. Docker doesn't provide any command to backup volumes. You have to use temporary containers with a bind mount to create backups.

Volumes

Created and managed by Docker. You can create a volume explicitly using the docker volume create command, or Docker can create a volume during container or service creation.

When you create a volume, it is stored within a directory on the Docker host. When you mount the volume into a container, this directory is what is mounted into the container. This is similar to the way that bind mounts work, except that volumes are managed by Docker and are isolated from the core functionality of the host machine.

A given volume can be mounted into multiple containers simultaneously. When no running container is using a volume, the volume is still available to Docker and is not removed automatically. You can remove unused volumes using docker volume prune.

When you mount a volume, it may be named or anonymous. Anonymous volumes are not given an explicit name when they are first mounted into a container, so Docker gives them a random name that is guaranteed to be unique within a given Docker host. Besides the name, named and anonymous volumes behave in the same ways.

Volumes also support the use of volume drivers, which allow you to store your data on remote hosts or cloud providers, among other possibilities.

enter image description here

Bind mounts

Available since the early days of Docker. Bind mounts have limited functionality compared to volumes. When you use a bind mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full path on the host machine. The file or directory does not need to exist on the Docker host already. It is created on demand if it does not yet exist. Bind mounts are very performant, but they rely on the host machine’s filesystem having a specific directory structure available. If you are developing new Docker applications, consider using named volumes instead. You can’t use Docker CLI commands to directly manage bind mounts.

enter image description here

There is also tmpfs mounts.
tmpfs mounts

A tmpfs mount is not persisted on disk, either on the Docker host or within a container. It can be used by a container during the lifetime of the container, to store non-persistent state or sensitive information. For instance, internally, swarm services use tmpfs mounts to mount secrets into a service’s containers.
enter image description here

Reference:
https://docs.docker.com/storage/

Yes, this is quite different from a few perspectives. Like you wrote in the question's title, it is about understanding why we need data volumes vs bind mount to host.

Part 1 - Basic scenarios with examples

Lets take 2 scenarios.

Case 1: Web server
We want to provide our web server a configuration file that might change frequently.
For example: exposing ports according to the current environment.
We can rebuild the image each time with the relevant setup or create 2 different images for each environment. Both of this solutions aren’t very efficient.

With Bind mounts Docker mounts the given source directory into a location inside the container.
(The original directory / file in the read-only layer inside the union file system will simply be overridden).

For example - binding a dynamic port to nginx:

version: "3.7"
services:
web:
image: nginx:alpine
volumes:
- type: bind #<-----Notice the type
source: ./mysite.template
target: /etc/nginx/conf.d/mysite.template
ports:
- "9090:8080"
environment:
- PORT=8080
command: /bin/sh -c "envsubst < /etc/nginx/conf.d/mysite.template >
/etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"

(*) Notice that this example could also be solved using Volumes.

Case 2 : Databases
Docker containers do not store persistent data -- any data that will be written to the writable layer in container’s union file system will be lost once the container stops running.

But what if we have a database running on a container, and the container stops - that means that all the data will be lost?

Volumes to the rescue.
Those are named file system trees which are managed for us by Docker.

For example - persisting Postgres SQL data:

services:
db:
image: postgres:latest
volumes:
- "dbdata:/var/lib/postgresql/data"
volumes:
- type: volume #<-----Notice the type
source: dbdata
target: /var/lib/postgresql/data
volumes:
dbdata:

Notice that in this case, for named volumes, the source is the name of the volume (for anonymous volumes, this field is omitted).

Part 2 - Comparison

Differences in management and isolation on the host

Bind mounts exist on the host file system and being managed by the host maintainer.
Applications / processes outside of Docker can also modify it.

Volumes can also be implemented on the host, but Docker will manage them for us and they can not be accessed outside of Docker.

Volumes are a much wider solution

Although both solutions help us to separate the data lifecycle from containers, by using Volumes you gain much more power and flexibility over your system.

With Volumes we can design our data effectively and decouple it from other parts of the system by storing it in dedicated remote locations (e.g., in the cloud) and integrate it with external services like backups, monitoring, encryption and hardware management.