Data persistence in Docker

Containers are, by nature, stateless. This means that data within a container will exist only as long as the container exists. On one hand, this allows for quick and repeatable deployment; on the other hand, it complicates data migration and prevents data persistence in case the container needs to be deleted. Additionally, writing to the writable layer requires the use of a storage driver to manage the filesystem, and such additional abstraction worsens performance.

There are several options for storing data in Docker:

Volumes – Volumes allow you to link filesystem paths of a container back to the host machine. They are managed by Docker. Thanks to this, data created by your application will not be lost after removing the container.
Bind mounts – When you want to share files between the host’s filesystem and the container in a way that changes are immediately reflected on both sides, you should use bind mounts.
tmpfs mount – This involves storing data outside the container’s writable layer in the host’s memory. However, such data is lost when the container is stopped or deleted. This is useful for sensitive data that should not be stored in either the host or container memory. This option is available only on Linux.

No matter which way of storing data we choose, they all look the same from the container perspective.

Volumes

They are stored in a part of filesystem managed by Docker. On Linux it is in: /var/lib/docker/volumes/
One volume can be mounted to multiple containers at the same time
You can create volume explicitly using command below or allow Docker to create it automatically during container creation:

docker volume create [name of the volume]

When you create volume explicitly you need to mount it to container using following command:

# while starting container:
--mount type=volume,src=[name of the volume],target=[path where to mount volume in container]

To display information about volume you can use:

docker volume inspect [name of the volume]

Unused named volumes are not remove automatically. You can delete them using command below:

docker volume prune [name of the volume]

When you don’t specify a name for a volume (in which case it will be assigned a random unique name) and when creating a container to which it will be attached, you use the –rm flag, the volume will also be deleted when the container is removed.
You can use a volume driver to store data on a remote host or cloud provider.
When you mount an empty volume into a directory in a container that has some data, this data will be copied into the volume (which can be used to put into volume data that another container needs).
When you mount a non-empty volume into a directory in a container that has some data, this data will be unavailable until the volume is mounted.

When to use volumes?

to share persistent data between multiple containers
when you want to decouple host configuration from container runtime
when you want to store data on remote host/cloud provider
when you want to back up/restore/migrate data

Bind mount

It can be stored in any place in host filesystem.
If referenced directory doesn’t exist on host filesystem it is created on demand while mounting.
You can’t directly manage bind mounts with Docker CLI.
To mount bind mount you need to use following command:

--mount type=bind,src=[path_in_host_filesystem],target=[path_in_container]

When you mount bind mount into directory in container that have some data, this data will be unavailable until volume is mounted.

When to use bind mounts?

to share some data between host and container e.g. configuration files or source code

tmpfs

data is stored in host system’s memory, and it’s never saved in filesystem

When to use tmpfs?

when your app needs large amount of non-persistent data
when you don’t want sensitive data to persist

Volumes

Bind mount

tmpfs

Related Posts

AWS Distributed caching

Infrastructure as a Code (IaC)

AWS elastic architecture