Reproducing Deployments with Docker-in-Docker

Debugging deployment issues locally by falling down the Docker-in-Docker (DinD) rabbit hole.

TL;DR:

You can run a # dockerd daemon inside a Docker container which can be used with $ docker-compose ... to (almost) completely reproduce hosted deployment environments on one’s development laptop.

As noted in a recent post, I seem to have become the Docker wrangler on the team for my recent projects. Also recently, I’m getting the distinct impression that containerization has or is beginning to reach that special point in the life-cycle of a technology where it is sufficiently widely adopted to be misused and has been in use long enough for that misuse to hurt. This is one of those ironic signs of success and utility but knowing that doesn’t lessen the pain or cost of the individual case one may run into.

Most recently, I was dealing with problematically fragile container-based deployments where the team was getting bitten over and over again by deployment details that weren’t causing problems in our local container-based development environments but were causing problems when CD pushed our changes to real hosted deployments. Specifically, the hosted deployments were failing on such things as filesystem paths and IP addresses. To make things worse, the internal CI/CD pipeline was under-resourced and very slow making the inner loop of testing changes painfully slow and wasteful.

Before you think it, this is a project where there are many things that need to be re-worked to be in alignment with current best practices and thus less fragile: from how containers are used, to keeping CI/CD responsive, to how deployment is orchestrated. Deployments should not be sensitive to changes in filesystem paths and IP addresses, to be sure. The project was a Python 3 upgrade, however, and we’d done just about all there was appetite for in terms of cleanup work that wasn’t directly related to the upgrade. As software developers, it’s our job to figure out how to solve difficult technical problems, including the difficulty of getting sub-optimal technology usage to work.

At any rate, the tax we were paying for this deployment sensitivity was so high and was happening often enough that I decided it was past time to work on a way to try and reproduce the real hosted deployment environment more completely, preferably locally. Reproducing deployment hosts as thoroughly as possible without turning my development laptop in to the deployment hosts feels like an isolation issue, so I reached for containers and Docker. Specifically, I wanted to start with a container image as close to the VM image used by the deployment hosts as possible and figure out how to run the actual application container images within those “host containers”. IOW, can I run Docker containers within a Docker container? Yup, for some time now.

It turns out that Docker now provides an official Docker-in-Docker (DinD) image. After reading it’s ./Dockerfile and some blog posts detailing how to run DinD, I decided it would be more efficient and maintainable to use the official docker:dind image as my base image and extend it to match the project’s deployment hosts rather than the other way around, start with a base image close to the deployment hosts and get DinD working in those. A quick ./Dockerfile reproduced the deployment host content in an image, including deployment host filesystem paths and users, e.g. /home/ec2-user/:

FROM docker:dind
# Defensive shell settings, avoid silent failures
SHELL ["/bin/ash", "-xeu", "-c"]

# Install required OS host packages
RUN \
        apk --no-cache add 'py-pip' 'python3-dev' 'libffi-dev' 'openssl-dev' 'gcc' \
        'libc-dev' 'rust' 'cargo' 'make' && \
        pip3 install 'docker-compose'

# Duplicate the AWS EC2 user
RUN adduser --disabled-password --gecos 'EC2 User,,,' ec2-user

A ./docker-compose.yml file reproduces the running deployment hosts including network topology such as subnet and host IP addresses. I also abuse the ./docker-compose.yml file to build the application images:

version: "3.8"

networks:
  default:
    ipam:
      driver: "default"
      config:
        # Match the hosted deployment network
        - subnet: "10.81.82.0/24"

services:

# Build custom container image, to be loaded into the nested host Docker daemons.
  foo-app:
    build:
      context: "./foo-app/"
    image: "bar-company/foo-app:dev"
    # Build only, don't actually run as a service
    entrypoint: ["true"]

# Emulate deployment hosts running Docker daemons

  foo-host:
    build:
      context: "./"
    image: "bar-company/foo-host:dev"
    # Just build the image, don't actually run as a service
    entrypoint: ["true"]

  foo-host-corge:
    image: "bar-company/foo-host:dev"
    depends_on:
      - "foo-host"
    privileged: true
    networks:
      default:
        # Match the hosted deployment IP
        ipv4_address: "10.81.82.100"
    volumes:
      # Docker-in-Docker requires data on a real filesystem
      - "./var/lib/foo-host-corge/docker/:/var/lib/docker/"
      # Reproduce where project data is stored in hosted deployments
      - "./var/lib/foo-data/:/home/ec2-user/foo-data/"
      # Make source editable
      - "./foo-app/:/srv/foo-app/"
      # Reproduce deployment containers using a `$ docker-compose ...` configuration
      # specific to the deployment host
      - "./foo-host-corge/:/srv/foo-host-corge/"
    working_dir: "/srv/foo-host-corge/"

  foo-host-grault:
    image: "bar-company/foo-host:dev"
    depends_on:
      - "foo-host"
    privileged: true
    networks:
      default:
        # Match the hosted deployment IP
        ipv4_address: "10.81.82.101"
    volumes:
      # Docker-in-Docker requires data on a real filesystem
      - "./var/lib/foo-host-grault/docker/:/var/lib/docker/"
      # Reproduce where project data is stored in hosted deployments
      - "./var/lib/foo-data/:/home/ec2-user/foo-data/"
      # Make source editable
      - "./foo-app/:/srv/foo-app/"
      # Reproduce deployment containers using a `$ docker-compose ...` configuration
      # specific to the deployment host
      - "./foo-host-grault/:/srv/foo-host-grault/"
    working_dir: "/srv/foo-host-grault/"

Nested ./foo-host-corge/docker-compose.yml and ./foo-host-grault/docker-compose.yml files reproduce how the application containers are run on the real deployment hosts, but run instead within the nested DinD containers that reproduce the deployment hosts:

version: "3.8"

services:

  postgres:
    image: "postgres"
    ports:
      - "5432:5432"
    env_file:
      - "./.env"

  foo-app:
    image: "bar-company/foo-app:dev"
    ports:
      - "80:80"
    extra_hosts:
       - "redis:10.81.82.101"
    volumes:
      # Make source editable
      - "/srv/foo-app/:/srv/foo-app/"
      # Reproduce where project data is stored in hosted deployments
      - "/home/ec2-user/foo-data/:/srv/foo-data/"
version: "3.8"

services:

  redis:
    image: "redis"
    ports:
      - "6379:6379"

  foo-app:
    image: "bar-company/foo-app:dev"
    ports:
      - "80:80"
    extra_hosts:
       - "postgres:10.81.82.100"
    volumes:
      # Make source editable
      - "/srv/foo-app/:/srv/foo-app/"
      # Reproduce where project data is stored in hosted deployments
      - "/home/ec2-user/foo-data/:/srv/foo-data/"

Use some $ docker ... commands and some shell to push the application images into the nested # dockerd daemons and some $ docker-compose ... commands to run containers using those images, along with the rest of the deployment configuration. I use a ./Makefile to stitch this all together but use what you like:

# Reproduce hosted deployments in nested Docker (DinD)

### Defensive settings for make:
#     https://tech.davis-hansson.com/p/make/
SHELL := bash
.ONESHELL:
.SHELLFLAGS:=-xeu -o pipefail -O inherit_errexit -c
.SILENT:
.DELETE_ON_ERROR:
MAKEFLAGS += --warn-undefined-variables
MAKEFLAGS += --no-builtin-rules


# Top-level targets

.PHONY: all
all: var/log/docker-compose-build.log foo-host-corge/.env

.PHONY: run
run: all
	docker-compose exec -T "foo-host-corge" docker-compose up -d
	docker-compose exec -T "foo-host-grault" docker-compose up -d
	sleep 1
	docker-compose exec -T "foo-host-corge" docker-compose ps
	docker-compose exec -T "foo-host-grault" docker-compose ps

.PHONY: test
test: run
# Demonstrate that deployment host network topology and hostnames are reproduced
	docker-compose exec foo-host-corge \
	    docker-compose exec foo-app nc -vz redis 6379
	docker-compose exec foo-host-grault \
	    docker-compose exec foo-app nc -vz postgres 5432
# Demonstrate that deployment host filesystem paths and data are reproduced
	sudo rm -rfv ./var/lib/foo-data/*
	ls -al "./var/lib/foo-data/"
	docker-compose exec foo-host-corge \
	    ls -al "/home/ec2-user/foo-data/"
	docker-compose exec foo-host-grault \
	    ls -al "/home/ec2-user/foo-data/"
	docker-compose exec foo-host-corge docker-compose exec foo-app \
	    ls -al "/srv/foo-data/"
	docker-compose exec foo-host-grault docker-compose exec foo-app \
	    ls -al "/srv/foo-data/"
	docker-compose exec foo-host-grault docker-compose exec foo-app \
	    test ! -e "/srv/foo-data/bar.txt"
	sudo touch "./var/lib/foo-data/bar.txt"
	docker-compose exec foo-host-corge docker-compose exec foo-app \
	    ls -al "/srv/foo-data/"
	docker-compose exec foo-host-grault docker-compose exec foo-app \
	    ls -al "/srv/foo-data/"
	docker-compose exec foo-host-grault docker-compose exec foo-app \
	    test -e "/srv/foo-data/bar.txt"

.PHONY: clean
clean:
	docker-compose down --rmi local -v
	sudo rm -rf "./var/lib/"
	mkdir -pv "./var/log/backups/"
	test ! -e "./var/log/docker-compose-build.log" ||
	    mv --backup=numbered -v "./var/log/docker-compose-build.log" \
	        "./var/log/backups/"
	test ! -e "./foo-host-corge/.env" ||
	    mv --backup=numbered -v "./foo-host-corge/.env" \
	        "./var/log/backups/"

# Real targets

var/log/docker-compose-build.log: foo-app/Dockerfile Dockerfile docker-compose.yml
	mkdir -pv "./$(dir $(@))/"
	sudo docker-compose build --pull | tee -a "./$(@)"
# Wait for the nested deployment host `# dorckerd` daemons to become available
	docker-compose up -d
	for host in foo-host-corge foo-host-grault
	do
	    timeout --foreground -v 20 $(SHELL) $(.SHELLFLAGS) "\
		while ! docker-compose exec -T "$${host}" docker ps; do sleep 0.1; done"
# Load the built image into the deployment host Docker daemons
	    docker save "bar-company/foo-app:dev" |
		docker-compose exec -T "$${host}" docker load
	done

foo-host-corge/.env:
	echo "POSTGRES_PASSWORD=$$(apg -M NCL -n 1)" >"./$(@)"

Once it’ up and running we can see that the network topology, host names, and containerized services and applications are running as they would in the real hosted deployment:

$ make test
+ docker-compose exec -T foo-host-corge docker-compose up -d
foo-host-corge_foo-app_1 is up-to-date
foo-host-corge_postgres_1 is up-to-date
+ docker-compose exec -T foo-host-grault docker-compose up -d
foo-host-grault_redis_1 is up-to-date
foo-host-grault_foo-app_1 is up-to-date
+ sleep 1
+ docker-compose exec -T foo-host-corge docker-compose ps
          Name                         Command               State           Ports
-------------------------------------------------------------------------------------------
foo-host-corge_foo-app_1    python -m http.server --di ...   Up      0.0.0.0:80->80/tcp
foo-host-corge_postgres_1   docker-entrypoint.sh postgres    Up      0.0.0.0:5432->5432/tcp
+ docker-compose exec -T foo-host-grault docker-compose ps
          Name                         Command               State           Ports
-------------------------------------------------------------------------------------------
foo-host-grault_foo-app_1   python -m http.server --di ...   Up      0.0.0.0:80->80/tcp
foo-host-grault_redis_1     docker-entrypoint.sh redis ...   Up      0.0.0.0:6379->6379/tcp
+ docker-compose exec foo-host-corge docker-compose exec foo-app nc -vz redis 6379
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.81.82.101:6379.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
+ docker-compose exec foo-host-grault docker-compose exec foo-app nc -vz postgres 5432
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.81.82.100:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
+ sudo rm -rfv './var/lib/foo-data/*'
+ ls -al ./var/lib/foo-data/
total 8
drwxr-xr-x 2 root root 4096 Apr  5 08:49 .
drwxr-xr-x 5 root root 4096 Apr  5 08:45 ..
+ docker-compose exec foo-host-corge ls -al /home/ec2-user/foo-data/
total 8
drwxr-xr-x    2 root     root          4096 Apr  5 15:49 .
drwxr-sr-x    1 ec2-user ec2-user      4096 Apr  5 15:45 ..
+ docker-compose exec foo-host-grault ls -al /home/ec2-user/foo-data/
total 8
drwxr-xr-x    2 root     root          4096 Apr  5 15:49 .
drwxr-sr-x    1 ec2-user ec2-user      4096 Apr  5 15:45 ..
+ docker-compose exec foo-host-corge docker-compose exec foo-app ls -al /srv/foo-data/
total 8
drwxr-xr-x 2 root root 4096 Apr  5 15:49 .
drwxr-xr-x 1 root root 4096 Apr  5 15:45 ..
+ docker-compose exec foo-host-grault docker-compose exec foo-app ls -al /srv/foo-data/
total 8
drwxr-xr-x 2 root root 4096 Apr  5 15:49 .
drwxr-xr-x 1 root root 4096 Apr  5 15:45 ..
+ docker-compose exec foo-host-grault docker-compose exec foo-app test '!' -e /srv/foo-data/bar.txt
+ sudo touch ./var/lib/foo-data/bar.txt
+ docker-compose exec foo-host-corge docker-compose exec foo-app ls -al /srv/foo-data/
total 8
drwxr-xr-x 2 root root 4096 Apr  5 15:50 .
drwxr-xr-x 1 root root 4096 Apr  5 15:45 ..
-rw-r--r-- 1 root root    0 Apr  5 15:50 bar.txt
+ docker-compose exec foo-host-grault docker-compose exec foo-app ls -al /srv/foo-data/
total 8
drwxr-xr-x 2 root root 4096 Apr  5 15:50 .
drwxr-xr-x 1 root root 4096 Apr  5 15:45 ..
-rw-r--r-- 1 root root    0 Apr  5 15:50 bar.txt
+ docker-compose exec foo-host-grault docker-compose exec foo-app test -e /srv/foo-data/bar.txt

There you have it. The demo code is publicly available. I think we can find more uses for DinD. In particular, I have always found local development clean build issues to be a persistent plague: “It worked for me when I committed it!”. Next up I’m working on using DinD as a route to a generic local clean build test.

Comments

comments powered by Disqus