Smaller Docker containers for Go apps

Update January 2018: Multi-stage builds, which were introduced in Docker 17.05, are an easier way to achieve the same small Docker images.

At litl we use Docker images to package and deploy our Room for More services, using our Galaxy deployment platform. This week I spent some time looking into how we might reduce the size of our images and speed up container deployments.

Most of our services are in Go, and thanks to the fact that compiled Go binaries are mostly-statically linked by default, it’s possible to create containers with very few files within. It’s surely possible to use these techniques to create tighter containers for other languages that need more runtime support, but for this post I’m only focusing on Go apps.

The old way

We built images in a very traditional way, using a base image built on top of Ubuntu with Go 1.4.2 installed. For my examples I’ll use something similar.

Here’s a Dockerfile:

FROM golang:1.4.2
EXPOSE 1717

RUN go get github.com/joeshaw/qotd

# Don't run network servers as root in Docker
USER nobody

CMD qotd

The golang:1.4.2 base image is built on top of Debian Jessie. Let’s build this bad boy and see how big it is.

$ docker build -t qotd .
...
Successfully built ae761b93e656

$ docker images qotd
REPOSITORY     TAG         IMAGE ID          CREATED           VIRTUAL SIZE
qotd           latest      ae761b93e656      3 minutes ago     520.3 MB

Yikes. Half a gigabyte. Ok, what leads us to a container this size?

$ docker history qotd
IMAGE               CREATED BY                                      SIZE
ae761b93e656        /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "qotd"]   0 B
b77d0ca3c501        /bin/sh -c #(nop) USER [nobody]                 0 B
a4b2a01d3e42        /bin/sh -c go get github.com/joeshaw/qotd       3.021 MB
c24802660bfa        /bin/sh -c #(nop) EXPOSE 1717/tcp               0 B
124e2127157f        /bin/sh -c #(nop) COPY file:56695ddefe9b0bd83   2.481 kB
69c177f0c117        /bin/sh -c #(nop) WORKDIR /go                   0 B
141b650c3281        /bin/sh -c #(nop) ENV PATH=/go/bin:/usr/src/g   0 B
8fb45e60e014        /bin/sh -c #(nop) ENV GOPATH=/go                0 B
63e9d2557cd7        /bin/sh -c mkdir -p /go/src /go/bin && chmod    0 B
b279b4aae826        /bin/sh -c #(nop) ENV PATH=/usr/src/go/bin:/u   0 B
d86979befb72        /bin/sh -c cd /usr/src/go/src && ./make.bash    97.4 MB
8ddc08289e1a        /bin/sh -c curl -sSL https://golang.org/dl/go   39.69 MB
8d38711ccc0d        /bin/sh -c #(nop) ENV GOLANG_VERSION=1.4.2      0 B
0f5121dd42a6        /bin/sh -c apt-get update && apt-get install    88.32 MB
607e965985c1        /bin/sh -c apt-get update && apt-get install    122.3 MB
1ff9f26f09fb        /bin/sh -c apt-get update && apt-get install    44.36 MB
9a61b6b1315e        /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B
902b87aaaec9        /bin/sh -c #(nop) ADD file:e1dd18493a216ecd0c   125.2 MB

This is not a very lean container, with a lot of intermediate layers. To reduce the size of our containers, we did two additional steps:

(1) Every repo has a clean.sh script that is run inside the container after it is initially built. Here’s part of a script for one of our Ubuntu-based Go images:

apt-get purge -y software-properties-common byobu curl git htop man unzip vim \
python-dev python-pip python-virtualenv python-dev python-pip python-virtualenv \
python2.7 python2.7 libpython2.7-stdlib:amd64 libpython2.7-minimal:amd64 \
libgcc-4.8-dev:amd64 cpp-4.8 libruby1.9.1 perl-modules vim-runtime \
vim-common vim-tiny libpython3.4-stdlib:amd64 python3.4-minimal xkb-data \
xml-core libx11-data fonts-dejavu-core groff-base eject python3 locales \
python-software-properties supervisor git-core make wget cmake gcc bzr mercurial \
libglib2.0-0:amd64 libxml2:amd64

apt-get clean autoclean
apt-get autoremove -y

rm -rf /usr/local/go
rm -rf /usr/local/go1.*.linux-amd64.tar.gz
rm -rf /var/lib/{apt,dpkg,cache,log}/
rm -rf /var/{cache,log}

(2) We run Jason Wilder’s excellent docker-squash tool. It is especially helpful when combined with the clean.sh script above.

These steps are time intensive. Cleaning and squashing take minutes and dominate the overall build and deploy time.

In the end, we have built a mostly-statically linked Go binary sitting alongside an entire Debian or Ubuntu operating system. We can do better.

Separating containers for building and running

There have been a handful of good blog posts about how to do this in the past, including one by Atlassian this week. Here’s another one from Xebia, and another from Codeship.

However, all these posts focus on building a completely static Go binary. This means you eschew cgo by setting CGO_ENABLED=0 and the benefits that go along with it. On OS X, you lose access to the system’s SSL root CA certificates. On Linux, user.Current() from the os/user package no longer works. And in both cases you must use the Go DNS resolver rather than the one provided by the operating system. If you are not testing your application with CGO_ENABLED=0 prior to building a Docker container with it then you are not testing the code you ship.

We can use a few purpose-built base Docker images and the tricks from Jamie McCrindle’s Dockerception to build two separate Docker containers: one larger container to build our software and another smaller one to run it.

The builder

We create a Dockerfile.build, which is responsible for initializing the build environment and building the software:

FROM golang:1.4.2

RUN go get github.com/joeshaw/qotd
COPY / Dockerfile.run

# This command outputs a tarball which can be piped into
# `docker build -f Dockerfile.run -`
CMD tar -cf - -C / Dockerfile.run -C $GOPATH/bin qotd

This container, when run, will output a tarball to standard out, containing only our qotd binary and Dockerfile.run, used to build the runner.

Dynamically linked binary

Notice that we did not set CGO_ENABLED=0 here, so our binary is still dynamically linked against GNU libc:

$ ldd $GOPATH/bin/qotd
	linux-vdso.so.1 (0x00007ffea6b8a000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6e76e50000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6e76aa7000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6e7706d000)

We need to run this binary in an environment that has glibc available to us. That means we cannot use stock BusyBox (which uses uClibc) or Alpine (which uses musl). However, the BusyBox distribution that ships with Ubuntu is linked against glibc, and that’ll be the foundation for our running container.

The busybox:ubuntu-14.04 image only has a root user, but you should never run network-facing servers as root, even in a container. Use my joeshaw/busybox-nonroot image — which adds a nobody user with UID 1 — instead.

The runner

Now we create a Dockerfile.run, which is responsible for creating the environment in which to run our app:

FROM joeshaw/busybox-nonroot
EXPOSE 1717

COPY qotd /bin/qotd

USER nobody
CMD qotd

Putting them together

First, create the builder image:

docker build -t qotd-builder -f Dockerfile.build .

Next, run the builder container, piping its output into the creation of the runner container:

docker run --rm qotd-builder | docker build -t qotd -f Dockerfile.run -

Now we have a qotd container which has the basic BusyBox environment, plus our qotd binary. The size?

$ docker images qotd
REPOSITORY     TAG         IMAGE ID          CREATED           VIRTUAL SIZE
qotd           latest      92e7def8f105      3 minutes ago     8.611 MB

Under 9 MB. Much improved. Better still, it doesn’t require squashing, which saves us a lot of time.

Conclusion

In this example, we were able to go from a 500 MB image built from golang:1.4.2 and containing a whole Debian installation down to a 9 MB image of just BusyBox and our binary. That’s a 98% reduction in size.

For one of our real services at litl, we reduced the image size from 300 MB (squashed) to 25 MB and the time to build and deploy the container from 8 minutes to 2. That time is now dominated by building the container and software, and not by cleaning and squashing the resulting image. We didn’t have to give up on using cgo and glibc, as some of its features are essential to us. If you’re using Docker to deploy services written in Go, this approach can save you a lot of time and disk space. Good luck!