Building Minimal Docker Containers for Python Applications

A best practice when creating Docker containers is keeping the image size to a minimum. The fewer bytes you have to shunt over the network or store on disk, the better. Keeping the size down generally means it is faster to build and deploy your container.

Each container should contain the application code, language-specific dependencies, OS dependencies and that’s it. Any more is a waste and a potential security issue. If you have tools like gcc inside a container that is deployed to production, then an attacker with shell access can easily build tools to access other internal systems. Having layers of security minimises the damage one attack can cause.

Python in Docker

I was recently working on a Python webserver. The requirements.txt looked something like:

Flask>=0.12,<0.13
flask-restplus>=0.9.2,<0.10
Flask-SSLify>=0.1.5,<0.2
Flask-Admin>=1.4.2,<1.5
gunicorn>=19,<20

Fat image

If you search Google you will find examples of Dockerfiles that look like:

FROM python:3.6
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["gunicorn", "-w 4", "main:app"]

This container image weighs in at 714MB!!

If you’re like me, then you’re scratching your head wondering “this is just a simple Python web app, why is it that big??” Let’s find a way to reduce that.

Alpine

Minimalism is important but too small can be harmful as well. You could build all containers from scratch, but that means you have to deal with low-level OS primitives like shell, cat, find, etc. That very quickly becomes tedious and distracts from getting code in front of customers as fast as possible (one of our mantras). I have found that a pragmatic balance is using a base image such as Alpine. At time of writing, the latest Alpine image (v3.7) weighs in at 4.15MB, very respectable. You also get a minimal POSIX environment with which to build your application.

FROM python:3.6-alpine
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["gunicorn", "-w 4", "main:app"]

Building this container results in an image size of 115MB. Of this, the base image is 89.2MB (at time of writing). That means that our app is responsible for the additional 25.8MB.

Development

Those with keen eyes and Docker experience will see an issue with the Dockerfile above. Every time we make a change to our source code and rebuild the container, the dependencies will be re-downloaded and re-installed. This is not good — it takes too much time to do iterative development. Let’s rewrite the Dockerfile to take advantage of layer caching.

Layer caching

FROM python:3.6-alpine
COPY requirements.txt /
RUN pip install -r /requirements.txt
COPY src/ /app
WORKDIR /app
CMD ["gunicorn", "-w 4", "main:app"]

Rewriting our Dockerfile this way makes use of Docker’s layer caching and skips installing Python requirements if the requirements.txt file does not change.

This makes our build fast, but it has no impact on the overall image size.

Cached dependencies

If you look closely at output of the Docker build from above you should see something along the lines of:

Building wheels for collected packages: Flask-SSLify, Flask-Admin, itsdangerous, wtforms, MarkupSafe
Running setup.py bdist_wheel for Flask-SSLify: started
Running setup.py bdist_wheel for Flask-SSLify: finished with status ‘done’
Stored in directory: /root/.cache/pip/wheels/70/14/5b/fbd15774657c5cadc661a66236d121640c60dd9382f2a28469
Running setup.py bdist_wheel for Flask-Admin: started
Running setup.py bdist_wheel for Flask-Admin: finished with status ‘done’
Stored in directory: /root/.cache/pip/wheels/3f/0f/33/5e27d4e7ba9459198695c28f879659197e33be5d5338a07a1b
Running setup.py bdist_wheel for itsdangerous: started
Running setup.py bdist_wheel for itsdangerous: finished with status ‘done’
Stored in directory: /root/.cache/pip/wheels/fc/a8/66/24d655233c757e178d45dea2de22a04c6d92766abfb741129a
Running setup.py bdist_wheel for wtforms: started
Running setup.py bdist_wheel for wtforms: finished with status ‘done’
Stored in directory: /root/.cache/pip/wheels/36/35/f3/7452cd24daeeaa5ec5b2ea13755316abc94e4e7702de29ba94
Running setup.py bdist_wheel for MarkupSafe: started
Running setup.py bdist_wheel for MarkupSafe: finished with status ‘done’
Stored in directory: /root/.cache/pip/wheels/88/a7/30/e39a54a87bcbe25308fa3ca64e8ddc75d9b3e5afa21ee32d57

When pip install was running, it also stored a copy of the dependencies we downloaded to /root/.cache. This is useful when we’re using doing local development outside of Docker, but uses unnecessary space that is never going to be touched by the application. This directory is taking up 5MB of our 25MB ‘app’ image. Let’s eliminate this by taking advantage of another Docker feature — multistage builds.

Multistage builds

Docker 17.05 added support for multistage builds. This means that dependencies can be built in one image and then imported into another. Rewriting our Dockerfile to use multistage builds now looks like:

FROM python:3.6-alpine as base
FROM base as builder
RUN mkdir /install
WORKDIR /install
COPY requirements.txt /requirements.txt
RUN pip install --install-option="--prefix=/install" -r /requirements.txt
FROM base
COPY --from=builder /install /usr
COPY src /app
WORKDIR /app
CMD ["gunicorn", "-w 4", "main:app"]

This Docker container is 105MB with the compiled Python dependencies weighing in at 20.4M (I worked this out by running a du -h /install from within the build container).

105MB is a significant improvement over the 714MB that we started with!

Further down the rabbit hole

It is definitely possible to reduce the image size further — by switching to the Alpine version of Python and removing extraneous files at container build time like docs, tests, etc. I was able to get the image size to less than 70MB. But is it worth it? If you’re doing many deploys per day to the same set of VMs then there is a high chance that the majority of the layers that make up the image will be cached on disk, meaning that there are diminishing returns as the image size approaches zero. Additionally, the resulting Dockerfile was complex and not simple to grok (simplicity is king). Pragmatism is important — letting someone else shoulder the maintenance of the base image allows you to focus on the business problem.

Conclusion

Docker is a powerful tool that allows us to bundle up our application along side language and OS dependencies. This is extremely valuable when we roll out this image to production as we guarantee that the image that we tested with will be the image that is run in production.

The Docker build system allows us to create images that are very large if written naively but also small, lightweight, and cacheable if done correctly.

Related posts

Leave a Reply

Be the First to Comment!

Notify of
avatar
wpDiscuz