As more and more applications are being developed in Python, containerizing these applications has become an important task for developers. Docker, the popular containerization technology, allows developers to package their applications into a lightweight, portable container that can run on any system that supports Docker.In this article, we will discuss how to optimize and secure a Python application with Docker by analyzing a sample Dockerfile provided below:

The Code

Repository available here

# Use an official Python runtime as a parent image
FROM python:3.9-slim@sha256:2bac43769ace90ebd3ad83e9229355e25dfc58e58543d3ab326c3330b505283d as build_stage

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY app.py /app

# Install pipreqs
RUN pip install pipreqs

# Generate requirements.txt file based on app.py
RUN pipreqs /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Cleanup after ourselves
RUN pip uninstall -y pipreqs
RUN pip cache purge

FROM build_stage as run_stage

# Create a new user with a specific user ID
RUN useradd --uid 1001 app_admin

# Switch to the new user
USER app_admin

COPY . /app

# Run app.py when the container launches
CMD ["python","app/app.py"]

There are three things this dockerfile achieves

  • Uses minimal images, explicit tags and uses a staged build process to increase security
  • Automatically finds and install dependencies using pipreqs
  • Optimizes the build process by separating dependencies and source-code

Below is a breakdown of each line of the dockerfile and how it contributes to ensuring a secure, optimized and easy-to-use containerized environment for Python applications

Stage 1: Build the Requirements

This docker file uses a multi-staged approach.

There are several benefits of writing a multi-stage Dockerfile:

  1. Smaller image size: Multi-stage builds allow you to build a final image that only contains the necessary files and dependencies for your application to run. By separating the build process from the final runtime image, you can reduce the size of the final image.
  2. Faster build times: When you use a multi-stage build, Docker can reuse previously built stages if the contents of those stages haven’t changed. This can significantly speed up the build process, especially when building large or complex images.
  3. Improved security: By separating the build process from the final runtime image, you can reduce the risk of including unnecessary files and dependencies in your final image. This can help improve the security of your application.
  4. Better organization: Multi-stage builds can help you organize your Dockerfile into logical stages, making it easier to understand and maintain.

Overall, multi-stage builds can help you create more efficient and secure Docker images while also reducing the time it takes to build and deploy your application.

A Minimal Image with Explicit Tags

This line specifies the base image to use for the container. In this case, it uses the official Python 3.9 slim image, which is a lightweight version of Python that includes only the essential packages. It also specifies a digest with the token “@sha256:2bac43769ace90ebd3ad83e5392295e25dfc58e58543d3ab326c3330b505283d”. This practice ensures that every time we rebuild the Docker image for this Python application, the same underlying operating system and library versions are used. This provides a deterministic build.

# Use an official Python runtime as a parent image
FROM python:3.9-slim@sha256:2bac43769ace90ebd3ad83e5392295e25dfc58e58543d3ab326c3330b505283d as build_stage

To find the digest, we have several options:

  1. Grabbing the Docker base image digest from Docker Hub.
  2. Downloading the Docker image onto our computer with docker pull python:3.10-slim, which reveals the Docker image digest:
3.10-slim: Pulling from library/python
7d63c13d9b9b: Pull complete
6ad2a11ca37b: Pull complete
1d79bc863ed3: Pull complete
c72b5f03bec8: Pull complete
0c3b0c5ce69b: Pull complete
Digest: sha256:2bac43769ace90ebd3ad83e5392295e25dfc58e58543d3ab326c3330b505283d
Status: Downloaded newer image for python:3.10-slim
docker.io/library/python:3.10-slim

If we already have the Python Docker image on our computer, we can just get the image digest from the current existing image on disk with the command docker images --digests | grep python:

python    3.10-slim    sha256:2bac43769ace90ebd3ad83e5392295e25dfc58e58543d3ab326c3330b505283d

Once we have the base image digest, we can just add it to the aforementioned Dockerfile.

Setting the Working Directory

# Set the working directory to /app
WORKDIR /app

This line sets the working directory inside the container to /app, which is where the application code will be copied to.

Copying Files with Requirements

# Copy the current directory contents into the container at /app
# COPY . /app
COPY app.py /app
COPY .env /app

These lines copy the website_test.py file and the .env file from the local machine to the /app directory inside the container.

Installing Requirements

# Install pipreqs
RUN pip install --no-cache-dir pipreqs

This line installs the pipreqs package, which will be used later to generate a requirements.txt file.The flag –no-cache-dir stops pip from looking in the cache directory for the library files.

# Generate requirements.txt file based on app.py
RUN pipreqs /app

This line generates a requirements.txt file based on the Python packages imported in the website_test.py file.

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

This line installs the required packages specified in the requirements.txt file.

Cleanup

# Cleanup after ourselves
RUN pip uninstall -y pipreqs
RUN pip cache purge

This will remove the pipreqs library and delete any cache of pip

Stage 2: Setup Application

In this stage we setup the app to be run in the container

FROM build_stage as run_stage

This line changes the build stage to run_stage. As before we are using the same minimal image and digest token to ensure a deterministic build.

Create an Application-Admin User Account

To keep the attack surface of our container small we must create a user other than admin.

# Create a new user with a specific user ID
RUN useradd --uid 1001 app_admin

# Switch to the new user
USER app_admin

Copy Application Files

COPY . /app

This line copies the entire project directory to the /app directory inside the container. If we had done this before this stage any change to files in our project folder will re-trigger the COPY instruction, and subsequently, the rest of the build layers. That leaves us opportunity for optimization and speed-up.

The next time Docker checks if layers can be reused, if it finds that there are no changes to the requirements.txt file, it will execute the COPY instruction. With this, we speed up a lot of the build process, no waiting for minutes between builds each time that we modify something in our code.

Setup the Container Command

# Run app.py when the container launches
CMD ["python","app/app.py"]

This line specifies the command to run when the container is started. In this case, it runs the website_test.py script using the python interpreter.

Conclusion

Optimizing and securing a Dockerfile is an essential step in deploying a Python application in a containerized environment. By following the tips outlined above, you can create a Dockerfile that is lightweight, fast, and secure.