Crafting Container Images That Won't Drive You Crazy

At Jetstack Consult, we have seen a variety of setups for building, running, and storing containers. These are some things we recommend as best practices for creating and maintaining your container images.

Key takeaways:

Paint with small brushes: Keep your images minimal
There is a backstage: Use multi-stage images
Don’t be a copycat: Copy carefully
The layers of a cake: Working with container layers
Rooting for safety: Avoiding running things as root
Contain your excitement: Store your containers in OCI registries
Base-ing for success: Keep your base images up-to-date
Top secret: Don’t store secrets in containers
Latest isn’t greatest: Use digests
Autographs are cool: Sign your images
Recipe for success: Generate SBOMs for your containers.

Let us start with a small reminder of what a container is.

What is a container?

Starting with the basics. When we’re talking about a container, we are talking about a format. One that holds your application and its dependencies. It is a format that allows you to pass your application to another application, a container runtime, that can maintain many containers. All running in parallel and sharing local resources without interfering with each other.

The format consists of a collection of directories and an accompanying JSON file. The container runtime uses these when your running the application on a host. Sometimes a daemon manages the runtime on the host.

A container runtime is a virtualization on the operating system level. A runtime uses kernel features from Linux. A runtime makes use of things like namespaces (not to be confused with Kubernetes Namespaces), cgroups, and network interfaces. It’s the thing that lets you run something packed into a container on a host machine.

Cloud Native Consulting Services

When our experts are your experts, you can make the most of Kubernetes

Find an expert

But what is inside a container?

A container is a collection of chained .tar files and JSON files, these are called layers. The JSON files give you some metadata about the filesystem that is present in each “layer” where each layer is stored as a .tar file.

For interacting with containers, you can use something like docker or podman.

If you want to look inside a container, you can try using podman save <image> and then unzip the .tar file and see for yourself. Here’s what an exploded view of the busybox container image looks like:

$ tree
.
├── 2d0d8216f525405eec5284066d6d371456026e0bdb088ea813b300481d9a6b05.json
├── 7ac10b182f32d02dd60492491e7ffa59f60388418edde260daa2013d0a0fb818
│   ├── VERSION
│   ├── json
│   └── layer.tar  # <- this thing contains a directory tree
├── manifest.json
└── repositories

The busybox image comes shipped with some Linux native things, if we extract the layer.tar we can see what those things are:

$ tar -tf 7ac10b182f32d02dd60492491e7ffa59f60388418edde260daa2013d0a0fb818/layer.tar
...
bin/awk
bin/base32
bin/base64
bin/basename
bin/bc
bin/busybox
bin/bzcat
bin/bzip2
bin/cal
bin/cat
bin/chat
bin/chattr
bin/chgrp
bin/chmod
bin/chown
bin/chpasswd
...

What this means is that there are things inside the busybox container. Things that aren’t the busybox application and which may or may not be useful for the application. If we have a look at the Dockerfile, the one used to build the image in the first place over on Github, we can find that it’s based on Debian:

FROM debian:bullseye-slim

They are using slimmed down version of an image. But it still includes unnecessary components. Those components will consume resources. When you’re bringing in more dependencies than necessary, you also increase the attack surface of your applications. You’re importing dependencies that might have vulnerabilities. Ones not yet discovered, but that could be at a later stage. Keeping up with security updates can be costly and time-consuming. It’s better if we can avoid it altogether.

So, a container image is a collection of files and directories. When used with a container runtime, such as containerd or the Docker daemon, these images can be “mounted” on shared hardware without the other containers being aware of its existence.

And now back to the main topic of this article 🚀

How to make your container images not drive you crazy:

Paint with small brushes: Keep your images minimal

An image is worth a thousand words and sometimes fewer words are better. When creating a container image, its final size is important. You should reduce the size of the image. If you’re able to, there are significant improvements to be had. Especially in the efficiency of your applications.

A smaller image will be cheaper to store; quicker to build; quicker to start and deploy. Even though we live in an age of decreasing storage and network costs, there is still a point where optimizing image size can result in significant performance and cost savings.

Where that point is for you if you have to decide for yourself, but you should consider the long-term storage cost of your images. You have two vectors you can alter: how big are the things you want to store and for how long do you want to store them? If you can reduce one of those, you might be opening the door for a lot of savings, and even more so if you can reduce both.

There is a backstage: Use multi-stage images

When creating a container image, reducing its size can improve performance with minimal effort. Smaller images are cheaper to store and quicker to construct, start, and deploy. One way to reduce the size of your container images is to use multi-stage images. They allow you to use a different image to create the container that is later used.

For example, if you want to package a Go application inside a container, your Dockerfile might look something like this:

FROM ubuntu:latest
RUN apt-get update
RUN apt-get install -y golang-go
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o main .
EXPOSE 8080
CMD ["./main"]

But, as I mentioned earlier, this approach means that we’re including a lot of low-level overhead by using ubuntu:latest. The ubuntu image doesn’t come with Go installed, which in this scenario is the only thing we need the operating system to have support for. So we have to install golang with apt-get and add it ourselves.

A better approach would be to use a dedicated base image for golang, instead of installing it on the ubuntu image:

FROM golang:1.19.5-bullseye
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o main .
EXPOSE 8080
CMD ["./main"]

But, when running our application, we do not need Golang itself; we only need the built binary. Meaning that we can reduce the size of the image even further. Instead of using the Golang image, we can use something like the Distroless image. A base image built by Google that contains no shells, package managers, or other unnecessary tools. So now we’re doing the build in an environment where we have access to golang, but then we’re getting rid of that environment for the final image that we’ll use and store:

FROM golang:1.19.5-bullseye as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o main .

FROM gcr.io/distroless/base-debian11
COPY --from=builder /app/main /
EXPOSE 8080
CMD ["/main"]

If we’re comparing the size of the images, the difference is stark:

$ podman images | grep example
example-large   latest   357bdcc6a3de   53 seconds ago      859MB
example-small   latest   93c28a8e05d5   6 minutes ago       19.3MB

The images are functionally the same. We’ve removed some of the unnecessary files that would be included in the base image for building.

You might be asking why we’re using distroless instead of building with Dockers scratch image. The reason would be that when you’re using the scratch image, you have to bring everything yourself. This allows you to control all dependencies you include in your images, but it might become cost prohibitive at a certain level. Let’s say you’re wanting to run a Go application with HTTPS, then you need to bring and manage your own CA certificates as well.

You could also explore using NixOS in your images. There’s some inspiration around the internet, but the major drawback with it is the steep learning curve. If you decide to use it, you would probably have to spend quite a lot of time learning and figuring out the nuances. But it’s nonetheless interesting and shows that there’s still room for improvements.

Working with slimmed-down images

Don’t fret that your images are left without a shell and those other tools that you might want to have when you’re working with containers. With containers, you can add temporary things for when you want to do something specific.

For example, with Docker, you can attach a privileged container to the one you want to debug that allows you to use a shell “on” that image. You can use the following to “add” a shell to a Docker container:

docker run -it --rm --pid=container:<container-id> --privileged alpine sh

If you’re using distroless you can also use the tag :debug which includes a shell with the image. But if you want to test with the exact image that your container is using, you can mount a shell in a volume. I’ve taken the liberty of using podman instead of docker in my example below, it doesn’t matter but I wanted to use podman. Use whichever you’re comfortable with:

# 1. We'll borrow the shell from busybox and mount it into our container:
$ podman create --name debugger busybox
$ mkdir debugger
$ podman export busybox -o debugger.tar
$ tar -C debugger -xf debugger.tar

# 2. Mount the shell as a volume in the image without the shell:
$ podman run -d --rm \
    --volume $(pwd)/debugger:/.debugger \
    --name container-with-shell-volume <image-without-shell>

# 3. Start the debugging session:
$ podman exec -it container-with-shell-volume /.debugger/bin/sh

# 3.1 We can add the stuff we brought in with the volume to the PATH of the container:
$ export PATH=${PATH}:/.debugger/bin

If you want to debug an image running in a Kubernetes environment, and you’re running version 1.25, you can use kubectl debug <pod> like this:

kubectl debug -it <pod-name> --image=busybox --target=<pod-name>

Don’t be a copycat: Copy carefully

When you’re creating your image, you’re going to copy things from a filesystem into the container. It’s not too uncommon to see in Dockerfiles things like:

COPY . .

When building your container image, it’s important to be mindful of what you include. You want to make sure you’re only including necessary items in your container. Including unnecessary files can lead to security vulnerabilities and bloated images. One example of this is including the .git folder in your container. This folder contains your entire git history, which can be sensitive information.

To avoid this, don’t include everything without thinking it through. It’s also important to set up a good .dockerignore file. It explicitly excludes certain paths, so you can be sure they are not included in the container.

The layers of a cake: Working with container layers

It’s not uncommon to treat a container’s filesystem as you would the one on your machine. But, the process of adding and removing files is a different one when working with containers.

A container is made up of layers. A layer is a snapshot of a filesystem. When we’re composing a container image we’re starting with what we call a “base image”, which is a filesystem with some things included by default. We then add later layers on that base with each instruction we’re adding to our Dockerfile. Meaning that every COPY or RUN produces a new layer.

The thing that’s easy to miss or misunderstand with this setup is the fact that each layer is persisted inside the image. So even if you try to change a layer in a second step when creating the image, you’re still going to have the old layer. You can’t see it in the final image. But it’s still present in the image, it’s weird, I know.

Let’s illustrate this with an example. We can add a big file to a base image, remove the file, and then build the image. We expect the final image to not contain the large file, right? But that isn’t the case.

# Make a 1gb file
$ mkfile -n 1g bigfile

# We'll create two images. One with a big file inside it which 
# gets deleted and one where the file is never added.
$ cat Dockerfile-bigfile
FROM alpine
COPY bigfile .
RUN rm bigfile
CMD [""]

$ cat Dockerfile-nobigfile
FROM alpine
CMD [""]

# Build the images
$ docker build -t example-big-file -f Dockerfile-bigfile ./

# Inspect the image size
$ docker images 
REPOSITORY             TAG        IMAGE ID       CREATED        SIZE
example-no-big-file    latest     0189f39b67b4   36 hours ago   30.9MB
example-big-file       latest     d7bcfbc21ff1   36 hours ago   1.1GB

The image is still bigger than it should be if we had indeed removed the file. This is also a good reason to be using multi-stage builds: you can leave those big but necessary files out of the final image. If we check the image’s layers we can see that there’s a big layer present, and inside that layer, the file is still there:

$ mkdir bigfile && cd bigfile
$ docker save example-big-file -o bigfile.tar
$ tar -xf bigfile.tar 
$ tree -sh
[ 384]  bigimage
├── [7.4M]  26cbea5cba74143fbe6f584f5fc5321543155aedc4a434fcaa63b643877b5a74.tar
├── [ 160]  509b4fd55d294eda09e22619eb5ac949390504fb2b36bd925542b25ac37688a6
│   ├── [   3]  VERSION
│   ├── [ 149]  json
│   └── [  71]  layer.tar -> [Error reading symbolic link information]
# this layer right here
├── **[1.0G]**  53a54b53cef65b43686ca79fe937709295991a423adb2b78e6806d100d0e1d34.tar
├── [ 160]  6567b1a77a5b44c7b98faeac98f023d76a35f6a86c07795482a9404979c90413
│   ├── [   3]  VERSION
│   ├── [1.2K]  json
│   └── [  71]  layer.tar -> [Error reading symbolic link information]
├── [5.0K]  8396d7b134f17813d33299990e09b312b353b82ff1076ca4d7c33ede628ab346.tar
├── [ 160]  9d57e2699cd2ce461696a31e96335b1bb586335440da69181ecd4b57dc0cfeb1
│   ├── [   3]  VERSION
│   ├── [  73]  json
│   └── [  71]  layer.tar -> [Error reading symbolic link information]
├── [1.0G]  bigfile.tar
├── [2.1K]  d7bcfbc21ff113faf8098d2942733b8c96a7f7d0acbb89911e69c58b65a99308.json
├── [ 365]  manifest.json
└── [ 116]  repositories

This is also the reason that you should never include sensitive data in your container image assuming that you can delete it after it’s used. If you want to explore even more of the content of your images, you could use something like Dive which lets you explore the layers of the container.

Rooting for safety: Avoiding running things as root

Although using containers provides some separation of concerns, it is still a good idea to avoid running things as root whenever possible, just like on your machine. Always strive to follow the principle of least privilege.

User management and permissions are central components of operating system security. Even within a container, it is important to consider these aspects. Anything you mount into your container will be subject to the rules that are put forth inside, not enforced on the outside. If you include sensitive parts of your filesystem inside your container, a root user inside the container will have access to those.

If a malicious or faulty process infiltrates your container, having root access within the container could allow it to import dependencies; hijack resources; and even upload data. Any action that can be performed by a root user can potentially be executed by a process with root access as well. Here’s a good article for understanding how root inside and outside of your container interacts.

In our previous example where we used the distroless base image, we could have looked on GitHub to see what types of users were available on that base image. distroless has a tag that allows us to use a base image with a non-root user, namely :nonroot. We can use the following code snippet to ensure that we are not running the application inside the container as root:

...
FROM gcr.io/distroless/base-debian11:nonroot
USER nonroot
...

If your image isn’t provided with a user, other than root, you can create a user from inside your Dockerfile. If you do this, ensure the permissions on your application are set to allow that user to only interact with the components it should have access to.

Contain your excitement: Store your containers in OCI registries

If you are using any of the cloud-hosted container storage solutions by Google, Amazon, or Microsoft, someone has already thought of this for you. For those who have to self-host their registries, it is important to use one that is OCI compliant. Options include JFrog Artifactory, Red Hat Quay, Docker Hub, or an open-source alternative like Harbor.

One reason to use OCI compliant registries is that you will be able to make use of new features as they are implemented within the specification. For example, the OCI artifact manifest specification now allows you to house attestations, SBOMs, and signatures together with your images, making it easy to manage them all from one location. But to be clear, not all OCI registries support this yet, but since it’s in the specification it’ll become a staple feature in the future.

Since you will accumulate quite a few containers over time, you should also decide on a retention policy. You need to decide what works for your scenario. Can you remove them after a period of time? Should you keep images that are still being pulled? What do you do with containers that have outdated base images?

Base-ing for Success: Keeping base images up-to-date

Base images are like the canvases that you use to create your application image. These images can be more or less feature-complete, and pre-built base images are constantly updated. Keeping your base images up-to-date can prevent the problems that arise from unpatched environments. Regularly updating your base images will help you stay current with the latest security patches and avoid any compatibility issues.

Top secret: Don’t store secrets in containers

Most applications rely on something that should be kept safe and out of the hands of malicious applications and actors. It could be a key to a database, an API key, or a set of credentials. Regardless, it has no business living inside your container.

The problem with keeping the secrets in your Dockerfiles is that your Dockerfiles will most likely live in something like a GitHub repository, which is no place to store those much-valued secrets. Secondly, a secret that’s made its way into one of the layers of your container can easily be extracted by somebody with access to your container image.

You must ensure that the secrets aren’t being built into your containers. Most container orchestration tools have some form of functionality that allows you to inject the secrets at runtime.

For example, Kubernetes allows you to mount secrets by defining them in a Kubernetes Secret resource and mounting them to the filesystem of the pod that your container is in. You could also look at something like Sealed Secrets, which can encrypt your Kubernetes secret resources when they are stored in your version control system (VCS).

You could also have a look at using Mozilla SOPS if you’re using something like Flux or one of the vault setups that are usually available through your Kubernetes engine hosting platform. If you need to run a vault yourself, there’s Hashicorp Vault.

Latest isn’t greatest: Use digests

Using the :latest tag when pulling container images is not a best practice. There are other options you can use that are better: version tags (semantic versions and release tags) or digests. The reason you want to be specific with the tag is that you want to easily be able to check what you have inside your containers.

Version tags should be immutable, but depending on the setup they sometimes aren’t. So say a dependency has released an image with version 1.2.3 and a SHA256 digest of abcdef, you would expect that every time you pull in the image with the 1.2.3 to get the abcdef image. But if the version tag is treated as a mutable tag, the digest of the version could change. In the best of worlds, each version or release should only point to a single digest.

A problem with using digests could be that you make it harder to debug issues if an image were to be removed from the registry. If you’re using mutable version tags, you would still find something when you’re pulling in the container.

You also have frameworks like SLSA which are putting a lot of weight on things like reproducible and hermetic builds, which are harder to guarantee without using specific digests for dependencies. In essence, pinning by digest in your Dockerfile is always a best practice.

When specifying digests directly, it’s important to have a strategy in place for bumping those digests. If you previously relied on using :latest to automatically pull in the latest patches, you can no longer do that. There are tools available, such as Dependabot, that can assist you in keeping track of upstream changes or Dexter (made by a Jetstack engineer) to help you figure out which digests to use.

If you’re building images for others to consume, ensure that you have a coherent tagging strategy and adhere to it. Treat tags as immutable: do not update tags that are meant to remain stable, and make sure that tags that are intended to change are updated when necessary.

Autographs are cool: Sign your images

Ensuring the authenticity and integrity of the container images you use is crucial in today’s digital landscape. By verifying the origin of images, you can protect your organization against supply chain attacks. You also provide users with the confidence that the images they are using are safe.

One approach to achieving this is by signing your images with a digital signature. This can help ensure that the images have not been tampered with and that they are indeed from a trusted source. Moreover, it also helps in verifying the integrity of the image.

Sigstore Cosign is a great tool that can be used for signing your OCI containers. It is a free and open-source project that provides a simple and secure way to sign container images.

It’s important to note that when signing an image, you should ensure that cosign signs an image referenced by digest rather than a tag for security reasons. Switching out the image under a tag between build and sign can create an attack vector.

Recipe for success: Generate SBOMs for your containers

This recommendation may seem daunting to some, but given the current state of cybersecurity and the growing need for software security supply chain transparency, it is becoming increasingly necessary. In fact, with the issuance of an executive order on this topic, it’s only going to gain more traction as time goes on.

To help you create your SBOMs, there are several tools available that you can utilize. For example, you might consider using Syft or Trivy, both of which are readily available on GitHub. Alternatively, you might opt to use one of the native tools provided by the CycloneDX project. Regardless of which tool you choose, the important thing is to prioritize the creation of your SBOMs to ensure the security of your setup.

If you feel like you need help, we have developed a toolkit for supply chain related concerns.

Software Supply Chain Toolkit

Get the insights and advice you need to find and fix vulnerabilities in your software supply chains.

Access the Toolkit

Cover photo by Daniel Gregoire on Unsplash

2024 Machine Identity Management Summit

Help us forge a new era of cybersecurity

Looking to vulcanize your security with an identity-first strategy? Register today and save up to $100 with exclusive deals. But hurry, this sale won't last!