Versioning Docker Images

If you're running Docker containers in the cloud, you're probably uploading them into a registry. If you're using Google Cloud, that would be gcr.io (Google Container Registry).

As you're iterating on your application, you'll need to push new Docker images to the registry. A natural questions that comes is how to version these images? You don't want to overwrite images using the same tag and it's cumbersome to keep track of increasing version numbers. A good versioning scheme is to use a git commit hash. So your image name might looks like this: gcr.io/kubeproject-172120/simple:88d38d9. If you take your git repository at this hash, you'll find the files that produced this exact image.

This sounds simple to implement. You get the last commit's hash, build the image using the hash as a tag and push it to the registry. There's one big inconvinience to this scheme though - you have to commit each change if you want to use a new hash (and you do, you don't want to overwrite your production image) and when you're iterating on an image, that gets tiresome quickly. One possible solution would be to commit "debug" images. These images might be tagged with something like this 88d38d9-debug. This is an image produced by taking the git repo at the 88d38d9 hash and making some modifications. You'll know not to include these images in your production files and it's ok to overwrite them as you're iterating.

So let's look at how all of this can be implemented. Let's say you're putting all your images in one directory. The contents of this directory might look like this:

$ tree
.
├── build-image.sh
└── simple
    ├── app.py
    ├── Dockerfile
    └── requirements.txt

1 directory, 4 files

The build-image.sh script builds the Docker image and pushes it to gcr.io.

The script itself looks like this:

#!/bin/bash

if [ -z "$1" ]; then
  echo "Usage: $0 <image_dir> [--debug]"
  exit 1
fi

IMAGE_NAME=$1

if [ "$2" == "--debug" ]; then
  # If we're debugging, we can push code that's not committed.
  APPEND="-debug"
else
  IMAGE_PATH=/$IMAGE_NAME/

  if git status . --porcelain | grep $IMAGE_PATH > /dev/null; then
    echo "You have uncommited changes to your Docker image. Please commit them"
    echo "before building and populating. This helps ensure that all docker images"
    echo "are traceable back to a git commit."
    echo
    echo "Or if you're just building a debug image, use the --debug flag."
    exit 1
  fi
fi

# Set image tag.
GIT_REV=$(git log -n 1 --pretty=format:%h -- ./$IMAGE_NAME/)

if [ ! $GIT_REV ]; then
  echo "You're trying to build an image that has never been committed." \
    "You need to commit at least one version."
  exit 1
fi

TAG="$GIT_REV""$APPEND"

# Set image repo.
PROJECT_ID=$(gcloud config get-value project 2>/dev/null)
DOCKER_REPO="gcr.io/$PROJECT_ID"

# Full image name.
IMAGE_SPEC="$DOCKER_REPO/$IMAGE_NAME:$TAG"

cd $IMAGE_NAME

if [ ! -f $DOCKERFILE ]; then
  echo "No such file: $IMAGE_NAME/$DOCKERFILE"
  exit 1
fi

echo $IMAGE_SPEC

docker build -t $IMAGE_SPEC .
gcloud docker -- push $IMAGE_SPEC

echo "Pushed $IMAGE_SPEC"

One thing to pay attention to is that the hash that we're using is the hash of the last commit to the directory that contains the container files. This way, if you want to push a production ready image (non-debug), you can only commit the files inside this directory and if you're still working on others outside of it, you can continue doing so.

Let's run the build-image.sh script:

$ ./build-image.sh simple
gcr.io/kubehub-172120/simple:12430ce
Sending build context to Docker daemon 4.096 kB
Step 1/9 : FROM ubuntu:latest
 ---> 14f60031763d
Step 2/9 : MAINTAINER Petko Minkov "pminkov@gmail.com"
 ---> Using cache
 ---> 5a371036a9e3
Step 3/9 : RUN apt-get update -y
 ---> Using cache
 ---> 8992277faa20
Step 4/9 : RUN apt-get install -y python-pip python-dev build-essential
 ---> Using cache
 ---> 9c0937facaf0
Step 5/9 : COPY . /app
 ---> Using cache
 ---> dd9f289c1f55
Step 6/9 : WORKDIR /app
 ---> Using cache
 ---> d93c62ac371a
Step 7/9 : RUN pip install --upgrade pip
 ---> Using cache
 ---> cb2f0a65c93f
Step 8/9 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> d8fd659127d9
Step 9/9 : CMD python app.py
 ---> Using cache
 ---> 8493c8ad1a01
Successfully built 8493c8ad1a01
The push refers to a repository [gcr.io/kubehub-172120/simple]
dacb974e8350: Layer already exists 
6c4d57527510: Layer already exists 
5348dff0fc19: Layer already exists 
738da70fc9f8: Layer already exists 
f665434eb0ee: Layer already exists 
26b126eb8632: Layer already exists 
220d34b5f6c9: Layer already exists 
8a5132998025: Layer already exists 
aca233ed29c3: Layer already exists 
e5d2f035d7a4: Layer already exists 
12430ce: digest: sha256:51cd80db604d1ffa5230289c1f3fe40d19b3b8dc2afb0a0c003713360b07d2c6 size: 2411
Pushed gcr.io/kubehub-172120/simple:12430ce

Great, now the image is pushed. But I always like to use a "trust but verify" policy, so let's see how can we dig into what's going on at the registry.

My image's name is this gcr.io/kubehub-172120/simple. Here's how I see the tags I have uploaded to gcr.io:

$ gcloud beta container images list-tags gcr.io/kubehub-172120/simple
DIGEST        TAGS                   TIMESTAMP
9b424f849df2  88d38d9-debug,e8bc006  2017-07-30T23:03:14
51cd80db604d  12430ce                2017-07-30T23:06:42

If you want to inspect the contents of the image, you can just run a shell, like this:

$ gcloud docker -- run -i -t gcr.io/kubehub-172120/simple:12430ce /bin/bash
root@27cfb042d947:/app# ls
Dockerfile  app.py  requirements.txt
root@27cfb042d947:/app# 

I've used this workflow when working with a Kubernetes deployment and it worked well for me. Hope it's useful for someone else too. Enjoy.

social