I had a simple need recently. I wanted to install a bunch of software in a container, discover information or metadata, and then label the container with it. I also wanted to build it across a few architectures. Okay, that made it slightly less than simple! Why would I want to do this? Because of being able to inspect the config of a container (e.g., see this config). With this strategy I could generate matrices of builds for a container based on its contents without needing to pull it.

Too long didn’t read

See the repository vsoch/post-build-container for how I went about doing this. There are two example workflows - one is a “simple” approach that is intended for one architecture (and seems to work reliably and quickly) and the second attempts to build multiple for different arches, and the speed of that I’ve seen vary. For this approach I used buildx and a strategy to re-build and label only on push to main, and might try something else better suited to your needs. You also can derive build args (that might feed into environment variables too). I just happened to want labels!

How does it work?

This is fairly simple, so I don’t need a very long post! You basically start with a Dockerfile of interest. In my case I used spack as a dummy example, because it’s pretty handy for research around package managers or generally scientific software. Okay - so let’s say that this is a container base that I want to be able to inspect, and easily discover compilers or packages inside to do more interesting things with. I can do that with labels as we saw above, but how do those labels get there? It’s a bit of a catch-22 because you need the labels before the build, but the labels are derived during the build.

Build, Label, Build, Deploy!

The solution is that ☝️!

Build

We first build the container, that might look like this:

$ docker build -t the-container .

Label

Then we might issue some special command to extract a label we want. In the case of spack:

$ docker run -i --rm the-container spack compiler list --flat > compilers.txt

Note that “–flat” is a special flag I have added to a personal branch of spack, and it does not exist in spack proper. Yes, we probably don’t need to dump it into a file, but I find this easier to do. We then generate our labels environment variable, and save for the next step:

labels=$(echo $(tr '\r\n' ',' < compilers.txt))
labels="org.spack.compilers=${labels}"
printf "Saving compiler labels ${labels}\n"         
echo "compiler_labels=${labels}" >> $GITHUB_ENV

Build

And now for the cool part! We now just simply create a new Dockerfile, and build from the image we just created and add the labels.

$ echo "FROM $" > Dockerfile.tmp
$ docker build -t the-container -d Dockerfile.tmp --label ${labels} .

Importantly, notice we are using the Dockerfile.tmp generated on the fly. You don’t technically need to re-use/overwrite the container name, but I didn’t need it again so I did.

Deploy

And then you can push that container!

# note that "the-container" will obviously not work :)
$ docker push the-container

That’s it! You can check out the GitHub action in the repository linked above, or see the package generated. Check out an example workflow run if you are interested.

Challenges

The above steps (without buildx or different architectures) worked quite easily, however when I added buildx I ran into some issues. When the build step uses an action to perform the build, it seemed that when a platform was specified, the image wasn’t readily available to use after (e.g., docker images did not show what was just built). I tried with load, but the only thing that seemed to work easily was to just push it. If I gave up the different arches it would have been much easier, but I didn’t want to, so my solution is simply to just label and do the final deployment on your merge or workflow dispatch. I suspect there is a way, and please open an issue if you find it!