- Mobility of compute. The ability to maintain a workflow and be confident that you can execute the workflow on different hosts and operating systems. Once your software environment is built as a container, you can run it on the RCC, The Open Science Grid, cloud providers such as Digital Ocean or Amazon, or any other environment that supports containers.
- Reproducibility and replicability. Your software can be run with the EXACT same libraries and parameters by other researchers on different compute resources.
- Control over your environment. Currently, if you wish to run custom software or use a custom configuration on the HPC or Spear, you must request that it be installed by a systems administrator. Containers give you complete control over your software, configuration, and environment. You are "root" inside your container.
What is a container?
A container is a a configurable software environment for a computer process that you, the user control.
Start with a container...
...install an operating system and all your software insde the container...
...and then put that container on the compute resource of your choice where it will be run. The compute resource does not need to know anything about what is inside of the container. Such resources might include the HPC, the Open Science Gride, Amazon Cloud, or many other environments.
- Docker is a software technology providing containers.
- Singularity is also a software technology providing containers but it is specifically designed for High Performance Computing.
The HPC now supports running Singularity containers. Docker containers are easily converted into Singularity containers. This is the recommended worfklow for using containers at the RCC.
Some zeitgeisty IT concepts
Before we begin, there are a few software development concets you should be aware of:
- In software engineering, continuous integration (CI) is the practice of merging all developer working copies into a shared mainline frequently (e.g., several times a day)
- Continuous delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time.
- DevOps (a clipped compound of "development" and "operations") is a software engineering culture and practice that aims at unifying software development (Dev) and software operation (Ops).
Exercise: Create a Docker Container and Deploy It
Here is an exercise that will introduce you to the workflow for using containers. Specifically, in this exercise, you will:
- Create a Docker container that runs a program on your personal computer
- Create a build file for that Docker container and add it to a GIT repository.
- Use that build file to create an up-to-date Singularity container for execution on the HPC.
- Run our container on some different resources.
You will need:
- An account at Docker Hub.
- A laptop or desktop computer with Docker installed.
In this video we'll create an instance (a container) of the latest Ubuntu image available on Docker Hub. We'll install and test some software. This is the 'Development Phase' or our software life cycle.
- Containers are meant to be stateless. So, writing results to a file inside the container is a bad idea. Later in the process we'll solve that problem by writing our results to the 'real' file system, external to the container.
- Docker uses a layered filesystem, and you can see this as the Ubuntu image is being pulled from Docker Hub. Images are additive, and they are built on top of each other. This is more problematic for a research scientist that wants to insure reproducibility and replicability.
- If someone changes a layer beneath your image then your image has effectively changed. Singularity solves this problem.
We can use Docker build files to automate the process that was done during the Development phase. A Docker build file is simply a text file that describes how to build a Docker Image. The Docker Image is then deployed as a container running in an environment.
This effectively usurps the need to create a How to install this software guide for your software or environment. The Docker build file automates and documents all of the steps necessary to install your software and configure your environment.
By using GIT we can share the build files, collaborate with others on the build process, and maintain an auditable history of changes of the build process for the software.
Docker build files utilize an 'entry point' or 'run command' which describes the main process designed to run inside the container.
- We will use the ENTRYPOINT directive for better integration with Singularity later on.
- It is considered best practice to wrap your entrypoint command in a custom shell script.
- An ENTRYPOINT or CMD is not required when building an image, but is most often included.
The video below demonstrates the following:
- Cloning an new, empty Github repository (https://github.com/dcs02d/singularity_workflows.git).
- Creating an entry point shell script to run sysbench inside of a Docker container.
- Creating a new Docker Build file to build the image and run the entry point shell script.
- Commiting these changes to the Git repository, and pushing them to GitHub.
The next video demonstrates how to build a docker image from the build file we created above and push the image to DockerHub. Note, for this video, I used an account that I had already created on Docker Hub.
Now that our build file is setup and pushed to Docker Hub, let's run a test container on a desktop. To do this, you will need to install Docker on your personal computer.
Notice in the docker run command that we 'map' the results file to a file on the actual file system. This way, our container (which should be stateless) can disappear once execution completes, and we are left with only the results. You can map individual files or folder on the host system to a file/folder inside your container.
As a part of the test I'll delete the LOCAL image that was created during the build process and pull down my image from Docker Hub:
Deploying to a cloud provider using Docker
We're ready to deploy using Docker. We can easily use any popular cloud provider to deploy to. This video demonstrates the process for deploying our image and running a container on a DigitalOcean node:
Deploying to the RCC HPC with Singularity
Singluarity is a container technology that makes it easy to run Docker containers in environments like the RCC HPC Cluster. While it is possible to create Singularity containers without Docker, we recommend maintaing Docker containers and importing them into Singularity for several reasons:
- The "mind share" around building and maintaining Docker containers is robust, and there is a fair amount of documentation online for using Docker (vs Singularity, which is a niche container platform)
- Docker containers are more portable, since more platforms and providers can run Docker than can Singularity.
That said, Singularity is a very useful wrapper for Docker containers in HPC environments:
- When inside a Docker container, your privilege level is 'escalated to root'. In a Singularity container, your user identity is the same as on the host environment. This makes sharing data between your home directory on the host and your container easy. In fact, Singularity automatically mounts your home directory inside of the container.
- Singularity images can be saved as a single file, versus Docker containers, which are saved as multiple layered files. Once you have saved a copy of a singularity image, it will remain the same in perpetuity and ensures that your workflow can run on any system that supports Singularity.
- The systems team would not be happy about anyone accessing the storage system with root access!
The following video demonstrates running a Docker container with a shared volume between the host and the client and writing files to a directory. Notice the file ownership on the host is "root" when it is written by a process inside of the container.
Deploying an image from Singularity Hub
- Singularity hub builds automatically from the file named Singularity.
- The format for Singularity Recipe (build) files is similar to that of Docker build files.
- Singularity supports a more complex format that allows building and running multiple applications from a single Container image (whereas Docker best practice is to run a single process from each container).
The following video demonstrates how to login to Singularity Hub using a GitHub account and set up a 'collection'
When a collection is set up, Singularity Hub will watch my GIT repository. Anytime a commit is made to the Singularity file in the Git repository, Singularity Hub will automatically rebuild the image. This is a simple implementation of the continuous deployment principle mentioned at the beginning of this article.
Deploying a Singularity Container to the Open Science Grid
- Review the documentation on Singularity and the Open Science Grid
- Create an account with OSG
- Open a ticket and request the the cowsay Docker image created above be imported to the OSG library (OSG does not allow users to download Singularity images directly; a support request must be filed. Contrastingly, FSU RCC does allow you to pull Singularity images onto the cluster).