We are planning an upgrade to the HPC software stack during the week beginning August 16. Unlike previous software upgrades, we will perform a rolling upgrade to minimize downtime. This means that parts of the cluster will remain online while other parts are being upgraded.
We plan on performing the maintenance during the week of August 16 - 20. We will send out an email and update this webpage when we have a detailed draft schedule prepared (est 2-3 weeks).
We are making the following major changes to our software stack to increase stability, fix bugs, and enhance the usability of the HPC:
- Upgade CentOS from v7 to v8.3
- Upgrade Slurm from v20.02 to v20.11 (release notes)
- Upgrade software packages to newer versions (full list forthcoming)
- Upgrade our Open OnDemand web portal from v1.7 to v2.0 (release notes)
- Make a major change in how we install new user software packages, like NetCDF and R. (details below)
Change to The upgrade process
The process for upgrading the cluster will require us to re-install the operating system on every compute node. To facilitate minimal downtime during the week of the upgrade, we will reinstall only small sets of nodes at a time, while keeping most nodes online. Once nodes have been reinstalled, they will immediately be put back into production and the next set of nodes will be reinstalled.
Since we will have to update all nodes, jobs will get killed periodically throughout the week. This also means that for a short time, the cluster will consist of two sets of nodes running two different software stacks, the old CentOS7 and the new CentOS8 builds. Users with more complicated scripts and workflows may want to wait until the upgrade is complete before submitting jobs, especially if your jobs utilize multiple queues/partitions.
The Login Nodes will be available, and the Slurm job scheduler will accept jobs throughout the upgrade. Also, access to our storage systems, GPFS and Archival, will remain online throughout the upgrade process.
Changes to the HPC software stack
We are improving the way that we organize the libraries under a standard scheme. We already support multiple versions of some software on the HPC (e.g. "R" and MATLAB), but the folder structure has not been consistent and can lead to confusion.
We encourage users to rely on environment modules when possible, which RCC staff keep up-to-date with the correct paths for libraries. If your code requires loading libraries for compilation, we encourage you to use the pkg-config tool instead of specifying the full path in your Makefiles or scripts. Refer to the pkg-config documentation on our website for more details. pkg-config is available on the HPC now, so you can refactor your custom scripts right away.
If you have any questions or comments, please reach out to us: email@example.com.