UPDATE: Jan 19, 2023 - 12pm - All residual issues have been resolved with the new SpectrumScale GPFS 3 PetaByte storage system. Thanks for your patience.
UPDATE: Jan 5, 2023 - Thanks for your patience while we worked to get the Globus and Spear services back online. We are pleased to report that both are operational now.
Globus users, please note that we had to change the endpoint names. If you are an existing Globus user, you will need to update your client. The names have changed as follows:
fsurcc#gpfs_home ➡ fsurcc#GPFS#home
fsurcc#gpfs_research ➡ fsurcc#GPFS#research
fsurcc#archival ➡ fsurcc#archival#1
fsurcc#archival-2 ➡ fsurcc#archival#2
In addition, due to new security requirements from the vendor, you will need to link your RCC user credentials with your Globus account. Instructions for doing so are on our website.
We are also currently experiencing a latent issue where the GPFS file system periodically freezes for a few seconds at a time. With the new storage system running over InfiniBand RDMA fabric, we have noticed long waiting processes on some nodes. There may be occasional interruptions while we work with the vendor to resolve this issue.
UPDATE: Jan 3, 2023 - We are pleased to report that the Spear cluster has been brought back into service. The only remaining service that is unavailable is Globus. We will let you know as soon as it is back online.
UPDATE: Jan 2, 2023 - We are pleased to report that the following services are back online as of now:
- The High Performance Computing Cluster
- Open OnDemand (https://ood.rcc.fsu.edu)
- Customer VMs and hosted hardware
The following services need some additional work; we expect to have these online early to mid week:
- Globus services
- Spear servers
UPDATE: Dec 30, 2022 - The storage system upgrade is mostly complete. All user and research data is intact, but we would like to have a few more days to test some of the service changes. So, we are extending the downtime to Monday, January 2.
UPDATE: Dec 16, 2022 - We previously reported that the Archival Storage System would be minimally impacted by the scheduled outage. Since then, we have decided that we need to shutdown the filesystem export servers (export.rcc.fsu.edu) to upgrade the software on them. These expose the Archival Storage System via SFTP and Globus, so we need to amend our prior announcement to state that the Archival system will be unavailable too.
We will post a notification as soon as the Export Nodes are returned to service.
We are pleased to report that we have received, installed, and configured a new 3 Petabyte IBM SpectrumScale (GPFS) storage system.
We have completed our initial file transfer from the current system to the new one. Over the coming months, we will need to perform periodic synchronizations of data from the current system to the new system. This may have a performance impact for jobs running on the cluster. We will make every effort to minimize this impact and appreciate your patience.
We will need to pause the old storage system for a final synchronization and to upgrade the software on the compute nodes. We have chosen the week after Christmas in order to minimize impact to ongoing research. The tentative dates are Monday, December 26 through Friday, December 30.
This downtime will affect the following services:
- The HPC cluster.
- The Slurm scheduler will stop accepting new jobs at 5pm on December 23, and all running jobs will be killed at 8am on December 26.
- The GPFS storage system
- The HPC login nodes, Open OnDemand, Spear nodes, Export nodes, and all other public facing servers will be unavailable starting at 8am on December 26.
- The Archival storage system
- The Export nodes will be unavailable starting at 8am on December 26. These may return to service sooner than the other services, depending on how things go.
The following services will see minimal impacts:
- Customer VMs
- Colocation customers' servers
If you have any concerns or issues about the proposed downtime, please let us know: firstname.lastname@example.org.