UPDATE 10/11 @ 10:15am:
Message from RCC Director, Paul van der Mark:
Dear Sliger colocation customers and RCC users,
Yesterday, we experienced an unplanned power outage in our Sliger data center. It was an unfortunate fluke of standard maintenance and an undocumented feature. During our renovation last spring, the contractor installed a new fire-suppression system. In addition, a new safety feature was added, connecting that system with the emergency power-off switch on our UPS. However, the contractor did not add that feature to the wiring diagrams, so when Orr Protection performed a routine test on the new system, it turned off the UPS and thereby turned off power in the server room.
RCC staff returned power to most of the colocation customers within 15 minutes. But unfortunately, many customers still had to come to the data center to reset or turn on their equipment. Because of its complexity, the RCC HPC system took several hours to [become] fully operational.
In the afternoon of Monday, October 10th, FSU's Department of Environmental Health and Safety put a permanent fix in place for the issue. We, therefore, are confident that this was a unique occurrence. We are genuinely sorry for any inconvenience this has caused.
Best Regards,
Paul--
Paul van der Mark, PhD
Director, Research Computing Center
Information Technology Services
Florida State University
Phone: 850.644.0193
its.fsu.edu | rcc.fsu.eduhttps://fsu.zoom.us/my/pvandermark
UPDATE 10/10 @ 3pm:
- The HPC and Spear clusters are online, and the Slurm scheduler is accepting jobs.
- The "/hpc" VPN profile for students, guests, and any other non-staff members is up.
- Open OnDemand is up.
- The self-service web portal and webservices (RCCTool) are up.
- All RCC managed customer VMs and other hosted systems are up.
- Globus is up.
- Our storage export servers are up.
If you had jobs running before this morning, you will need to resubmit them.
UPDATE 10/10 @ 12:45pm:
We are making progress restoring service.
ONLINE
- The HPC and Spear clusters are online, and the Slurm scheduler is accepting jobs.
OFFLINE
- The "/hpc" VPN profile for students, guests, and any other non-staff member is still down.
- Open OnDemand is down
- The self-service web portal and webservices (RCCTool) are down.
- All RCC managed customer VMs are down.
- Globus is down.
- Our export servers are down.
At approximately 9:05AM this morning, the Sliger Datacenter suffered a power outage. The power is back online, but all RCC systems were affected. We are working to bring everything up as quickly as possible. Colocation customers were affected as well.
More details to come. We will update this page throughout the day until everything is back online.
If you have any specific systems that you need addressed, please email support@rcc.fsu.edu.