Active Alerts

No current alerts. All systems operational.

Alerts Archive

  • Sliger Network Maintanence Thursday - 6-7:30am

    ITS Networking has informed us that they will be performing switch maintenance on the building from 6-7:30am tomorrow (Thursday, December 14). We do not expect RCC Resources to be affected. However, if your jobs read or write data to the Lustre, or your jobs are running on certain …

  • HPC Login Node Maintenance - Saturday, Dec 7 from 7am - 9am

    This Saturday at 7am, we will conduct maintenance on two HPC login nodes.  We expect this maintenance to last two hours.  If you are logged-in to hpc-login.rcc.fsu.edu via SSH around 7am, your session may be disconnected. If do get disconnected, simply re-connect to another login node in our …

  • Planned Globus Downtime - Saturday, Dec 9

    We've received notice from our storage partner, Globus that there will be a brief downtime on Saturday, December 9 from 11am to 3pm (EST).  Here is the notice: As we recently announced, we are working towards making Globus data management solutions suitable for use with protected data and …

  • HPC Maintenance - HPC Rack 6 - Nov 27 & 28 (some compute nodes unavailable)

    Our next (and final) round of HPC maintenance begins on Monday, November 27, and will affect nodes in Rack 6.  The maintenance will last for two days (Monday and Tuesday).  A list of affected partitions and nodes is below.  We have already begun to drain jobs from these nodes in preparation for the …

  • COPMLETE - HPC Slurm Maintenance

    UPDATE - Nov 21 (10:40am) - Maintenance is complete.  Slurm is back online.  Thanks for your patience. We will conduct maintenance on the HPC Slurm Controller on Tuesday, November 21 from 9am - 11am .  During this time, job submission and control commands (sbatch, squeue, etc) will not …

  • HPC Maintenance - Mon Nov 13 thru Thurs Nov 16 - HPC Racks 8, 12, and 20

    Our next round of HPC maintenance begins on Monday, November 13, and will affect nodes in Rack 8, 12, and 20.  The maintenance will last for four days (Monday through Thursday).  A list of affected partitions and nodes is below.  We have already begun to drain jobs from these nodes in preparation …

  • ITS Network Maintenance (Sun, Nov 5 from 12am to 5am)

    TS is conducting network maintenance this weekend.  On Sunday (Nov 5) from 12am until 5am, ITS staff will perform a router repair that will affect intracampus network traffic.  This includes RCC resources, particularly those in Dirac (VMs, Lustre, some HPC compute nodes).  We do not expect any …

  • HPC Maintenance - HPC Rack 7 (some compute nodes unavailable)

    Our next round of HPC maintenance begins on Monday, November 6, and will affect nodes in Rack 7.  The maintenance will last for two days (Monday and Tuesday).  A list of affected partitions and nodes is below.  We have already begun to drain jobs from these nodes in preparation for the downtime. …

  • COMPLETED: Spear and HPC Maintenance

    UPDATE - Oct 26 (12pm)  - Maintenance is complete, and the Spear system is back online.  Thanks for your patience. UPDATE - Oct 24 (4:20pm)  - HPC Rack is back online, and we are currently adding nodes back to the Slurm scheduler.  The Spear system will remain offline through Thursday, Oct …

  • Intermittent PanFS / HPC Login Issues

    UPDATE: Oct 16 - 2:30pm --  We've opened a support request with our storage vendor, and we are working with them to come to a resolution. We are experiencing intermittent issues with our PanFS storage system.  This is causing timeouts when users attempt to connect to the HPC login nodes.  It …

  • HPC Maintenance - hpc-4-[1-40] (some compute nodes unavailable)

    Monday and Tuesday, October 16, and 17 , we will upgrade networking in HPC Rack 4. Some HPC compute nodes will be unavailable during this time, but most of the HPC will be available. We anticipate that this maintenance will be complete no later than  Tuesday, October 17 at 5pm . If you have …

  • Login Issues (VPN and Login Nodes)

    We are working on a few issues related to authentication today: If you are having trouble logging into  hpc-login.rcc.fsu.edu , you can instead use  hpc-login-35.rcc.fsu.edu .  A few other nodes in our login cluster are down today.  We anticipate that they will be back online within a few …

  • HPC Maintenance - hpc-3-[1-40] (some compute nodes unavailable)

    Today and tomorrow, we are upgrading networking in HPC Rack 3. Some HPC compute nodes will be unavailable during this time, but most of the HPC will be available. We anticipate that this maintenance will be complete no later than  tomorrow, Tuesday, October 10 at 5pm . If you have access to an …

  • RCC Monitoring Tropical Storm Nate

    The Research Computing Center staff is monitoring the progress of Hurricane Nate. We do not expect a direct hit at this time, but we are preparing for any possible impact scenarios, including a potential landfall near Tallahassee on or about Sunday, October 8. Tomorrow, October 6, we will …

  • COMPLETE: Maintenance on Export Nodes and Globus

    Update Oct 5, 2:45pm -  Maintenance is now complete.  Thank you for your patience. The RCC export nodes are offline from  Wednesday, October 4 at 9am  until  Thursday, October 5  at 5pm  for planned maintenance.  This includes Globus  any NFS-mounted shares  on virtual machines. We …

  • Lustre Issues - Data Loss Incident

    UPDATE - Sep 18 (11:50am) -  Lustre Data Loss:  https://rcc.fsu.edu/news/lustre-data-loss UPDATE - Sep 18 (9:30am) - Lustre and Spear back online, but there has been some data loss.  We are drafting a message now to send to users with details. We are working on issues related to the …

  • Hurricane Irma Update (9/9 5pm) - All RCC Services Offline

    UPDATE 9/9 - 5pm - Since the hurricane threat to Tallahassee has continued to increase over the past 24 hours, we are obligated to turn off  all RCC services .  Please stay tuned to our Twitter feed  for updates. UPDATE 9/9 - 12:30pm - Hurricane Irma continues to pose a greater threat to …

  • Slurm Controller Issues

    UPDATE (9:15am - Tue, July 25) -  Slurm issues are resolved.  We are continuing to monitor the system today in case we see any residual problems. UPDATE (7:35pm) - Slurm issues persist, and job submissions are currently not working.  Currently running jobs will continue to run, but you may …

  • RESOLVED: Slurm Controller Issues

    We have corrected the issue with the Slurm controller, and the system is back online.  Thank you for your patience. We are currently experiencing issues with the Slurm controller.  Submitting jobs and other Slurm commands are unavailable.  We are looking into the issue and will resolve it as …

  • HPC Issues with backfill and backfill2 partitions

    We are currently examining issues on the  backfill  and backfill2  partitions.  Users attempting to submit jobs to either of these partitions may see their jobs wait indefinitely (or for a very long time), with the reason being shown as "Priority". As soon as we have some updates on the …

  • We are upgrading MATLAB

    We are currently working on upgrading MATLAB to the latest version (R2017a).  You may experience some issues if you or your jobs attempt to use it while we are working on it.

  • SYSTEMS ONLINE - HPC, Spear, and Lustre Export Node

    We have fully completed all of the planned maintenance for the HPC, Spear, and Lustre Export nodes.  All of our services are back online, including: HPC Spear Globus Lustre Export Nodes This upgrade includes a number of end-user changes on our systems.  The major …

  • HPC, Spear, and Lustre Export Node Maintenance

    UPDATE: May 17 @ 2:45pm Globus is now available.  If you use Globus to transfer data to and from our storage systems, you can resume operations. UPDATE: May 16 @ 8:50am We are making progress on the software upgrade, and we are on-track to restore HPC availability early next …

  • General Access Spear Service Restored

    UPDATE Monday, February 20, 2016 - The General Access Spear nodes have been restored.  Thanks for your patience while we worked to bring these back online.   We disabled our General Access Spear nodes yesterday (Spear 1...8, available via  spear-login.rcc.fsu.edu ) to perform some maintenan…

  • Brief (< 15 min) Lustre downtime - Fri at 7am

    There will be a brief service disruption for our Lustre storage system on Friday, December 16 from 7am until 7:15am. The storage system itself will not be affected, but we need to reconfigure a network switch attached to the service.  This will require disconnecting the main distributed …

  • RESOLVED - Lustre Issues

    UPDATE - Nov 18 - 11am - Most Lustre-based services are now resolved.  Please let us know if you have any continuing issues: support@rcc.fsu.edu. UPDATE - Nov 18 - 10:45am - We have discovered the cause of the issue, and are working on resolving it. --- We are having issues with …

  • Maintenance on core router in Dirac data center 11/11/2016

    We will perform a software upgrade on our core Nexus router in the Dirac data center on Friday November 11th around 7AM. We performed a similar upgrade on an other nexus router and did not encounter any problems. The total upgrade can be performed in 10 - 30 minutes, with an anticipated unavailabil…

  • Off-campus access slow or unreliable

    The FSU campus network has been experiencing periodic bouts of slow or unreliable connectivity with the Internet for the past week or two. Some of our off-campus VPN users may experience slow connecitivty to our systems, or may not be able to connect.  If you experience this, please try again …

  • Hurricane Matthew Alert

    Update - October 7 - 10am -  We don't expect any impact from hurricane Matthew over the coming days, but RCC staff members will stay in standby mode in case the path of the hurricane changes First Alert - Octobe 4, 10am -  While it looks like hurricane Matthew will not have any …

  • Latest Update - Lustre Restored, other items

    We have completed restoration of the Lustre filesystem, and the system is now operational. Nearly all data was recovered during the restoration. The copy of data from our backup went much slower than we anticipated, but completed without error A very small number of files on the system that …

  • VMs in Virtual Cluster RESOLVED

    UPDATE 11:45AM -  The issues with the VM cluster are resolved.  Thank you very much for your patience. There was an issue with the underlying storage system.  Systems staff are meeting today to evaluate ways to mitigate future instances of this particular storage issue.  We will keep you …

  • VMs in Virtual Cluster

    There is a storage issue on our systems this morning affecting several VMs in the virtual machine cluster.

  • Hurricane Hermine Recovery - Spear Online; Lustre recovery proceeding

    UPDATE - Thurs, Sept 15, 3:20pm - Lustre at 36% recovered At this time, only three services remain affected by Hermine: Lustre data -  We have recovered 36% of the data on Lustre that was affected by the loss of our OST.  This process is moving slower than expected, and will likely …

  • We're conducting HPC Maintenance July 13 - 20 #rccupgrade2015

    Our transisition from MOAB to Slurm and upgrade occurs this week. Although the Login Nodes are available, the HPC scheduler will be offline during this period.