RCC News

Signup for our Newsletter

We publish a monthly newsletter which showcases our events and opportunities at the Research Computing Center.

Signup »

View Archives

News Archive

HPC Status Update

We've been tuning, tweaking, and fixing the HPC since we upgraded the system in July, and we have lots of updates to report on.

We're Hiring (SysAdmin)!

The RCC is hiring a systems administrator to work on our team at FSU. If you're interested, you should apply!

Status Report on the HPC

Here is a few updates on the HPC, including the state of accounts, job preemption, and other items.

Slurm Scheduler Issues Resolved

UPDATE 7pm: The HPC issues are resolved. Thanks for your patience.

We are currently experiencing issues on the HPC where Slurm commands are not responding. Our Systems Team is working to restore the service, and we will keep you posted as soon as we resolve the issue.

OpenMPI: Major Memory Leak Bug

UPDATE: Fri, Jul 24 - 9:00pm - We have completed compiling and redeploying the new version of OpenMPI. All systems are now running OpenMPI v1.8.7.

We have just been notified of a major memory leak with OpenMPI v1.8.6 (the current version on the HPC). This is a likely reason that many nodes have been crashing and disrupting jobs on the HPC this week.

HPC Upgrade Complete

The HPC is back online, and the new Slurm scheduler is generally available to all users.

If you experience any issues submitting your jobs, or if you have questions about using Slurm, please let us know: support@rcc.fsu.edu.

You may also want to refer to our online materials listed in our prior announcement.

Upgraded Software

Our Applications Team has been busy upgrading, recompiling, and testing software on our systems. We are pleased to provide upgraded versions of 126 packages.

In addition, we are upgrading the High Performance Computing cluster to RedHat Version 7.1, which includes a new Linux kernel and many other enhancements

Here are the packages, in alphabetical order:

System Upgrades (including Slurm) to occur Monday, July 13

We have scheduled our Slurm and RHEL7 upgrade to occur on Monday, July 13 through Sunday, July 19. During this time, our HPC and Spear systems will be unavailable.

RESOLVED: NoleStor Issues

UPDATE 4:36 pm - These issues are resolved. Thank you for your patience.

If you use NoleStor, you may notice another service disruption today. This is due to an issue on our server that we are working with the vendor to correct.

Paritally Resolved: SSH issues with Spear and Owner-Based Login Nodes

Yesterday evening, we started experiencing issues with SSH on our Spear systems and owner based login nodes. SSH is functional, but ECDSA keys are not.