RCC News

Signup for our Newsletter

We publish a monthly newsletter which showcases our events and opportunities at the Research Computing Center.

Signup »

View Archives

News Archive

Bin Chen publishes more proof of General Relativity

Our very own Bin Chen has published a paper in Scientific Reports which offers more proof of Einstein's general theory of relativity.

Bluewaters $50,000 Scholarship

Applications are now open for the NCSA Blue Waters Graduate Fellowship. The program includes 1 year fellowship $38,000 + $12,000 tuition allowance + 50,000 node hours on Blue Waters HPC System.

Consolidating Condor into Slurm

As of today, over 229,300 jobs have successfully run in the Slurm scheduler. Given the stability and flexibility of the new scheduler, we are consolidating the Condor system into the Slurm. This means that jobs you previously submitted to Condor, you will now submit to a HPC partition named Condor.

Announcing the rcctool

We have created a CLI tool to allow you to see your partitions, account information, and reset your password. Simply run rcctool when logged into the HPC.

Lustre and Spear Status

Update - Oct 14 - 4:45pm

Our Systems Team has been working hard today to restore the Lustre storage service. As of 4:45pm today, the Lustre system is online but in recovery mode. It is currently working on Spear nodes, but not yet on export nodes.

This means that Spear is now online. Lustre access from the HPC or from other systems is not functional yet, though. We will keep you posted as to progress.

Thanks again for your patience,

Update - Oct 14

HPC Services Restored - Sliger Cooling Issues

A cooling issue occurred in our data center earlier today. As of 6pm, we are bringing nodes back online.

HPC Cheat Sheet

We've published a handy HPC Cheat Sheet. Download and Print it if you want a quick reference.

Scheduler Update: Memory Limits

Recently, we noticed a substantial amount of nodes crashing, causing job failures. We have been investigating this issue and have determined that the problem is related to memory issues. Jobs have been filling up all available RAM and swap partitions. Under Moab and RHEL 6.5, this issue did not show up, since offending jobs would get killed by the Linux kernel. Currently with the new scheduler, these jobs cause compute nodes to crash and reboot. Any running job on those nodes will fail without any meaningful error message sent to users ("node failure").

HPC Status Update

We've been tuning, tweaking, and fixing the HPC since we upgraded the system in July, and we have lots of updates to report on.

We're Hiring (SysAdmin)!

The RCC is hiring a systems administrator to work on our team at FSU. If you're interested, you should apply!