This package is a set of C programs designed to perform K-Means clustering in parallel. The program supports OpenMP shared-memory parallel systems as well as MPI distributed-memory parallel systems.
Using parallel-kmeans on RCC Resources
The parallel-kmeans program comes in three varieties. These include a multicore parallel version which uses OpenMP for its parallelization, a distributed parallel version which uses OpenMPI and a sequential version. These can be accessed from the HPC system using the following commands:
module load gnu-openmpi # For OpenMP Parallel Version omp_main OPTIONS -i INFILE -n N_CLUSTERS #For MPI Parallel Version mpi_main OPTIONS -i INFILE -n N_CLUSTERS # For Sequential Version seq_main OPTIONS -i INFILE -n N_CLUSTERS
A complete list of the options available can be found on the main website here or by typing one of the above commands with the
-h option and no INFILE or N_CLUSTERS.
Running Parallel K-Means through SLURM
It is possible to submit a job for parallel-kmeans using the SLURM submission system. This can be done using the following code:
#!/bin/bash #SBATCH --job-name="ParallelKmeans" #SBATCH --mail-type=ALL #SBATCH -n 4 #SBATCH -p genacc_q #SBATCH -t 14-00:00:00 #SBATCH --mem-per-cpu=3900M module load gnu-openmpi ## For the MPI Version mpirun -np 4 mpi_main OPTIONS -i INFILE -n N_CLUSTERS # For OpenMP Parallel Version omp_main OPTIONS -i INFILE -n N_CLUSTERS