R (statistical computing)

Introduction

The "R" Tool is an open-source, popular, and fully-featured statistical application and programming platform. We have multiple R versions installed on HPC.  These versions include:

  • R 4.0.0
  • R 3.6.3 (For Intel Systems with Intel MKL Library)
  • R 3.6.1
  • R 3.5.2
  • R 3.5.1
  • R 3.4.0
  • R 3.2.5
  • R 3.1.3
  • The Default R Module uses R 3.5.2

Running R on RCC Systems

To enter an interactive R session on HPC or Spear, simply "R" command.  The default version that loads is Version 3.4.0.  If you wish to run an older version, you must load the appropriate module before running "R"; e.g.:

# loads version 3.2.5
module load R/3.2.5

# loads version 3.2.0
module load R/3.2.0

# loads version 4.0.0
module load R/4.0.0

Type 'q()' to quit R.

To submit R jobs to SLURM, refer to the following example submission script:

#!/bin/bash

#SBATCH -n 1
#SBATCH -J "MyRJob"
#SBATCH -p backfill
#SBATCH -t 1:00:00
#SBATCH --mail-type=ALL

module load R/3.2.5

R CMD BATCH yourRscript

The yourRscript is a text file where you have saved R commands to run.

For more information about R, refer to the online documentation

Installing R Packages in Your Home Directory

Though RCC hosts many packages on our systems (see list below), there may be times when you need a specific package we do not currently have. In this case, you can install this package locally in your home directory using the following instructions (no need to open a ticket!):

  1. Type: module load R
  2. Type: R
  3. Type: install.packages("PACKAGE_NAME_HERE")
  4. This will present you with the following information: Installing package into ?/opt/hpc/R/R-3.5.2/shar/R/library? (as ?lib? is unspecified)
  5. You will then see a warning:  Warning in install.packages("abc") : 'lib = "/opt/hpc/R/R-3.5.2/share/R/library"' is not writable. This is normal.
  6. You will then be asked:  Would you like to use a personal library instead? (yes/No/cancel). Type yes
  7. You will then be asked:  Would you like to create a personal library ?~/R/x86_64-redhat-linux-gnu-library/3.5? to install packages into? (yes/No/cancel). Again, type yes
  8. You will then be shown: --- Please select a CRAN mirror for use in this session ---
  9. This will bring up a list of CRAN mirrors you can use to download and install your library.

Available Packages

RCC has an extensive list of packages for R available.

List of Available R Packages

  • Akima
  • acepack
  • ade4
  • ald
  • assertthat
  • backports
  • base
  • base64enc
  • BH
  • bitops
  • boot
  • Brew
  • caTools
  • checkmate
  • chron
  • class
  • cluster
  • coda
  • codetools
  • colorspace
  • compiler
  • crayon
  • curl
  • datasets
  • data.table
  • DBI
  • dichromat
  • digest
  • doParallel
  • doSNOW
  • evaluate
  • fdasrvf
  • fields
  • foreach
  • foreign
  • Formula
  • futile.logger
  • futile.options
  • gdata
  • grDevices
  • graphics
  • grid
  • ggplot2
  • ghyp
  • gplots
  • graph
  • gridExtra
  • gtable
  • gtools
  • hexbin
  • highr
  • Hmisc
  • htmlTable
  • htmltools
  • htmlwidgets
  • httr
  • hwriter
  • iterators
  • jsonlite
  • KernSmooth
  • knitr
  • labeling
  • lambda.r
  • Lattice
  • latticeExtra
  • lava
  • lazyeval
  • locfit
  • MADAM
  • magrittr
  • manipulate
  • maps
  • markdown
  • MASS
  • matrix
  • matrixcalc
  • MatrixModels
  • matrixStats
  • memoise
  • methods
  • mgcv
  • mime
  • mnormt
  • munsell
  • mvtnorm
  • numDeriv
  • nnet
  • openintro
  • openssl
  • parellel
  • plogr
  • plyr
  • praise
  • psych
  • qrLMM
  • quantreg
  • R6
  • RColorBrewer
  • RcppArmadillo
  • Rcpp
  • RCurl
  • reshape2
  • rjags
  • rlang
  • rpart
  • RSQLite
  • scales
  • scatterplot3d
  • sendmailR
  • snow
  • spam
  • SparseM
  • spatial
  • splines
  • statmod
  • stats
  • stats4
  • stringi
  • stringr
  • survival
  • swirl
  • testthat
  • tibble
  • timereg
  • tools
  • utils
  • viridisLite
  • viridis
  • XML
  • xtable
  • yaml

Bioconductor on RCC Systems

Bioconductor is a very extensive set of libraries and tools written in R for use in R programs which are designed to perform a myriad of different tasks common to bioinformatics data analysis. RCC has an extensive list of Bioconductor packages installed for use on RCC Systems. A complete list of these follows in the section below.

List of Available Bioconductor Packages

  • affiyo
  • affy
  • annaffy
  • annotate
  • annotationDbi
  • Biobase
  • BiocInstaller
  • BiocGenerics
  • BiocParallel
  • biomaRt
  • Biostrings
  • DelayedArray
  • DESeq
  • DESeq2
  • DEXSeq
  • gcrma
  • genefilter
  • geneplotter
  • GenomeInfoDbData
  • GenomeInfoDb
  • GenomicAlignments
  • GenomicFeatures
  • GenomicRanges
  • GO.db
  • iRanges
  • KEGG.db
  • KEGGgraph
  • limma
  • made4
  • multtest
  • preprocessCore
  • qvalue
  • Rgraphviz
  • Rsamtools
  • rtracklayer
  • S4Vectors
  • SPIA
  • SummarizedExperiment
  • vsn
  • webbioc
  • XVector
  • zlibbioc

Parallel Computing with R

R has a number of powerful tools available to perform computations in parallel. This capability is vital for leveraging the full power of RCC's systems for your research. The R parallel computing tools currently supported by RCC include the following. Each has a link to their respective home pages.

List of Available R Parallel Computing Packages

 

Using R Parallel Computing Packages on HPC

In order to start a parallel job in R on the HPC system, first select an available Parallel Computing package.   The parallel and doParallel packages are intended for single-node, multicore computations (meaning run on one machine with multiple cores).  Other packages may become available in the future which support multi-node computations.  Once you have selected your parallel computing package, write or convert your code to utilize that package (see the appropriate link in the above section to the appropriate package documentation for more information on how to do that).  When ready to submit your job, simply create a submit script following one of the examples below and submit your job using the sbatch command.

 

The R parallel Package

If you are using the parallel or doParallel packages for R, your submit script should look something like the following:

#!/bin/bash
#SBATCH -J myRjob
#SBATCH -N 1
#SBATCH -n 4
#SBATCH -p genacc_q
#SBATCH -t 10:00:00

srun R CMD BATCH myRjob.R

When ready to submit, simply save the above submit script as something like myRjob.sh and then, from within the directory you saved your submit script, type sbatch myRjob.sh (you can rename myRjob to anything, just keep the .sh after the name).

 

Version
4.0.0 / Several Others
Software Category