TotalView

Software Category
Version
8.13.0, 2018.2.6, 2019.2.12, 2021.3.9

TotalView

Documentation Update in Progress

Introduction

TotalView is a dynamic debugging tool for source code and memory for C, C++, and Fortran applications. It includes memory debugging and analysis, support for the Xeon Phi coprocessor, and OpenACC / CUDA debugging capabilities. This documentation focuses on using TotalView's parallel debugging capabilities. The RCC has a 64 token license for TotalView.

Using TotalView on the HPC

A Note about TotalView 2018.2.6 and newer

Please note that the new version may not always display the source code when the program is called from the command line using totalview srun -a -n 16 -ppdebug MYPARALLELCODE, where MYPARALLELCODE is the name of your program. In order to get the source code to appear, a debugging session must be started manually from within the UI itself by going to File > Debug a Parallel Program. Select OpenMPI or MPICH from the drop-down list under "Parallel System" and browse for the executable under "File Name." The RCC team is aware of this issue and is working to resolve it.

Preparing the program

Compile your C/C++/Fortran program with the -g option for debugging. For example, a Fortran program named mat_mul_par.f90 should be compiled as

$ mpif90 -g -o mat_mul_par mat_mul_par.f90

Interactively debugging a program with TotalView

The TotalView GUI needs x11 forwarding and interactive access to the machine(s) the program to be debugged is run on. Therefore, you need to request an interactive computing session from the Slurm resource manager. Below is an example of a request.

$ srun  --pty -t 30:00 -n 4  -p genacc_q /bin/bash 

 The --pty option requests an interactive session so that the TotalView GUI can start. The -p option specifies the queue name. The walltime and number of nodes should depend on the job. Note that requesting too many nodes and/or too much walltime may greatly increase your wait time (especially in the general access queue). This is not a good idea, as you have to wait until the interactive job starts to use TotalView. Therefore, select a minimum amount of resources.

When your interactive session has started (i.e., when you get your terminal prompt back), you are ready to run your program and do some debugging. First, load the appropriate module for the TotalView version you would like to use.

# Available versions: 
$ module load totalview/8.13
$ module load totalview/2018.2.6
$ module load totalview/2019.2.12
$ module load totalview/2021.3.9

Then, change to the directory where you compiled your program and load the same MPI module used to compile it. An example session using the code dijkstra_openmp.c is below. You can download the code here

The -n option specifies the number of workers. Note that the number of workers should be less than or equal to the number of processes requested when you start the session. You may need to run the command export OMP_NUM_THREADS=4 (or any number from 2 to 24) to get more processes to show up in the GUI's Processes & Threads tab.

$ module load gnu openmpi
$ mpif90 -g -o mat_mul_par mat_mul_par.f90
$ totalview srun -a -n 16 -p genacc_q cleamat_mul_par

# alternative example program:
$ gcc -g -fopenmp dijkstra_openmp.c -o dijkstra_openmp
$ totalview dijkstra_openmp

This will open three windows, seen below.
https://acct.rcc.fsu.edu/files/rccimp/totalview_1.jpgSelect OK in the Startup Parameters window. If you are going to do memory debugging or need to record the program state while running, check the relevant boxes. You can ignore the messages in the terminal window. Click the Go button in the Process window to start running your program. Click OK when the following message box appears:

https://acct.rcc.fsu.edu/files/rccimp/totalview_2.jpg

Your program will start running and you will see details about the workers (ID and Rank information) appear on the root window.
https://acct.rcc.fsu.edu/files/rccimp/totalview_3.5.jpg

You may notice that TotalView is displaying information about mpirun itself. Click on main in the Stack Trace window to reveal your program.
https://acct.rcc.fsu.edu/files/rccimp/totalview_4.jpg

You can add breakpoints to your code by clicking on any of the line numbers contained in the box. You can also drill down to your other functions by right-clicking on a function and selecting dive. In the example below, two breakpoints are inserted before and after the MPI_BCAST. After clicking Go, the execution will pause at the breakpoint and you can check variable values by hovering your mouse over any variable names or right-clicking and choosing dive. Arrays will be opened in new windows. You can view the value of a variable across all the processes by right-clicking and selecting Across Processes.
https://acct.rcc.fsu.edu/files/rccimp/totalview_6.jpg

https://acct.rcc.fsu.edu/files/rccimp/totalview_7.jpg

The Process window shows the master node. You can view other processes by selecting a different a process in the root window and right-clicking and choosing  Dive in new window. A separate window will open as shown above.

TotalView allows you to view the run-time state of your MPI program's message passing by selecting Process Window >Tools Menu > Message Queue. However, only pending or unexpected messages are shown in this window. A graphical representation of this information can be obtained from Process Window >Tools Menu > Message Queue Graph.

For more information about TotalView, please refer to the official documentation for the version you are using: 2021.3.9, 2019.2.12, 2018.2.6, and 8.13.0.