Parallel IPython-Programming on HPC and Spear

This page introduces IPython (interactive Python) and the basics of running parallel Python jobs on the FSU HPC and Spear clusters.

An Overview of IPython Architecture for Parallel Computing

IPython is a command shell for interactive computing with Python. It supports interactive data visualization and use of GUI toolkits. It also provides easy to use, high performance tools for parallel computing.

The following figure demonstrates the architecture of IPython:

A parallel IPython cluster is composed of four components:

  1. client,
  2. hub,
  3. schedulers, and
  4. engines.

The hub and schedulers together are referred to as the controller. The controller works as an interface between the client and the parallel IPython engines. These components live in the IPython.parallel package and are installed with IPython.

Before you can run parallel python jobs,you need to start a controller and a set of engines. The controller has to be initialized before the engines, but it can run on a machine separate from the engines.

The ipcluster command provides a simple way of starting both the controller and engines for you:

   $ ipcluster  start --profile=profile_name  --n=num_of_engines  &

where profile_name is the name of the cluster profile (defined below), and --n option specifies the number of engines required.

The ipcluster utility has a notion of Launchers that can start controllers and engines with various remote execution schemes. Currently supported models include ssh, mpiexec, PBS-style (Torque, SGE, LSF), and Windows HPC Server. We will focus on the PBS launcher for HPC applications, since this integrates with our systems more closely.

HPC or Spear?

Since the Spear cluster is designed to facilitate interactive data analysis and visualization of large data sets, it is an ideal system on which to run IPython jobs.

When running IPython jobs on Spear, both the controller and the engines (see previous section) will be created locally on the node you are working on.

You can run IPython jobs on the HPC cluster in a parallel fashion, but there are some caveats:

  1. You have to create MOAB scripts and submit controller/engine jobs separately to the HPC queues
  2. You won't be able to start working on your iPython jobs until both the controller and engine jobs start running.

Consequently, it is much easier and faster to use the Spear cluster to run iPython jobs than the HPC cluster. If you are an experienced IPython programmer, and need more than 48 cores/engines, you may consider using HPC instead of Spear.

Running IPython on Spear

Below are the steps to run IPython in parallel on Spear.

Step 1. Connect to Spear

See our documentation for connecting to Spear. For IPython, you can simply use SSH to connect.

Step 2. Start a local IPython cluster

  $ ipcluster start --profile=default --n=12 &
  • Note. The default cluster profile profile_default will be used if no profile is provided when starting the cluster. Since we are running ipython locally, the default profile will suffice.
  • Note. You can check how many CPUs are available using the following command:

    $ lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                12
    ....
    

Step 3. Start the interactive IPython interface client:

   $ ipython --profile=default

Step 4. Run your parallel Python job interactively.

For example, try the following commands:

In [1]: from IPython import parallel

In [2]: rc = parallel.Client()

In [3]: rc.block= True

In [4]: rc.ids
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [5]: def pow(a,b):
 ...:     return a**b
...:

In [6]: rc[:].apply(pow,2,10)
Out[6]:  [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024]

In [7]: view = rc.load_balanced_view()

In [8]:  view.map(pow,[2]*12,range(12))
Out[8]: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]

In [9]: exit()

Step 5. Stop the IPython cluster.

  $ ipcluster stop --profile=default

Running IPython on HPC

Running IPython on the HPC cluster allows you to start the controller and engines on multiple nodes. It is critical to create and configure the cluster profile, and to prepare the template submit scripts for starting the controller and engines jobs.

Step 1. Create a IPython cluster profile.

Login to the HPC and run the following command at the prompt:

$ ipython profile create --parallel --profile=slurm

This will create a cluster profile called profile_slurm (the name you specified will be automatically prefixed by profile e.g., slurm → profile_slurm). The profile directory will be created at either:

  ~/.config/.ipython/profile_slurm

or

   ~/.ipython/profile_slurm

(where ~ is your home directory on the HPC file system).

The profile directory contains several important cluster configuration files; in particular,

   $ ls ~/.config/.ipython/profile_slurm/
   ipcluster_config.py
   ipcontroller_config.py
   ipengine_config.py
   ...
  • Note. The name of the profile is arbitrary. We called the profile slurm in this example because we will create a IPython cluster using the SLURM job scheduler. If you want to create a small cluster on your local host, you can call it local. You can create as many profiles as you want, and they are reusable.
  • Note. The default profile is profile_default, and will be invoked if no profile option is specified when starting an IPython cluster using ipcluster command (see Step 5 below).

Step 2. Prepare a template submit script for starting the IPython cluster controller.

To start the cluster controller, we need to create a SLURM submit script for the controller. Below is an example (slurm.controller.template):

#SBATCH -N 1
#SBATCH --ntasks-per-node=1        # run on one node
#SBATCH -t 04:00:00             # max runtime is 4 hours
#SBATCH --mem-per-cpu=2000                # run on nodes with >2 gb of memory
#SBATCH -J  ipython-controller    # name
#SBATCH -p  backfill                   # backfill queue (specify your preferred queue here)
#SBATCH --mail-type=ALL                       # send emails for job start/stop/error
#SBATCH -o ipc.controller-%J.out

ipcontroller --profile=slurm

Save the submit script in your profile directory (e.g. ~/.ipython/profile_slurm/) In the last line of the above script, ipcontroller --profile=slurm, the ipcontroller utility is used to start the controller on the node assigned by the SLURM scheduler. The profile name must match the one you created in Step #1.

Step 3. Prepare a template submit script for starting the IPython cluster engines.

We need a second SLURM submit script to provide a template submit script to start the engines. Here is one (slurm.engine.template):

#SBATCH -N {n/4}:
#SBATCH --ntasks-per-node=4
#SBATCH -t  04:00:00
#SBATCH -J ipython-engine
#SBATCH -p backfill
#SBATCH --mail-type=ALL
#SBATCH -o ipc.engine-%J.out

which ipengine  
mpiexec -np {n} ipengine --profile=slurm
  • Note. In this example, the parameter n is the number of engines you want to start. This parameter will be passed to the submit script when we create the cluster using the ipcluster command, i.e.,

    $ ipcluster start --profile=slurm --n=num_of_engines

We have specified n/4 nodes, and 4 cores for each nodes (ppn=4, you can use the number you need).

  • Note. In the last line of the above script, we run the ipengine utility of IPython in parallel using mpiexec to start n engines on the HPC cluster. The profile name must match the one you created in Step #1.
  • Note. The above script should be saved in your profile directory, say, ~/.ipython/profile_slurm/.

Step 4. Modify the configuration files in your profile directory.

Several files and directories were generated automatically when you created the cluster profile using the ipython profile create command in Step #1.

A few files need to be modified before we can use them. In this example, we need to modify a few files in the ~/.ipython/profile_slurm directory.

  1. We select the SLURM launchers for the controller and engines by adding the following two lines to ipcluster_config.py:

    c.IPClusterStart.controller_launcher_class = 'SLURMControllerLauncher'

    c.IPClusterEngines.engine_launcher_class = 'SLURMEngineSetLauncher'

  2. We also need to specify the names of our template submit scripts for the controller and the engines. For this we add the following two lines to ipcluster_config.py:

    c.SLURMEngineSetLauncher.batch_template_file = "slurm.engine.template"

    c.SLURMControllerLauncher.batch_template_file = "slurm.controller.template"

  3. Next, we need modify the ipcontroller_config.py by adding the following line

    c.HubFactory.ip = '*'

This will instruct the controller to listen for network connections on all its interfaces.

Step 5. Load the MPI module

The IPython engines are started using the mpiexec utility (refer to the slurm.engine.template). Consequently, we have to load a mpi-complier before we can start a IPython cluster. For example, to load the intel mpi compiler,

  $ module load intel
  $ module load intel-mvapich2

If you prefer GNU compiler, you can use

 $ module load gnu-openmpi 

Note. Consider adding these lines to your ~/.bashrc file, then you will not have to execute these commands every time you start a cluster.

Step 6. Start a IPython cluster.

Start an IPython cluster by issuing the following commands:

$ cd $HOME/.ipython/profile_slurm     # change to the profile directory

, and then

 $ ipcluster start --profile=slurm -n 8  &

or

 $ ipcluster start --profile=slurm --n=8  &

The ipcluster start command will submit the slurm.controller.template and slurm.engine.template to SLURM for you, and will create the controller and engines.

The ipcluster start command must be run under the your profile directory, say, ~/.ipython/profile_slurm/. Otherwise you will see an error message containing the following line

 ERROR:root:Error in periodic callback
 ....
 IOError: [Errno 2] No such file or directory: u'slurm.controller.template' 

You will see output similar to the following example if the ipcluster start command runs successfully:

$ ipcluster start --profile=slurm &
[1] 6207
[IPClusterStart] Using existing profile dir: u'/panfs/storage.local/home-4/bchen3/.ipython/profile_slurm'
[IPClusterStart] Starting ipcluster with [daemon=False]
[IPClusterStart] Creating pid file: /panfs/storage.local/home-4/bchen3/.ipython/profile_slurm/pid/ipcluster.pid
[IPClusterStart] Starting SLURMControllerLauncher: ['sbatch', u'./slurm_controller']
[IPClusterStart] Writing instantiated batch script: ./slurm_controller
[IPClusterStart] Job submitted with job id: '7742035'
[IPClusterStart] Process 'sbatch' started: '7742035'
[IPClusterStart] Starting 8 engines
[IPClusterStart] Starting 8 engines with SLURMEngineSetLauncher: ['sbatch', u'./slurm_engines']
[IPClusterStart] Writing instantiated batch script: ./slurm_engines
[IPClusterStart] Job submitted with job id: '7742036'
[IPClusterStart] Process 'sbatch' started: '7742036'
  • Note. You have to specify the profile to use, otherwise, the profile_default will be created (if does not exist yet) and invoked. You also need specify parameter n, the number of parallel engines you want to create.
  • Note. The cluster will be shut down automatically when the wall-clock limit is reached. If you finish your calculations before then, stop the cluster by hand using $ ipcluster stop --profile=slurm

Step 7. Start an IPython client on the command line.

Before you create an IPython client, make sure that both the controller and the engine jobs are running. For example you can check the status of the jobs above using squeue

$ squeue
JOBID     PARTITION       Name             User     ST       Time NODES 

653      backfill        ipython-controller  $user    PD      0:00  hpc-6-1       
654      backfill        ipython-engine      $user    PD       0:00  hpc-6-[2,3]      

(The letters PD implies that both jobs are still in the backfill queue, i.e., not running yet). You have to wait until you see things like

$ squeue
JOBID         PARTITION           Name             User     ST       Time NODES 

653      backfill        ipython-controller   $user    R       0:00  hpc-6-1       
654      backfill        ipython-engine       $user    R       0:00  hpc-6-[2,3]      

(The letter R implies that the jobs are running now).

After the controller and engines are properly started, you can start an interactive python interface by

$ ipython --profile=slurm

You will see output similar to the following:

Python 2.6.6 (r266:84292, Aug 28 2012, 10:55:56) 
Type "copyright", "credits" or "license" for more information.
IPython 0.11 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features. 
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

IPython profile: slurm

In [1]: 

Step 8. Run Parallel Python Jobs

If the command ipcluster run successfully, you will have 8 python engines ready for use. For example, try the following commands:

In [1]: from IPython import parallel

In [2]: rc = parallel.Client()

In [3]: rc.block = True

In [4]: rc.ids
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7]

In [5]: def pow(a,b):
...:     return a**b
...: 

In [6]: rc[:].apply(pow,2,10)
Out[6]: [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024]

In [7]: view = rc.load_balanced_view()

In [8]: view.map(pow,[2]*10,range(10))
Out[8]: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

In [9]: view.map(pow,[2]*16,range(16))
Out[9]: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768]

Step 9. Stop the IPython Cluster

You can stop the IPython cluster by issuing the following command

  $ ipcluster stop  --profile=slurm
  • Note. When you start and stop the IPython cluster, you have to specify the profile of the cluster.

An Example: Computing Pi Using Monte-Carlo Simulation

The following function monte_carlo_pi can be used to estimate the constant pi.

Algorithm. First, generate a random point P=(x,y) in the unit square (0<x<1 and 0<y<1) using the random number generator provided by NumPy. If the point P is within the unit circle (x * x+y * y<1), then accept it (i.e., n_shots++). We generate 'n' such points. The probably for accepting a point is pi/4 = n_shots/n. Consequently, an estimate of pi is 4*n_shots/n. The larger the n, the more accurate the estimate of pi.

from numpy import *
def monte_carlo_pi(n):
"estimate pi using monte-carlo dart-shooting"
#   n is number of darts to throw
if ( n < 1 ) : return 0

n_shots = 0;
for i in range(n):
   x = random.random()  # one point
   y = random.random()
   r = sqrt(x*x + y*y)  # distance to the origin
   if (r <= 1) : n_shots = n_shots + 1

return  4.0*float(n_shots)/float(n)
  • Note. The package NumPy must be imported before calling the function monte_carlo_pi.

Now we run the above code in parallel in a Spear node with 12 engines.

Method 1. Define the function locally and send it to the engines.

In [1]: from IPython.parallel import Client

In [2]: rc = Client()

In [3]: rc.ids
Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [4]: dv=rc[:]

In [5]: def monte_carlo_pi(n):
...:     if (n < 1) : return 0
...:     n_shots = 0
...:     for i in range(n):
...:         x = random.random()
...:         y = random.random()
...:         r = sqrt(x*x+y*y)
...:         if ( r<=1 ): n_shots = n_shots + 1
...:     return 4.0*float(n_shots)/float(n)    
...: 

 In [6]: dv.execute('from numpy import *')
Out[6]: <AsyncResult: _execute>

In [7]: ar = dv.apply_async(monte_carlo_pi, 1000000)

In [8]: ar.ready()
Out[8]: True

In [9]: pi_list = ar.get()

In [10]: print pi_list
[3.143116, 3.1435599999999999, 3.1430880000000001, 3.1429719999999999, 3.143068, 3.1420400000000002, 3.141092, 3.1412520000000002, 3.1407080000000001, 3.1407240000000001, 3.141588, 3.139068]

In [11]  print "my simulation gives pi = %f " % average(pi_list)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/lustre/home-4/bchen3/<ipython-input-10-86bcc7348329> in <module>()
----> 1 print "my simulation gives pi = %f " % average(pi_list)

NameError: name 'average' is not defined


In [12]: from numpy import *

In [13]: print "my simulation gives pi = %f " % average(pi_list) 
my simulation gives pi = 3.141856 

In [14]: ar = dv.apply_async(monte_carlo_pi, 10000000)

In [15]: ar.ready()
Out[15]: False

In [16]: ar.ready()
Out[16]: True

In [17]: pi_list = ar.get()

In [18]: print pi_list
[3.1410276000000001, 3.1421383999999999, 3.1418712000000002, 3.1418940000000002, 3.1425808000000002, 3.1413988000000002, 3.1426428, 3.1411123999999999, 3.1414312, 3.1419348, 3.1416088000000002, 3.1410212]

In [19]: print "my simulation gives pi = %f " % average(pi_list)
my simulation gives pi = 3.141722 
  • Note. Since the local IPython session and the engines are running on different machines, the package numpy has to be imported separately for the local session and for the engines. To import numpy on the engines, we used the command dv.execute('from numpy import *'), where dv is the direct view object created for the client rc.

  • Note. The function monte_carlo_pi was defined locally. To run it on the remote engines, we used the command dv.apply_async(monte_carlo_pi, 10000000)

Method 2. Define the function globally for the engines.

You can also define parallel functions using the @view.remote() decorator. For example, the following code does the same thing as above.

In [1]: from IPython.parallel import Client

In [2]: rc = Client()

In [3]: dv= rc[:]

In [4]: @dv.remote(block=True)
...: def monte_carlo_pi(n):
...:     if ( n < 1 ) : return 0
...:     n_shots = 0
...:     for i in range(n) :
...:         x = random.random()
...:         y = random.random()
...:         r = sqrt(x*x+y*y)
...:         if ( r<=1 ) : n_shots = n_shots + 1
...:     return 4.0*float(n_shots)/float(n)    
...: 

In [5]: pi_list = monte_carlo_pi(1000000)

In [6]: print pi_list
[3.1408040000000002, 3.139564, 3.1445120000000002, 3.1392880000000001, 3.1420319999999999, 3.1397680000000001, 3.1437599999999999, 3.1422560000000002, 3.1407560000000001, 3.1434319999999998, 3.1421999999999999, 3.141864]

In [7]: from numpy import *

In [8]: print "my simulation gives pi = %f " % average(pi_list) 
my simulation gives pi = 3.141686 

References

  1. Overview of IPython's architecture for parallel and distributed computing.
  2. Detailed discussion of IPython cluster controller and engines.
  3. Discussion of IPython magic commands
  4. Official IPython documentation