IMB Benchmark Example

The Intel MPI Benchmark provides a tool for testing the parallel execution speed of one to many systems.

Install IMB

Installing IMB is outside of the scope of this documentation. Further information can be found by following installation as described in the Intel documentation. It requires Intel MPI to be installed which can be obtained by registering a system with Intel.

Run Benchmark

To use 64 CPU cores on a SLURM HPC environment for a single application, the instruction --ntasks=64 can be used.

The following example shows launching the Intel Message-passing MPI benchmark across 64 cores on your research environment. This application is launched via the OpenMPI mpirun command - the number of threads and list of hosts are automatically assembled by the scheduler and passed to the MPI at runtime. This job script loads the apps/imb module before launching the application, which automatically loads the module for OpenMPI.

#!/bin/bash -l
#SBATCH -n 64
#SBATCH --job-name=imb
#SBATCH -D $HOME/
#SBATCH --output=imb.out.%j
module load apps/imb
mpirun --prefix $MPI_HOME \
       IMB-MPI1

We can then submit the IMB job script to the scheduler, which will automatically determine which nodes to use:

[flight@chead1 (mycluster1) ~]$ sbatch imb.sh
Submitted batch job 1162
[flight@chead1 (mycluster1) ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                           1162       all      imb    centos  R       0:01      8 ip-10-75-1-[42,45,62,67,105,178,233,250]

View Results

The results for the benchmark are available in the SLURM output file.

[flight@chead1 (mycluster1) ~]$ cat imb.out.1162
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.0, MPI-1 part
#------------------------------------------------------------
# Date                  : Tue Aug 30 10:34:08 2016
# Machine               : x86_64
# System                : Linux
# Release               : 3.10.0-327.28.3.el7.x86_64
# Version               : #1 SMP Thu Aug 18 19:05:49 UTC 2016
# MPI Version           : 3.0
# MPI Thread Environment:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 62 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         3.17         0.00
            1         1000         3.20         0.30
            2         1000         3.18         0.60
            4         1000         3.19         1.19
            8         1000         3.26         2.34
           16         1000         3.22         4.74
           32         1000         3.22         9.47
           64         1000         3.21        19.04
          128         1000         3.22        37.92
          256         1000         3.30        73.90
          512         1000         3.41       143.15
         1024         1000         3.55       275.36
         2048         1000         3.75       521.04
         4096         1000        10.09       387.14
         8192         1000        11.12       702.51
        16384         1000        12.06      1296.04
        32768         1000        14.65      2133.32
        65536          640        19.30      3238.72
       131072          320        29.50      4236.83
       262144          160        48.17      5189.77
       524288           80        84.36      5926.88
      1048576           40       157.40      6353.32
      2097152           20       305.00      6557.31
      4194304           10       675.20      5924.16