OpenMPI, MPICH, and/or MVAPICHΒΆ

Scyld ClusterWare distributes several versions of OpenMPI, MPICH, and MVAPICH2, and other versions are available from 3rd-party providers. Different versions of the ClusterWare packages can coexist, and users can link applications to the desired libraries and execute the appropriate binary executables using module load commands. Typically one or more of these packages are installed in the compute node images for execution, as well as on any other server where OpenMPI (and similar) applications are built.

View the available ClusterWare versions using:

yum clean all     # just to ensure you'll see the latest versions
yum list --enablerepo=scyld* | egrep "openmpi|mpich|mvapich" | egrep scyld

The OpenMPI, MPICH, and MVAPICH packages are named by their major-minor version numbers, e.g., 4.0, 4.1, and each has one or more available major-minor "point" releases, e.g., openmpi4.1-4.1.1 and openmpi4.1-4.1.4.

A simple yum install will install the latest "point" release for the specified major-minor version, e.g.:

sudo yum install openmpi4.1 --enablerepo=scyld*

installs the default GNU libraries, binary executables, buildable source code for various example programs, and man pages for openmpi4.1-4.1.4. The openmpi4.1-gnu packages are equivalent to openmpi4.1.

Alternatively or additionally:

sudo yum install openmpi4.1-intel --enablerepo=scyld*

installs those same packages built using the Intel oneAPI compiler suite. These compiler-specific packages can co-exist with the base GNU package. Similarly you can additionally install openmpi4.1-nvhpc for libraries and executables built using the Nvidia HPC SDK suite, and openmpi-aocc for libraries and executables built using AMD Optimizing C/C++ and Fortran Compilers. Additionally openmpi4.1-hpcx_cuda-${compiler} rpms sets are built against Nvidia HPC-X and cuda software packages and with gnu, intel, nvhpc and aocc compilers.

Note

ClusterWare provides openmpi packages that are built with third party software and compilers with best effort. However if an openmpi rpm of a certain combination of compiler, software, OpenMPI version and distro is missing, that is because that combination failed to build or package failed to run. Also, the third party software and compilers that are needed for those OpenMPI packages must be installed in addition to clusterware installation.

Important

The ClusterWare yum repo includes various versions of openmpi* RPMs which were built with different sets of options by different compilers, each potentially having requirements for specific other 3rd-party packages. In general, avoid installing openmpi RPMs using a wildcard such as openmpi4*scyld and instead carefully install only specific RPMs from the ClusterWare yum repo together with their specific required 3rd-party packages.

Suppose openmpi4.1-4.1.1 is installed and you see a newer "point" release openmpi4.1-4.1.4 in the repo. If you do:

sudo yum update openmpi4.1 --enablerepo=scyld*

then 4.1.1 updates to 4.1.4 and removes 4.1.1. Suppose for some reason you want to retain 4.1.1, install the newer 4.1.4, and have both "point" releases coexist. For that you need to download the 4.1.4 RPMs and install (not update) them using rpm, e.g.,

sudo rpm -iv openmpi4.1-4.1.4*

You can add OpenMPI (et al) environment variables to a user's ~/.bash_profile or ~/.bashrc file, e.g., add module load openmpi/intel/4.1.4 to default a simple OpenMPI command to use a particular release and compiler suite. Commonly a cluster uses shared storage of some kind for /home directories, so changes made by the cluster administrator or by an individual user are transparently reflected across all nodes that access that same shared /home storage.

For OpenMPI, consistent user uid/gid and passphrase-less key-based access is required for a multi-threaded application to communicate between threads executing on different nodes using ssh as a transport mechanism. The administrator can inject all users, or a selected list of users, or a single user into the compute node image using the sync-uids script. See Configure Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To use OpenMPI (et al) without installing either torque-scyld or slurm-scyld, then you must configure the firewall that manages the private cluster network between the head node(s), server node(s), and compute nodes. See Firewall Configuration for details.