Common Additional Configuration and Software¶

Following a successful initial install or update of Scyld ClusterWare, or as local requirements of your cluster dictate, you may need to make one or more configuration changes.

Configure Hostname¶

Verify that the head node hostname has been set as desired for permanent, unique identification across the network. In particular, ensure that the hostname is not localhost or localhost.localdomain.

Choosing An Alternate Database¶

Two backend databases are available: etcd and Couchbase. The current default is etcd. For a production cluster, the backend database should be chosen at the time of the first install, or simply allow the default database to prevail.

Database Differences¶

On head nodes with multiple IP addresses the current ClusterWare etcd implementation has no way to identify the correct network for communicating with other head nodes. By default the system will attempt to use the first non-local IP. Although this is adequate for single head clusters and simple multihead configurations, a cluster administrator setting up a multihead cluster should specify the correct IP. This is done by setting the etcd.peer_url option in the /opt/scyld/clusterware/conf/base.ini file. A correct peer URL on a head node with the IP address of 10.24.1.1 where the 10.24.1.0/24 network should be used for inter-head communications might look like:

etcd.peer_url = http://10.24.1.1:52380

If this value needs to be set or changed on an existing cluster, it should be updated on a single head node, then managedb recover run on that head node, and then other heads (re-)joined to the now correctly configured one. The etcd.peer_url setting should only be necessary on the first head as the proper network will be communicated to new heads during the join process.

Unlike Couchbase, the ClusterWare etcd implementation does not allow the second-to-last head node in a multihead cluster to leave or be ejected. Instead a cluster administrator can run the new managedb recover command on the remaining head node for the same effect. See Removing a Joined Head Node for details, and Managing Multiple Head Nodes for broader information about multiple headnode management.

Important

Prior to any manipulation of the distributed database, whether through managedb recover, joining head nodes to a cluster, removing head nodes from a cluster, or switching between database backends, the administrator is strongly encouraged to make a backup of the ClusterWare database using the managedb tool. See managedb in the Reference Guide.

The firewall requirements for etcd are much simpler, as only a single port needs to be opened between head nodes, whereas Couchbase requires six.

No etcd alternative to the Couchbase console exists, although the etcdctl command provides scriptable direct document querying and manipulation. ClusterWare provides a wrapped version of etcdctl located in the /opt/scyld/clusterware-etcd/bin/ directory. The wrapper should be run as root and automatically applies the correct credentials and connects to the local etcd endpoint. Note that direct manipulation of database JSON documents should only be done when directed by Penguin support.

Switching Between Databases¶

The clusterware-provided headctl tool (found in /opt/scyld/clusterware/bin/) includes arguments to toggle between databases. We suggest taking a snapshot of the virtual machine before performing these operations, and preferably making a database backup with managedb as well.

For example, switching the database on a single head node cluster:

sudo /opt/scyld/clusterware/bin/headctl --use-<DATABASE>

where <DATABASE> is either etcd or couchbase, which performs a series of steps:

Install the clusterware-<DATABASE> package, if not already installed.
Stop the clusterware service.
Export the database to a temporary file.
Toggle the base.ini plugins.database option.
Purge any existing database.
Load the exported database from the temporary file.
Update the firewall for the appropriate <DATABASE>.
Restart the clusterware service.

Once these steps complete, the head node will resume normal operations.

For a multihead cluster this should be performed on each head node in turn using headctl as described above.

For example, to switch from couchbase to etcd:

sudo /opt/scyld/clusterware/bin/headctl --use-etcd

As a side effect, this command will cause each head node to separate from the cluster, triggering a couchbase rebalance that should complete before converting the next head node. Once all the now-independent head nodes are running the etcd database, pick one, ideally the first one converted, and join the others to it. Whenever joining head nodes to the cluster, restarting the clusterware service on all head nodes after the final join is a good idea:

sudo systemctl restart clusterware

You can then confirm the head nodes are again working together by executing on each head node:

scyld-nodectl status
sudo /opt/scyld/clusterware/bin/managedb --heads

and verify that scyld-nodectl and managedb agree.

When you sure that everything is working as expected, then if switching from Couchbase to etcd, on each head node remove the clusterware-couchbase package and delete /opt/couchbase. If switching from etcd to Couchbase, then remove the clusterware-etcd package.

This ability to toggle between databases is intended for administrators interested in experimenting with the alternate database or when troubleshooting problems. For production clusters the database should be selected at installation time and currently defaults to etcd.

Couchbase auto-failover¶

After the cluster administrator finishes configuring a cluster containing multiple head nodes and Couchbase, then you should enable auto-failover. See the Configuring Support for Database Failover for details.

Configure Authentication¶

ClusterWare administrator authentication is designed to easily integrate with already deployed authentication systems via PAM. By default cluster administrators are authenticated through the pam_authenticator tool that in turn uses the PAM configuration found in /etc/pam.d/cw_check_user. In this configuration, administrators can authenticate using their operating system password as long as they have been added to the ClusterWare system using the scyld-adminctl command. For example, to add username "admin1":

scyld-adminctl create name=admin1

If a ClusterWare administrator is running commands from a system account on the head node by the same name (i.e. ClusterWare administrator fred is also head node user fred), the system will confirm their identity via a Unix socket based protocol. Enabled by default, this mechanism allows the scyld tools to connect to a local socket to securely set a dynamically generated one-time password that is then accepted during their next authentication attempt. This takes place transparently, allowing the administrator to run commands without providing their password. The client code also caches an authentication cookie in the user's .scyldcw/auth_tkt.cookie for subsequent authentication requests.

Managing cluster user accounts is generally outside the scope of ClusterWare and should be handled by configuring the compute node images appropriately for your environment. In large organizations this usually means connecting to Active Directory, LDAP, or any other mechanism supported by your chosen compute node operating system. In simpler environments where no external source of user identification is available or it is not accessible, ClusterWare provides a sync-uids tool. This program can be found in the /opt/scyld/clusterware-tools/bin directory and can be used to push local user accounts and groups either to compute nodes or into a specified image. For example:

# push uids and their primary uid-specific groups:
sync-uids --users admin1,tester --image SlurmImage

# push uid with an additional group:
sync-uids --users admin1 --groups admins --image SlurmImage

The above pushes the users and groups into the compute node image for persistence across reboots. Then either reboot the node(s) to see these changes, or push the IDs into running nodes with:

sync-uids --users admin1,tester --nodes n[1-10]

The tool generates a shell script that is then executed on the compute nodes or within the image chroot to replicate the user and group identifiers on the target system. This tool can also be used to push ssh keys into the authorized_keys files for a user onto booted compute nodes or into a specified image. Please see the tool's --help output for more details and additional functionality, such as removing users or groups, and controlling whether home directories are created for injected user accounts.

Disable/Enable Chain Booting¶

The default ClusterWare behavior is to perform chain booting for more efficient concurrency for servicing a flood of PXEbooting nodes that are requesting their large rootfs file. Without chain booting, the head node(s) serve the rootfs file for all PXEbooting nodes and thus become a likely bottleneck when hundreds of nodes are concurrently requesting their file. With chain booting, the head node(s) serve the rootfs files to the first compute node requesters, then those provisioned compute nodes offer to serve as a temporary rootfs file server for other requesters.

In the event that the cluster administrator wishes to disable chain booting, then the cluster administrator executing as user root should edit the file /opt/scyld/clusterware/conf/base.ini to add the line:

chaining.enable = False

To reenable chain booting, either change that False to True, or simply comment-out that chaining.enable line to revert back to the default enabled state.

Installing Optional ClusterWare Software¶

scyld-install installs and updates the basic ClusterWare software. Additional software packages are available in the ClusterWare repository.

scyld-install manipulates the /etc/yum.repos.d/clusterware.repo file to automatically enable the scyld repos when the tool executes and disable the repos when finished. This is done to avoid inadvertent updating of ClusterWare packages when executing a simple yum update.

Note

If the cluster administrator has created multiple /etc/yum.repos.d/*.repo files that specify repos containing ClusterWare RPMs, then this protection against inadvertent updating is performed only for /etc/yum.repos.d/clusterware.repo, not for those additional repo files.

Accordingly, the --enablerepo=scyld* argument is required when using yum for listing, installing, and updating these optional ClusterWare packages on a head node. For example, these optional installable software packages can be viewed using yum list --enablerepo=scyld* | grep scyld. After installation, any available updates can be viewed using yum check-update --enablerepo=scyld* | grep scyld.

Specific install and configuration instructions for various of these packages, e.g., job managers and OpenMPI middleware, are detailed in this chapter.

Job Schedulers¶

The default Scyld ClusterWare installation for RHEL/CentOS 7 includes support for the optional job scheduler packages Slurm and PBS TORQUE, and for RHEL/CentOS 8 includes support for the optional packages Slurm and OpenPBS. These optional packages can coexist on a scheduler server, which may or may not be a ClusterWare head node. However, if job schedulers are installed on the same server, then only one at a time should be enabled and executing on that given server.

All nodes in the job scheduler cluster must be able to resolve hostnames of all other nodes as well as the scheduler server hostname. ClusterWare provides a DNS server in the clusterware-dnsmasq package, as discussed in Node Name Resolution. This dnsmasq will resolve all compute node hostnames, and the job scheduler's hostname should be added to /etc/hosts on the head node(s) in order to be resolved by dnsmasq. Whenever /etc/hosts is edited, please restart the clusterware-dnsmasq service with:

sudo systemctl restart clusterware-dnsmasq

Installing and configuring a job scheduler requires making changes to the compute node software. When using image-based compute nodes, we suggest first cloning the DefaultImage or creating a new image, leaving untouched the DefaultImage as a basic known-functional pristine image.

For example, to set up nodes n0 through n3, you might first do:

scyld-imgctl -i DefaultImage clone name=jobschedImage
scyld-bootctl -i DefaultBoot clone name=jobschedBoot image=jobschedImage
scyld-nodectl -i n[0-3] set _boot_config=jobschedBoot

When these nodes reboot after all the setup steps are complete, they will use the jobschedBoot and jobschedImage.

See https://slurm.schedmd.com/rosetta.pdf for a discussion of the differences between PBS TORQUE and Slurm. See https://slurm.schedmd.com/faq.html#torque for useful information about how to transition from OpenPBS or PBS TORQUE to Slurm.

The following sections describe the installation and configuration of each job scheduler type.

Slurm¶

See Job Schedulers for general job scheduler information and configuration guidelines. See https://slurm.schedmd.com for Slurm documentation.

First install Slurm software on the job scheduler server:

sudo yum install slurm-scyld --enablerepo=scyld*

Important

For RHEL/CentOS 8, install Slurm with an additional argument: sudo yum install slurm-scyld --enablerepo=scyld* --enablerepo=PowerTools

Now use a helper script slurm-scyld.setup to complete the initialization and setup the job scheduler and config file in the compute node image(s).

Note

The slurm-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups and Dynamic Groups for details.

slurm-scyld.setup init                        # default to all 'up' nodes
slurm-scyld.setup update-image slurmImage     # for permanence in the image

Reboot the compute nodes to bring them into active management by Slurm. Check the Slurm status:

slurm-scyld.setup status

# If the Slurm daemon and munge are not both executing, then:
slurm-scyld.setup cluster-restart

# And check the status again

This cluster-restart is a manual one-time setup that doesn't affect the slurmImage. The update-image is necessary for persistance across compute node reboots.

Generate new slurm-specific config files with:

slurm-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

slurm-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by directly editing the /etc/slurm/slurm.conf config file. Any such changes must be added to slurmImage by reexecuting:

slurm-scyld.setup update-image slurmImage

and then either reboot all the compute nodes with that updated image, or additional execute:

slurm-scyld.setup cluster-restart

to manually push the changes to the up nodes without requiring a reboot.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i slurmImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the Slurm status on the server and compute nodes:

slurm-scyld.setup status

The Slurm service can also be started and stopped cluster-wide with:

slurm-scyld.setup cluster-stop
slurm-scyld.setup cluster-start

Slurm executable commands and libraries are installed in /opt/scyld/slurm/. Each Slurm user must set up the PATH and LD_LIBRARY_PATH environment variables to properly access the Slurm commands. This is done automatically for users who login when Slurm is running via the /etc/profile.d/scyld.slurm.sh script. Alternatively, each Slurm user can manually execute module load slurm or can add that command line to (for example) the user's ~/.bash_profile or ~/.bashrc.

PBS TORQUE¶

PBS TORQUE is only available for RHEL/CentOS 7 clusters. See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.adaptivecomputing.com/support/documentation-index/torque-resource-manager-documentation for PBS TORQUE documentation.

First install PBS TORQUE software on the job scheduler server:

sudo yum install torque-scyld --enablerepo=scyld*

Now use a helper script torque-scyld.setup to complete the initialization and setup the job scheduler and config file in the compute node image(s).

Note

The torque-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups and Dynamic Groups for details.

torque-scyld.setup init                       # default to all 'up' nodes
torque-scyld.setup update-image torqueImage   # for permanence in the image

Reboot the compute nodes to bring them into active management by TORQUE. Check the TORQUE status:

torque-scyld.setup status

# If the TORQUE daemon is not executing, then:
torque-scyld.setup cluster-restart

# And check the status again

This cluster-restart is a manual one-time setup that doesn't affect the torqueImage. The update-image is necessary for persistance across compute node reboots.

Generate new torque-specific config files with:

torque-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

torque-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by directly editing the /var/spool/torque/server_priv/nodes config file. Any such changes must be added to torqueImage by reexecuting:

torque-scyld.setup update-image slurmImage

and then either reboot all the compute nodes with that updated image, or additional execute:

torque-scyld.setup cluster-restart

to manually push the changes to the up nodes without requiring a reboot.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i torqueImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the TORQUE status on the server and compute nodes:

torque-scyld.setup status

The TORQUE service can also be started and stopped cluster-wide with:

torque-scyld.setup cluster-stop
torque-scyld.setup cluster-start

TORQUE executable commands are installed in /usr/sbin/ and /usr/bin/, TORQUE libraries are installed in /usr/lib64/, and are therefore accessible by the default search rules.

OpenPBS¶

OpenPBS is only avaiable for RHEL/CentOS 8 clusters.

See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.openpbs.org for OpenPBS documentation.

First install OpenPBS software on the job scheduler server:

sudo yum install openpbs-scyld --enablerepo=scyld*

Use a helper script to complete the initialization and setup the job scheduler and config file in the compute node image(s).

Note

The openpbs-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups and Dynamic Groups for details.

openpbs-scyld.setup init                      # default to all 'up' nodes
openpbs-scyld.setup update-image openpbsImage # for permanence in the image

Reboot the compute nodes to bring them into active management by OpenPBS. Check the OpenPBS status:

openpbs-scyld.setup status

# If the OpenPBS daemon is not executing, then:
openpbs-scyld.setup cluster-restart

# And check the status again

This cluster-restart is a manual one-time setup that doesn't affect the openpbsImage. The update-image is necessary for persistance across compute node reboots.

Generate new openpbs-specific config files with:

openpbs-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

openpbs-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by executing qmgr.

Any such changes must be added to openpbsImage by reexecuting:

openpbs-scyld.setup update-image openpbsImage

and then either reboot all the compute nodes with that updated image, or additional execute:

openpbs-scyld.setup cluster-restart

to manually push the changes to the up nodes without requiring a reboot.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i openpbsImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the OpenPBS status on the server and compute nodes:

openpbs-scyld.setup status

The OpenPBS service can also be started and stopped cluster-wide with:

openpbs-scyld.setup cluster-stop
openpbs-scyld.setup cluster-start

OpenPBS executable commands and libraries are installed in /opt/scyld/openpbs/. Each OpenPBS user must set up the PATH and LD_LIBRARY_PATH environment variables to properly access the OpenPBS commands. This is done automatically for users who login when OpenPBS is running via the /etc/profile.d/scyld.openpbs.sh script. Alternatively, each OpenPBS user can manually execute module load openpbs or can add that command line to (for example) the user's ~/.bash_profile or ~/.bashrc.

Kubernetes¶

ClusterWare administrators wanting to use Kubernetes as a container orchestration layer across their cluster can either choose to install Kubernetes manually following directions found online, or use scripts provided by the clusterware-kubeadm package. To use these scripts first install the clusterware-kubeadm package on a server that is a Scyld ClusterWare head node, a locally installed ClusterWare compute node, or a separate non-ClusterWare server. Installing the control plane an RAM-booted or otherwise ephemeral compute node is discouraged.

The provided scripts are based on the kubeadm tool and inherit both the benefits and limitations of that tool. If you prefer to use a different tool to install Kubernetes please follow appropriate directions available online from your chosen Kubernetes provider. The clusterware-kubeadm package is mandatory, and the clusterware-tools package is recommended:

sudo yum --enablerepo=scyld* install clusterware-kubeadm clusterware-tools

Important

For a server to function as a Kubernetes control plane, SELinux must be disabled (verify with getenforce) and swap must be turned off (verify with swapon -s, disable with swapoff -a -v).

After installing the software, as a cluster administrator execute the scyld-kube tool to initialize the Kubernetes control plane. To initialize on a local server:

scyld-kube --init

Or to initialize on an existing booted ClusterWare compute node (e.g., node n0):

scyld-kube --init -i n0

Note that a ClusterWare cluster can have multiple control planes and can use multiple control planes in a Kubernetes High Availability (HA) configuration. See Appendix: Using Kubernetes for detailed examples.

You can validate this initialization by executing:

kubectl get nodes

which should show the newly initialized control plane server.

Next, join one or more booted ClusterWare nodes (e.g., nodes n[1-3]) as worker nodes of this Kubernetes cluster. The full command syntax accomplishes this by explicitly identifying the control plane node by its IP address:

scyld-kube -i n[1-3] --join --cluster <CONTROL_PLANE_IP_ADDR>

However, if the control plane node is a ClusterWare compute node, then the scyld-kube --init process defined Kube-specific attributes and a simpler syntax suffices:

scyld-kube -i n[1-3] --join

The simpler join command can find the control plane node without needing to be told its IP address as long as there is only one compute node that functioning as a Kubernetes control plane.

Note that scyld-kube --join also accepts admin-defined group names, e.g., for a collection of nodes joined to the kube_workers group:

scyld-kube -i %kube_workers --join --cluster <CONTROL_PLANE_IP_ADDR>

See Attribute Groups and Dynamic Groups for details.

For persistence across compute node reboots, modify a node image (e.g., kubeimg), that is used by Kubernetes worker nodes so that these nodes auto-join when booted. If multiple control planes are present optionally specify the control plane by IP address:

scyld-kube --image kubeimg --join
    or
scyld-kube --image kubeimg --join --cluster CONTROL_PLANE_IP_ADDR

After rebooting these worker nodes, you can check Kubernetes status again on the control plane node and should now see the joined worker nodes:

kubectl get nodes

You can test Kubernetes by executing a simple job that calculates pi:

kubectl apply -f https://kubernetes.io/examples/controllers/job.yaml

(ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/)

See Appendix: Using Kubernetes for detailed examples.

scyld-nss Name Service Switch (NSS) Tool¶

The optional package scyld-nss provides a Name Service Switch (NSS) tool that translates a hostname to its IP address or an IP address to its hostname(s), as specified in the /etc/scyld-nss-cluster.conf configuration file. These hostnames and their IP addresses (e.g., for compute nodes and switches) are those managed by the ClusterWare database, which automatically provides that configuration file at startup and thereafter if and when the cluster configuration changes.

Note

scyld-nss is currently only supported on head nodes.

Installing scyld-nss inserts the scyld function in the /etc/nsswitch.conf hosts line, and installs the symlink /lib64/libnss_scyld.so.2 and library /lib64/libnss_scyld-1.0.so to functionally integrate with the other NSS tools.

Benefits include an expanded functionality of ClusterWare hostname resolution and increased performance of NSS queries for those hostnames. Install the nscd package for additional significant performance improvements on clusters with very high node counts.

The scyld-nss package includes a scyld-nssctl tool allowing a cluster administrator to manually stop or start the service by removing or reinserting the scyld function in /etc/nsswitch.conf. Any user can employ scyld-nssctl to query the current status of the service. See scyld-nssctl for details.

Firewall Configuration¶

If you are not using the torque-scyld or slurm-scyld packages, either of which will transparently configure the firewall on the private cluster interface between the head node(s), job scheduler servers, and compute nodes, then you need to configure the firewall manually for both the head node(s) and all compute nodes.

Install OpenMPI, MPICH, and/or MVAPICH¶

Scyld ClusterWare distributes several versions of OpenMPI, MPICH, and MVAPICH, and other versions are available from 3rd-party providers. Different versions of the ClusterWare packages can coexist, and users can link applications to the desired libraries and execute the appropriate binary executables using module load commands. Typically one or more of these packages are installed in the compute node images for execution, as well as on any other server where OpenMPI (and similar) applications are built.

View the available ClusterWare versions using:

yum clean all     # just to ensure you'll see the latest versions
yum list --enablerepo=scyld* | egrep "openmpi|mpich|mvapich" | egrep scyld

The OpenMPI, MPICH, and MVAPICH packages are named by their major-minor version numbers, e.g., 3.0, 4.0, 4.1, and each has one or more available major-minor "point" releases, e.g., openmpi4.1-4.1.1 and openmpi4.1-4.1.4.

A simple yum install will install the latest "point" release for the specified major-minor version, e.g.:

sudo yum install openmpi4.1 --enablerepo=scyld*

installs the default GNU libraries, binary executables, buildable source code for various example programs, and man pages for openmpi4.1-4.1.4. The openmpi4.1-gnu packages are equivalent to openmpi4.1.

Alternatively or additionally:

sudo yum install openmpi4.1-intel --enablerepo=scyld*

installs those same packages built using the Intel compiler suite. These compiler-specific packages can co-exist with the base GNU package. Similarly you can additionally install openmpi4.1-pgi for libraries and executables built using the PGI compiler suite.

The openmpi*-psm2 packages are intended for use with QLogic Infiniband controllers.

Important

To install openmpi*psm2 packages for RHEL/CentOS 8 and beyond, you must additionally enable the PowerTools repo, e.g., sudo yum install openmpi4.1-psm2 --enablerepo=scyld* --enablerepo=PowerTools

Important

The ClusterWare yum repo includes various versions of openmpi* RPMs which were built with different sets of options by different compilers, each potentially having requirements for specific other 3rd-party packages. In general, avoid installing openmpi RPMs using a wildcard such as openmpi4*scyld and instead carefully install only specific RPMs from the ClusterWare yum repo together with their specific required 3rd-party packages.

Suppose openmpi4.1-4.1.1 is installed and you see a newer "point" release openmpi4.1-4.1.4 in the repo. If you do:

sudo yum update openmpi4.1 --enablerepo=scyld*

then 4.1.1 updates to 4.1.4 and removes 4.1.1. Suppose for some reason you want to retain 4.1.1, install the newer 4.1.4, and have both "point" releases coexist. For that you need to download the 4.1.4 RPMs and install (not update) them using rpm, e.g.,

sudo rpm -iv openmpi4.1-4.1.4*

You can add OpenMPI (et al) environment variables to a user's ~/.bash_profile or ~/.bashrc file, e.g., add module load openmpi/intel/4.1.4 to default a simple OpenMPI command to use a particular release and compiler suite. Commonly a cluster uses shared storage of some kind for /home directories, so changes made by the cluster administrator or by an individual user are transparently reflected across all nodes that access that same shared /home storage.

For OpenMPI, consistent user uid/gid and passphrase-less key-based access is required for a multi-threaded application to communicate between threads executing on different nodes using ssh as a transport mechanism.

For example, user root can set up access for each given username $user and target node $node:

# This script must execute as 'root'.
# Here we set up user "user1" on node n0:
user=user1
node=n0

user_uid=`id -u $user`
user_gid=`id -g $user`
user_gname=`id -gn $user`
ssh $node groupadd -g $user_gid $user_gname
ssh $node useradd -u $user_uid -g $user_gid $user
# copy ssh key .bashrc
ssh $node mkdir -p -m 700 /home/$user/.ssh >/dev/null
scp /home/$user/.ssh/id_rsa.pub $node:/home/$user/.ssh/authorized_keys >/dev/null
ssh $node chmod 600 /home/$user/.ssh/authorized_keys
scp /home/$user/.bashrc $node:/home/$user/ >/dev/null
ssh $node chown -R $user_gid:$user_uid /home/$user/.ssh
ssh $node chown $user_gid:$user_uid /home/$user

To use OpenMPI (et al) without installing either torque-scyld or slurm-scyld, then you must configure the firewall that manages the private cluster network between the head node(s), server node(s), and compute nodes. See Firewall Configuration for details.

Configure IP Forwarding¶

By default, the head node does not allow IP forwarding from compute nodes on the private cluster network to external IP addresses on the public network. If IP forwarding is desired, then it must be enabled and allowed through each head node's firewalld configuration.

On a head node, to forward internal compute node traffic through the <PUBLIC_IF> interface to the outside world, execute:

firewall-cmd --zone=public --remove-interface=<PUBLIC_IF>
firewall-cmd --zone=external --add-interface=<PUBLIC_IF>
# confirm it was working at this point then make it permanent
firewall-cmd --permanent --zone=public --remove-interface=<PUBLIC_IF>
firewall-cmd --permanent --zone=external --add-interface=<PUBLIC_IF>

Appropriate routing for compute nodes can be modified in the compute node image(s) (see scyld-modimg tool in the Reference Guide). Limited changes may also require modifying the DHCP configuration template /opt/scyld/clusterware-iscdhcp/dhcpd.conf.template.

Install Additional Tools¶

Cluster administrators may wish to install additional software tools to assist in managing the cluster.

Name Service Cache Daemon (nscd)

The Name Service Cache Daemon (nscd) provides a cache for most common name service requests. The performance impact for very large clusters is significant.

/usr/bin/jq

The jq tool can be downloaded from the Red Hat EPEL yum repository. It provides a command-line parser for JSON output.

For example, for the --long status of node n0:

[sysadmin@head1 /]$ scyld-nodectl -i n0 ls --long
Nodes
  n0
    attributes
      _boot_config: DefaultBoot
      _no_boot: 0
      last_modified: 2019-06-05 23:44:48 UTC (8 days, 17:09:55 ago)
    groups: []
    hardware
      cpu_arch: x86_64
      cpu_count: 2
      cpu_model: Intel Core Processor (Broadwell)
      last_modified: 2019-06-06 17:15:59 UTC (7 days, 23:38:45 ago)
      mac: 52:54:00:a6:f3:3c
      ram_total: 8174152
    index: 0
    ip: 10.54.60.0
    last_modified: 2019-06-14 16:54:39 UTC (0:00:04 ago)
    mac: 52:54:00:a6:f3:3c
    name: n0
    power_uri: none
    type: compute
    uid: f7c2129860ec40c7a397d78bba51179a

You can use jq to parse the JSON output to extract specific fields:

[sysadmin@head1 /]$ scyld-nodectl --json -i n0 ls -l | jq '.n0.mac'
"52:54:00:a6:f3:3c"

[sysadmin@head1 /]$ scyld-nodectl --json -i n0 ls -l | jq '.n0.attributes'
{
  "_boot_config": "DefaultBoot",
  "_no_boot": "0",
  "last_modified": 1559778288.879129
}

[sysadmin@head1 /]$ scyld-nodectl --json -i n0 ls -l | jq '.n0.attributes._boot_config'
"DefaultBoot"