Quickstart GuideΒΆ

The following is a brief example of creating a minimal, yet functional, Scyld ClusterWare cluster. No attempt is made here to explain the full breadth and depth of ClusterWare, which is extensively discussed in the remainder of the documentation (see Documentation Overview). This Quickstart Guide assumes the reader is familiar with administering Red Hat RHEL or CentOS servers. For readers who are unfamiliar with clusters, see Cluster Architecture Overview.

Prerequisites for this minimal cluster:

  • A minimum of two x86_64 servers, and preferably three: one becomes a ClusterWare head node and the remainder become ClusterWare compute nodes. This Quickstart example uses three servers.

  • The head node can be a bare metal server, although there is greater flexibility if it is a virtual machine. It should have a minimum of 4GB of RAM and 16GB of storage, and it must have an Ethernet controller connected to a private cluster network to communicate with the compute node(s). These are minimal requirements. See Required and Recommended Components for recommendations for a production cluster.

  • The head node must be running Red Hat RHEL or CentOS 7.6 (or newer) or an equivalent distribution (see Supported Distributions and Features). It must have access to a repo for that base distribution so that ClusterWare can yum install additional packages.

  • If you do not already have a ClusterWare repo for ClusterWare packages, then the scyld-install installer prompts the user for an appropriate userid/password authentication and builds the ClusterWare repo in /etc/yum.repos.d/clusterware.repo. Typically access to the ClusterWare packages is through a second Ethernet controller connected to the "outside world", e.g., the Internet.

  • The compute node(s) must have their BIOS configured to PXEboot by default, using either "Legacy" or "UEFI" mode. They should have a minimum of 4GB of RAM and one Ethernet controller that is also physically connected to the same private cluster network. See Required and Recommended Components for recommendations for a production cluster.

A ClusterWare cluster administrator needs root privileges. Common practice is to create non-root administrators and give them sudo capability. For example, create an administrator user admin1:

useradd admin1       # create the user
passwd admin1        #  and give it a password

# Give "admin1" full root sudo privileges
echo "admin1 ALL=(root) NOPASSWD: ALL" >> /etc/sudoers

# Now execute as that user "admin1"
su - admin1
# And add SSH key pairs (defaulting to /home/admin1/.ssh/id_rsa)
ssh-keygen

Create a cluster configuration text file that names the interface to the private cluster network, the IP address on that network for the first compute node, and a list of MAC addresses to use as compute nodes. For example:

cat <<-EOF >/tmp/cluster-conf
iprange 10.54.60.0          # starting IP address of node 0
node 52:54:00:a6:f3:3c      # node 0 MAC address
node 40:8d:5c:fa:ea:c3      # node 1 MAC address
EOF

Now install the clusterware-installer package using the ClusterWare /etc/yum.repos.d/clusterware.repo repo file:

sudo yum install clusterware-installer

That package contains the scyld-install script that you will execute to install the software and create the DefaultImage, which consists of a basic compute node pxeboot image, and the DefaultBoot config file, which references that DefaultImage and contains various boot-time information such as a kernel commandline to pass to a booting node.

For a simple installation:

# Reminder: you should be executing as user "admin1"
scyld-install --config /tmp/cluster-conf

By default the DefaultImage contains a kernel and rootfs software from the same base distribution installed on the head node, although if the head node executes RHEL8, then no DefaultImage and DefaultBoot are created.

Alternatively, for more flexibility (especially with a RHEL8 head node), execute the installer with an additional option that identifies the base distribution to be used for the DefaultImage:

scyld-install --config /tmp/cluster-conf --os-iso <ISO-file>

where <ISO-file> is either a pathname to an ISO file or a URL of an ISO file of a specific base distribution release, e.g., --os-iso rhel-8.5-x86_64-dvd.iso. That ISO can match the head node's base distribution or can be any distribution supported by Penguin Computing (see Supported Distributions and Features).

Now you have a basic 2-node cluster that should PXEboot compute nodes. The installer has created a DefaultImage that contains basic compute node software and a DefaultBoot config file for booting that image, and has initialized every node to PXEboot using the DefaultBoot. Validate your current setup by rebooting both compute nodes, and check the status of the nodes as they boot and connect to the head node:

scyld-nodectl status --refresh
# Use ctrl-c to exit this display

which initially shows:

Node status                                              [ date & time ]
------------------------------------------------------------------------
n[0-1] new

for the nodes n0 and n1, and automatically updates as each node's status changes from booting to up. The per-node transition from new to booting consumes a minute or more doing hardware initialization, PXEboot provisioning, and early software init. The transition from booting to up consumes another minute or more. If the nodes do not boot, then see Failing PXE Network Boot.

You can view information about the up nodes by executing:

scyld-nodectl ls -L
# which is shorthand for `scyld-nodectl list --long-long`

scyld-nodectl status -L

Now enhance the functionality of the compute node software by installing the Slurm job scheduler and an OpenMPI software stack into the image that PXEboots. Best practice is to retain the original DefaultImage and DefaultBoot as a pristine starting point for future additional software enhancements, so copy these Default objects and modify just the copies:

scyld-imgctl -i DefaultImage clone name=NewImage
scyld-bootctl -i DefaultBoot clone name=NewBoot
# The NewBoot clone is initially associated with the DefaultImage,
#  so change that:
scyld-bootctl -i NewBoot update image=NewImage
# Instruct all compute nodes to use "NewBoot" (instead of "DefaultBoot"):
scyld-nodectl --all set _boot_config=NewBoot

Add the head node to /etc/hosts and then restart the clusterware-dnsmasq service. Suppose the head node's private cluster network IP address is "10.54.0.60":

echo "10.54.0.60 $(hostname)" | sudo tee -a /etc/hosts
sudo systemctl restart clusterware-dnsmasq

Now install and configure Slurm, which is one of the ClusterWare-supported job schedulers. The scyld-install installer has disabled the ClusterWare repo (for an explanation, see Additional Software) so we must explicitly enable the repo:

sudo yum install slurm-scyld --enablerepo=scyld*

# Perform the Slurm initialization
slurm-scyld.setup init

# Add Slurm client software to the NewImage
slurm-scyld.setup update-image NewImage

# Reboot the nodes and view their status as they boot
scyld-nodectl --all reboot
scyld-nodectl status --refresh
# And ctrl-c when both rebooting nodes are again "up"

# Check the job scheduler status
slurm-scyld.setup status
# If the Slurm daemon and munge are not both executing, then:
slurm-scyld.setup cluster-restart
# And check status again
slurm-scyld.setup status
sinfo

Configure the cluster to support OpenMPI multi-threaded communication between compute nodes using the ssh transport mechanism, which requires user uid/gid and passphrase-less key-based access. For this Quickstart Guide we will continue to use admin1 as the user. Add admin1's authentication to the NewImage:

/opt/scyld/clusterware-tools/bin/sync-uids \
                -i NewImage --create-homes \
                --users admin1 --sync-key admin1=/home/admin1/.ssh/id_rsa.pub

Install OpenMPI 4.0 into NewImage using chroot:

scyld-modimg -i NewImage --chroot --no-discard --overwrite --upload
  # Inside the chroot you are executing as user root
  yum install openmpi4.0

  # Set up access to Slurm and OpenMPI for "admin1"
  echo "module load slurm"   >> /home/admin1/.bashrc
  echo "module load openmpi" >> /home/admin1/.bashrc

  # Build an example OpenMPI application
  cd /opt/scyld/openmpi/*/gnu/examples
  yum install make
  module load openmpi
  make hello_c
  # For simplicity, copy the executable to /home/admin1/hello_c
  cp hello_c /home/admin1/hello_c

  exit  # from the chroot

Reboot the nodes with the updated NewImage:

scyld-nodectl --all reboot
# Observe the node status changes
scyld-nodectl status --refresh
# And ctrl-c when both rebooting nodes are again "up"

From the head node, verify this by using Slurm to execute programs on the compute nodes:

module load slurm

# Verify basic Slurm functionality by executing a simple command on each node
srun -N 2 hostname

# Use Slurm to execute one "Hello World" program on each of the two nodes
srun -N 2 hello_c