Job Schedulers¶
The default Scyld ClusterWare installation for RHEL/CentOS 7 includes support for the optional job scheduler packages Slurm and PBS TORQUE, and for RHEL/CentOS 8 includes support for the optional packages Slurm and OpenPBS. These optional packages can coexist on a scheduler server, which may or may not be a ClusterWare head node. However, if job schedulers are installed on the same server, then only one at a time should be enabled and executing on that given server.
All nodes in the job scheduler cluster must be able to resolve hostnames
of all other nodes as well as the scheduler server hostname.
ClusterWare provides a DNS server in the clusterware-dnsmasq package,
as discussed in Node Name Resolution.
This dnsmasq will resolve all compute node hostnames,
and the job scheduler's hostname should be added to
/etc/hosts
on the head node(s) in order to be resolved by dnsmasq.
Whenever /etc/hosts
is edited, please restart the clusterware-dnsmasq
service with:
sudo systemctl restart clusterware-dnsmasq
Installing and configuring a job scheduler requires making changes to the compute node software. When using image-based compute nodes, we suggest first cloning the DefaultImage or creating a new image, leaving untouched the DefaultImage as a basic known-functional pristine image.
For example, to set up nodes n0 through n3, you might first do:
scyld-imgctl -i DefaultImage clone name=jobschedImage
scyld-bootctl -i DefaultBoot clone name=jobschedBoot image=jobschedImage
scyld-nodectl -i n[0-3] set _boot_config=jobschedBoot
When these nodes reboot after all the setup steps are complete, they will use the jobschedBoot and jobschedImage.
See https://slurm.schedmd.com/rosetta.pdf for a discussion of the differences between PBS TORQUE and Slurm. See https://slurm.schedmd.com/faq.html#torque for useful information about how to transition from OpenPBS or PBS TORQUE to Slurm.
The following sections describe the installation and configuration of each job scheduler type.
Slurm¶
See Job Schedulers for general job scheduler information and configuration guidelines. See https://slurm.schedmd.com for Slurm documentation.
Note
As of Clusterware 12, the default slurm-scyld configuration is Configless, see https://slurm.schedmd.com/configless_slurm.html for more information. This reduces the admin effort needed when updating the list of compute nodes.
First install Slurm software on the job scheduler server:
sudo yum install slurm-scyld --enablerepo=scyld*
Important
For RHEL/CentOS 8, install Slurm with an additional argument:
sudo yum install slurm-scyld --enablerepo=scyld* --enablerepo=powertools
For RHEL/Rocky 9, install Slurm with an additional argument:
sudo yum install slurm-scyld --enablerepo=scyld* --enablerepo=crb
Now use a helper script slurm-scyld.setup
to complete the initialization
and setup the job scheduler and the compute node image(s).
Note
The slurm-scyld.setup
script performs the init
,
reconfigure
, and update-nodes
actions (described below) by default
against all up nodes. Those actions optionally accept a node-specific
argument using the syntax [--ids|-i <NODES>]
or a group-specific
argument using [--ids|-i %<GROUP>]
.
See Attribute Groups and Dynamic Groups for details.
slurm-scyld.setup init # default to all 'up' nodes
init
first generates /etc/slurm/slurm.conf by trying to install slurm-scyld-node
and run slurmd -C on 'up' nodes. By default configless slurm is enabled by
"SlurmctldParameters=enable_configless" in /etc/slurm/slurm.conf, and a DNS SRV record
called slurmctld_primary is created. To see the details about the SRV:
scyld-clusterctl hosts -i slurmctld_primary ls -l
.
Note
For clusters with a backup Slurm controller, create a slurmctld_backup DNS SRV record:
scyld-clusterctl --hidden hosts create name=slurmctd_backup port=6817 service=slurmctld \
domain=cluster.local target=backuphostname type=srvrec priority=20
However if there are no 'up' nodes or slurm-scyld-node installation fails for some reason,
then no node is configured in slurm.conf during init
. Later you can use reconfigure
to create a new slurm.conf or update-node
to update the nodes in an existing slurm.conf.
init
also generates /etc/slurm/cgroup.conf and /etc/slurm/slurmdbd.conf,
starts munge, slurmctld, mariadb, slurmdbd, and restarts slurmctld. At last, init
tries to start slurmd on nodes.
In an ideal case if the script succeeds to install slurm-scyld-node on compute nodes,
srun -N 1 hostname
works after init
.
The slurmd installation and configuration on 'up' nodes do not survive after nodes reboot, unless on diskful compute nodes. To make a persistent slurm image:
slurm-scyld.setup update-image slurmImage # for permanence in the image
By default update-image
does not include slurm config files into slurmImage if configless
is enabled, otherwise includes config files into slurmImage. You can overwrite this default
behavior by appending an additional arg "--copy-configs" or "--remove-configs" after
slurmImage as in above command.
Reboot the compute nodes to bring them into active management by Slurm. Check the Slurm status:
slurm-scyld.setup status
If any services on controller (slurmctld, slurmdbd and munge) or compute nodes (slurmd and
munge) are not running, you can try to use systemctl to start individual service, or use
slurm-scyld.setup cluster-restart
, slurm-scyld.setup restart
and
slurm-scyld.setup start-nodes
to restart slurm cluster-wide, controller only and nodes only.
Note
The above restart or start do not effect slurmImage.
The update-image
is necessary for persistence
across compute node reboots.
Generate new slurm-specific config files with:
slurm-scyld.setup reconfigure # default to all 'up' nodes
Add nodes by executing:
slurm-scyld.setup update-nodes # default to all 'up' nodes
or add or remove nodes by directly editing the /etc/slurm/slurm.conf
config file.
Note
With Configless Slurm, the slurmImage does NOT need to be
reconfigured after new nodes are added -- Slurm will automatically
forward the new information to the slurmd
daemons on the nodes.
Inject users into the compute node image using the sync-uids
script.
The administrator can inject all users, or a selected list of users,
or a single user.
For example, inject the single user janedoe:
/opt/scyld/clusterware-tools/bin/sync-uids \
-i slurmImage --create-homes \
--users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub
See Configure Authentication and
/opt/scyld/clusterware-tools/bin/sync-uids -h
for details.
To view the Slurm status on the server and compute nodes:
slurm-scyld.setup status
The Slurm service can also be started and stopped cluster-wide with:
slurm-scyld.setup cluster-stop
slurm-scyld.setup cluster-start
Slurm executable commands and libraries are installed in /opt/scyld/slurm/
.
The Slurm controller configuration can be found in /etc/slurm/slurm.conf
,
and each node caches a copy of that slurm.conf
file in /var/spool/slurmd/conf-cache/
.
Each Slurm user must set up the PATH and LD_LIBRARY_PATH
environment variables to properly access the Slurm commands.
This is done automatically for users who login when Slurm is running
via the /etc/profile.d/scyld.slurm.sh
script.
Alternatively, each Slurm user can manually execute module load slurm
or can add that command line to (for example) the user's
~/.bash_profile
or ~/.bashrc
.
For a traditional config-file-based Slurm deployment, the admin will have to
push the new /etc/slurm/slurm.conf
file out to the compute nodes and then
restart slurmd
. Alternately, the admin can modify the boot image to
include the new config file, and then reboot the nodes into that new image.
PBS TORQUE¶
PBS TORQUE is only available for RHEL/CentOS 7 clusters. See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.adaptivecomputing.com/support/documentation-index/torque-resource-manager-documentation for PBS TORQUE documentation.
First install PBS TORQUE software on the job scheduler server:
sudo yum install torque-scyld --enablerepo=scyld*
Now use a helper script torque-scyld.setup
to complete the initialization
and setup the job scheduler and config file in the compute node image(s).
Note
The torque-scyld.setup
script performs the init
,
reconfigure
, and update-nodes
actions (described below) by default
against all up nodes. Those actions optionally accept a node-specific
argument using the syntax [--ids|-i <NODES>]
or a group-specific
argument using [--ids|-i %<GROUP>]
.
See Attribute Groups and Dynamic Groups for details.
torque-scyld.setup init # default to all 'up' nodes
torque-scyld.setup update-image torqueImage # for permanence in the image
Reboot the compute nodes to bring them into active management by TORQUE. Check the TORQUE status:
torque-scyld.setup status
# If the TORQUE daemon is not executing, then:
torque-scyld.setup cluster-restart
# And check the status again
This cluster-restart
is a manual one-time setup that doesn't affect the
torqueImage.
The update-image
is necessary for persistence
across compute node reboots.
Generate new torque-specific config files with:
torque-scyld.setup reconfigure # default to all 'up' nodes
Add nodes by executing:
torque-scyld.setup update-nodes # default to all 'up' nodes
or add or remove nodes by directly editing the
/var/spool/torque/server_priv/nodes
config file.
Any such changes must be added to torqueImage by reexecuting:
torque-scyld.setup update-image slurmImage
and then either reboot all the compute nodes with that updated image, or additional execute:
torque-scyld.setup cluster-restart
to manually push the changes to the up nodes without requiring a reboot.
Inject users into the compute node image using the sync-uids
script.
The administrator can inject all users, or a selected list of users,
or a single user.
For example, inject the single user janedoe:
/opt/scyld/clusterware-tools/bin/sync-uids \
-i torqueImage --create-homes \
--users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub
See Configure Authentication and
/opt/scyld/clusterware-tools/bin/sync-uids -h
for details.
To view the TORQUE status on the server and compute nodes:
torque-scyld.setup status
The TORQUE service can also be started and stopped cluster-wide with:
torque-scyld.setup cluster-stop
torque-scyld.setup cluster-start
TORQUE executable commands are installed in /usr/sbin/
and /usr/bin/
,
TORQUE libraries are installed in /usr/lib64/
,
and are therefore accessible by the default search rules.
OpenPBS¶
OpenPBS is only available for RHEL/CentOS 8 clusters.
See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.openpbs.org for OpenPBS documentation.
First install OpenPBS software on the job scheduler server:
sudo yum install openpbs-scyld --enablerepo=scyld*
Use a helper script to complete the initialization and setup the job scheduler and config file in the compute node image(s).
Note
The openpbs-scyld.setup
script performs the init
,
reconfigure
, and update-nodes
actions (described below) by default
against all up nodes. Those actions optionally accept a node-specific
argument using the syntax [--ids|-i <NODES>]
or a group-specific
argument using [--ids|-i %<GROUP>]
.
See Attribute Groups and Dynamic Groups for details.
openpbs-scyld.setup init # default to all 'up' nodes
openpbs-scyld.setup update-image openpbsImage # for permanence in the image
Reboot the compute nodes to bring them into active management by OpenPBS. Check the OpenPBS status:
openpbs-scyld.setup status
# If the OpenPBS daemon is not executing, then:
openpbs-scyld.setup cluster-restart
# And check the status again
This cluster-restart
is a manual one-time setup that doesn't affect the
openpbsImage.
The update-image
is necessary for persistence
across compute node reboots.
Generate new openpbs-specific config files with:
openpbs-scyld.setup reconfigure # default to all 'up' nodes
Add nodes by executing:
openpbs-scyld.setup update-nodes # default to all 'up' nodes
or add or remove nodes by executing qmgr
.
Any such changes must be added to openpbsImage by reexecuting:
openpbs-scyld.setup update-image openpbsImage
and then either reboot all the compute nodes with that updated image, or additional execute:
openpbs-scyld.setup cluster-restart
to manually push the changes to the up nodes without requiring a reboot.
Inject users into the compute node image using the sync-uids
script.
The administrator can inject all users, or a selected list of users,
or a single user.
For example, inject the single user janedoe:
/opt/scyld/clusterware-tools/bin/sync-uids \
-i openpbsImage --create-homes \
--users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub
See Configure Authentication and
/opt/scyld/clusterware-tools/bin/sync-uids -h
for details.
To view the OpenPBS status on the server and compute nodes:
openpbs-scyld.setup status
The OpenPBS service can also be started and stopped cluster-wide with:
openpbs-scyld.setup cluster-stop
openpbs-scyld.setup cluster-start
OpenPBS executable commands and libraries are installed in
/opt/scyld/openpbs/
.
Each OpenPBS user must set up the PATH and LD_LIBRARY_PATH
environment variables to properly access the OpenPBS commands.
This is done automatically for users who login when OpenPBS is running
via the /etc/profile.d/scyld.openpbs.sh
script.
Alternatively, each OpenPBS user can manually execute module load openpbs
or can add that command line to (for example) the user's
~/.bash_profile
or ~/.bashrc
.