For Scyld ClusterWare, the default installation includes both the TORQUE resource manager and the Slurm workload manager, each providing users with an intuitive interface for remotely initiating and managing batch jobs on distributed compute nodes.
ClusterWare TORQUE is a customized redistribution of Open Source software that derives from `Adaptive Computing Enterprises, Inc. https://www.adaptivecomputing.com/products/opensource/torque. TORQUE is an Open Source tool based on standard OpenPBS. ClusterWare Slurm is a redistribution of Open Source software that derives from https://slurm.schedmd.com, and the associated Munge package derives from http://dun.github.io/munge/.
Both TORQUE and Slurm are installed by default, although only one job manager can be enabled at any one time. See Enabling TORQUE or Slurm below, for details. See the User’s Guide for general information about using TORQUE or Slurm. See Managing Multiple Master Nodes for details about how to configure TORQUE for high availability using multiple master nodes.
Scyld also redistributes the Scyld Maui job scheduler, also derived from Adaptive Computing, that functions in conjunction with the TORQUE job manager. The alternative Moab job scheduler is also available from Adaptive Computing with a separate license, giving customers additional job scheduling, reporting, and monitoring capabilities.
In addition, Scyld provides support for most popular Open Source and commercial schedulers and resource managers, including SGE, LSF, and PBSPro. For the latest information, see the Penguin Computing Support Portal at https://www.penguincomputing.com/support.
Enabling TORQUE or Slurm¶
To enable TORQUE: after all compute nodes are up and running, you disable Slurm (if it is currently enabled), then enable and configure TORQUE, then reboot all the compute nodes:
slurm-scyld.setup cluster-stop beochkconfig 98slurm off slurm-scyld.setup disable beochkconfig 98torque on torque-scyld.setup reconfigure # when needed torque-scyld.setup enable torque-scyld.setup cluster-start torque-scyld.setup status bpctl -S all -R
To enable Slurm: after all compute nodes are up and running, you disable TORQUE (if it is currently enabled), then enable and configure Slurm, then reboot all the compute nodes:
torque-scyld.setup cluster-stop beochkconfig 98torque off torque-scyld.setup disable beochkconfig 98slurm on slurm-scyld.setup reconfigure # when needed slurm-scyld.setup enable slurm-scyld.setup cluster-start slurm-scyld.setup status bpctl -S all -R
Note: slurmdbd uses mysql to create a database defined by
/etc/slurm/slurmdbd.conf, and expects mysql to be configured with no
Each Slurm user must setup the PATH and LD_LIBRARY_PATH environment
variables to properly access the Slurm commands. This is done
automatically for users who login when the slurm service is running
and the pbs_server is not running, via the
/etc/profile.d/scyld.slurm.sh script. Alternatively, each Slurm user
can manually execute
module load slurm or can add that command line
to (for example) the user’s