Initial Installation of Scyld ClusterWare

The Scyld ClusterWare scyld-install script installs the necessary packages from the ClusterWare yum repositories, and installs dependency packages as needed from the base distribution (e.g., Red Hat RHEL or CentOS) yum repositories.

Important

Do not install ClusterWare as an upgrade to an existing ClusterWare 6 or 7 installation. Instead, install Scyld ClusterWare on a non-ClusterWare system that ideally is a virtual machine. (See Required and Recommended Components.)

Important

The head node(s) must use a Red Hat RHEL- or CentOS-equivalent base distribution release 7.6 or later environment, due to dependencies on newer libvirt and selinux packages.

Note

Clusters commonly employ multiple head nodes. The instructions in this section describe installing ClusterWare on the first head node. To later install ClusterWare on additional head nodes, see Managing Multiple Head Nodes.

scyld-install anticipates being potentially executed by a non-root user, so ensure that your userid can execute sudo. Additionally, if using sudo behind a proxy, then because sudo clears certain environment variables for security purposes, the cluster administrator should consider adding several lines to /etc/sudoers:

Defaults    env_keep += "HTTP_PROXY http_proxy"
Defaults    env_keep += "HTTPS_PROXY https_proxy"
Defaults    env_keep += "NO_PROXY no_proxy"

Important

Various commands that manipulate images execute as user root, thereby requiring that the commands internally use sudo and requiring that user root must have access to the administrator's workspace which contains the administrator's images. Typically the per-user workspace is ~/.scyldcw/workspace/. If that directory is not accessible to the command executing as root, then another accessible directory can be employed, and the administrator can identify that alternative pathname by adding a modimg.workspace setting to ~/.scyldcw/settings.ini.

Important

scyld-install uses the yum command to access Scyld ClusterWare and potentially various other repositories (e.g., Red Hat RHEL or CentOS) that by default normally reside on Internet websites. However, if the head node(s) do not have Internet access, then the required repositories must reside on local storage that is accessible by the head node(s). See Appendix: Creating Local Repositories without Internet.

Execute the ClusterWare install script

If /etc/yum.repos.d/clusterware.repo exists, then scyld-install's subsequent invocations of yum will employ that configuration file. If /etc/yum.repos.d/clusterware.repo does not exist, then scyld-install prompts the user for an appropriate authentication token and uses that to build a /etc/yum.repos.d/clusterware.repo that is customized to your cluster.

scyld-install accepts an optional argument specifying a cluster configuration file that contains information necessary to set up the DHCP server. For example:

cat <<-EOF >/tmp/cluster-conf
interface enp0s9            # names the private cluster interface
nodes 4                     # max number of compute nodes
iprange 10.10.32.45         # starting IP address of node 0
node 08:00:27:f0:44:35      # node 0 MAC address
node 08:00:27:f0:44:45      # node 1 MAC address
node 08:00:27:f0:44:55      # node 2 MAC address
node 08:00:27:f0:44:65      # node 3 MAC address
EOF

where the syntax of this cluster configuration file is:

domain <DOMAIN_NAME>

Optional. Defaults to "cluster.local".

interface <INTERFACE_NAME>

Optional. Specifies the name of head node's interface to the private cluster network, although that can be determined from the specification of the <FIRST_IP> in the iprange line.

nodes <MAX_COUNT>

Optional. Specifies the max number of compute nodes, although that can be determined from the iprange if both the <FIRST_IP> and <LAST_IP> are present. The max will also adjust as-needed if and when additional nodes are defined. For example, see Node Creation with Known MAC address(es).

iprange <FIRST_IP> [<LAST_IP>]

Specifies the IP address of the first node (which defaults to n0) and optionally the IP address of the last node. The <LAST_IP> can be deduced from the <FIRST_IP> and the nodes <MAX_COUNT>. The <FIRST_IP> can include an optional netmask via a suffix of /<BIT_COUNT> (e.g., /24) or a mask (e.g., /255.255.255.0).

<FIRST_INDEX> <FIRST_IP> [<LAST_IP>] [via <FROM_IP>] [gw <GATEWAY_IP>]

This is a more elaborate specification of a range of IP addresses, and it is common when using DHCP relays or multiple subnets. <FIRST_INDEX> specifies that the first node in this range is node n<FIRST_INDEX> and is assigned IP address <FIRST_IP>; optionally specifies that the range of nodes make DHCP client requests that arrive on the interface that contains <FROM_IP>; optionally specifies that each DHCP'ing node be told to use <GATEWAY_IP> as their gateway, which otherwise defaults to the IP address (on the private cluster network) of the head node.

For example: 128 10.10.24.30/24 10.10.24.100 via 192.168.65.2 gw 10.10.24.254 defines a DHCP range of 71 addresses, the first starting with 10.10.24.30, and assigns the first node in the range as n128; watches for DHCP requests arriving on the interface containing 192.168.65.2; and tells these nodes to use 10.10.24.254 as the their gateway.

node [<INDEX>] <MAC> [<MAC>]

One compute node per line, and commonly consisting of multiple node lines, where each DHCP'ing node is recognized by its unique MAC address and is assigned an IP address using the configuration file specifications described above. Currently only the first <MAC> is used. An optional <INDEX> is the index number of the node that overrides the default of sequentially increasing node number indices and thereby creates a gap of unassigned indices. For example, a series of eight node lines without an <INDEX> that is followed by node 32 52:54:00:c4:f7:1e creates a gap of unassigned indices n8 to n31 and assigns this node as n32.

Note

ClusterWare yum repositories contain RPMs that duplicate various Red Hat EPEL RPMs, and these ClusterWare RPMs get installed or updated in preference to their EPEL equivalents, even if /etc/yum.repos.d/ contains an EPEL .conf file.

Note

ClusterWare employs userid/groupid 539 to simplify communication between the head node(s) and the backend shared storage where it stores node image files, kernels, and initramfs files. If the scyld-install script detects that this uid/gid is already in use by other software, then the script issues a warning and chooses an alternative new random uid/gid. The cluster administrator needs to set the appropriate permissions on that shared storage to allow all head nodes to read and write all files.

The ClusterWare database is stored as JSON content within a replicated document store distributed among the ClusterWare head nodes. This structure protects against the failure of any single head node. Although the system originally used the community edition of Couchbase as the distributed database, the internal API is implemented using pluggable modules and now supports etcd as the default distributed database and Couchbase as an alternative. The module API is intended to present a consistent experience regardless of the backend database but some details, such as failure modes, will differ.

During head node installation the cluster administator can select the database using the DB_RPM environment variable. The current default is clusterware-etcd, although using a value of clusterware-couchbase will install the appropriate package and configuration for the Couchbase backend.

For example, using the cluster-config created above and installing the default etcd database:

scyld-install --config /tmp/cluster-conf

Or alternatively, choosing the alternative couchbase database:

DB_RPM=clusterware-couchbase scyld-install --config cluster.conf

The administrator can also switch between the available backend databases after the cluster is installed. See Choosing An Alternate Database for details.

By default scyld-install creates the DefaultImage that contains a kernel and rootfs software from the same base distribution installed on the head node, although if the head node executes RHEL8, then no DefaultImage and DefaultBoot are created.

Alternatively, for more flexibility (especially with a RHEL8 head node), execute the installer with an additional option that identifies the base distribution to be used for the DefaultImage:

scyld-install --config /tmp/cluster-conf --os-iso <ISO-file>

where <ISO-file> is either a pathname to an ISO file or a URL of an ISO file. That ISO can match the head node's distribution or can be any supported distribution.

scyld-install unpacks an embedded compressed payload and performs the following steps:

  • Checks for a possible newer version of the clusterware-installer RPM. If one is found, then the script will update the local RPM installation and execute the newer scyld-install script with the same arguments. An optional argument --skip-version-check bypasses this check.

  • An optional argument --yum-repo /tmp/clusterware.repo re-installs a yum repo file to /etc/yum.repos.d/clusterware.repo. This is unnecessary if /etc/yum.repos.d/clusterware.repo already exists and is adequate.

  • Checks whether the clusterware RPM is installed.

  • Confirms the system meets various minimum requirements.

  • Installs the clusterware RPM and its supporting RPMs.

  • Copies a customized Telegraf configuration file to /etc/telegraf/telegraf.conf

  • Enables the tftpd service in xinetd for PXE booting.

  • Randomizes assorted security-related values in /opt/scyld/clusterware/conf/base.ini

  • Sets the current user account as a ClusterWare administrator in /opt/scyld/clusterware/conf/base.ini. If this is intended to be a production cluster, then the system administrator should create additional ClusterWare administrator accounts and clear this variable. For details on this and other security related settings, including adding ssh keys to compute nodes, please see the Installation & Administrator Guide section Securing the Cluster.

  • Modifies /etc/yum.repos.d/clusterware.repo to change enabled=1 to enabled=0. Subsequent executions of scyld-install to update ClusterWare will temporarily (and silently) re-enable the ClusterWare repo for the duration of that command. This is done to avoid inadvertent updates of ClusterWare packages if and when the clusterware administrator executes a more general yum install or yum update intending to add or update the base distribution packages.

Then scyld-install uses systemd to enable and start firewalld, and opens ports for communication between head nodes as required by etcd (or Couchbase). See Services, Ports, Protocols for details.

Once the ports are open, scyld-install initializes the ClusterWare database and enables and starts the following services:

  • httpd: The Apache HTTP daemon that runs the ClusterWare service and proxies Chronograf and Kapacitor.

  • xinetd: Provides network access to tftp for PXE booting.

  • Telegraf: Collects head node performance data and feeds into InfluxDB.

  • InfluxDB: Stores node performance and status data for visualization in Chronograf.

  • Chronograf: Displays the head node and compute node status data through a web interface.

  • Kapacitor: The eventing software that works with Chronograf.

The script then:

  • Opens ports in firewalld for public access to HTTP, HTTPS, TFTP, iSCSI, and incoming Telegraf UDP messages.

  • Installs and configures the cluster administrator's clusterware-tools package (unless it was executed with the --no_tools option).

  • Configures the cluster administrator's ~/.scyldcw/settings.ini to access the newly installed ClusterWare service using the scyld-tool-config tool.

  • Creates an initial simple boot image DefaultImage, boot config DefaultBoot, and attributes DefaultAttribs using the scyld-add-boot-config tool.

  • Loads the cluster configuration specified on the command line using the scyld-cluster-conf load command.

  • Restarts the httpd service to apply the loaded cluster configuration.

Important

See the Node Images and Boot Configurations for details about how to modify existing boot images, create new boot images, and associate specific boot images and attributes with specific compute nodes. We strongly recommend not modifying or removing the initial DefaultImage, but rather cloning that basic image into a new image that gets modified further, or just creating new images from scratch.

Important

If you wish to ensure that the latest packages are installed in the image after the scyld-install, then execute scyld-modimg -i DefaultImage --update --overwrite --upload.

Important

See Common Additional Configuration and Software for additional optional cluster configuration procedures, e.g., installing and configuring a job scheduler, installing and configuring one of the MPI family software stacks.

Important

If this initial scyld-install does not complete successfully, or if you want to begin the installation anew, then when/if you re-run the script, you should cleanse the partial, potentially flawed installation by adding the --clear argument, e.g., scyld-install --clear --config /tmp/cluster-conf. If that still isn't sufficient, then scyld-install --clear-all --config /tmp/cluster-conf does a more complete clearing, then reinstalls all the ClusterWare packages.

Due to licensing restrictions, when running on a Red Hat RHEL system, the installer will still initially create a CentOS compute node image as the DefaultImage. If after this initial installation a cluster administrator wishes to instead create compute node images based on RHEL, then use the scyld-clusterctl repos tool as described in Appendix: Creating Arbitrary RHEL Images, and create a new image (e.g., DefaultRHELimage) to use as a new default.

Configure additional cluster administrators

The ClusterWare administrator's command-line tools are found in the clusterware-tools package. which is installed by default on the head node by scyld-install. It can be additionally installed on any system that has HTTP (or HTTPS, see Securing the Cluster) access to a ClusterWare head node in the cluster.

To install these tools on a machine other than the head node, login to that other system, copy /etc/yum.repos.d/clusterware.repo from a head node to the same location on this system, then execute:

sudo yum install clusterware-tools

Once the tools are installed, each administrator must configure a connection to the ClusterWare service, which is controlled by variables in the user's ~/.scyldcw/settings.ini file. The scyld-tool-config tool script is provided by the clusterware-tools package. The contents of the settings.ini file are discussed in the Reference Guide. Running that tool and answering the on-screen questions will generate a settings.ini file, although administrators of more advanced cluster configurations may need to manually add or edit additional variables.

Once the settings.ini is created, you can test your connection by running a simple node query:

scyld-nodectl ls

This query may complain at this time that no nodes exist or no nodes are selected, although such a complaint does verify that the requesting node can properly communicate with a head node database. However, if you see an error resembling the one below, check your settings.ini contents and your network configuration:

Failed to connect to the ClusterWare service.  Please check that the
service is running and your base_url is set correctly in
/home/adminuser/.scyldcw/settings.ini or on the command line.

The connection URL and username can also be overridden for an individual program execution using the --base-url and --user options available for all scyld-* commands. The settings.ini file generated by scyld-install will contain a blank client.authpass variable. This is provided for convenience during installation, though for production clusters the system administrator will want to enforce authentication restrictions. See details in Securing the Cluster.