Initial Installation of Scyld ClusterWare¶
The Scyld ClusterWare scyld-install
script installs the necessary
packages from the ClusterWare yum repositories,
and installs dependency packages as needed from the base distribution
(e.g., Red Hat RHEL or CentOS) yum repositories.
Important
Do not install ClusterWare as an upgrade to an existing ClusterWare 6 or 7 installation. Instead, install Scyld ClusterWare on a non-ClusterWare system that ideally is a virtual machine. (See Required and Recommended Components.)
Important
The head node(s) must use a Red Hat RHEL- or CentOS-equivalent base distribution release 7.6 or later environment, due to dependencies on newer libvirt and selinux packages.
Note
Clusters commonly employ multiple head nodes. The instructions in this section describe installing ClusterWare on the first head node. To later install ClusterWare on additional head nodes, see Managing Multiple Head Nodes.
scyld-install
anticipates being potentially executed by a non-root user,
so ensure that your userid can execute sudo.
Additionally, if using sudo
behind a proxy, then because sudo
clears
certain environment variables for security purposes,
the cluster administrator should consider adding several lines to
/etc/sudoers
:
Defaults env_keep += "HTTP_PROXY http_proxy"
Defaults env_keep += "HTTPS_PROXY https_proxy"
Defaults env_keep += "NO_PROXY no_proxy"
Important
Various commands that manipulate images execute as user root,
thereby requiring that the commands internally use sudo
and
requiring that user root must have access to the administrator's
workspace which contains the administrator's images.
Typically the per-user workspace is ~/.scyldcw/workspace/
.
If that directory is not accessible to the command executing as root,
then another accessible directory can be employed,
and the administrator can identify that alternative pathname by adding
a modimg.workspace setting to ~/.scyldcw/settings.ini
.
Important
scyld-install
uses the yum
command to access Scyld ClusterWare
and potentially various other repositories (e.g., Red Hat RHEL or CentOS) that
by default normally reside on Internet websites.
However, if the head node(s) do not have Internet access,
then the required repositories must reside on local storage that is
accessible by the head node(s).
See Appendix: Creating Local Repositories without Internet.
Execute the ClusterWare install script¶
If /etc/yum.repos.d/clusterware.repo
exists, then scyld-install
's
subsequent invocations of yum
will employ that configuration file.
If /etc/yum.repos.d/clusterware.repo
does not exist,
then scyld-install
prompts the user for an appropriate authentication token
and uses that to build a /etc/yum.repos.d/clusterware.repo
that is
customized to your cluster.
scyld-install
accepts an optional argument specifying a cluster configuration
file that contains information necessary to set up the DHCP server.
For example:
cat <<-EOF >/tmp/cluster-conf
interface enp0s9 # names the private cluster interface
nodes 4 # max number of compute nodes
iprange 10.10.32.45 # starting IP address of node 0
node 08:00:27:f0:44:35 # node 0 MAC address
node 08:00:27:f0:44:45 # node 1 MAC address
node 08:00:27:f0:44:55 # node 2 MAC address
node 08:00:27:f0:44:65 # node 3 MAC address
EOF
where the syntax of this cluster configuration file is:
domain <DOMAIN_NAME>
Optional. Defaults to "cluster.local".
interface <INTERFACE_NAME>
Optional. Specifies the name of head node's interface to the private cluster network, although that can be determined from the specification of the <FIRST_IP> in the iprange line.
nodes <MAX_COUNT>
Optional. Specifies the max number of compute nodes, although that can be determined from the iprange if both the <FIRST_IP> and <LAST_IP> are present. The max will also adjust as-needed if and when additional nodes are defined. For example, see Node Creation with Known MAC address(es).
iprange <FIRST_IP> [<LAST_IP>]
Specifies the IP address of the first node (which defaults to n0) and optionally the IP address of the last node. The <LAST_IP> can be deduced from the <FIRST_IP> and the nodes <MAX_COUNT>. The <FIRST_IP> can include an optional netmask via a suffix of /<BIT_COUNT> (e.g., /24) or a mask (e.g., /255.255.255.0).
<FIRST_INDEX> <FIRST_IP> [<LAST_IP>] [via <FROM_IP>] [gw <GATEWAY_IP>]
This is a more elaborate specification of a range of IP addresses, and it is common when using DHCP relays or multiple subnets. <FIRST_INDEX> specifies that the first node in this range is node n<FIRST_INDEX> and is assigned IP address <FIRST_IP>; optionally specifies that the range of nodes make DHCP client requests that arrive on the interface that contains <FROM_IP>; optionally specifies that each DHCP'ing node be told to use <GATEWAY_IP> as their gateway, which otherwise defaults to the IP address (on the private cluster network) of the head node.
For example:
128 10.10.24.30/24 10.10.24.100 via 192.168.65.2 gw 10.10.24.254
defines a DHCP range of 71 addresses, the first starting with 10.10.24.30, and assigns the first node in the range as n128; watches for DHCP requests arriving on the interface containing 192.168.65.2; and tells these nodes to use 10.10.24.254 as the their gateway.
node [<INDEX>] <MAC> [<MAC>]
One compute node per line, and commonly consisting of multiple node lines, where each DHCP'ing node is recognized by its unique MAC address and is assigned an IP address using the configuration file specifications described above. Currently only the first <MAC> is used. An optional <INDEX> is the index number of the node that overrides the default of sequentially increasing node number indices and thereby creates a gap of unassigned indices. For example, a series of eight node lines without an <INDEX> that is followed by
node 32 52:54:00:c4:f7:1e
creates a gap of unassigned indices n8 to n31 and assigns this node as n32.
Note
ClusterWare yum repositories contain RPMs that duplicate various
Red Hat EPEL RPMs, and these ClusterWare RPMs get installed or updated in
preference to their EPEL equivalents, even if /etc/yum.repos.d/
contains an EPEL .conf file.
Note
ClusterWare employs userid/groupid 539 to simplify communication
between the head node(s) and the backend shared storage where it stores
node image files, kernels, and initramfs files.
If the scyld-install
script detects that this uid/gid is already in use
by other software, then the script issues a warning and
chooses an alternative new random uid/gid.
The cluster administrator needs to set the appropriate permissions on
that shared storage to allow all head nodes to read and write all files.
The ClusterWare database is stored as JSON content within a replicated document store distributed among the ClusterWare head nodes. This structure protects against the failure of any single head node. Although the system originally used the community edition of Couchbase as the distributed database, the internal API is implemented using pluggable modules and now supports etcd as the default distributed database and Couchbase as an alternative. The module API is intended to present a consistent experience regardless of the backend database but some details, such as failure modes, will differ.
During head node installation the cluster administator can select the database using the DB_RPM environment variable. The current default is clusterware-etcd, although using a value of clusterware-couchbase will install the appropriate package and configuration for the Couchbase backend.
For example, using the cluster-config
created above and installing
the default etcd database:
scyld-install --config /tmp/cluster-conf
Or alternatively, choosing the alternative couchbase database:
DB_RPM=clusterware-couchbase scyld-install --config cluster.conf
The administrator can also switch between the available backend databases after the cluster is installed. See Choosing An Alternate Database for details.
By default scyld-install
creates the DefaultImage that contains a kernel and
rootfs software from the same base distribution installed on the head node,
although if the head node executes RHEL8,
then no DefaultImage and DefaultBoot are created.
Alternatively, for more flexibility (especially with a RHEL8 head node), execute the installer with an additional option that identifies the base distribution to be used for the DefaultImage:
scyld-install --config /tmp/cluster-conf --os-iso <ISO-file>
where <ISO-file> is either a pathname to an ISO file or a URL of an ISO file. That ISO can match the head node's distribution or can be any supported distribution.
scyld-install
unpacks an embedded compressed payload
and performs the following steps:
Checks for a possible newer version of the clusterware-installer RPM. If one is found, then the script will update the local RPM installation and execute the newer
scyld-install
script with the same arguments. An optional argument--skip-version-check
bypasses this check.An optional argument
--yum-repo /tmp/clusterware.repo
re-installs a yum repo file to/etc/yum.repos.d/clusterware.repo
. This is unnecessary if/etc/yum.repos.d/clusterware.repo
already exists and is adequate.Checks whether the clusterware RPM is installed.
Confirms the system meets various minimum requirements.
Installs the clusterware RPM and its supporting RPMs.
Copies a customized Telegraf configuration file to
/etc/telegraf/telegraf.conf
Enables the tftpd service in
xinetd
for PXE booting.Randomizes assorted security-related values in
/opt/scyld/clusterware/conf/base.ini
Sets the current user account as a ClusterWare administrator in
/opt/scyld/clusterware/conf/base.ini
. If this is intended to be a production cluster, then the system administrator should create additional ClusterWare administrator accounts and clear this variable. For details on this and other security related settings, including adding ssh keys to compute nodes, please see the Installation & Administrator Guide section Securing the Cluster.Modifies
/etc/yum.repos.d/clusterware.repo
to changeenabled=1
toenabled=0
. Subsequent executions ofscyld-install
to update ClusterWare will temporarily (and silently) re-enable the ClusterWare repo for the duration of that command. This is done to avoid inadvertent updates of ClusterWare packages if and when the clusterware administrator executes a more generalyum install
oryum update
intending to add or update the base distribution packages.
Then scyld-install
uses systemd
to enable and start firewalld
,
and opens ports for communication between head nodes as required by etcd
(or Couchbase).
See Services, Ports, Protocols for details.
Once the ports are open, scyld-install
initializes the ClusterWare
database and enables and starts the following services:
httpd: The Apache HTTP daemon that runs the ClusterWare service and proxies Chronograf and Kapacitor.
xinetd: Provides network access to tftp for PXE booting.
Telegraf: Collects head node performance data and feeds into InfluxDB.
InfluxDB: Stores node performance and status data for visualization in Chronograf.
Chronograf: Displays the head node and compute node status data through a web interface.
Kapacitor: The eventing software that works with Chronograf.
The script then:
Opens ports in
firewalld
for public access to HTTP, HTTPS, TFTP, iSCSI, and incoming Telegraf UDP messages.Installs and configures the cluster administrator's clusterware-tools package (unless it was executed with the
--no_tools
option).Configures the cluster administrator's
~/.scyldcw/settings.ini
to access the newly installed ClusterWare service using thescyld-tool-config
tool.Creates an initial simple boot image DefaultImage, boot config DefaultBoot, and attributes DefaultAttribs using the
scyld-add-boot-config
tool.Loads the cluster configuration specified on the command line using the
scyld-cluster-conf load
command.Restarts the httpd service to apply the loaded cluster configuration.
Important
See the Node Images and Boot Configurations for details about how to modify existing boot images, create new boot images, and associate specific boot images and attributes with specific compute nodes. We strongly recommend not modifying or removing the initial DefaultImage, but rather cloning that basic image into a new image that gets modified further, or just creating new images from scratch.
Important
If you wish to ensure that the latest packages are installed in
the image after the scyld-install
,
then execute scyld-modimg -i DefaultImage --update --overwrite --upload
.
Important
See Common Additional Configuration and Software for additional optional cluster configuration procedures, e.g., installing and configuring a job scheduler, installing and configuring one of the MPI family software stacks.
Important
If this initial scyld-install
does not complete successfully,
or if you want to begin the installation anew,
then when/if you re-run the script, you should cleanse the partial,
potentially flawed installation by adding the --clear
argument, e.g.,
scyld-install --clear --config /tmp/cluster-conf
.
If that still isn't sufficient, then
scyld-install --clear-all --config /tmp/cluster-conf
does a more
complete clearing, then reinstalls all the ClusterWare packages.
Due to licensing restrictions, when running on a Red Hat RHEL system,
the installer will still initially create a CentOS compute
node image as the DefaultImage.
If after this initial installation a cluster administrator wishes to instead
create compute node images based on RHEL,
then use the scyld-clusterctl repos
tool as described in
Appendix: Creating Arbitrary RHEL Images,
and create a new image (e.g., DefaultRHELimage) to use as a new default.
Configure additional cluster administrators¶
The ClusterWare administrator's command-line tools are found in the
clusterware-tools package.
which is installed by default on the head node by scyld-install
.
It can be additionally installed on any system that has HTTP (or HTTPS,
see Securing the Cluster) access to a
ClusterWare head node in the cluster.
To install these tools on a machine other than the head node,
login to that other system, copy /etc/yum.repos.d/clusterware.repo
from a head node to the same location on this system,
then execute:
sudo yum install clusterware-tools
Once the tools are installed, each administrator must configure a
connection to the ClusterWare service, which is controlled by
variables in the user's ~/.scyldcw/settings.ini
file. The
scyld-tool-config
tool script is provided by the
clusterware-tools package. The contents of the settings.ini
file are discussed in the Reference Guide. Running that tool and
answering the on-screen questions will generate a settings.ini
file, although administrators of more advanced cluster configurations may
need to manually add or edit additional variables.
Once the settings.ini
is created, you can test your connection by
running a simple node query:
scyld-nodectl ls
This query may complain at this time that no nodes exist
or no nodes are selected,
although such a complaint does verify that the requesting node can
properly communicate with a head node database.
However, if you see an error resembling the one below,
check your settings.ini
contents and your network configuration:
Failed to connect to the ClusterWare service. Please check that the
service is running and your base_url is set correctly in
/home/adminuser/.scyldcw/settings.ini or on the command line.
The connection URL and username can also be overridden for an individual
program execution using the --base-url
and --user
options available
for all scyld-*
commands.
The settings.ini
file generated by scyld-install
will
contain a blank client.authpass variable.
This is provided for convenience during installation,
though for production clusters the system administrator will want to
enforce authentication restrictions.
See details in Securing the Cluster.