Managing Multiple Head Nodes¶
ClusterWare supports optional active-active(-active....) configurations
of multiple cooperating head nodes that share a single replicated database.
Such multi-headnode configurations allow any head node to
provide services for any compute node in the cluster. These services
include cluster configuration using scyld-*
tools, compute node
booting and power control, and compute node status collection.
The ClusterWare etcd database requires a minimum of three cooperating head nodes to support full High Availability ("HA") in the event of head node failures. The etcd HA works in a limited manner with just two head nodes. The command-line tools provided by ClusterWare for head node management are intended to cover the majority of common cases.
The ClusterWare Couchbase HA (now deprecated) can operate fully with two head nodes, albeit being unable to sustain full functionality with both head nodes failing concurrently. Head nodes using Couchbase can similarly be managed through those same command-line tools as well as through the Couchbase console, which can handle more complicated recovery scenarios.
Adding A Head Node¶
After installing the first head node as described in
Installation and Upgrade of Scyld ClusterWare,
additional head nodes can be installed and joined with the other cooperating
head nodes using the same scyld-install
tool or using curl
.
On an existing head node view its database password:
sudo grep database.admin_pass /opt/scyld/clusterware/conf/base.ini
Join a non-ClusterWare server¶
A non-ClusterWare server can use scyld-install
to join another head
node (identified by its IP address IP_HEAD) that may itself already be
joined to other head nodes.
You can download scyld-install
from the Penguin repo
https://updates.penguincomputing.com/clusterware/12/installer/scyld-install or
https://updates.penguincomputing.com/clusterware-el8/12/installer/scyld-install
without needing a cluster ID,
or if you already have a /etc/yum.repos.d/clusterware.repo
installed,
then you can download the clusterware-installer package,
which includes scyld-install
.
Then:
scyld-install --database-passwd <DBPASS> --join <IP_HEAD>
where DBPASS is IP_HEAD's database password, as described above.
If no --database-passwd
is provided as an argument, then scyld-install
queries the administrator interactively for IP_HEAD's database password.
scyld-install
doing a join will install ClusterWare using the same
clusterware.repo and database type being used by the head node at IP_HEAD.
A cluster configuration file is not required when joining a server to a head node because those settings are obtained from the existing head node's cluster database after the join successfully completes.
Join a ClusterWare head node¶
A "solo" ClusterWare head node can use scyld-install
to join another
head node (identified by its IP address IP_HEAD) that may itself already
be joined to other head nodes.
Important
The join action discards the "solo" head node's current
images and boot configs, then finishes leaving the "solo" head node
with access to just the cooperating head nodes' images and boot configs.
If you want to save any images or configs, then first use
scyld-bootctl export
(Copying boot configurations between head nodes)
or managedb save
.
Important
The joining head node must be using the same database type as the existing head node(s). A head node using the Couchbase database must convert to using etcd. See Appendix: Switching Between Databases.
For example, to join a ClusterWare head node:
scyld-install --update --database-passwd <DBPASS> --join <IP_HEAD>
When a ClusterWare 12 head node joins another ClusterWare 12 head node,
scyld-install
performs a mandatory update of the current ClusterWare using
IP_HEAD's clusterware.repo prior to joining that IP_HEAD.
This ensures that ClusterWare (and scyld-install
) are executing compatible
ClusterWare software.
However, the ClusterWare 11 version of scyld-install
will not automatically
perform this mandatory update of 11 to 12 and will just update the joining head
node to the newest version of ClusterWare 11.
Penguin Computing recommends first updating the ClusterWare 11 head node to 12
(following the guidance of Updating ClusterWare 11 to ClusterWare 12),
and then using the ClusterWare 12 scyld-install
to perform the join.
Just as when joining a non-ClusterWare server,
if no --database-passwd
is provided as an argument, then scyld-install
queries the administrator interactively for IP_HEAD's database password.
After a Join¶
Important
Every head node must know the hostname and IP address of every
other head node, either by having those hostnames in each head node's
/etc/hosts
or by having their common DNS server know all the hostnames.
Additionally, if using head nodes as default routes for the compute nodes,
as described in Configure IP Forwarding,
then ensure that all head nodes are configured to forward IP traffic
preferably over the same routes.
Important
Every head node should use a common network time-sync protocol.
The Red Hat RHEL default is chronyd
(found in the chrony package),
although ntpd
(found in the ntp package) continues to be available.
After a Join, you should restart the clusterware service on all joined head nodes.
Subsequent head node software updates are also accomplished by executing
scyld-install -u
.
We recommend that all cooperating head nodes update to a common
ClusterWare release.
In rare circumstance a newer ClusterWare release on the head nodes
also requires a compatible newer clusterware-node package in each compute
node image.
Such a rare coordinated update will be documented in the Release Notes
and Changelog & Known Issues.
Removing a Joined Head Node¶
A list of connected head nodes can be seen with:
sudo /opt/scyld/clusterware/bin/managedb --heads
with more information visible doing:
scyld-clusterctl heads ls -l
For a cluster with two or more head nodes using a Couchbase database, or a cluster with three or more head nodes using an etcd database, you can remove one of the head nodes by doing:
sudo /opt/scyld/clusterware/bin/managedb leave
Or if that head node is shut down, then from another head node in the cluster doing:
sudo /opt/scyld/clusterware/bin/managedb eject <IP_HEAD_TO_REMOVE>
The now-detached head node will no longer have access to the shared database and
will be unable to execute any scyld-*
command, as those require a database.
Either re-join the previous cluster:
sudo /opt/scyld/clusterware/bin/managedb join <IP_HEAD>
or join another cluster after updating the local
/opt/scyld/clusterware/conf/base.ini
database.admin_pass to the other
cluster's database password:
sudo /opt/scyld/clusterware/bin/managedb join <IP_OTHER_HEADNODE>
or performing a fresh ClusterWare install by removing the current ClusterWare and continuing with a reinstall:
scyld-install --clear-all --config <CLUSTER_CONFIG>
However, for a cluster with only two head nodes using an etcd database,
you cannot managedb eject
or managedb leave
,
and instead must execute:
sudo /opt/scyld/clusterware/bin/managedb recover
on both head nodes. This severs each head node from their common coordinated access to the database.
Important
Keep in mind that following the managedb recover
,
both head nodes have autonomous and unsynchronized access to the now-severed
database that manages the same set of compute nodes, which means that both
will compete for "ownership" of the same booting compute nodes.
To avoid both head nodes competing for the same compute nodes,
either execute sudo systemctl stop clusterware
on one of the head nodes,
or perform one of the steps described above to re-join this head node to the
other head node that previously shared the same database,
or join another head node,
or perform a fresh ClusterWare install.
Configuring Support for Database Failover¶
When planning a multi-head cluster for true High Availability, a cluster administrator should allocate three or more head nodes. In this configuration, if one head node fails, then the database service on the remaining head nodes can be configured to automatically eject the failed node and recover with at most a short interruption in service. After one head node has failed, the cluster administrator must reset the auto-failover mechanism to avoid a single failure causing cascading ejections.
Complicated Couchbase recovery scenarios are managed by the cluster
administrator interacting with the database console through a web browser:
localhost:8091/ui
.
The console username is root and the password can be found in the
database.admin_pass variable in /opt/scyld/clusterware/conf/base.ini
.
Extensive documentation for
this Couchbase console is available online on the Couchbase website:
https://docs.couchbase.com/home/index.html.
ClusterWare Couchbase is currently version 5.1.3.
To enable automatic failure of a head node in a multiple head node configuration, access the Couchbase console and click on Settings in the menu on the left side of the initial Dashboard window, then click on Auto-Failure in the horizontal list across the top of the Settings window. Then select Enable auto-failure and enter a preferred Timeout value, e.g., a default of 120 seconds. Finally, click the Save button.
In the discouraged dual-head configuration, a head node has no means to
distinguish between a network bifurcation and the other node actually
failing. To avoid a split-brain situation, the remaining head node must
be explicitly told to take over for the failed node using the
managedb eject
command. Head node provided services will be
interrupted until this ejection is triggered.
Booting With Multiple Head Nodes¶
Since all head nodes are connected to the same private cluster network,
any compute node's DHCP request will receive offers from all the head nodes.
All offers will contain the same IP address by virtue of the fact that all
head nodes share the same MAC-to-IP and node index information
in the replicated database.
The PXE client on the node accepts one of the DHCP offers,
which is usually the first received,
and proceeds to boot with the offering head node as its "parent head node".
This parent head node provides the kernel and initramfs
files during the PXE process, and provides the root file system for the
booting node, all of which should also be replicated in
/opt/scyld/clusterware/storage/
(or in the alternative non-default location
specified in /opt/scyld/clusterware/conf/base.ini
).
On a given head node you can determine the compute nodes for which it is the
parent by examining the head node /var/log/clusterware/head_*
or
/var/log/clusterware/api_error_log*
files for lines containing
"Booting node".
On a given compute node you can determine its parent by examining the node's
/etc/hosts
entry for parent-head-node
.
Once a node boots, it asks its parent head node for a complete list of head nodes, and then thereafter the node sends periodic status information to its parent head node at the top of the list. If at any point that parent head node does not respond to the compute node's status update, then the compute node chooses a new parent by rotating its list of available head nodes by moving the unresponsive parent to the bottom of the list and moving the second node in the list up to the top of the list as the new parent.
The administrator can force compute nodes to re-download the head node list
by executing scyld-nodectl script fetch_hosts
and specifying one or more
compute nodes.
The administrator can also refresh the SSH keys on the compute node using
scyld-nodectl script update_keys
.
Clusters of 100 nodes or more benefit from having each head node being a parent to roughly the same number of compute nodes. Each head node periodically computes the current mean number of nodes per head, and if a head node parents significantly more (e.g., >20%) nodes than the mean, then the head node triggers some of its nodes to use another head node. Care is taken to avoid unnecessary shuffling of compute nodes. The use of the _preferred_head attribute may create an imbalance that this rebalancing cannot remedy.
Copying boot configurations between head nodes¶
A multiple head node cluster contains cooperating head nodes that share a replicated database and transparent access to peer boot configurations, kernel images, and initramfs files. See Managing Multiple Head Nodes for details. There is no need to manually copy boot configs between these head nodes.
However, it may be useful to copy boot configurations from a head node that controls one cluster to another head node that controls a separate cluster, thereby allowing the same boot config to be employed by compute nodes in the target cluster. On the source head node the administrator "exports" a boot config to create a single all-inclusive self-contained file that can be copied to a target head node. On the target head node the administrator "imports" that file into the local cluster database, where it merges with the local head node's existing configs, images, and files.
Important
Prior to exporting/importing a boot configuration,
you should determine if the boot config and kernel image names on the source
cluster already exist on the target cluster.
For example, for a boot configuration named xyzBoot, execute
scyld-bootctl -i xyzBoot ls -l
on the source head node to view the
boot config name xyzBoot and note its image name, e.g., xyzImage.
Then on the target head node execute
scyld-bootctl ls -l | egrep "xyzBoot|xyzImage"
to determine if duplicates exist.
If any name conflict exists, then either (1) on the source head node create or clone a new uniquely named boot config associated with a uniquely named image, then export that new boot config, or (2) on the target head node import the boot config using optional arguments, as needed, to assign unique name or names.
To export the boot configuration xyzBoot:
scyld-bootctl -i xyzBoot export
which creates the file xyzBoot.export
.
If there is no name conflict(s) with the target cluster,
then on the target head node import with:
scyld-bootctl import xyzImage.export
If there is a name conflict with the image name, then perform the import with the additional argument to rename the imported image:
scyld-bootctl import xyzImage.export --image uniqueImg
or import the boot config without importing its embedded image at all (and later associate a new image with this imported boot config):
scyld-bootctl import xyzImage.export --no-recurse
If there is a name conflict with the boot config name itself, then add:
scyld-bootctl import xyzImage.export --boot-config uniqueBoot
Associate a new image name to the imported boot config if desired, then associate the boot config with the desired compute node(s):
scyld-nodectl -i <NODES> set _boot_config=xyzBoot