Managing Multiple Head Nodes

ClusterWare supports optional active-active(-active....) configurations of multiple cooperating head nodes that share a single replicated database. Such multi-headnode configurations allow any head node to provide services for any compute node in the cluster. These services include cluster configuration using scyld-* tools, compute node booting and power control, and compute node status collection.

Currently the ClusterWare database requires a minimum of three cooperating head nodes to support full High Availability ("HA") in the event of head node failures. Couchbase HA can operate fully with two head nodes, albeit being unable to sustain full functionality with both head nodes failing concurrently. The etcd HA works in a limited manner with just two head nodes.

The command line tools provided by ClusterWare for head node management are intended to cover the majority of common cases, although head nodes using Couchbase can also be managed through the Couchbase console where more complicated recovery scenarios can be handled.

Adding A Head Node

After installing the first head node as described in Initial Installation of Scyld ClusterWare, additional head nodes can be installed and joined with the other cooperating head nodes using the same scyld-install tool or using curl.

On an existing head node view its database password:

sudo grep database.admin_pass /opt/scyld/clusterware/conf/base.ini

Join a non-ClusterWare server using scyld-install

For a non-ClusterWare server, you can use ClusterWare to join an existing head node that is identified by its IP address IP_HEAD:

scyld-install --yum-repo clusterware.repo --database-passwd <DBPASS> --join <IP_HEAD>

where DBPASS is IP_HEAD's database password. If the --database-passwd is not provided as a scyld-install argument, then scyld-install queries the administrator interactively for that password.

The database type on all joined head nodes must be the same type. If needed, prefix scyld-install with (for example) DB_RPM=clusterware-couchbase to override the etcd database default if the existing head node(s) use Couchbase.

scyld-install needs to access a clusterware.repo file in order to install ClusterWare packages. This clusterware.repo file is created or accessed via one of the supported methods: a filename passed as a scyld-install --yum-repo argument that gets copied to /etc/yum.repos.d/clusterware.repo, an already existing /etc/yum.repos.d/clusterware.repo, or a file created by scyld-install using an authentication string that is passed as the scyld-install --token argument or when queried interactively by scyld-install.

scyld-install installs ClusterWare, replacing the local base.ini password with IP_HEAD's DBPASS password, then interacts with IP_HEAD to join with it and any other head nodes already joined.

A cluster configuration file is not required when joining a server to a head node because those settings are obtained from the existing head node's cluster database.

Join a ClusterWare head node using scyld-install

Alternatively, a "solo" ClusterWare head node can join an existing group of joined nodes.

Important

The join action discards the "solo" head node's current images and boot configs, then finishes leaving the "solo" head node with access to just the cooperating head nodes' images and boot configs. If you want to save any images or configs, then first use scyld-bootctl export (Copying boot configurations between head nodes) or managedb save.

Important

The joining head node must be using the same database type as the existing head node(s).

For example, to join a ClusterWare head node:

scyld-install -u --database-passwd <DBPASS> --join <IP_HEAD>

Just as when joining a non-ClusterWare server, if no --database-passwd is provided as an argument, then scyld-install queries the administrator interactively for IP_HEAD's database password.

Join a non-ClusterWare server using curl

The cluster administrator can use curl to download a modified version of the installer script with necessary parameters already embedded, and then pipe that script to bash to execute. This installs ClusterWare as a head node on the joining server and joins the cluster:

curl http://<IP_HEAD>/api/v1/install/head?passwd=<DBPASS> | bash

where IP_HEAD and DBPASS are described above.

Join a ClusterWare head node using curl

Important

The join action discards the "solo" head node's current images and boot configs, then finishes leaving the "solo" head node with access to just the cooperating head nodes' images and boot configs. If you want to save any images or configs, then first use scyld-bootctl export (Copying boot configurations between head nodes) or managedb save.

Alternatively, a "solo" ClusterWare head node can join an existing group of joined nodes using curl in the same manner a non-ClusterWare server joins a cluster, albeit with the additional HEADNODE_JOIN=y variable that passes to the downloaded installer script being executed by bash:

curl http://<IP_HEAD>/api/v1/install/head?passwd=<DBPASS> | HEADNODE_JOIN=y bash

where IP_HEAD and DBPASS are described above.

After a Join

Important

Every head node must know the hostname and IP address of every other head node, either by having those hostnames in each head node's /etc/hosts or by having their common DNS server know all the hostnames. Additionally, if using head nodes as default routes for the compute nodes, as described in Configure IP Forwarding, then ensure that all head nodes are configured to forward IP traffic preferably over the same routes.

Important

Every head node should use a common network time-sync protocol. The Red Hat RHEL default is chronyd (found in the chrony package), although ntpd (found in the ntp package) continues to be available.

After a Join, you should restart the clusterware service on all joined head nodes.

Subsequent head node software updates are also accomplished by executing scyld-install -u. We recommend that all cooperating head nodes update to a common ClusterWare release. In rare circumstance a newer ClusterWare release on the head nodes also requires a compatible newer clusterware-node package in each compute node image. Such a rare coordinated update will be documented in the Release Notes and Changelog.

Removing a Joined Head Node

A list of connected head nodes can be seen with:

sudo /opt/scyld/clusterware/bin/managedb --heads

with more information visible doing:

scyld-clusterctl heads ls -l

For a cluster with two or more head nodes using a Couchbase database, or a cluster with three or more head nodes using an etcd database, you can remove one of the head nodes by doing:

sudo /opt/scyld/clusterware/bin/managedb leave

Or if that head node is shut down, then from another head node in the cluster doing:

sudo /opt/scyld/clusterware/bin/managedb eject <IP_HEAD_TO_REMOVE>

The now-detached head node will no longer have access to the shared database and will be unable to execute any scyld-* command, as those require a database. Either re-join the previous cluster:

sudo /opt/scyld/clusterware/bin/managedb join <IP_HEAD>

or join another cluster after updating the local /opt/scyld/clusterware/conf/base.ini database.admin_pass to the other cluster's database password:

sudo /opt/scyld/clusterware/bin/managedb join <IP_OTHER_HEADNODE>

or performing a fresh ClusterWare install by removing the current ClusterWare and continuing with a reinstall:

scyld-install --clear-all --config <CLUSTER_CONFIG>

However, for a cluster with only two head nodes using an etcd database, you cannot managedb eject or managedb leave, and instead must execute:

sudo /opt/scyld/clusterware/bin/managedb recover

on both head nodes. This severs each head node from their common coordinated access to the database.

Important

Keep in mind that following the managedb recover, both head nodes have autonomous and unsynchronized access to the now-severed database that manages the same set of compute nodes, which means that both will compete for "ownership" of the same booting compute nodes.

To avoid both head nodes competing for the same compute nodes, either execute sudo systemctl stop clusterware on one of the head nodes, or perform one of the steps described above to re-join this head node to the other head node that previously shared the same database, or join another head node, or perform a fresh ClusterWare install.

Configuring Support for Database Failover

When planning a multi-head cluster for true High Availability, a cluster administrator should allocate three or more head nodes. In this configuration, if one head node fails, then the database service on the remaining head nodes can be configured to automatically eject the failed node and recover with at most a short interruption in service. After one head node has failed, the cluster administrator must reset the auto-failover mechanism to avoid a single failure causing cascading ejections.

Complicated Couchbase recovery scenarios are managed by the cluster administrator interacting with the database console through a web browser: localhost:8091/ui.

The console username is root and the password can be found in the database.admin_pass variable in /opt/scyld/clusterware/conf/base.ini. Extensive documentation for this Couchbase console is available online on the Couchbase website: https://docs.couchbase.com/home/index.html. ClusterWare Couchbase is currently version 5.1.3.

To enable automatic failure of a head node in a multiple head node configuration, access the Couchbase console and click on Settings in the menu on the left side of the initial Dashboard window, then click on Auto-Failure in the horizontal list across the top of the Settings window. Then select Enable auto-failure and enter a preferred Timeout value, e.g., a default of 120 seconds. Finally, click the Save button.

In the discouraged dual-head configuration, a head node has no means to distinguish between a network bifurcation and the other node actually failing. To avoid a split-brain situation, the remaining head node must be explicitly told to take over for the failed node using the managedb eject command. Head node provided services will be interrupted until this ejection is triggered.

Shared Storage and Peer Downloads

Multi-head clusters can be configured to use shared storage among the head nodes, but by default each head will use its own local storage to keep a copy of each uploaded or requested file. The storage location is defined in /opt/scyld/clusterware/conf/base.ini by the local_files.path variable, and it defaults to /opt/scyld/clusterware/storage/.

Whenever a ClusterWare head node is asked for a file such as a kernel, the expected file size and checksum are retrieved from the database. If the file exists in local storage and has the correct size and checksum, then that local file will be provided. However, if the file is missing or incorrect, then the head node attempts to retrieve the correct file from a peer.

Note that local files whose checksums do not match will be renamed with a .old.NN extension, where NN starts at 00 and increases up to 99 with each successive bad file. This ensures that in the unlikely event that the checksum in the database is somehow corrupted, the original file can be manually restored.

Peer downloading consists of the requesting head node retrieving the list of all head nodes from the database and contacting each in turn in random order. The first peer that confirms that it has a file with the correct size provides that file to the requesting head node. The checksum is computed during the transfer, and the transferred file is discarded if that checksum is incorrect. Contacted peers will themselves not attempt to download the file from other peers in order to avoid having a completely missing file trigger a cascade.

After a successful peer download, the original requester receives the file contents after a delay due to the peer download process. If the file cannot be retrieved from any head node, then the original requester will receive a HTTP 404 error.

This peer download process can be bypassed by providing shared storage among head nodes. Such storage should either be mounted at the storage directory location prior to installation, or the /opt/scyld/clusterware/conf/base.ini should be updated with the non-default pathname immediately after installation of each head node. Remember to restart the clusterware service after modifying the base.ini file by executing sudo systemctl restart clusterware, and note that the systemd clusterware.service is currently an alias for the httpd.service.

When a boot configuration or image is deleted from the cluster, the deleting head node will remove the underlying file(s) from its local storage. That head node will also temporarily move the file's database entry into a deleted files list that other head nodes periodically check and delete matching files from their own local storage. If the clusterware service is not running on a head node when a file is marked as deleted, then that head node will not be able to delete the local copy. When the service is later restarted, it will see its local file is now no longer referenced by the database and will rename it with the .old.NN extension described earlier. This is done to inform the administrator that these files are not being used and can be removed, although cautious administrators may wish to keep these renamed files until they confirm all node images and boot configurations are working as expected.

Booting With Multiple Head Nodes

Since all head nodes are connected to the same private cluster network, any compute node's DHCP request will receive offers from all the head nodes. All offers will contain the same IP address by virtue of the fact that all head nodes share the same MAC-to-IP and node index information in the replicated database. The PXE client on the node accepts one of the DHCP offers, which is usually the first received, and proceeds to boot with the offering head node as its "parent head node". This parent head node provides the kernel and initramfs files during the PXE process, and provides the root file system for the booting node, all of which should also be replicated in /opt/scyld/clusterware/storage/ (or in the alternative non-default location specified in /opt/scyld/clusterware/conf/base.ini).

On a given head node you can determine the compute nodes for which it is the parent by examining the head node /var/log/clusterware/head_* or /var/log/clusterware/api_error_log* files for lines containing "Booting node". On a given compute node you can determine its parent by examining the node's /etc/hosts entry for parent-head-node.

Once a node boots, it asks its parent head node for a complete list of head nodes, and then thereafter the node sends periodic status information to its parent head node at the top of the list. If at any point that parent head node does not respond to the compute node's status update, then the compute node chooses a new parent by rotating its list of available head nodes by moving the unresponsive parent to the bottom of the list and moving the second node in the list up to the top of the list as the new parent.

The administrator can force compute nodes to re-download the head node list by executing scyld-nodectl script fetch_hosts and specifying one or more compute nodes. The administrator can also refresh the SSH keys on the compute node using scyld-nodectl script update_keys.

Switching To Alternative ClusterWare Database

To switch from couchbase to etcd or vice versa, see Switching Between Databases.