Appendix: Using Docker for Head Nodes

A RHEL/CentOS-clone server can use the scyld-install tool to install Scyld ClusterWare to become a ClusterWare head node. This appendix describes an alternative approach that allows a server to use a ClusterWare Docker container that Penguin Computing has already built as a basic head node image, thereby avoiding the need for a cluster administrator to install and configure ClusterWare from scratch.

Note

ClusterWare also supports containers running on the compute nodes, allowing each node to act as a Docker host for running containers. See Appendix: Using Docker for Compute Nodes.

Install the foundational packages

The ClusterWare Docker container approach first requires installing the docker and clusterware-tools packages. For the latter package you need to set up /etc/yum.repos.d/clusterware.repo in order to access the Penguin Computing repo. Instructions for how to do that can be found at the beginning of the same chapter (Installation and Upgrade of Scyld ClusterWare) that describes how to perform an initial install of a full ClusterWare.

Once clusterware.repo is in place, then you can install the packages necessary for the Docker container approach:

sudo yum install docker clusterware-tools

The clusterware-tools package contains the various scyld- commands, including /usr/bin/scyld-containerctl, which is referenced below. Knowledgeable Docker administrators may wish to use the standard Docker tools.

Note

The podman container management system can be used in place of docker if desired.

Download and load the ClusterWare Docker image

First download a copy of a pre-built Docker image onto the server that is appropriate for the head node you wish to create. For example, visit https://updates.penguincomputing.com/clusterware/12/el8/container/ (with appropriate authentication) and view the available containers compatible with a RHEL/CentOS 8 base distribution that is already installed on the Docker host server. Suppose you choose clusterware-12.1.0-g0000 to download. You can validate the downloaded file using the same general method used to validate a downloaded ISO (see Appendix: Validating ClusterWare ISOs).

Load the downloaded image into the Docker image registry:

scyld-containerctl img-load clusterware-12.1.0-g0000

which will show several progress bars such as "Loaded image: localhost/clusterware:12.1.0-g0000". After loading, see just the ClusterWare image using:

scyld-containerctl img-ls

or see all Docker images using:

docker image list

Start the container

Start the ClusterWare head node container on the Docker host server:

scyld-containerctl start

which creates a new storage directory for persisting the data (by default named cw_container_storage), then creates the container itself and starts it executing. You can verify that the container is executing using:

scyld-containerctl status

which will show only clusterware containers. To see all Docker containers:

docker ps

Configure the container

The container needs to contain at least one admin account. For an admin account already defined on the Docker host, you can directly reference that admin's ssh key file with @ prepended to the admin's public key file name, e.g.,:

scyld-containerctl add-admin admin1 @/home/admin1/.ssh/id_dsa.pub

For an admin not defined on the Docker host, you will need a copy of the admin's id_dsa.pub file contents. You should include that <ssh-key> string on the command line enclosed in quotes to ensure that spaces and other characters are sent appropriately. For example, for admin admin2:

scyld-containerctl add-admin admin2 'ssh-rsa AAA..1A2B3C='

Note that the ssh key should end with an equals (=) sign and an optional email address.

It may be helpful to set the root password of the container to a new, known value -- this would allow access to the web-UI, for example. Use the root-pass action:

scyld-containerctl root-pass

The system will prompt for a new password, and ask for it a second time to confirm. The root-pass action will also print out the database password which would be needed for configuring Grafana monitoring (see monitoring_grafana).

Now configure the clusterID in the container with the customer's cluster authentication token so that it has access to the ClusterWare repo:

scyld-containerctl cluster-id <AUTH_TOKEN>

Now configure the use of ClusterWare tools using:

[root@rocky4 ~]# scyld-containerctl tool-config

which will attempt to find a "good" IP address for this Docker host to communicate with the private cluster network, although the tool may be confused if there are multiple network interfaces.

The tool writes results to stdout; for example:

ClusterWare tools will attempt to contact ssh-agent to get the
user's authentication key. It may be worthwhile for users to run:
    eval `ssh-agent` ; ssh-add

A potential .scyldcw/settings.ini file is below:

[ClusterWare]
client.base_url = https://10.54.0.123/api/v1
client.authuser = root
client.sslverify = quiet

Validate the proposed settings.ini lines, modify if needed, and write to ~/.scyldcw/settings.ini. This user's settings.ini file can be sent to each admin that has been added to the container, who can use that file for their own ~/.scyldcw/settings.ini after modifying the client.authuser = <username> line with their own username.

Each user will need to execute ssh-agent on the Docker host server at login to allow ClusterWare to authenticate that user's access to the scyld-* tools:

eval `ssh-agent` ; ssh-add

With ssh-agent running, an admin user can now execute ClusterWare commands. First try:

scyld-nodectl ls

If that authentication was successful, then because initially there are no nodes configured for the container, the above command should report ERROR: No nodes found, nothing was done and thus verifies the admin's proper access.

Since the container initially has no images or boot configurations by default, we can create them as with any other ClusterWare installation by executing:

scyld-add-boot-config --make-defaults

Similarly, the container initially has no defined networks or nodes defined, so those also need to be defined. For example, create a cluster config file called cluster.conf:

cat <<-EOF >cluster.conf
iprange 192.168.122.100/24
node 52:54:00:00:00:01
node 52:54:00:00:00:02
EOF

which defines a cluster network of 192.168.122.100/24 that services two compute nodes on that network with the given MAC addresses. Now configure the head node with that config file:

scyld-cluster-conf load cluster.conf

You can confirm the configuration with scyld-nodectl ls -l, which should return node names n0 and n1 with IP addresses in the specified range.

Stopping and restarting the container

To stop the ClusterWare container:

scyld-containerctl stop

The output will give the name of the storage directory and image-version information. It will also give an example command to restart this container without loss of data, e.g., by executing:

scyld-containerctl start cw_container_storage clusterware:12.0.0-g0000

Note

The ClusterWare container may take more time to shutdown than Docker usually expects and may show a time-out warning. This is just a warning. The container will in fact be stopped, which you can confirm with scyld-containerctl status or docker ps.

If you are using the default storage location cw_container_storage and image version name, then you can restart the head node without loss of data by using the shorter:

scyld-containerctl start

From an admin account, tools like scyld-nodectl ls should now work, and any nodes that were previously defined will still be present.

The Container Storage Area

The container storage directory will become populated with copies of several directories from inside the container. Most of this data will be opaque and should not be tampered with. The logs/ directory, however, might be of use in helping to debug or triage problems.

Known Issues

  • Depending on how the container manager is configured, the ClusterWare container may need extra networking privileges. In particular, user-created containers may not be allowed to access network ports below 1024. If syslog shows messages like:

    httpd: Permission denied: AH00072: make_sock: could not bind to address [::]:80

    then admins may need to configure the container-host machine to allow users to access lower-numbered ports. One can insert a new config file into /etc/sysctl.d to permanently lower the starting point for "unprivileged" ports. Since ClusterWare needs access to DNS/port 53, the following will create the necessary file:

    echo net.ipv4.ip_unprivileged_port_start = 53 | sudo tee /etc/sysctl.d/90-unprivileged_port_start.conf

    A reboot of the container-host will be needed to load the new sysctl configuration.