Appendix: Using Docker for Head Nodes¶
A RHEL/CentOS-clone server can use the scyld-install
tool to install Scyld ClusterWare to
become a ClusterWare head node.
This appendix describes an alternative approach that allows a server to use a
ClusterWare Docker container that Penguin Computing has already built as a
basic head node image,
thereby avoiding the need for a cluster administrator to install and configure
ClusterWare from scratch.
Note
ClusterWare also supports containers running on the compute nodes, allowing each node to act as a Docker host for running containers. See Appendix: Using Docker for Compute Nodes.
Install the foundational packages¶
The ClusterWare Docker container approach first requires installing the
docker and clusterware-tools packages.
For the latter package you need to set up /etc/yum.repos.d/clusterware.repo
in order to access the Penguin Computing repo.
Instructions for how to do that can be found at the beginning of the same
chapter (Installation and Upgrade of Scyld ClusterWare) that describes how to perform an
initial install of a full ClusterWare.
Once clusterware.repo
is in place, then you can install the packages
necessary for the Docker container approach:
sudo yum install docker clusterware-tools
The clusterware-tools package contains the various scyld-
commands,
including /usr/bin/scyld-containerctl
, which is referenced below.
Knowledgeable Docker administrators may wish to use the standard Docker tools.
Note
The podman container management system can be used in place of docker if desired.
Download and load the ClusterWare Docker image¶
First download a copy of a pre-built Docker image onto the server that is
appropriate for the head node you wish to create.
For example,
visit https://updates.penguincomputing.com/clusterware/12/el8/container/
(with appropriate authentication) and view the available containers compatible
with a RHEL/CentOS 8 base distribution that is already installed
on the Docker host server.
Suppose you choose clusterware-12.1.0-g0000
to download.
You can validate the downloaded file using the same general method used to
validate a downloaded ISO (see Appendix: Validating ClusterWare ISOs).
Load the downloaded image into the Docker image registry:
scyld-containerctl img-load clusterware-12.1.0-g0000
which will show several progress bars such as "Loaded image: localhost/clusterware:12.1.0-g0000". After loading, see just the ClusterWare image using:
scyld-containerctl img-ls
or see all Docker images using:
docker image list
Start the container¶
Start the ClusterWare head node container on the Docker host server:
scyld-containerctl start
which creates a new storage directory for persisting the data
(by default named cw_container_storage
),
then creates the container itself and starts it executing.
You can verify that the container is executing using:
scyld-containerctl status
which will show only clusterware
containers.
To see all Docker containers:
docker ps
Configure the container¶
The container needs to contain at least one admin account.
For an admin account already defined on the Docker host,
you can directly reference that admin's ssh key file with @
prepended
to the admin's public key file name, e.g.,:
scyld-containerctl add-admin admin1 @/home/admin1/.ssh/id_dsa.pub
For an admin not defined on the Docker host,
you will need a copy of the admin's id_dsa.pub
file contents.
You should include that <ssh-key> string on the command line enclosed in
quotes to ensure that spaces and other characters are sent appropriately.
For example, for admin admin2:
scyld-containerctl add-admin admin2 'ssh-rsa AAA..1A2B3C='
Note that the ssh key should end with an equals (=
) sign
and an optional email address.
It may be helpful to set the root password of the container to
a new, known value -- this would allow access to the web-UI, for example.
Use the root-pass
action:
scyld-containerctl root-pass
The system will prompt for a new password, and ask for it a second time to
confirm. The root-pass
action will also print out the
database password which would be needed for configuring Grafana
monitoring (see Grafana Login).
Now configure the clusterID in the container with the customer's cluster authentication token so that it has access to the ClusterWare repo:
scyld-containerctl cluster-id <AUTH_TOKEN>
Now configure the use of ClusterWare tools using:
[root@rocky4 ~]# scyld-containerctl tool-config
which will attempt to find a "good" IP address for this Docker host to communicate with the private cluster network, although the tool may be confused if there are multiple network interfaces.
The tool writes results to stdout; for example:
ClusterWare tools will attempt to contact ssh-agent to get the
user's authentication key. It may be worthwhile for users to run:
eval `ssh-agent` ; ssh-add
A potential .scyldcw/settings.ini file is below:
[ClusterWare]
client.base_url = https://10.54.0.123/api/v1
client.authuser = root
client.sslverify = quiet
Validate the proposed settings.ini
lines, modify if needed,
and write to ~/.scyldcw/settings.ini
.
This user's settings.ini
file can be sent to each admin that has been added
to the container,
who can use that file for their own ~/.scyldcw/settings.ini
after
modifying the client.authuser = <username> line with their own username.
Each user will need to execute ssh-agent
on the Docker host server
at login to allow ClusterWare to authenticate that user's access to the
scyld-*
tools:
eval `ssh-agent` ; ssh-add
With ssh-agent
running, an admin user can now execute ClusterWare commands.
First try:
scyld-nodectl ls
If that authentication was successful,
then because initially there are no nodes configured for the container,
the above command should report ERROR: No nodes found, nothing was done
and thus verifies the admin's proper access.
Since the container initially has no images or boot configurations by default, we can create them as with any other ClusterWare installation by executing:
scyld-add-boot-config --make-defaults
Similarly, the container initially has no defined networks or nodes defined,
so those also need to be defined.
For example, create a cluster config file called cluster.conf
:
cat <<-EOF >cluster.conf
iprange 192.168.122.100/24
node 52:54:00:00:00:01
node 52:54:00:00:00:02
EOF
which defines a cluster network of 192.168.122.100/24 that services two compute nodes on that network with the given MAC addresses. Now configure the head node with that config file:
scyld-cluster-conf load cluster.conf
You can confirm the configuration with scyld-nodectl ls -l
,
which should return node names n0
and n1
with IP addresses in the
specified range.
Stopping and restarting the container¶
To stop the ClusterWare container:
scyld-containerctl stop
The output will give the name of the storage directory and image-version information. It will also give an example command to restart this container without loss of data, e.g., by executing:
scyld-containerctl start cw_container_storage clusterware:12.0.0-g0000
Note
The ClusterWare container may take more time to shutdown than
Docker usually expects and may show a time-out warning.
This is just a warning.
The container will in fact be stopped,
which you can confirm with scyld-containerctl status
or docker ps
.
If you are using the default storage location cw_container_storage
and image version name,
then you can restart the head node without loss of data by using the shorter:
scyld-containerctl start
From an admin account, tools like scyld-nodectl ls
should now work,
and any nodes that were previously defined will still be present.
The Container Storage Area¶
The container storage directory will become populated with copies of several
directories from inside the container.
Most of this data will be opaque and should not be tampered with.
The logs/
directory, however,
might be of use in helping to debug or triage problems.
Known Issues¶
Depending on how the container manager is configured, the ClusterWare container may need extra networking privileges. In particular, user-created containers may not be allowed to access network ports below 1024. If syslog shows messages like:
httpd: Permission denied: AH00072: make_sock: could not bind to address [::]:80
then admins may need to configure the container-host machine to allow users to access lower-numbered ports. One can insert a new config file into
/etc/sysctl.d
to permanently lower the starting point for "unprivileged" ports. Since ClusterWare needs access to DNS/port 53, the following will create the necessary file:echo net.ipv4.ip_unprivileged_port_start = 53 | sudo tee /etc/sysctl.d/90-unprivileged_port_start.conf
A reboot of the container-host will be needed to load the new
sysctl
configuration.