Managing Non-Scyld Nodes

A ClusterWare cluster typically consists of a Scyld master node and one or more Scyld compute nodes, integrated and communicating across the private cluster network interface. However, ClusterWare also supports additional devices and nodes that may reside on that private cluster network. This section describes how these Scyld and non-Scyld nodes are configured using entries in the /etc/beowulf/config file.

DHCP IP address assignment to devices

The private cluster network may have one or more devices attached to it that issue a DHCP request to obtain a dynamic IP address, vs. the device being configured with a static IP address. Typically, only the master node (or nodes - see Managing Multiple Master Nodes) owns a static IP address.

Caution

Care must be taken with static IP addresses to guarantee there are no address collisions.

Examples of such devices are managed switches and storage servers. The beoserv DHCP service for such devices is configured using the host directive, together with an associated hostname directive. For example,

nodes 32
iprange 10.20.30.100 10.20.30.131   # IPaddr range of compute nodes
...
hostrange 10.20.30.4  10.20.30.9    # IPaddr range of devices for DHCP
hostrange 10.20.30.90 10.20.30.99   # IPaddr range of PDUs for DHCP
...
host 00:A0:D1:E9:87:CA 10.20.30.5 smartswitch
host 00:A0:D1:E3:FC:E2 10.20.30.90 pdu1
host 00:A0:D1:E3:FD:4A 10.20.30.91 pdu2

The host keyword affects both the beoserv DHCP server and how the ClusterWare NSS responds to hostname lookups. The host keyword associates a non-cluster entity, identified by its MAC address, to an IP address that should be delivered to that client entity, if and when it makes a DHCP request to the master node, together with one or more optional hostnames to be associated with this IP address.

If the hostname is provided, then normal NSS functionality is available. Using the above example, then:

[user1@cluster ~] $ getent hosts smartswitch

returns:

10.20.30.5 smartswitch

and

[user1@cluster ~] $ getent ethers 00:A0:D1:E9:87:CA

returns:

00:a0:d1:e9:87:ca smartswitch

Each host IP address must fall within a defined hostrange range of IP addresses. Moreover, each of the potentially multiple hostrange ranges must not overlap any other range, must not overlap the cluster compute nodes range that is defined by the iprange directive, and must not collide with IP address(es) of master node(s) on the private network.

Simple provisioning using PXE

A default node entry, such as:

node 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

or an explicitly numbered node entry, such as one for node15:

node 15 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

is assumed to be a Scyld node, and a PXE request from one of these MAC addresses results in beoserv provisioning the node with the kernel image, initrd image, and kernel command-line arguments that are specified in /etc/beowulf/config file entries, e.g.:

kernelimage /boot/vmlinuz-2.6.18-164.2.1.el5.540g0000
initrdimage /var/beowulf/boot/computenode.initrd
kernelcommandline rw root=/dev/ram0 image=/var/beowulf/boot/computenode.rootfs

ClusterWare automatically maintains the config file's default kernelimage to specify the same kernel that currently executes on the master node. A Scyld node integrates into the BProc unified process space.

Enhanced syntax allows for custom booting of different kernel and initrd images. For example, specific nodes can boot a standalone RAM memory test in lieu of booting a full Linux kernel:

kernelimage 15 /var/beowulf/boot/memtest86+-4.00.bin
initrdimage 15 none
kernelcommandline 15 none

Thus when node15 makes a PXE request, it gets provisioned with the specified binary image that performs a memory test. In the above example, the initrdimage of none means that no initrd image is provisioned to the node because that particular memory test binary doesn't need an initrd. Moreover, the node number specifier of 15 can be a range of node numbers, each of which would be provisioned with the same memory test.

Simple provisioning using the class directive

An optional config file class directive assigns a name to a set of image and kernel command-line arguments. The previous example can be alternatively accomplished with:

class memtest kernelimage /var/beowulf/boot/memtest86+-4.00.bin
class memtest initrdimage none
class memtest kernelcommandline none
...
node 15 memtest 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

which results in the same memory test provisioning of node15 as seen earlier.

Similarly, the default Scyld node provisioning can be expressed as:

class scyld kernelimage /boot/vmlinuz-2.6.18-164.2.1.el5.540g0000
class scyld initrdimage /var/beowulf/boot/computenode.initrd
class scyld kernelcommandline rw root=/dev/ram0 image=/var/beowulf/boot/computenode.rootfs
...
node scyld pxe pxe 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

The first pxe is termed the boot-sequence, and the second pxe is termed the boot-stage. The boot-stage describes how beoserv should respond to a node's PXE request. In the example above, the boot-stage of pxe instructs beoserv to respond to the node's first PXE request with the kernel image, initrd image, and kernel command-line specified in the class scyld.

Booting a node from the local harddrive

The node entry's boot-sequence and boot-stage have more powerful capabilities. For example, suppose node15 is installed with a full distribution of CentOS 4.8 on a local harddrive, and suppose the master node's config file contains entries:

class genericboot kernelimage none
class genericboot initrdimage none
class genericboot kernelcommandline none
...
node 15 genericboot pxe+local local 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

When node15 boots, it first makes a DHCP request to join the private cluster network, then it attempts to boot, abiding by the specific sequence of boot devices named in its BIOS. ClusterWare expects that the first boot device is PXE over Ethernet, and the second boot device is a local harddrive. When node15 initiates its PXE request to the master node, beoserv sees the boot-stage of local and thus directs node15 to "boot next", i.e., to boot from the local harddrive.

Provisioning a non-Scyld node

In the previous example, we assumed that node15 already had a functioning, bootable operating system already installed on the node. Having a preexisting installation is not a requirement. Suppose the config file contains entries:

class centos5u4 kernelimage /var/beowulf/boot/vmlinuz-centos5u4_amd64
class centos5u4 initrdimage /var/beowulf/boot/initrd-centos5u4_amd64.img
class centos5u4 kernelcommandline initrd=initrd-centos5u4_amd64.img
                  ks=nfs:10.1.1.1:/home/os/kickstarts/n5-ks.cfg ksdevice=eth0
...
node 15 centos5u4 pxe+local pxe 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

(where the kernelcommandline has been broken into two lines for readability, although in reality it must be a single line in the config file). This time node15's PXE request arrives, and the boot-stage of pxe directs beoserv to respond with the class centos5u4 kernel image, initrd image, and kernel command-line arguments. The latter's ks arguments informs node15's kernel to initiate a kickstart operation, which is a Red Hat functionality that provisions the requester with rpms and other configuration settings as specified in the /home/os/kickstarts/n5-ks.cfg kickstart configuration file found on the master node. It is the responsibility of the cluster administrator to create this kickstart file. See Special Directories, Configuration Files, and Scripts for a sample configuration file.

After this initial PXE response (i.e., the pxe step of the pxe+local boot-sequence), beoserv rewrites the node entry to change the boot-stage to the local second step of the pxe+local boot-sequence. For example,

node 15 centos5u4 pxe+local pxe 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

gets automatically changed to:

node 15 centos5u4 pxe+local local 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

What this accomplishes is: the first PXE request is met with a directive to boot a kernel on node15 that initiates the kickstart provisioning, and then any subsequent PXE request from node15 (presumably from a now-fully provisioned node) results in a beoserv directive to node15 to "boot next", i.e., to boot from the local harddrive.

If the cluster administrator wishes to reprovision the node and start fresh, then simply change the boot-stage from local back to pxe, and execute service beowulf reload to instruct beoserv to re-read the config file to see your manual changes.

If you want the node to kickstart reprovision on every boot (albeit an unlikely scenario, but presented here for completeness), then you would configure this using:

node 15 centos5u4 pxe pxe 00:A0:D1:E5:C4:6E 00:A0:D1:E5:C4:6F

Integrating a non-Scyld node into the cluster

A non-Scyld node that locally boots a full distribution operating system environment may have an assigned IP address in the private cluster network iprange, but it is initially invisible to the master node's monitoring tools and job manager. The bpstat tool only knows about Scyld nodes, and the more general beostatus is ignorant of the non-Scyld node's presence in the cluster. The non-Scyld node is itself ignorant about the names and IP addresses of other nodes in the cluster, whether they be Scyld or non-Scyld nodes, until and unless the cluster administrator adds each and every node into the non-Scyld node's local /etc/hosts file.

This shortcoming can be remedied by installing two special ClusterWare packages onto the non-Scyld node: beostat-sendstats and beonss-kickbackclient. These packages contain the client-side pieces of beostat and beonss. They are available in the standard ClusterWare yum repository and are compatible with non-Scyld RHEL and CentOS distributions - and perhaps with other distributions. One way to judge compatibility is to determine what libraries the ClusterWare daemons need to find on the non-Scyld compute node. (The daemons are known to execute in recent RHEL and CentOS environments.) Examine the daemons that were installed on the master node when ClusterWare was installed:

ldd /usr/sbin/sendstats
ldd /usr/sbin/kickbackproxy

and then determine if the libraries that these binaries employ are present on the target non-Scyld node. If the libraries do so exist, then the special ClusterWare packages can be downloaded and installed on a non-Scyld node.

First, you should download the packages from the ClusterWare yum repo. A useful downloader is the /usr/bin/yumdownloader utility, which can be installed from the CentOS extras yum repository if it is not already installed on your master node:

[root@cluster ~] # yum install yum-utils

Then use the utility to download the special Penguin ClusterWare rpms:

[root@cluster ~] # yumdownloader --destdir=<localdir> beostat-sendstats beonss-kickbackclient

retrieves the rpms and stores them into the directory <localdir>, e.g., /var/www/html or /etc/beowulf/nonscyld.

These special packages can be installed manually on the non-Scyld node, or can be installed as part of the kickstart procedure (see Provisioning a non-Scyld node). Each package includes a /etc/init.d/ script that must be edited by the cluster administrator. Examine /etc/init.d/beostat-sendstats and /etc/init.d/beonss-kickbackclient, which contain comments that instruct the administrator about how to configure each script. Additionally, the non-Scyld node's /etc/nsswitch.conf must be configured to invoke the kickback service for the databases that the administrator wishes to involve beonss and the master node. See the master node's /etc/beowulf/nsswitch.conf for a guide to which databases are supported, e.g., hosts, passwd, shadow, and group. Finally, on the non-Scyld node, enable the scripts to start at node startup:

[root@cluster ~] # chkconfig beostat-sendstats on
[root@cluster ~] # chkconfig beonss-kickbackclient on