Special Directories, Configuration Files, and Scripts

Scyld ClusterWare adds some special files and directories on top of the standard Linux install that help control the behavior of the cluster. This appendix contains a summary of those files and directories, and what is in them.

What Resides on the Master Node

/etc/beowulf/ directory

All the config files for controlling how BProc and Beoboot behave are stored here.

/etc/beowulf/config

This file contains the settings that control the bpmaster daemon for BProc, and the beoserv daemon that is part of beoboot. It also contains part of the configuration for how to make beoboot boot images.

/etc/beowulf/fdisk/

This directory is used by beofdisk to store files detailing the partitioning of the compute nodes' harddrives, and is also read from when it rewrites the partition tables on the compute nodes. See ?

/etc/beowulf/fstab

Refer to Disk Partitioning for details on using node-specific fstab.N files.

/etc/beowulf/backups/ directory

Contains time-stamped backups of older versions of various configuration files, e.g., /etc/beowulf/config and /etc/beowulf/fstab, to assist in the recovery of a working configuration after an invalid edit.

/etc/beowulf/init.d/ directory

Contains various scripts that are executed on the master node by the node_up script when booting a compute node.

/etc/beowulf/conf.d/ directory

Contains various configuration files that are needed when booting a compute node.

/usr/lib/beoboot directory

This directory contains files that are used by beoboot for booting compute nodes.

/usr/lib/beoboot/bin

This directory contains the node_up script and several smaller scripts that it calls.

/var/beowulf directory

This directory contains compute node boot files and static information, as well as the list of unknown MAC addresses. It includes three subdirectories.

/var/beowulf/boot

This is the default location for files essential to booting compute nodes. Once a system is up and running, you will typically find three files in this directory:

  • computenode — the boot sector used for bootstrapping the kernel on the compute node.
  • computenode.initrd — the kernel image and initial ramdisk used to boot the compute node.
  • computenode.rootfs — the root file system for the compute node.

/var/beowulf/statistics

This directory contains a cached copy of static information from the compute nodes. At a minimum, it includes a copy of /proc/cpuinfo.

/var/beowulf/unknown_addresses

This file contains a list of Ethernet hardware (MAC) addresses for nodes considered unknown by the cluster. See Compute Node Categories for more information.

/var/log/beowulf directory

This directory contains the boot logs from compute nodes. These logs are the output of what happens when the node_up script runs. The files are named node., where <number> is the actual node number.

What Gets Put on the Compute Nodes at Boot Time

  • Generally speaking, the /dev directory contains a subset of devices present in the /dev directory on the master node. The /usr/lib/beoboot/bin/mknoderootfs script creates most of the /dev/ entries (e.g., zero, null, and random). /etc/beowulf/init.d/20ipmi creates ipmi0. /usr/lib/beoboot/bin/setup_fs creates shm and pts (as directed by /etc/beowulf/fstab). The harddrive devices (e.g., sda) are created at compute node bootup time, if local drives are discovered. If Infiniband hardware is present on the compute node, /etc/beowulf/init.d/15openib creates various device entries in /dev/infiniband/.

  • The /etc directory contains the ld.so.cache, localtime, mtab, and nsswitch.conf files. The node_up script creates a simple hosts file.

  • The /home directory exists as a read-write NFS mount of the /home directory from the master node. Thus, all the home directories can be accessed by jobs running on the compute nodes.

  • Additionally, other read-only NFS mounts exist by default, to better assist out-of-the-box application and script execution: /bin, /usr/bin, /opt, /usr/lib64/python2.3, /usr/lib/perl5, and /usr/lib64/perl5.

  • The node_up script mounts pseudo-filesystems as directed by /etc/beowulf/fstab: /proc, /sys, and /bpfs.

  • mknoderootfs creates /var and several of its subdirectories.

  • The /tmp directory is world-writeable and can be used as temporary space for compute jobs.

  • /etc/beowulf/config names various libraries directories that are managed by the compute node's library cache. Run beoconfig libraries to see the current list of library directories. Caching shared libraries, done automatically as needed on a compute node, speeds up the transfer process when you are trying to run jobs, eliminates the need to NFS-mount the various common directories that contain libraries, and minimizes the space consumed by libraries in the compute node's RAM filesystem.

  • Typically, when the loader starts up an application, it opens the needed shared libraries. Each open() causes the compute node to pull the shared library from the master node and save it in the library cache, which typically resides in the node's RAM filesystem. However, some applications and scripts reference a shared library or other file that, although it resides in one of those libraries directories, the reference does not use open() to access the file, and so the file does not get automatically pulled into the library cache. For example, an application or script might first use stat() to determine if a specific file exists, and then use open() if the stat() is successful, otherwise continue on to stat() an alternative file. The stat() on the compute node will fail until an open() pulls the file from the master. The application or script thus fails to execute, and the missing library or file name is typically displayed as an error.

    To remedy this type of failure, you should use a prestage directive in /etc/beowulf/config to explicitly name files that should be pulled to each compute node at node startup time. Run beoconfig prestage for the current list of prestaged files.

/usr/lib/locale/locale-archive Internationalization

Glibc applications silently open the file /usr/lib/locale/locale-archive, which means it gets downloaded by each compute node early in a node's startup sequence via the BProc filecache functionality. The default locale-archive is 95 MBytes in RHEL6 and over 100 MBytes in RHEL7. This download consumes significant network bandwidth and thus causes serialization delays if numerous compute nodes attempt to concurrently boot, and thereafter this large file consumes significant RAM filesystem space on each node. It is likely that a cluster's users and applications do not require all the international locale data that is present in the default file. With care, the cluster administrator may choose to rebuild locale-archive with a greatly reduced set of locales and thus create a significantly smaller file that is less impactful on cluster performance.

Rebuilding and replacing locale-archive should be done on a quiescent master node, as the file typically is mmap'ed by a process (e.g., crond, bash), and the appearance of a replacement version may perturb shells and other programs, such as aborting the shell that executes the rebuild or having that shell issue an immediate warning message about an undefined environment variable. In the event that a problem does appear, you should reboot the master node. Otherwise, newly executing programs on the master node will use the updated locale-archive, and compute nodes will employ the new file only after the node reboots.

In a RHEL5 environment, the glibc-common RPM installs the /usr/lib/locale/ directory containing the full set of locale definition files and a full locale-archive binary file. The build-locale-archive command rebuilds the locale-archive with every individual locale data file that is found in that directory. Thus, to reduce the size of locale-archive, you must first reduce the number of locale data files in that directory - but only after saving the default locale data files in a safe place, so you can later rebuild the locale-archive with a different set of locale data files as the cluster's needs change. Beginning with the default /usr/lib/locale/ directory with its full set of locale data files:

[root@cluster ~] # cd /usr/lib
[root@cluster ~] # cp -a locale locale.default
[root@cluster ~] # (cd locale ; rm -fr *_*)

saves all the locale data files in a new directory and produces a stripped-down /usr/lib/locale/, leaving only the locale-archive file. Now reintroduce a smaller set of locale data files. For example, to include the U.S.-English and U.S.-Great Britain locale files:

[root@cluster ~] # cp -a locale.default/en_US* locale
[root@cluster ~] # cp -a locale.default/en_GB* locale

When /usr/lib/locale/ contains the desired locale data files, perform the rebuild:

[root@cluster ~] # build-locale-archive

and reboot the master node and/or the compute nodes as needed.

In a RHEL6 environment, the glibc-common RPM installs just the full default locale-archive binary file. The default /usr/lib/locale/ directory contains no locale data files. Scyld ClusterWare has saved the default locale-archive as locale-archive.default and has created locale-archive.default.list as a text file containing a list of all the locales in that default file. To generate a smaller file, you start with the full default locale-archive, then eliminate locales from the full list using localedef --delete-from-archive, then execute build-locale-archive to finalize the new locale-archive file. To assist in this procedure, Scyld ClusterWare installs helper scripts and some sample locale lists. For example, to rebuild with just the U.S.-English locales:

[root@cluster ~] # cd /usr/lib/locale
[root@cluster ~] # ./rebuild-archive.sh locales.English_US

Or to include all the English language locales:

[root@cluster ~] # cd /usr/lib/locale
[root@cluster ~] # ./rebuild-archive.sh locales.English

When executing rebuild-archive.sh, this helper script prints details of what is being requested and asks for permission to proceed.

Several other sample locales.* files have been provided. The local cluster administrator can use one of these files, or can create a new custom file, as desired. Each such locales.* file should contain a list of one or more specific locales (e.g., en_US.uts8), or contain patterns that match a locale or locales (e.g., en_US), one per line. For example, the locales.English file contains:

# All English language locales
en_

which is a pattern that matches every en_* locale.

Additionally, Scyld ClusterWare provides reset-archive.sh, which is a script that returns locale-archive to its original default state.

Caution

Note that for both RHEL6 and RHEL7, we recommend always including en_US* locales, just to be safe, as the default RHEL/CentOS distributions reference the LANG=en_US.uts8 locale in several /etc/ configuration files. Each Scyld ClusterWare 6-supplied locales.* file contains the suggested en_US locale pattern.

Site-Local Startup Scripts

Local, homegrown scripts to be executed at node boot time can be placed in /etc/beowulf/init.d/. The conventions for this are as follows:

  • Scripts should live in /etc/beowulf/init.d/
  • Scripts should be numbered in the order in which they are to be executed (e.g., 20raid, 30startsan, 45mycustom_hw)
  • Any scripts going into /etc/beowulf/init.d/ should be cluster aware. That is, they should contain the appropriate bpsh and/or bpcp commands to make the script work on the compute node rather than on the master node. Examine the Scyld ClusterWare distributed scripts for examples.

Any local modifications to Scyld ClusterWare distributed scripts in /etc/beowulf/init.d will be lost across subsequent Scyld ClusterWare updates. If a local sysadmin believes a local modification is necessary, we suggest:

  1. Copy the to-be-edited original script to a file with a unique name, e.g.:

    cd /etc/beowulf/init.d
    cp 37some_script 37some_script_local
    
  2. Remove the executable state of the original:

    beochkconfig 37some_script off
    
  3. Edit 37some_script_local as desired.

  4. Thereafter, subsequent ClusterWare updates may install a new 37some_script, but the update will not re-enable the non-executable state of that script. The local 37some_script_local remains untouched. However, keep in mind that the newer ClusterWare version of 37some_script may contain fixes or other changes that need to be reflected in 37some_script_local because that edited file was based upon an older ClusterWare version.

Sample Kickstart Script

Non-Scyld nodes can be provisioned using the Red Hat kickstart utility. The following is a sample kickstart configuration script, which should be edited as appropriate for your local cluster:

# centos 5u3  (amd64) hybrid example kickstart

install
reboot
# point to NFS server that exports a directory containing the iso images of centOS 5.3
nfs --server=192.168.5.30 --dir=/eng_local/nfs-install/centos5u3_amd64
lang en_US.UTF-8
keyboard us
xconfig --startxonboot
network --device eth0 --bootproto dhcp --onboot yes
#network --device eth1 --onboot no --bootproto dhcp
rootpw --iscrypted $1$DC2r9BD4$Y1QsTSuL6K9ESdVk18eJT0
firewall --disabled
selinux --disabled
authconfig --enableshadow --enablemd5
timezone --utc America/Los_Angeles
bootloader --location=mbr
key --skip

# The following is commented-out so nobody uses this by accident and
# overwrites their local harddisks on a compute node.
#
# In order to enable using this kickstart script to install an operating system
# on /dev/sda of your compute node and thereby erasing all prior content,
# remove the comment character in front of the next 4 lines:

# clearpart --linux --drives=sda
# part /boot --fstype ext3 --size=100 --ondisk=sda
# part swap --fstype swap --size=2040 --ondisk=sda
# part / --fstype ext3 --size=1024 --grow

#############################################################################
%packages
@ ruby
@ system-tools
@ MySQL Database
@ Editors
@ System Tools
@ Text-based Internet
@ Legacy Network Server
@ DNS Name Server
@ FTP Server
@ Network Servers
@ Web Server
@ Server Configuration Tools
@ Sound and Video
@ Administration Tools
@ Graphical Internet
@ Engineering and Scientific
@ Development Libraries
@ GNOME Software Development
@ X Software Development
@ Authoring and Publishing
@ Legacy Software Development
@ Emacs
@ Legacy Software Support
@ Ruby
@ KDE Software Development
#@ Horde
@ PostgreSQL Database
@ Development Tools
#@ Yum Utilities
#@ FreeNX and NX
kernel-devel
OpenIPMI-tools
openmpi-devel
sg3_utils

#############################################################################
%pre

# any thing you want to happen before the install process starts

#############################################################################
%post
#!/bin/bash
# anything you want to happen after the install process finishes

masterip=10.56.10.1
wget http://$masterip/sendstats
chmod +x sendstats
mv sendstats /usr/local/sbin/
echo "/usr/local/sbin/sendstats" >> /etc/rc.local

# If you get the blinking cursor of death and no OS post, then uncomment this.
#grub-install --root-directory=/boot hd0
#grub-install --root-directory=/boot hd1
#grub-install --root-directory=/boot hd2

# Removes rhgb and quiet from grub.conf
sed -i /boot/grub/grub.conf -e 's/rhgb//g;s/quiet//g'

# Sets up the serial console in grub.conf
# TODO

# turns off cpuspeed
chkconfig cpuspeed --level 123456 off

# changes xorg.conf from mga to vesa
sed -i /etc/X11/xorg.conf -e 's/mga/vesa/'

# turns on ipmi
chkconfig ipmi on
chkconfig sshd on
wget http://10.56.10.1/done