Special Directories, Configuration Files, and Scripts¶
Scyld ClusterWare adds some special files and directories on top of the standard Linux install that help control the behavior of the cluster. This appendix contains a summary of those files and directories, and what is in them.
What Resides on the Master Node¶
/etc/beowulf/ directory¶
All the config files for controlling how BProc
and Beoboot
behave are
stored here.
/etc/beowulf/config¶
This file contains the settings that control the bpmaster
daemon for
BProc
, and the beoserv
daemon that is part of beoboot. It also
contains part of the configuration for how to make beoboot boot images.
/etc/beowulf/fdisk/¶
This directory is used by beofdisk
to store files detailing the
partitioning of the compute nodes’ harddrives, and is also read from
when it rewrites the partition tables on the compute nodes. See ?
/etc/beowulf/fstab¶
Refer to Disk Partitioning for details on using node-specific fstab.N files.
/etc/beowulf/backups/ directory¶
Contains time-stamped backups of older versions of various configuration
files, e.g., /etc/beowulf/config
and /etc/beowulf/fstab
, to
assist in the recovery of a working configuration after an invalid edit.
/etc/beowulf/init.d/ directory¶
Contains various scripts that are executed on the master node by the
node_up
script when booting a compute node.
/etc/beowulf/conf.d/ directory¶
Contains various configuration files that are needed when booting a compute node.
/usr/lib/beoboot directory¶
This directory contains files that are used by beoboot for booting compute nodes.
/usr/lib/beoboot/bin¶
This directory contains the node_up
script and several smaller
scripts that it calls.
/var/beowulf directory¶
This directory contains compute node boot files and static information, as well as the list of unknown MAC addresses. It includes three subdirectories.
/var/beowulf/boot¶
This is the default location for files essential to booting compute nodes. Once a system is up and running, you will typically find three files in this directory:
computenode
— the boot sector used for bootstrapping the kernel on the compute node.computenode.initrd
— the kernel image and initial ramdisk used to boot the compute node.computenode.rootfs
— the root file system for the compute node.
/var/beowulf/statistics¶
This directory contains a cached copy of static information from the
compute nodes. At a minimum, it includes a copy of /proc/cpuinfo
.
/var/beowulf/unknown_addresses¶
This file contains a list of Ethernet hardware (MAC) addresses for nodes considered unknown by the cluster. See Compute Node Categories for more information.
/var/log/beowulf directory¶
This directory contains the boot logs from compute nodes. These logs are
the output of what happens when the node_up
script runs. The files
are named node.
, where <number> is the actual node number.
What Gets Put on the Compute Nodes at Boot Time¶
Generally speaking, the
/dev
directory contains a subset of devices present in the/dev
directory on the master node. The/usr/lib/beoboot/bin/mknoderootfs
script creates most of the/dev/
entries (e.g.,zero
,null
, andrandom
)./etc/beowulf/init.d/20ipmi
createsipmi0
./usr/lib/beoboot/bin/setup_fs
createsshm
andpts
(as directed by/etc/beowulf/fstab
). The harddrive devices (e.g.,sda
) are created at compute node bootup time, if local drives are discovered. If Infiniband hardware is present on the compute node,/etc/beowulf/init.d/15openib
creates various device entries in/dev/infiniband/
.The
/etc
directory contains theld.so.cache
,localtime
,mtab
, andnsswitch.conf
files. Thenode_up
script creates a simplehosts
file.The
/home
directory exists as a read-write NFS mount of the/home
directory from the master node. Thus, all the home directories can be accessed by jobs running on the compute nodes.Additionally, other read-only NFS mounts exist by default, to better assist out-of-the-box application and script execution:
/bin
,/usr/bin
,/opt
,/usr/lib64/python2.3
,/usr/lib/perl5
, and/usr/lib64/perl5
.The
node_up
script mounts pseudo-filesystems as directed by/etc/beowulf/fstab
:/proc
,/sys
, and/bpfs
.mknoderootfs
creates/var
and several of its subdirectories.The
/tmp
directory is world-writeable and can be used as temporary space for compute jobs./etc/beowulf/config
names various libraries directories that are managed by the compute node’s library cache. Runbeoconfig libraries
to see the current list of library directories. Caching shared libraries, done automatically as needed on a compute node, speeds up the transfer process when you are trying to run jobs, eliminates the need to NFS-mount the various common directories that contain libraries, and minimizes the space consumed by libraries in the compute node’s RAM filesystem.Typically, when the loader starts up an application, it opens the needed shared libraries. Each open() causes the compute node to pull the shared library from the master node and save it in the library cache, which typically resides in the node’s RAM filesystem. However, some applications and scripts reference a shared library or other file that, although it resides in one of those libraries directories, the reference does not use open() to access the file, and so the file does not get automatically pulled into the library cache. For example, an application or script might first use stat() to determine if a specific file exists, and then use open() if the stat() is successful, otherwise continue on to stat() an alternative file. The stat() on the compute node will fail until an open() pulls the file from the master. The application or script thus fails to execute, and the missing library or file name is typically displayed as an error.
To remedy this type of failure, you should use a prestage directive in
/etc/beowulf/config
to explicitly name files that should be pulled to each compute node at node startup time. Runbeoconfig prestage
for the current list of prestaged files.
/usr/lib/locale/locale-archive Internationalization¶
Glibc applications silently open the file
/usr/lib/locale/locale-archive
, which means it gets downloaded by
each compute node early in a node’s startup sequence via the BProc
filecache functionality. The default locale-archive
is 95 MBytes in
RHEL6 and over 100 MBytes in RHEL7. This download consumes significant
network bandwidth and thus causes serialization delays if numerous
compute nodes attempt to concurrently boot, and thereafter this large
file consumes significant RAM filesystem space on each node. It is
likely that a cluster’s users and applications do not require all the
international locale data that is present in the default file. With
care, the cluster administrator may choose to rebuild locale-archive
with a greatly reduced set of locales and thus create a significantly
smaller file that is less impactful on cluster performance.
Rebuilding and replacing locale-archive
should be done on a
quiescent master node, as the file typically is mmap’ed by a process
(e.g., crond
, bash
), and the appearance of a replacement version
may perturb shells and other programs, such as aborting the shell that
executes the rebuild or having that shell issue an immediate warning
message about an undefined environment variable. In the event that a
problem does appear, you should reboot the master node. Otherwise, newly
executing programs on the master node will use the updated
locale-archive
, and compute nodes will employ the new file only
after the node reboots.
In a RHEL5
environment, the glibc-common RPM installs the
/usr/lib/locale/
directory containing the full set of locale
definition files and a full locale-archive
binary file. The
build-locale-archive
command rebuilds the locale-archive
with
every individual locale data file that is found in that directory. Thus,
to reduce the size of locale-archive
, you must first reduce the
number of locale data files in that directory - but only after saving
the default locale data files in a safe place, so you can later rebuild
the locale-archive
with a different set of locale data files as the
cluster’s needs change. Beginning with the default /usr/lib/locale/
directory with its full set of locale data files:
[root@cluster ~] # cd /usr/lib
[root@cluster ~] # cp -a locale locale.default
[root@cluster ~] # (cd locale ; rm -fr *_*)
saves all the locale data files in a new directory and produces a
stripped-down /usr/lib/locale/
, leaving only the locale-archive
file. Now reintroduce a smaller set of locale data files. For example,
to include the U.S.-English and U.S.-Great Britain locale files:
[root@cluster ~] # cp -a locale.default/en_US* locale
[root@cluster ~] # cp -a locale.default/en_GB* locale
When /usr/lib/locale/
contains the desired locale data files,
perform the rebuild:
[root@cluster ~] # build-locale-archive
and reboot the master node and/or the compute nodes as needed.
In a RHEL6
environment, the glibc-common RPM installs just the
full default locale-archive
binary file. The default
/usr/lib/locale/
directory contains no locale data files. Scyld ClusterWare has
saved the default locale-archive
as locale-archive.default
and
has created locale-archive.default.list
as a text file containing a
list of all the locales in that default file. To generate a smaller
file, you start with the full default locale-archive
, then eliminate
locales from the full list using localedef --delete-from-archive
,
then execute build-locale-archive
to finalize the new
locale-archive
file. To assist in this procedure, Scyld ClusterWare installs
helper scripts and some sample locale lists. For example, to rebuild
with just the U.S.-English locales:
[root@cluster ~] # cd /usr/lib/locale
[root@cluster ~] # ./rebuild-archive.sh locales.English_US
Or to include all the English language locales:
[root@cluster ~] # cd /usr/lib/locale
[root@cluster ~] # ./rebuild-archive.sh locales.English
When executing rebuild-archive.sh
, this helper script prints details
of what is being requested and asks for permission to proceed.
Several other sample locales.*
files have been provided. The local
cluster administrator can use one of these files, or can create a new
custom file, as desired. Each such locales.*
file should contain a
list of one or more specific locales (e.g., en_US.uts8), or contain
patterns that match a locale or locales (e.g., en_US), one per line.
For example, the locales.English
file contains:
# All English language locales
en_
which is a pattern that matches every en_* locale.
Additionally, Scyld ClusterWare provides reset-archive.sh
, which is a script that
returns locale-archive
to its original default state.
Caution
Note that for both RHEL6 and RHEL7, we recommend always including en_US* locales, just to be safe, as the default RHEL/CentOS distributions reference the
LANG=en_US.uts8
locale in several/etc/
configuration files. Each Scyld ClusterWare 6-suppliedlocales.*
file contains the suggested en_US locale pattern.
Site-Local Startup Scripts¶
Local, homegrown scripts to be executed at node boot time can be placed
in /etc/beowulf/init.d/
. The conventions for this are as follows:
Scripts should live in
/etc/beowulf/init.d/
Scripts should be numbered in the order in which they are to be executed (e.g., 20raid, 30startsan, 45mycustom_hw)
Any scripts going into
/etc/beowulf/init.d/
should be cluster aware. That is, they should contain the appropriatebpsh
and/orbpcp
commands to make the script work on the compute node rather than on the master node. Examine the Scyld ClusterWare distributed scripts for examples.
Any local modifications to Scyld ClusterWare distributed scripts in
/etc/beowulf/init.d
will be lost across subsequent Scyld ClusterWare updates. If a
local sysadmin believes a local modification is necessary, we suggest:
Copy the to-be-edited original script to a file with a unique name, e.g.:
cd /etc/beowulf/init.d cp 37some_script 37some_script_local
Remove the executable state of the original:
beochkconfig 37some_script off
Edit
37some_script_local
as desired.Thereafter, subsequent ClusterWare updates may install a new
37some_script
, but the update will not re-enable the non-executable state of that script. The local37some_script_local
remains untouched. However, keep in mind that the newer ClusterWare version of37some_script
may contain fixes or other changes that need to be reflected in37some_script_local
because that edited file was based upon an older ClusterWare version.
Sample Kickstart Script¶
Non-Scyld nodes can be provisioned using the Red Hat kickstart
utility. The following is a sample kickstart configuration script, which
should be edited as appropriate for your local cluster:
# centos 5u3 (amd64) hybrid example kickstart
install
reboot
# point to NFS server that exports a directory containing the iso images of centOS 5.3
nfs --server=192.168.5.30 --dir=/eng_local/nfs-install/centos5u3_amd64
lang en_US.UTF-8
keyboard us
xconfig --startxonboot
network --device eth0 --bootproto dhcp --onboot yes
#network --device eth1 --onboot no --bootproto dhcp
rootpw --iscrypted $1$DC2r9BD4$Y1QsTSuL6K9ESdVk18eJT0
firewall --disabled
selinux --disabled
authconfig --enableshadow --enablemd5
timezone --utc America/Los_Angeles
bootloader --location=mbr
key --skip
# The following is commented-out so nobody uses this by accident and
# overwrites their local harddisks on a compute node.
#
# In order to enable using this kickstart script to install an operating system
# on /dev/sda of your compute node and thereby erasing all prior content,
# remove the comment character in front of the next 4 lines:
# clearpart --linux --drives=sda
# part /boot --fstype ext3 --size=100 --ondisk=sda
# part swap --fstype swap --size=2040 --ondisk=sda
# part / --fstype ext3 --size=1024 --grow
#############################################################################
%packages
@ ruby
@ system-tools
@ MySQL Database
@ Editors
@ System Tools
@ Text-based Internet
@ Legacy Network Server
@ DNS Name Server
@ FTP Server
@ Network Servers
@ Web Server
@ Server Configuration Tools
@ Sound and Video
@ Administration Tools
@ Graphical Internet
@ Engineering and Scientific
@ Development Libraries
@ GNOME Software Development
@ X Software Development
@ Authoring and Publishing
@ Legacy Software Development
@ Emacs
@ Legacy Software Support
@ Ruby
@ KDE Software Development
#@ Horde
@ PostgreSQL Database
@ Development Tools
#@ Yum Utilities
#@ FreeNX and NX
kernel-devel
OpenIPMI-tools
openmpi-devel
sg3_utils
#############################################################################
%pre
# any thing you want to happen before the install process starts
#############################################################################
%post
#!/bin/bash
# anything you want to happen after the install process finishes
masterip=10.56.10.1
wget http://$masterip/sendstats
chmod +x sendstats
mv sendstats /usr/local/sbin/
echo "/usr/local/sbin/sendstats" >> /etc/rc.local
# If you get the blinking cursor of death and no OS post, then uncomment this.
#grub-install --root-directory=/boot hd0
#grub-install --root-directory=/boot hd1
#grub-install --root-directory=/boot hd2
# Removes rhgb and quiet from grub.conf
sed -i /boot/grub/grub.conf -e 's/rhgb//g;s/quiet//g'
# Sets up the serial console in grub.conf
# TODO
# turns off cpuspeed
chkconfig cpuspeed --level 123456 off
# changes xorg.conf from mga to vesa
sed -i /etc/X11/xorg.conf -e 's/mga/vesa/'
# turns on ipmi
chkconfig ipmi on
chkconfig sshd on
wget http://10.56.10.1/done