Changelog¶
See Release Notes for summary information about the latest ClusterWare release. This section contains a more detailed ChangeLog history of all releases.
11.9.4-g0000 - February 9, 2024
Fix an old logic bug that caused uploads to take much longer than necessary.
Improve the head node leave and eject behavior.
Use a prebuilt version of the squashfs binaries for packing and unpacking images.
Sort
scyld-modimg ls
andscyld-nodectl
selector output.Improvements to scyld-nss.
Fix PREFER_MOD handling in
scyld-mkramfs
.Cleanly shutdown the telegraf relay daemon to avoid socket-in-use errors.
Replace incorrect use of .netloc() with calls to .hostname().
Additional bug fixes backported from ClusterWare 12.1 development.
11.9.3-g0000 - October 11, 2023
Improved proxy variable handling in
scyld-install
.Include latest version of scyld-nss but do not install it by default.
Fix reboot to properly wait for
shutdown -r
to fail before trying a hard reset.Remove
ssh
banner handling and change the defaultssh
LogLevel.Due to licensing the clusterware-ansible package now includes ansible-core instead of ansible.
Include a default /etc/ansible/ansible.cfg in clusterware-ansible.
Fix
scyld-nodectl exec --label
behavior.Improve backend performance by better batching of database requests.
Include scyld-install code to upgrade from version 11 to version 12.
Additional bug fixes backported from ClusterWare 12 development.
11.9.2-g0000 - January 9, 2023
Override a new 1GB upload limit in httpd.
Directly serve ISO contents for faster access.
Default to faster public gitrepo access via Git Smart HTTP.
Fix openSUSE image creation on el8.
Assorted other bug fixes.
11.9.1-g0000 - November 4, 2022
Assorted improvements to scyld-nss.
Fix two GUI crashes when viewing image details.
Remove dependency on libcgroup that caused image creation failures.
Fix bug where a 11.9.0 head node could not join to a pre-11.9.0 head node.
Add an SELinux module in the clusterware-ansible package.
Fix
scyld-modimg
hang on system with SELinux disabled.Restrict ZTP-boot to Cumulus switches.
Assorted other bug fixes.
11.9.0-g0000 - September 30, 2022
Initial support for ZTP-boot for switches.
Implement a couple fixes for the install-time el8 STIG.
Include assorted tpm2_* tools into the initramfs for storing encryption keys in the compute node's TPM.
Include the nvme driver in the initramfs for disked booting on NVMe storage.
Improved
scyld-modimg
SELinux labelling with a parallel setfiles.Fix a regression that broke the sshpass integration.
Fix isc-dhcpd.log parsing in el8 and el9.
Fix openSUSE image creation.
Initial scyld-nss implementation for compute node name resolution on head nodes without DNS.
Attempt to install nscd during image creation.
Implement attribute substitution in power_uris.
Initial support for a new _ansible_pull_now attribute.
Enable repo_gpgcheck in our software repositories for el8 and later.
Implement
scyld-install --non-interactive
for unattended installs.Disable and remove additional services during
scyld-install --clear-all
.Update to etcd 3.5.4
Include a new GPU data collection script for TICK.
Assorted other SELinux updates, bug fixes, and scaling improvements.
11.8.2-g0000 - August 19, 2022
Fix a timeout during batch-create for nodes.
Capture more log files in
scyld-sysinfo
.Greatly reduce calls to
rpm
inupdate-node-status
.Ensure tftp starts after reboot on el8 head nodes.
11.8.1-g0000 - August 2, 2022
Fix a regression that could cause a crash during image cloning.
Fix bulk node creation when using a @contents.json source.
Hide the BMC password in
scyld-nodectl sol
output.Fix handling of
--binary
inscyld-nodectl exec
.Enable Git Smart HTTP for head-node hosted git repositories. Further Git improvements coming soon.
Assorted other bug fixes.
11.8.0-g0000 - June 17, 2022
Fix a
scyld-nodectl ping
crash where a down node is still ping-able.Default to not compressing data during image capture making the process significantly faster.
Better checking that the ClusterWare installation source matches the system where the installation is running.
Fix a bug that would render
scyld-*ctl
tools unusable when deleting an in-use naming pool.Fix long standing bug where a changing a node MAC would not be pushed to the DHCP server without another network change or a service restart.
Use
unsquashfs
to unpack cwsquash file systems during rwram booting for a significant speedup.Implement variable replacement in boot configuration command lines.
Add a node script to configure BMC settings based on a mixture of the power_uri field along with the _ips and _gateways attributes.
Rewrite handling for adding unknown nodes to the cluster when they are seen making DHCP requests. Pre-loading MACs is still preferred.
Fix
scyld-clusterctl heads clean
to clean unused files in more cases.Include more variables (groups, power_uri, etc.) in the [Node] section of the attributes.ini on compute nodes.
Clean up basic.ks kickstart example file and separate ClusterWare related parts into an includable file.
Fix down head handling on locally installed systems.
Separate more IPv4 and IPv6 code and logging.
Assorted changes to support Python 3.9 and later.
Snap pip package versions forward.
Add
scyld-install --iso
support to allow administrators to install from a downloadable ClusterWare ISO.Further refine support for installing RHCOS from an uploaded ISO using their ignition system.
Implement "then" support to allow multiple steps in a single command when using ClusterWare command line tools.
Fix bugs with manually modifying an existing naming pool.
Implement
scyld-nodectl power setnext <bootdev>
to set the next boot device for a node.Define parent-head-node inside
scyld-modimg --chroot
.Update /etc/hosts on compute nodes when they cycle from a down head to a working one.
Implement
scyld-modimg --write-repos
to rewrite an image's clusterware-node.repo file based on the current head node configuration.Add a variety of new tests to better catch regressions.
Assorted other bug fixes and scaling improvements.
11.7.2-g0000 - April 6, 2022
Correct a pair of bash variable name collisions in
update-node-status
and the custom dracut module.Auto-reconnect for
scyld-nodectl sol
.Tweak the
adjust-repos.sh
script to work beforefind
is installed.Catch a very unlikely case where nodes have the same system UUID.
Assorted other bug fixes.
11.7.1-g0000 - March 18, 2022
Small changes so
_boot_rw_layer=rwtab
works.Hide
last_modified_on
in the output from my tools.Fix script URL If the iPXE boots from the second network.
Recognize uploaded RHEL CoreOS ISOs.
Improve the export and iso download progress output.
Administrators can use URLs as sources when uploading ISOs and other binaries.
Implement
scyld-install --os-iso
and the correspondingscyld-add-boot-config --iso
.Do not automatically use vault.centos.org as a DefaultImage installation source.
Assorted other bug fixes.
11.7.0-g0000 - February 23, 2022
Detect when a CentOS 8 image is being created and update the files in /etc/yum.repos.d to use the vault.centos.org server.
Change the basic.ks example so the node sets its own
_boot_style=next
before rebooting instead of powering off.Catch a code path that could result in duplicate names for images.
Support name= syntax to clear attributes and fields.
Give administrators more control over image creation details via the
--pkgmgr
parameter.Implement a document cache to improve performance database performance.
Support unconfiguring bootnet at the end of the initramfs.
Significantly expand the
scyld-nodectl waitfor
functionality.Support per-node
_gateways
and_macs
reserved attributes.Accept additional MAC address formats.
Recognize nodes by DUID to support booting with IPv6 via dhcpv6.
Split dhcpd.conf.template into multiple parts for easier management.
Restore more specific success and error messages from node power control commands.
Implement _disk_wipe and add encryption support to _disk_cache reserved attributes.
Add a
reboot-kexec
command to the clusterware-node package to trigger kexec rebooting from inside the node.Include
mdadm
in the initramfs if it is installed in the image.Fix bug where
scyld-modimg
auto-deletes a cached item and then immediately tried to use it.Generalize IP code to properly handle IPv6 addresses.
Properly handle interfaces with more than one address.
Include the ipxe.iso for booting virtual machines using IPv6.
Assorted Ubuntu compute node networking fixes.
Remove unused URL parameters passed during the iPXE script download.
Improve
scyld-sysinfo
data collection on Ubuntu compute nodes.Include Scyld TICK packages in
scyld-install --clear-all / --update
.Correctly terminate scripts with non-zero exit code on error.
Remove the unused ip= kernel argument added during compute node boot.
Better error handling during chain booting.
Automatically update compute node hostnames when the node name or _hostname attribute changes.
Allow for clearing selected database object fields by setting their value to an empty string, i.e. the node-level cmdline field.
Log more information about each request in the api_access_log.
Assorted other bug fixes and scaling improvements.
11.6.0-g0000 - October 31, 2021
Fix a boot-time
systemd
service race condition by forcingNetworkManager.service
to wait forcw-boot-prenet.service
to complete.Require
scyld-adminctl
keys to be unique so they can be used to identify the user inssh
sessions duringgit clone
.Update
headctl
to automatically open and closefirewalld
ports.Empty ~/.scyldcw/ during
scyld-install --clear-all
but leave the logs directory intact.Bump busybox version to 1.34.1 and remove unused build options.
Update the slurm-scyld packages to version 21.08.3.
Update the TICK packages: telegraf to version 1.20.2, influxdb to version 1.8.10, and chronograf to 1.9.1.
Fix a boot chaining issue where peers waited on each other, and enable boot chaining by default with chaining.enable now defaulting to True.
Make argument order optionally much more flexible.
Better repository related error messages from
scyld-install
.Ensure node FQDNs are placed before the corresponding short names in our
dnsmasq
hosts file.Further improvements to
scyld-cluster-conf
to handle more config file formats.Rewrite naming pool pattern collision detection.
FIPS fixes for Lark grammer parser.
Add support for pushing IPs from secondary pools into node _ips attributes.
Include multiple (default up to 3) head nodes for DNS through dhcp.
Document head node support for AlmaLinux.
Add a new optional clusterware-ansible package including boot-time
ansible-pull
support.Use symlinks when including
git
in ClusterWare packages.Implement
scyld-nodectl waitfor
using--selector
logic.Expose and improve the
scyld-nodectl --selector
support.Client-side recursive delete for boot configuration and distros.
Delete unmodified images from the
scyld-modimg
cache older than one hour.Add an optional uname wrapper inside
scyld-modimg --chroot
.Snap pip and npm package versions forward.
SELinux improvements for MLS and RHEL registration.
Replace
systemctl restart
withsystemctl reload
in the installer andheadctl
scripts.Set the new id_method status field based on how the head node identified the compute node during the status update.
Add new
scyld-clusterctl gitrepos
andscyld-clusterware certs
tools.Implement
--version
inscyld-kube
so administrators can deploy a specific version.Assorted other improvements and bug fixes.
11.5.1-g0001 - October 5, 2021
SELinux changes for manipulating MLS images.
Correct
scyld-modimg
discard behavior.Correct
scyld-nodectl ping
and related failures.Assorted other bug fixes.
11.5.1-g0000 - October 4, 2021
Significant changes to
scyld-cluster-conf
to improve handling of less common configurations and make future maintenance simpler.Check for rpmsave / rpmnew files post upgrade and notify the cluster administrator that they should be addressed.
Fix a typo in the influxdb rsyslog configuration.
Add parent-head-node to /etc/hosts inside chroots to more closely match deployed image behavior.
Significant rearrangement of
scyld-modimg
to make future maintenance simpler.Fix uid/gid handling in
scyld-modimg --import
. Tool now expects any tar being imported to contain correct numeric uid / gid instead of matching names against the host system.Remove deprecated
replace
code in favor ofupdate
calls.Use inst.ks= instead of deprecated ks=. Note that this does break kickstarting for el6.
Fix a bug in
scyld-nodectl exec
for nodes that changed IP addresses after a ClusterWare head node upgrade.Remove most references to the cwtar format but ensure
scyld-modimg
can still import and repack the format.Test Rocky Linux 8.4 head nodes.
Update busybox used in the custom initramfs.
Remove remaining traces of NFS Ganesha integration.
Small improvements to image capture functionality.
Customize the iPXE user-class (now CWiPXE) to more tightly control the iPXE boot process.
Force iPXE to use low numbered ports when fetching boot files.
Version bumps for many third-party packages.
Include a custom-built version of git for future features.
Remove long deprecated code.
Assorted other improvements and bug fixes.
11.5.0-g0001 - September 2, 2021
Fix a regression that broke kickstart and live booting.
11.5.0-g0000 - August 26, 2021
Default to etcd for new installs.
Add HA kubernetes support to the
scyld-kube
utility.Implement automatic etcd compact and defrag on heads nodes as well as auto-eject and auto-rejoin for the head node cluster.
Both general and etcd specific performance improvements.
Ensure _no_boot stops soft and hard power control commands unless
--force
is specified.Improve
scyld-mkramfs --update
to accept a--kver <VERSION>
option when updating a boot configuration after a kernel upgrade.Implement
scyld-nodectl reboot --kexec
in preparation for image previewing.Reduce resource usage in
scyld-nodectl status --refresh
.Ongoing effort to encrypt more compute node to head node and compute node to compute node communications.
Reduced polling by implementing waiting on database content changes.
Much cleaner logging during service shutdown.
Code and endpoint removal and cleanups.
Assorted other improvements and bug fixes.
11.4.3-g0000 - July 8, 2021
Confirm image creation from an ISO is limited to packages on the ISO.
Improved FIPS support for earlier RHEL and CentOS releases.
Improved proxy handling during
scyld-install --update
.Rewrite
scyld-nodectl status --refresh
to handle more corner cases and terminal resizing.Include scyld package versions in
scyld-nodectl ls -L
output.Add support for redirecting stdout and stderr into per-node local files when using
scyld-nodectl exec
.Fix bugs around manually setting network database parameters.
Simplify URLs used during the early boot process.
Change kubeadm runtime to use containerd instead of docker.
Change dnsmasq SRV request handling to immediately return an invalid response to requests from Couchbase.
Update the slurm-scyld packages to version 20.11.8.
Update the TICK packages: telegraf is version 1.18.3, influxdb is version 1.8.6, chronograf is 1.8.10, and kapacitor is version 1.5.9.
Assorted other improvements and bug fixes.
11.4.2-g0000 - June 4, 2021
Upgrade OpenMPI removing the Slurm library version dependency.
Initial support for MLS on RHEL 7 and CentOS 7 head nodes.
Support
scyld-nodectl ping
to ping nodes on demand.Support URL encoding of the password section of the power_uri to handle additional characters. Any power_uri currently containing % may need to be updated.
Use excludes.txt to exclude specific directories in the squashfs packer.
Exclude paths from the setfiles call when exiting the chroot.
Add additional security related HTTP headers.
Small updates to the ReactJS GUI including fixing checkbox behavior.
Fix a
managedb save
regression that defaulted to saving into a directory instead of a file.Assorted other improvements and bug fixes.
11.4.1-g0000 - April 30, 2021
Significant performance improvements from reducing database contention in high node count clusters.
Technology preview: Adding etcd as an alternative database backend. etcd should allow for further scale improvements in later releases.
Announcing support for Oracle Linux 7 and 8. These operating systems are now supported for both head nodes and compute nodes in all the configurations supported for RHEL and CentOS.
Expand the information collected about hardware and firmware versions at boot time. This information expands the possibilites for cluster administrators to detect and track cluster changes.
Improved FIPS and MLS support. ClusterWare now supports compute nodes CentOS and RHEL 8 images in MLs enforcing mode, and FIPS 140-2 is fully supported on head nodes and compute nodes across the cluster.
Removing NFS compute node root file system integration. In modern, scalable clusters the benefits of separating compute nodes from head nodes (e.g. simplicity, performance, security, and independence from head nodes post boot) significantly outweigh the costs of running the core operating system from RAM.
Support switching database backends on existing clusters using
headctl
.Streamline the peer download process used to pass files between head nodes.
Improve initramfs LUKS support when booting ephemeral compute nodes with _disk_root and boot syle disked.
Fix multiple problems with CentOS / RHEL 8 kickstart support. The example
basic.ks
kickstart file now works for versions 7 and 8.Print any unexpected errors from the
setfiles
call when exiting thescyld-modimg
chroot.Correct IP calculations in more complicated cluster configurations.
Improve
scyld-nodectl exec
across large node counts.Better banner filtering from
scyld-nodectl exec
and soft power control viascyld-nodectl <reboot|shutdown>
.Attempt to install rdma-core and fipscheck packages in newly created compute node images.
Improve backend caching and hinting for IP/MAC/name to UID translations.
Improved FIPS support on compute nodes. This change requires upgrading clusterware-node inside images and rebuilding the initramfs files.
Deprecate NFS Ganesha integration and obsolete the clusterware-ganesha package.
Correct CentOS 6 images to point at the CentOS vault during creation.
Improve
scyld-install
to stop earlier on error.Include mlx5_core by default in initramfs files.
Implement stored selectors and dynamic groups.
Names of dynamic groups cannot collide with attribute group names.
Dyngroups can reference other dyngroups but can be slow to evaluate.
Properly report group join / leave failures.
Update the parser used on node specifications.
Upgrading and refreeze all pip packages to latest versions.
Remove unnecessary pip packages from the virtual environment.
Assorted other improvements and bug fixes.
11.4.0-g0000 - January 22, 2021
Initial kubeadm support. This adds a new clusterware-kubeadm package providing a
scyld-kube
command. See Kubernetes for details.Support passing
%group
toscyld-nodectl
in place of a node specification to affect all nodes in the named group.When installing from a ClusterWare ISO, upload that ISO into the ClusterWare system as a repo and update any
file://
URLs inclusterware.repo
accordingly.Properly copy gpgcheck values from
/etc/yum.repos.d/clusterware.repo
on the head node to the clusterware-node.repo during image creation.Do not ask for a ClusterWare password when stdout is not a terminal.
Support CentOS Stream for head nodes and compute node images.
The Job Scheduler
-scyld.setup
scripts now support optionally naming specific nodes (vs. presuming all up nodes) for the actionsinit
,reconfigure
, andupdate-nodes
. See Job Schedulers.Remove some unused Couchbase-related Requires and BuildRequires from the spec file.
Default to using the current $USER in
scyld-*
commands when no client.authuser is defined insettings.ini
.Provide ISO contents over HTTP via a /repo/<name>/content/ URL.
Do not record virt-what output on non-virtual nodes.
Update libvirt-python to 6.10.0. Expect many other Python and NPM packages to be updated in following releases.
Enable HTTPS communication during head node installation.
Fix
set-node-attribs
command line parsing, and treat arguments without '=' as a request to delete the named attribute.Do not install clusterware-ganesha during head node installation.
Fix kernel version detection when multiple
/lib/modules/<kernel>/
directories exist.Switch backend subexec from multiprocessing to threading, thereby making some deadlocks much less likely.
Assorted other improvements and bug fixes.
11.3.0-g0001 - December 2, 2020
Simplify initramfs rwram booting with SELinux by fully preserving rather than restoring SELinux contexts from the image.
Compute IPs at node creation time instead of waiting for the leases daemon to compute the same. Clearing the
ip
field viascyld-nodectl up ip=
will trigger immediate recomputation.Confirm incoming
_boot_config
and_boot_style
strings are usable before accepting them.Adapt initramfs scripts to boot Ubuntu and Debian images.
Improved support for customizing initramfs files through
scyld-mkramfs
.Add
scyld-mkramfs --update <bootconfig>
to simplify the common case where a cluster administrator wants to update the initramfs in an existing boot configuration.Initial implementation of
scyld-chroot
insidescyld-modimg
chroots includingcopyin
,copyout
, andinfo
.Fully disable backend image repacking since we now only use a single image format.
Capture more information about compute node storage and infiniband hardware.
Expand the yum and dnf handler to also support zypper systems, i.e. openSUSE.
Try to install
less
,iperf3
, andcryptsetup
when creating images.Initial implementation of
scyld-bootctl import
to match the existing export command.Assorted other improvements and bug fixes.
11.2.2-g0000 - October 30, 2020
Add
/opt/scyld/clusterware/bin/headctl
script to enable / disable Apache features on the head node. Can enable / disable HTTPS and set compute nodes to prefer HTTPS communication. Will default to preferring HTTPS in future release.Compute nodes verify server identity provided by HTTPS when possible, but default to accepting unverified head nodes.
Further address a low probability file corruption bug when
scyld-modimg
unpacks images.Fix IP collision bug introduced in 11.2.1 so that X.X.X.1 is not detected as matching X.X.X.1[0-9]+.
The
scyld-tool-config
tool will generate a HTTPS base_url field when connecting to any server other than localhost.Assorted SELinux updates for basic MLS policy.
Increase default password lengths as they are rarely manually entered.
Rearrange Apache configuration files to simplify changes in /etc/httpd and add a CW-Proxy-Secret header to confirm when backend system can trust other forward-related headers.
Double Python thread count to 32.
Initial LUKS in the initramfs providing encryption-at-rest for ephemeral compute node boot syle disked with _disk_root.
Initial implementation of compute node peer downloads for boot chaining. Controlled by chaining.enable
base.ini
variable that defaults to False.Add
arping
to busybox for early dhcp client scripts.Remove deprecated arguments and content from
dhcpd.conf.template
,scyld-clusterctl
,mount_rootfs
, etc.Add
scyld-nodectl sol [--enable|--steal]
options.Include node hostnames in dhcp offers and more aliases in dns.
Expanded support for _ips to create ifcfg-IFACE files.
Include public ssh host keys in compute node status.
Pass the head node's gateway to compute nodes on the same network.
Capture more hardware (IB, NVMe) details during node boot.
Assorted other improvements and bug fixes.
11.2.1-g0000 - September 24, 2020
Add
mount/umount
back intosudoers.d
for Ganesha exports.Fix Ganesha export permissions.
Disable backend repacking.
Disable zypper detection that triggered in odd circumstances.
Fix parsing of distribution major number.
Exclude tests folders from clusterware-tools.
Fix percent sign use in
_boot_tmpfs_size
.
11.2.0-g0000 - September 4, 2020
Support for CentOS / RHEL 8 head nodes.
Remove cwtar as a backend image format, leaving only cwsquash.
Fix
scyld-modimg
crash on bad--query
.Fix
scyld-nodectl ls -l
(andls -L
) ram_total andscyld-nodectl status -L
ram_free output.Fix permissions when creating files in
sync-uids
.Fix
scyld-modimg --create
for CentOS 8.0 / 8.1.Wait for rebalance to complete when joining head nodes.
Allow for zero-padding of node names.
Add more
scyld-nodectl ls -l
andls -L
output fields.Rework
scyld-add-boot-config
to be more flexible.Include example
node.sh
for locally installed compute nodes.Only use local authentication when connecting to local server.
Improve locally installed compute node hostname handling.
Combine and improve calls to
file
to identify objects.Remove remaining bits of
bpstat
and other legacy tools.Install an example settings.ini during
scyld-install
.Shorten paths in some output to make output more readable.
Trick
mksquashfs
into providing more detailed progress.Clean up and standardize database failure cases, and resume daemons when database recovers.
Implement database purge and improve
scyld-install --clear
.Improve package removal during
scyld-install --clear-all
.Change the cwsquash format to use a GPT partition table.
Move ganesha SELinux rules into the clusterware-ganesha package.
Improve the
take-snapshot
tool, which performs database backups and manages retention of those backups, typically executing as a cronjob. See take-snapshot in the Reference Guide.Improvements to
scyld-sysinfo
, including no longer requiring setup of user root authentication to capture state of compute nodes.Assorted other improvements and bug fixes.
11.1.2-g0001 - July 8, 2020
Patch pyramid in the virtual environment to allow a non-security use of md5.
11.1.2-g0000 - July 1, 2020
Initial implementation of node naming pools.
scyld-install
update now callsmanagedb update
.Head and compute node status includes their "now" timestamp.
Initial implementation of head nodes as a chrony pool defaults to disabled.
Squashfs tools now use 50% of the available processors, although this is configurable.
Boot time
set_hostname.sh
script now useshostname
instead ofhostnamectl
on CentOS 6.Fix an authentication race that triggered password prompts.
Initial support for CentOS 8.2 compute node images.
Add a short (0.03s) cache in the database layer.
Improved kickstart menu generation.
Use enabled=0/1 in
/etc/yum.repos.d/clusterware.repo
to avoid inadvertent yum updates.Changes to
scyld-install
in preparation for CentOS 8.Expanded variable substitution in kickstart files.
Improved SELinux permissions on enforcing compute nodes.
Fix file descriptor leak causing "too many open files" error.
Support X-Sendfile when downloading images and boot files.
scyld-modimg --query
lists all installed packages.Fixes to
scyld-modimg
discard and upload logic.Assorted other improvements and bug fixes.
11.1.1-g0002 - May 27, 2020
Only updating clusterware-tools and these release notes.
Remove a log statement that caused a crash in
scyld-nodectl exec
when providing stdin.Conditionally reinstate some initramfs code that is required to successfully boot a cwsquash image with style rwram.
11.1.1-g0001 - May 21, 2020
Use cgroups to identify and terminate child processes from a chroot.
Ignore /tmp and /var/tmp when correcting SELinux contexts in a chroot.
Use the head IP instead of the gateway IP in iscsi boot style.
Database cleaning code is now aware of uploaded ISO files.
Cleaning code will not attempt to connect to a down head node.
11.1.1-g0000 - May 19, 2020
Clearer errors from the client tools when the head node is unresponsive.
Handle when a large upload times-out, fixing the "size does not match" error.
Add mechanism for starting a long running task and checking for results in separate calls with a custom HTTP header.
Rewrite remotely deleted files detection to reduce the chances of leaving
.old.00
files.More daemons now clean up their leftovers in the
workspace/
directory.Add storage cleaning support via the
scyld-clusterctl heads clean
command. See scyld-clusterctl for details.The status of ClusterWare services on a head node or nodes can now be checked and changed via the
scyld-clusterctl heads service
command. See scyld-clusterctl for details.Fix a case that failed to find the disk during iscsi booting.
Improvements to libvirt power control for VM compute nodes.
Improved logging in SSH and Couchbase failure cases.
Nodes can be reordered using
scyld-cluster-conf load
without losing configuration.Fix a cloning failure that left file copies in
/opt/scyld/clusterware/storage/
.Display "|deleted|" when a database link is broken in
scyld-bootctl
orscyld-attribctl
.More consistent error and success messages from power on/off/status.
Reduce database calls in common code paths.
Small fixes to
/opt/scyld/clusterware-installer/make-iso
for ISO image generation.When exiting
scyld-modimg
, move the stdout of "fixing SELinux file labels" to after choosing to keep an image, not prior to that choice.Document booting memtest86+ on compute nodes.
Better error handling in clusterware-node scripts and head initialization.
Assorted other improvements, code clean ups, and bug fixes.
11.1.0-g0001 - March 16, 2020
Default to rwram booting even when using the cwsquash format.
Improvements to the code that pulls images, ISOs, and boot files between heads.
More useful error messages from
scyld-modimg
package commands.Better iSCSI device detection at boot time.
Default the authentication cookie lifetime to 20 minutes.
Initial support for capturing images from running nodes.
Support for the SELinux MLS policy on compute nodes.
Support
tar
input and output formanagedb
.Expanded ISO upload and kickstart support.
Add _boot_style live and next for booting CentOS / RHEL ISOs.
Improved support for re-assigning compute node indices.
Compute nodes will re-fetch keys and head nodes if their head node is replaced with a new installation.
Simplify steps to switch head node SELinux status.
Include more tools in the initramfs busybox build.
scyld-install
is more forgiving when creating the first user.Adding
--grouped
and--in-order
support toscyld-nodectl exec
.Officially support
scyld-modimg --mount / --unmount
.Capture any modified installed file in
scyld-sysinfo
.Include rsyslog and network information from telegraf.
Include progress meters on all
scyld-*ctl
uploads or downloads.Support uploading larger files such as full DVD ISO files.
Add initial support for creating ClusterWare installation ISO images.
Assorted other improvements, code clean ups, and bug fixes.
11.0.8-g0000 - November 8, 2019
dhcpd.conf.template
improvements to simplify bootstrapping systems.Initial implementation of
take-snapshot
for backing up the database and images.Pass more power command errors up to the user.
Fix SELinux permissions for chronograf proxying.
Move port numbers into named services for firewalld.
FIPS fixes for ISC dhcpd to allow and default OMAPI to hmac-sha1.
Default to using
-Ilanplus
for ipmitool calls.Support for filtering banners out of
scyld-nodectl exec
.Add a _remote_user attribute so we no longer require root ssh to control compute nodes.
Improvements to the Slurm and TORQUE helper scripts.
Add the
sync-uids
script to inject user accounts.Generate longer passwords for Couchbase.
Replace most periodic sudo calls with long-lived scripts to reduce logging to
/etc/log/secure
.Default authentication to
pam_authenticator
+maplocal
.Assorted other improvements and bug fixes.
11.0.7-g0001 - October 2, 2019
Add SELinux rule for ClusterWare service to query service status.
Fix a small bug where scyld-sysinfo was not capturing modified ClusterWare files (
rpms_clusterware_verify
).Add a missing line to the clusterware-installer REVISIONS file.
11.0.7-g0000 - October 1, 2019
scyld-sysinfo
now optionally captures compute node state.Add 20-second keep-alive when wrapping ssh commands.
scyld-nodectl ssh
command is an alias forscyld-nodectl exec
if a command is passed.Expand the head node information stored in the database.
Various
scyld-*ctl
commands support field selection with new--field
arguments.Various
scyld-*ctl
commands support two new output formats:--csv
and--table
.Include sanboot as a _boot_style to boot local disks or URLs that iPXE sanboot supports.
scyld-install
doing an upgrade will not run steps that were performed when doing the initial ClusterWare install and which may have been subsequently altered by the local administrator.scyld-install
prints version information for each installed or upgraded packages.scyld-install
passes http_proxy/https_proxy to underlying calls.Assorted other improvements and bug fixes.
11.0.6-g0000 - September 6, 2019
Include version number in REVISIONS files.
Fix a
scyld-modimg
problem that rejected any attempt to create a new image with a name that was a subset of an existing image name.Add
scyld-clusterctl heads
that treats head nodes as database objects that can be viewed or deleted. More features to come.Support socket-based admin authentication for local user accounts.
Fix
scyld-cluster-conf save
.Eliminate an innocuous "Failure" message "No power URI provided for node" seen when doing
scyld-nodectl power cycle
orpower off
.Add nfs-utils to the base image.
Pass more ipmitool error messages back to the caller.
Catch some exceptions that would unnecessarily stop daemons, and instead handle more gracefully.
Initramfs dhclient should not survive the switch_root.
Add _hostname as a reserved attribute to override specific compute node hostnames. See Reserved Attributes.
Allow administrators to set a boot configuration image to "None" for new kickstart/preseed support, and add new appendices in the ClusterWare documentation that provides examples of how to use Red Hat kickstart for Ubuntu and CentOS (see Appendix: Using Red Hat Kickstart) and Debian preseed (see Appendix: Using Debian Preseed).
Assorted other fixes and improvements.
11.0.5-g0001 - August 6, 2019
Temporarily disable automatic renaming of unreferenced files.
11.0.5-g0000 - August 1, 2019
Fix the
--soft
then--hard
behavior when rebooting or shutting down nodes.Simplify and improve human readable tool output unless
--no-pretty
is passed.Add a new
ssh
action toscyld-nodectl
; details in documentation.Include
/etc/systemd/system/couchbase-server.service.d/override.conf
to allow Couchbase to use MD5 even when FIPS mode is enabled.Suppress FIPS mode messages from
scyld-nodectl exec
.Support for locally installed compute nodes; details in documentation.
Fixes when passing binary data to stdin of
scyld-nodectl exec
.Move the
dhcpd.leases
file from the default location to/opt/scyld/clusterware-iscdhcp/conf/dhcpd.leases
.Give other head nodes a better chance to delete local copies of deleted content.
Detect and rename files in storage that are not referenced in the database.
Update
resolv.conf
if the only nameserver was a head node that goes down.Assorted other fixes and improvements.
The slurm-scyld packages are updated to version 19.05.1, and openmpi2.0, openmpi1.10, and openmpi1.8 packages are rebuilt as version g0004 for compatibility with the newer slurm-scyld library. The openmpi3.1 packages are updated to version 3.1.4; openmpi3.0 updated to version 3.0.4; openmpi2.1 updated to version 2.1.6; and openmpi4.0 version 4.0.1 has been added to the distribution, all also compatible with the new slurm-scyld library and rebuilt as version g0004.
11.0.4-g0001 - July 3, 2019
Support CentOS 6 images for compute nodes.
Fix problem of root authorized keys being overwritten on compute node at boot time.
Require node status updates to arrive on privileged ports.
Improved
api_error_log
capture of IP addresses.Make
--summary
the defaultscyld-nodectl status
output.Various
scyld-sysinfo
improvements, including requesting a comment from the user that gets added to the output.Pass remote IPs through ProxyPass to get them to the logs.
Link dracut statically to simplify supporting different compute node OSes.
Enable automatic
--soft
then--hard
behavior forscyld-nodectl
reboot and shutdown, and document the difference.Convert more exceptions to errors due to bad command line arguments.
Wrap
ipmitool sol activate
in a newscyld-nodectl
option.Add an empty
/etc/fstab
during image creation.Modify the prompt when inside a chroot.
Fix a
scyld-bootctl clone
bug: copy the release field.Better error messages when a Couchbase member is unreachable.
Log the head's hostname when starting the service.
Add a syncer daemon that fetches remote files in the background.
Add
managedb update
to fix Couchbase after out-of-diskspace conditions.Add
scyld-nodectl
power on/off/cycle/status andscyld-nodectl
sol.If a small file is passed as stdin to
scyld-nodectl
, thenexec
the contents instead of streaming it.Cleanups to
scyld-modimg
around setting name, distro, and description.Rename
scyld-modimg --export
to--copyout
, and implement a new inverse action--copyin
.Assorted other fixes and improvements.
Various other packages have been released in coordination with Scyld ClusterWare 11.0.4-g0001 and should be updated, if installed: torque-scyld, slurm-scyld, singularity-scyld, openmpi3.1, openmpi3.0, openmpi2.1, openmpi2.0, openmpi1.10, and openmpi1.8.
The torque-scyld and slurm-scyld packages are now split into three packages for each job scheduler. For example, torque-scyld (which requires torque-scyld-libs) installs on the server, and torque-scyld-node (which requires torque-scyld-libs) gets installed into a node image by the
sched-helper
script. (See Job Schedulers.)singularity-scyld updates to version 3.2.1, and it no longer install files into
/opt/scyld/
, thus no longer requiring the user tomodule load singularity
. The installed files are now accessible via the standard $PATH and $LD_LIBRARY_PATH.
11.0.3-g0020 - June 6, 2019
Fixes to peer download so that only one thread will download at a time.
11.0.3-g0014 - May 24, 2019
Stopping the clusterware service now also stops the clusterware-dhcpd and clusterware-dnsmasq services.
Include the pciutils package and an empty
/etc/sysconfig/network
file when creating the base image.Fix various
scyld-install --clear-all
problems of overly aggressive deletions.Add
write_ifcfg.sh
to the prenet startup on compute nodes.Move the location of the
scyld-helper
script and add functionality to improve the configuration of Slurm or TORQUE. See Job Schedulers.Minor fixes to
managedb leave
andeject
.Improve
scyld-sysinfo
error handling.Expanded documentation around failover.
The sched-helper script can now push changes into compute node images.
Switch default gateway for compute nodes during head node failover.
Implement peer downloads for head node's missing files.
scyld-cluster-conf save
now handles nodes on multiple networks.
11.0.3-g0000 - May 8, 2019
First General Availability release.
Mark dnsmasq.conf.template and dhcpd.conf.template as configuration files.
Support dhcp relays.
Reduce log messages in api_error_log.
Fix an early boot issue that was causing yum to fail on nodes booted using roram style.
Fix the squashfs packer to work on images up to 100GB.
Default to 16 threads in the Apache wsgi configuration.
Add
--clear-all
argument to the installer.Python daemons will now attempt to automatically restart with an exponential backoff.
Implement the _preferred_head attribute.
Fix a bug where results were listed per node instead of collapsed.
Other assorted documentation and tool fixes.
Fixes for SELinux on head nodes:
dnsmasq properly starts and serves compute node addresses.
The repacker daemon disables itself due to required permissions.
scyld-cluster-conf load
improvements:Multiple PXE boot networks can be loaded from a single configuration file.
Nodes will be assigned to the most recently defined network during parsing.
Support 'gw', 'via', and 'as' when parsing remote network definitions.
scyld-nodectl
improvements:Parallelize power control commands.
Improved output streaming and parallelization.
Improved handling of stdin and --stdin.
Default the ssh_runner fanout value to 16 nodes at a time.
More documentation and examples.
11.0.1-b0209 - April 19, 2019
Third restricted release.
Includes the new clusterware-dnsmasq package, which supports resolving host names from /etc/hosts on the head node. See Node Name Resolution.
Support for establishing remote access between the head node(s) and compute nodes, or between compute nodes, by distributing SSH keys. See Compute Node Remote Access.
Support for booting "disked" compute nodes. See the Installation & Administrator Guide and Reference Guide for details.
Excludes
/boot/initramfs-*
files, and does not exclude/etc/ssh/ssh_host_*
files, when packing images.The Penguin serial number now appears in node hardware info, if it exists.
scyld-nodectl exec
improvements:Command now exits with the subcommand's exit code.
Command can now operate through the head node (default) or --direct.
Hide some
ssh
warning messages.
11.0.1-b0197 - April 5, 2019
Second restricted release.
Numerous bug fixes and enhancements.
11.0.1-b0183 - March 22, 2019
First restricted release.
ClusterWare TORQUE reverts some changes that were made to the original Adaptive Computing distribution for Legacy ClusterWare 6 and 7:
Includes the built-in pbs_sched job scheduler, and does not include the maui scheduler.
Includes "LimitCORE=infinity" that 6 and 7 has removed.
Reverts the name pbs_trqauthd back to the original trqauthd, and pbs_mom and trqauthd are now systemd daemons.
Known Issues And Workarounds¶
The following are known issues of significance with the latest version of ClusterWare and suggested workarounds.
The head node(s) must use a RHEL7 or CentOS7 base distribution release 7.6 or later environment, due to dependencies on newer libvirt and selinux packages.
Scyld OpenMPI versions 4.0 and 4.1 for RHEL/CentOS 8 require ucx version 1.9 or greater, which is available from CentOS 8 Stream and RHEL 8.4.
When using a TORQUE or Slurm job scheduler (see Job Schedulers), if a node reboots whose image was not created using
/opt/scyld/clusterware-tools/bin/sched-helper
, then the cluster administrator must manually restart the job scheduler. For example, if needed for a single node n0:NODE=n0 torque-scyld-node
orNODE=n0 slurm-scyld-node
. Or to restart on all nodes:torque-scyld.setup cluster-restart
orslurm-scyld.setup cluster-restart
.Ideally, compute node images are updated using
torque-scyld.setup update-image
orslurm-scyld.setup update-image
, which installs the TORQUE/Slurm config file in the image and enables the appropriate service at node startup.If administrators are using
scyld-modimg
to concurrently modify two different images, then one administrator will see a message of the form:WARNING: Local cache contains inconsistencies. Use --clean-local to delete temporary files, untracked files, and remove missing files from the local manifest.
then use
scyld-modimg --clean-local
.However, only execute
--clean-local
after allscyld-modimg
image manipulations have completed.The head node's GRUB_CMDLINE_LINUX in
/etc/default/grub
must not contain ipv6.disable=1; otherwise, thememcached
daemon cannot start (seen in/opt/couchbase/var/lib/couchbase/memcached.log.*
logs), which means that Couchbase cannot start, despite the fact that Couchbase does not actually use IPv6.Ensure that
/etc/sudoers
does not contain the line Defaults requiretty; otherwise, DHCP misbehaves.The NetworkManger-config-server package includes a
NetworkManager.conf
config file with an enabled "no-auto-default" setting. That is incompatible with ClusterWare compute node images and will cause nodes to lose network connectivity after their boot-time DHCP lease expires. Either disable that setting or remove the NetworkManger-config-server package from compute node images.The
scyld-clusterctl repos create
command has aurls=
argument that specifies where the new repo's contents can be found. The most common use isurls=http://<URL>
. The alternativeurls=file://<pathname>
does not currently work. Instead, you must first manually create an http-accessible repo from that pathname. See Appendix: Creating Local Repositories without Internet.When moving a head node from one etcd-based cluster to another using the
managedb join
command, please reboot the joining head once the join is complete.If a new head node is failing to join an existing etcd-based cluster check
/var/log/clusterware/etcd.log
and look for repeated lines of the form:<DATE> <SERVER> etcd: added member <HEX> [<URL>:52380] to cluster <HEX>
If the log file contains multiple of these line per join attempt please try running
managedb recover
on an existing head node and joining all head nodes back into the cluster one-at-a-time. Re-joining heads that were previously in the cluster may require a--purge
argument, i.e.managedb join --purge <IP>
scyld-install
performs its early check to determine if a newer clusterware-installer RPM is available by parsing the appropriate clusterware repo file (typically/etc/yum.repos.d/clusterware.repo
) to find the first base_url= line. If there are multiple such lines, i.e., specifying multiple ClusterWare repos, then the cluster administrator should order the repos so that the repo containing the newest RPMs is the first repo in the file.Any compute node booting from a head node upgraded to 11.7.0 but using a version of
clusterware-node
older than 11.2.2 may not successfully send status. Please upgrade theclusterware-node
package inside the image to resolve this problem.