Changelog & Known Issues

See Release Notes for summary information about the latest ClusterWare release. This section contains a more detailed ChangeLog history of all releases.

12.1.1-g0000 - January 23, 2023

  • Assorted fixes for initramfs ignition use when booting el9 nodes.

  • Rework how scyld-nodectl ssh gets node keys allowing for ssh into el9 nodes with FIPS enabled.

  • Print names in place of some UIDs returned by scyld-*ctl tools.

  • Note and handle that ram_total / ram_free are stored in KiB.

  • Check all uses of urlparse().netloc and replace several with urlparse().hostname.

  • Assorted test script and other bug fixes.


12.1.0-g0000 - December 28, 2023

  • Head node hosted gitrepos can mirror upstream repositories.

  • Several bug fixes around the scyld-nodectl waitfor functionality.

  • Hide the exports section in scyld-imgctl output unless -L is used.

  • Fix a long standing bug during file upload where "Finishing up..." still be displayed after upload was complete.

  • Fix a long standing bug during file upload that caused an additional file checksum computation.

  • Deprecate the nodes.boot_timeout global in favor of a per-node _boot_timeout attribute.

  • Fix head node eject / leave functionality to make it less likely a removed head node will automatically rejoin or try to provide services to compute nodes.

  • Fix PREFER_KMOD handling in /opt/scyld/clusterware-tools/conf/mkramfs.conf

  • Technology preview of a scheduler-watcher that can be used to feed scheduler status into the ClusterWare database. Attribute names and other details may change.

  • Enable the slider to show and hide scheduler status within the GUI if any node has status information.

  • Avoid address-in-use socket errors with multiple backend daemon threads.

  • Fix typos that broke sync-uids and take-snapshot in ClusterWare 12.

  • Make systems for node status, hardware, heath, and monitoring use plugins for easier management.

  • Authenticate with a user's SSH agent if they have already uploaded their public keys into the system.

  • New support for partitioning during boot using ignition. See the documentation for the _ignition reserved attribute for details.

  • Support for installing the GRUB 2 bootloader during boot. See the documentation for the _bootloader reserved attribute for details.

  • Improved image capture capabilities with better error handling and using optional credentials and sudo.

  • Implement a local signing authority for node client certificates stored in node TPMs.

  • Support searching for a node by hostname even when it differs from the ClusterWare node name.

  • Allow matching of naming pools in node selection using the same syntax that already matched dynamic groups.

  • Add support for attaching an attribute group to a naming pool.

  • Add _domain to specify the domain without using _hostname.

  • Confirmed ClusterWare works on Rocky 9.3 and similar distros.

  • Add a mechanism (chroot.env_paths) to define specific environment variables during image creation.

  • Fix several bugs around node renaming that could have permitted multiple nodes with the same MAC or similar issues.

  • Assorted GUI improvements, bug fixes, and performance improvements.


12.0.1-g0000 - July 24, 2023

  • Reimplement and expose the scyld-nodectl scp functionality.

  • Push scyld-pack-node to systems when running scyld-modimg capture. This also allows us to remove the clusterware-common package.

  • Improve proxy handling during the installation process.

  • Improve the handling of the _hosts attribute.

  • Initial support for scripting scyld-modimg through --run.

  • Provide a mechanism for changing the default hash from sha1 to sha256 or sha512.

  • Deprecate scyld-install --clear in favor of --clear-all.

  • Fix output labelling in scyld-nodectl exec results.

  • Mark node status and the current head node in managed --heads output.

  • Expand image capture to use _remote_user / _remote_pass.

  • Improved Debian / Ubuntu image creation.

  • Use the latest squashfs tools for packing and unpacking images.

  • Assorted bug fixes and performance improvements.


12.0.0-g0000 - April 21, 2023

  • The first release of ClusterWare version 12. Please see Updating ClusterWare 11 to ClusterWare 12 for more details.

  • Support RHEL / Rocky 9 as a head node and compute node platform.

  • Upgrade to use Python 3.9 on all head node platforms.

  • Entirely rewritten GUI with much more functionality (see ClusterWare Graphical Interface (GUI)).

  • Switch to Telegraf, InfluxDB version 2, and Grafana instead of TICK. See Monitoring Graphical Interface for details about Grafana.

  • Initial support for GRUB 2 as an alternative for iPXE.

  • Configure chrony at install time for time sync within the cluster.

  • Update managedb save to default to saving ONLY the database.

  • Fix selection language matching for attributes[_boot_config].

  • Include a newer (4.6) version of squashfs tools for more recent SELinux-related features.

  • Allow command line clients to authenticate by signing messages with their SSH keys.

  • Remove banner.txt support and use SSH LogLevel to control banner display when executing remote commands.

  • Avoid a crash when two attributes only differ in capitalization.

  • Fix "accept unknown nodes" behavior.

  • Fix behavior of scyld-nodectl exec --label.

  • Implement a new JWT-based authentication system with refresh tokens.

  • New in-memory caching and indexing mechanism to improve document store lookup times.

  • Provide a mechanism to record additional DNS mappings in the ClusterWare database.

  • Default to installing config-less Slurm.

  • Provide a tool to create a scyld-kube.iso for installation on clusters without internet access.

  • Support booting nodes using UEFI in HTTP mode.

  • Implement a restricted status-updater for "busy" nodes in C code, and provide attribute _status_cpuset to restrict cw-status-updater service subprocesses to a specific set of CPU cores.

  • Remove all references to Couchbase and some remaining NFS references.

  • Enable scyld-nss by default on head nodes for name resolution.

  • Use the dracut version native to the image instead of a custom ClusterWare version.

  • Multi-head clusters now automatically rebalance nodes between heads.

  • Many other bug fixes and optimizations.


11.9.2-g0000 - January 9, 2023

  • Override a new 1GB upload limit in httpd.

  • Directly serve ISO contents for faster access.

  • Default to faster public gitrepo access via Git Smart HTTP.

  • Fix openSUSE image creation on el8.

  • Assorted other bug fixes.


11.9.1-g0000 - November 4, 2022

  • Assorted improvements to scyld-nss.

  • Fix two GUI crashes when viewing image details.

  • Remove dependency on libcgroup that caused image creation failures.

  • Fix bug where a 11.9.0 head node could not join to a pre-11.9.0 head node.

  • Add an SELinux module in the clusterware-ansible package.

  • Fix scyld-modimg hang on system with SELinux disabled.

  • Restrict ZTP-boot to Cumulus switches.

  • Assorted other bug fixes.


11.9.0-g0000 - September 30, 2022

  • Initial support for ZTP-boot for switches.

  • Implement a couple fixes for the install-time el8 STIG.

  • Include assorted tpm2_* tools into the initramfs for storing encryption keys in the compute node's TPM.

  • Include the nvme driver in the initramfs for disked booting on NVMe storage.

  • Improved scyld-modimg SELinux labeling with a parallel setfiles.

  • Fix a regression that broke the sshpass integration.

  • Fix isc-dhcpd.log parsing in el8 and el9.

  • Fix openSUSE image creation.

  • Initial scyld-nss implementation for compute node name resolution on head nodes without DNS.

  • Attempt to install nscd during image creation.

  • Implement attribute substitution in power_uris.

  • Initial support for a new _ansible_pull_now attribute.

  • Enable repo_gpgcheck in our software repositories for el8 and later.

  • Implement scyld-install --non-interactive for unattended installs.

  • Disable and remove additional services during scyld-install --clear-all.

  • Update to etcd 3.5.4

  • Include a new GPU data collection script for TICK.

  • Assorted other SELinux updates, bug fixes, and scaling improvements.


11.8.2-g0000 - August 19, 2022

  • Fix a timeout during batch-create for nodes.

  • Capture more log files in scyld-sysinfo.

  • Greatly reduce calls to rpm in update-node-status.

  • Ensure tftp starts after reboot on el8 head nodes.


11.8.1-g0000 - August 2, 2022

  • Fix a regression that could cause a crash during image cloning.

  • Fix bulk node creation when using a @contents.json source.

  • Hide the BMC password in scyld-nodectl sol output.

  • Fix handling of --binary in scyld-nodectl exec.

  • Enable Git Smart HTTP for head node hosted git repositories. Further Git improvements coming soon.

  • Assorted other bug fixes.


11.8.0-g0000 - June 17, 2022

  • Fix a scyld-nodectl ping crash where a down node is still ping-able.

  • Default to not compressing data during image capture making the process significantly faster.

  • Better checking that the ClusterWare installation source matches the system where the installation is running.

  • Fix a bug that would render scyld-*ctl tools unusable when deleting an in-use naming pool.

  • Fix long standing bug where a changing a node MAC would not be pushed to the DHCP server without another network change or a service restart.

  • Use unsquashfs to unpack cwsquash file systems during rwram booting for a significant speedup.

  • Implement variable replacement in boot configuration command lines.

  • Add a node script to configure BMC settings based on a mixture of the power_uri field along with the _ips and _gateways attributes.

  • Rewrite handling for adding unknown nodes to the cluster when they are seen making DHCP requests. Pre-loading MACs is still preferred.

  • Fix scyld-clusterctl heads clean to clean unused files in more cases.

  • Include more variables (groups, power_uri, etc.) in the [Node] section of the attributes.ini on compute nodes.

  • Clean up basic.ks kickstart example file and separate ClusterWare related parts into an includable file.

  • Fix down head handling on locally installed systems.

  • Separate more IPv4 and IPv6 code and logging.

  • Assorted changes to support Python 3.9 and later.

  • Snap pip package versions forward.

  • Add scyld-install --iso support to allow administrators to install from a downloadable ClusterWare ISO.

  • Further refine support for installing RHCOS from an uploaded ISO using their ignition system.

  • Implement "then" support to allow multiple steps in a single command when using ClusterWare command line tools.

  • Fix bugs with manually modifying an existing naming pool.

  • Implement scyld-nodectl power setnext <bootdev> to set the next boot device for a node.

  • Define parent-head-node inside scyld-modimg --chroot.

  • Update /etc/hosts on compute nodes when they cycle from a down head to a working one.

  • Implement scyld-modimg --write-repos to rewrite an image's clusterware-node.repo file based on the current head node configuration.

  • Add a variety of new tests to better catch regressions.

  • Assorted other bug fixes and scaling improvements.


11.7.2-g0000 - April 6, 2022

  • Correct a pair of bash variable name collisions in update-node-status and the custom dracut module.

  • Auto-reconnect for scyld-nodectl sol.

  • Tweak the adjust-repos.sh script to work before find is installed.

  • Catch a very unlikely case where nodes have the same system UUID.

  • Assorted other bug fixes.


11.7.1-g0000 - March 18, 2022

  • Small changes so _boot_rw_layer=rwtab works.

  • Hide last_modified_on in the output from my tools.

  • Fix script URL If the iPXE boots from the second network.

  • Recognize uploaded RHEL CoreOS ISOs.

  • Improve the export and iso download progress output.

  • Administrators can use URLs as sources when uploading ISOs and other binaries.

  • Implement scyld-install --os-iso and the corresponding scyld-add-boot-config --iso.

  • Do not automatically use vault.centos.org as a DefaultImage installation source.

  • Assorted other bug fixes.


11.7.0-g0000 - February 23, 2022

  • Detect when a CentOS 8 image is being created and update the files in /etc/yum.repos.d to use the vault.centos.org server.

  • Change the basic.ks example so the node sets its own _boot_style=next before rebooting instead of powering off.

  • Catch a code path that could result in duplicate names for images.

  • Support name= syntax to clear attributes and fields.

  • Give administrators more control over image creation details via the --pkgmgr parameter.

  • Implement a document cache to improve performance database performance.

  • Support unconfiguring bootnet at the end of the initramfs.

  • Significantly expand the scyld-nodectl waitfor functionality.

  • Support per-node _gateways and _macs reserved attributes.

  • Accept additional MAC address formats.

  • Recognize nodes by DUID to support booting with IPv6 via dhcpv6.

  • Split dhcpd.conf.template into multiple parts for easier management.

  • Restore more specific success and error messages from node power control commands.

  • Implement _disk_wipe and add encryption support to _disk_cache reserved attributes.

  • Add a reboot-kexec command to the clusterware-node package to trigger kexec rebooting from inside the node.

  • Include mdadm in the initramfs if it is installed in the image.

  • Fix bug where scyld-modimg auto-deletes a cached item and then immediately tried to use it.

  • Generalize IP code to properly handle IPv6 addresses.

  • Properly handle interfaces with more than one address.

  • Include the ipxe.iso for booting virtual machines using IPv6.

  • Assorted Ubuntu compute node networking fixes.

  • Remove unused URL parameters passed during the iPXE script download.

  • Improve scyld-sysinfo data collection on Ubuntu compute nodes.

  • Include Scyld TICK packages in scyld-install --clear-all / --update.

  • Correctly terminate scripts with non-zero exit code on error.

  • Remove the unused ip= kernel argument added during compute node boot.

  • Better error handling during chain booting.

  • Automatically update compute node hostnames when the node name or _hostname attribute changes.

  • Allow for clearing selected database object fields by setting their value to an empty string, i.e. the node-level cmdline field.

  • Log more information about each request in the api_access_log.

  • Assorted other bug fixes and scaling improvements.


11.6.0-g0000 - October 31, 2021

  • Fix a boot-time systemd service race condition by forcing NetworkManager.service to wait for cw-boot-prenet.service to complete.

  • Require scyld-adminctl keys to be unique so they can be used to identify the user in ssh sessions during git clone.

  • Update headctl to automatically open and close firewalld ports.

  • Empty ~/.scyldcw/ during scyld-install --clear-all but leave the logs directory intact.

  • Bump busybox version to 1.34.1 and remove unused build options.

  • Update the slurm-scyld packages to version 21.08.3.

  • Update the TICK packages: telegraf to version 1.20.2, influxdb to version 1.8.10, and chronograf to 1.9.1.

  • Fix a boot chaining issue where peers waited on each other, and enable boot chaining by default with chaining.enable now defaulting to True.

  • Make argument order optionally much more flexible.

  • Better repository related error messages from scyld-install.

  • Ensure node FQDNs are placed before the corresponding short names in our dnsmasq hosts file.

  • Further improvements to scyld-cluster-conf to handle more config file formats.

  • Rewrite naming pool pattern collision detection.

  • FIPS fixes for Lark grammar parser.

  • Add support for pushing IPs from secondary pools into node _ips attributes.

  • Include multiple (default up to 3) head nodes for DNS through dhcp.

  • Document head node support for AlmaLinux.

  • Add a new optional clusterware-ansible package including boot-time ansible-pull support.

  • Use symlinks when including git in ClusterWare packages.

  • Implement scyld-nodectl waitfor using --selector logic.

  • Expose and improve the scyld-nodectl --selector support.

  • Client-side recursive delete for boot configuration and distros.

  • Delete unmodified images from the scyld-modimg cache older than one hour.

  • Add an optional uname wrapper inside scyld-modimg --chroot.

  • Snap pip and npm package versions forward.

  • SELinux improvements for MLS and RHEL registration.

  • Replace systemctl restart with systemctl reload in the installer and headctl scripts.

  • Set the new id_method status field based on how the head node identified the compute node during the status update.

  • Add new scyld-clusterctl gitrepos and scyld-clusterware certs tools.

  • Implement --version in scyld-kube so administrators can deploy a specific version.

  • Assorted other improvements and bug fixes.


  • 11.5.1-g0001 - October 5, 2021

  • SELinux changes for manipulating MLS images.

  • Correct scyld-modimg discard behavior.

  • Correct scyld-nodectl ping and related failures.

  • Assorted other bug fixes.


  • 11.5.1-g0000 - October 4, 2021

    • Significant changes to scyld-cluster-conf to improve handling of less common configurations and make future maintenance simpler.

    • Check for rpmsave / rpmnew files post upgrade and notify the cluster administrator that they should be addressed.

    • Fix a typo in the influxdb rsyslog configuration.

    • Add parent-head-node to /etc/hosts inside chroots to more closely match deployed image behavior.

    • Significant rearrangement of scyld-modimg to make future maintenance simpler.

    • Fix uid/gid handling in scyld-modimg --import. Tool now expects any tar being imported to contain correct numeric uid / gid instead of matching names against the host system.

    • Remove deprecated replace code in favor of update calls.

    • Use inst.ks= instead of deprecated ks=. Note that this does break kickstarting for el6.

    • Fix a bug in scyld-nodectl exec for nodes that changed IP addresses after a ClusterWare head node upgrade.

    • Remove most references to the cwtar format but ensure scyld-modimg can still import and repack the format.

    • Test Rocky Linux 8.4 head nodes.

    • Update busybox used in the custom initramfs.

    • Remove remaining traces of NFS Ganesha integration.

    • Small improvements to image capture functionality.

    • Customize the iPXE user-class (now CWiPXE) to more tightly control the iPXE boot process.

    • Force iPXE to use low numbered ports when fetching boot files.

    • Version bumps for many third-party packages.

    • Include a custom-built version of git for future features.

    • Remove long deprecated code.

    • Assorted other improvements and bug fixes.


  • 11.5.0-g0001 - September 2, 2021

    • Fix a regression that broke kickstart and live booting.


  • 11.5.0-g0000 - August 26, 2021

    • Default to etcd for new installs.

    • Add HA kubernetes support to the scyld-kube utility.

    • Implement automatic etcd compact and defrag on heads nodes as well as auto-eject and auto-rejoin for the head node cluster.

    • Both general and etcd specific performance improvements.

    • Ensure _no_boot stops soft and hard power control commands unless --force is specified.

    • Improve scyld-mkramfs --update to accept a --kver <VERSION> option when updating a boot configuration after a kernel upgrade.

    • Implement scyld-nodectl reboot --kexec in preparation for image previewing.

    • Reduce resource usage in scyld-nodectl status --refresh.

    • Ongoing effort to encrypt more compute node to head node and compute node to compute node communications.

    • Reduced polling by implementing waiting on database content changes.

    • Much cleaner logging during service shutdown.

    • Code and endpoint removal and cleanups.

    • Assorted other improvements and bug fixes.


  • 11.4.3-g0000 - July 8, 2021

    • Confirm image creation from an ISO is limited to packages on the ISO.

    • Improved FIPS support for earlier RHEL and CentOS releases.

    • Improved proxy handling during scyld-install --update.

    • Rewrite scyld-nodectl status --refresh to handle more corner cases and terminal resizing.

    • Include scyld package versions in scyld-nodectl ls -L output.

    • Add support for redirecting stdout and stderr into per-node local files when using scyld-nodectl exec.

    • Fix bugs around manually setting network database parameters.

    • Simplify URLs used during the early boot process.

    • Change kubeadm runtime to use containerd instead of docker.

    • Change dnsmasq SRV request handling to immediately return an invalid response to requests from Couchbase.

    • Update the slurm-scyld packages to version 20.11.8.

    • Update the TICK packages: telegraf is version 1.18.3, influxdb is version 1.8.6, chronograf is 1.8.10, and kapacitor is version 1.5.9.

    • Assorted other improvements and bug fixes.


  • 11.4.2-g0000 - June 4, 2021

    • Upgrade OpenMPI removing the Slurm library version dependency.

    • Initial support for MLS on RHEL 7 and CentOS 7 head nodes.

    • Support scyld-nodectl ping to ping nodes on demand.

    • Support URL encoding of the password section of the power_uri to handle additional characters. Any power_uri currently containing % may need to be updated.

    • Use excludes.txt to exclude specific directories in the squashfs packer.

    • Exclude paths from the setfiles call when exiting the chroot.

    • Add additional security related HTTP headers.

    • Small updates to the ReactJS GUI including fixing checkbox behavior.

    • Fix a managedb save regression that defaulted to saving into a directory instead of a file.

    • Assorted other improvements and bug fixes.


  • 11.4.1-g0000 - April 30, 2021

    • Significant performance improvements from reducing database contention in high node count clusters.

    • Technology preview: Adding etcd as an alternative database backend. etcd should allow for further scale improvements in later releases.

    • Announcing support for Oracle Linux 7 and 8. These operating systems are now supported for both head nodes and compute nodes in all the configurations supported for RHEL and CentOS.

    • Expand the information collected about hardware and firmware versions at boot time. This information expands the possibilities for cluster administrators to detect and track cluster changes.

    • Improved FIPS and MLS support. ClusterWare now supports compute nodes CentOS and RHEL 8 images in MLS enforcing mode, and FIPS 140-2 is fully supported on head nodes and compute nodes across the cluster.

    • Removing NFS compute node root file system integration. In modern, scalable clusters the benefits of separating compute nodes from head nodes (e.g. simplicity, performance, security, and independence from head nodes post boot) significantly outweigh the costs of running the core operating system from RAM.

    • Support switching database backends on existing clusters using headctl.

    • Streamline the peer download process used to pass files between head nodes.

    • Improve initramfs LUKS support when booting ephemeral compute nodes with _disk_root and boot style disked.

    • Fix multiple problems with CentOS / RHEL 8 kickstart support. The example basic.ks kickstart file now works for versions 7 and 8.

    • Print any unexpected errors from the setfiles call when exiting the scyld-modimg chroot.

    • Correct IP calculations in more complicated cluster configurations.

    • Improve scyld-nodectl exec across large node counts.

    • Better banner filtering from scyld-nodectl exec and soft power control via scyld-nodectl <reboot|shutdown>.

    • Attempt to install rdma-core and fipscheck packages in newly created compute node images.

    • Improve backend caching and hinting for IP/MAC/name to UID translations.

    • Improved FIPS support on compute nodes. This change requires upgrading clusterware-node inside images and rebuilding the initramfs files.

    • Deprecate NFS Ganesha integration and obsolete the clusterware-ganesha package.

    • Correct CentOS 6 images to point at the CentOS vault during creation.

    • Improve scyld-install to stop earlier on error.

    • Include mlx5_core by default in initramfs files.

    • Implement stored selectors and dynamic groups.

    • Names of dynamic groups cannot collide with attribute group names.

    • Dyngroups can reference other dyngroups but can be slow to evaluate.

    • Properly report group join / leave failures.

    • Update the parser used on node specifications.

    • Upgrading and refreeze all pip packages to latest versions.

    • Remove unnecessary pip packages from the virtual environment.

    • Assorted other improvements and bug fixes.


  • 11.4.0-g0000 - January 22, 2021

    • Initial kubeadm support. This adds a new clusterware-kubeadm package providing a scyld-kube command. See Kubernetes for details.

    • Support passing %group to scyld-nodectl in place of a node specification to affect all nodes in the named group.

    • When installing from a ClusterWare ISO, upload that ISO into the ClusterWare system as a repo and update any file:// URLs in clusterware.repo accordingly.

    • Properly copy gpgcheck values from /etc/yum.repos.d/clusterware.repo on the head node to the clusterware-node.repo during image creation.

    • Do not ask for a ClusterWare password when stdout is not a terminal.

    • Support CentOS Stream for head nodes and compute node images.

    • The Job Scheduler -scyld.setup scripts now support optionally naming specific nodes (vs. presuming all up nodes) for the actions init, reconfigure, and update-nodes. See Job Schedulers.

    • Remove some unused Couchbase-related Requires and BuildRequires from the spec file.

    • Default to using the current $USER in scyld-* commands when no client.authuser is defined in settings.ini.

    • Provide ISO contents over HTTP via a /repo/<name>/content/ URL.

    • Do not record virt-what output on non-virtual nodes.

    • Update libvirt-python to 6.10.0. Expect many other Python and NPM packages to be updated in following releases.

    • Enable HTTPS communication during head node installation.

    • Fix set-node-attribs command line parsing, and treat arguments without '=' as a request to delete the named attribute.

    • Do not install clusterware-ganesha during head node installation.

    • Fix kernel version detection when multiple /lib/modules/<kernel>/ directories exist.

    • Switch backend subexec from multiprocessing to threading, thereby making some deadlocks much less likely.

    • Assorted other improvements and bug fixes.


  • 11.3.0-g0001 - December 2, 2020

    • Simplify initramfs rwram booting with SELinux by fully preserving rather than restoring SELinux contexts from the image.

    • Compute IPs at node creation time instead of waiting for the leases daemon to compute the same. Clearing the ip field via scyld-nodectl up ip= will trigger immediate recomputation.

    • Confirm incoming _boot_config and _boot_style strings are usable before accepting them.

    • Adapt initramfs scripts to boot Ubuntu and Debian images.

    • Improved support for customizing initramfs files through scyld-mkramfs.

    • Add scyld-mkramfs --update <bootconfig> to simplify the common case where a cluster administrator wants to update the initramfs in an existing boot configuration.

    • Initial implementation of scyld-chroot inside scyld-modimg chroots including copyin, copyout, and info.

    • Fully disable backend image repacking since we now only use a single image format.

    • Capture more information about compute node storage and infiniband hardware.

    • Expand the yum and dnf handler to also support zypper systems, i.e. openSUSE.

    • Try to install less, iperf3, and cryptsetup when creating images.

    • Initial implementation of scyld-bootctl import to match the existing export command.

    • Assorted other improvements and bug fixes.


  • 11.2.2-g0000 - October 30, 2020

    • Add /opt/scyld/clusterware/bin/headctl script to enable / disable Apache features on the head node. Can enable / disable HTTPS and set compute nodes to prefer HTTPS communication. Will default to preferring HTTPS in future release.

    • Compute nodes verify server identity provided by HTTPS when possible, but default to accepting unverified head nodes.

    • Further address a low probability file corruption bug when scyld-modimg unpacks images.

    • Fix IP collision bug introduced in 11.2.1 so that X.X.X.1 is not detected as matching X.X.X.1[0-9]+.

    • The scyld-tool-config tool will generate a HTTPS base_url field when connecting to any server other than localhost.

    • Assorted SELinux updates for basic MLS policy.

    • Increase default password lengths as they are rarely manually entered.

    • Rearrange Apache configuration files to simplify changes in /etc/httpd and add a CW-Proxy-Secret header to confirm when backend system can trust other forward-related headers.

    • Double Python thread count to 32.

    • Initial LUKS in the initramfs providing encryption-at-rest for ephemeral compute node boot style disked with _disk_root.

    • Initial implementation of compute node peer downloads for boot chaining. Controlled by chaining.enable base.ini variable that defaults to False.

    • Add arping to busybox for early dhcp client scripts.

    • Remove deprecated arguments and content from dhcpd.conf.template, scyld-clusterctl, mount_rootfs, etc.

    • Add scyld-nodectl sol [--enable|--steal] options.

    • Include node hostnames in dhcp offers and more aliases in dns.

    • Expanded support for _ips to create ifcfg-IFACE files.

    • Include public ssh host keys in compute node status.

    • Pass the head node's gateway to compute nodes on the same network.

    • Capture more hardware (IB, NVMe) details during node boot.

    • Assorted other improvements and bug fixes.


  • 11.2.1-g0000 - September 24, 2020

    • Add mount/umount back into sudoers.d for Ganesha exports.

    • Fix Ganesha export permissions.

    • Disable backend repacking.

    • Disable zypper detection that triggered in odd circumstances.

    • Fix parsing of distribution major number.

    • Exclude tests folders from clusterware-tools.

    • Fix percent sign use in _boot_tmpfs_size.


  • 11.2.0-g0000 - September 4, 2020

    • Support for CentOS / RHEL 8 head nodes.

    • Remove cwtar as a backend image format, leaving only cwsquash.

    • Fix scyld-modimg crash on bad --query.

    • Fix scyld-nodectl ls -l (and ls -L) ram_total and scyld-nodectl status -L ram_free output.

    • Fix permissions when creating files in sync-uids.

    • Fix scyld-modimg --create for CentOS 8.0 / 8.1.

    • Wait for rebalance to complete when joining head nodes.

    • Allow for zero-padding of node names.

    • Add more scyld-nodectl ls -l and ls -L output fields.

    • Rework scyld-add-boot-config to be more flexible.

    • Include example node.sh for locally installed compute nodes.

    • Only use local authentication when connecting to local server.

    • Improve locally installed compute node hostname handling.

    • Combine and improve calls to file to identify objects.

    • Remove remaining bits of bpstat and other legacy tools.

    • Install an example settings.ini during scyld-install.

    • Shorten paths in some output to make output more readable.

    • Trick mksquashfs into providing more detailed progress.

    • Clean up and standardize database failure cases, and resume daemons when database recovers.

    • Implement database purge and improve scyld-install --clear.

    • Improve package removal during scyld-install --clear-all.

    • Change the cwsquash format to use a GPT partition table.

    • Move ganesha SELinux rules into the clusterware-ganesha package.

    • Improve the take-snapshot tool, which performs database backups and manages retention of those backups, typically executing as a cronjob. See take-snapshot in the Reference Guide.

    • Improvements to scyld-sysinfo, including no longer requiring setup of user root authentication to capture state of compute nodes.

    • Assorted other improvements and bug fixes.


  • 11.1.2-g0001 - July 8, 2020

    • Patch pyramid in the virtual environment to allow a non-security use of md5.


  • 11.1.2-g0000 - July 1, 2020

    • Initial implementation of node naming pools.

    • scyld-install update now calls managedb update.

    • Head and compute node status includes their "now" timestamp.

    • Initial implementation of head nodes as a chrony pool defaults to disabled.

    • Squashfs tools now use 50% of the available processors, although this is configurable.

    • Boot time set_hostname.sh script now uses hostname instead of hostnamectl on CentOS 6.

    • Fix an authentication race that triggered password prompts.

    • Initial support for CentOS 8.2 compute node images.

    • Add a short (0.03s) cache in the database layer.

    • Improved kickstart menu generation.

    • Use enabled=0/1 in /etc/yum.repos.d/clusterware.repo to avoid inadvertent yum updates.

    • Changes to scyld-install in preparation for CentOS 8.

    • Expanded variable substitution in kickstart files.

    • Improved SELinux permissions on enforcing compute nodes.

    • Fix file descriptor leak causing "too many open files" error.

    • Support X-Sendfile when downloading images and boot files.

    • scyld-modimg --query lists all installed packages.

    • Fixes to scyld-modimg discard and upload logic.

    • Assorted other improvements and bug fixes.


  • 11.1.1-g0002 - May 27, 2020

    • Only updating clusterware-tools and these release notes.

    • Remove a log statement that caused a crash in scyld-nodectl exec when providing stdin.

    • Conditionally reinstate some initramfs code that is required to successfully boot a cwsquash image with style rwram.


  • 11.1.1-g0001 - May 21, 2020

    • Use cgroups to identify and terminate child processes from a chroot.

    • Ignore /tmp and /var/tmp when correcting SELinux contexts in a chroot.

    • Use the head IP instead of the gateway IP in iscsi boot style.

    • Database cleaning code is now aware of uploaded ISO files.

    • Cleaning code will not attempt to connect to a down head node.


  • 11.1.1-g0000 - May 19, 2020

    • Clearer errors from the client tools when the head node is unresponsive.

    • Handle when a large upload times-out, fixing the "size does not match" error.

    • Add mechanism for starting a long running task and checking for results in separate calls with a custom HTTP header.

    • Rewrite remotely deleted files detection to reduce the chances of leaving .old.00 files.

    • More daemons now clean up their leftovers in the workspace/ directory.

    • Add storage cleaning support via the scyld-clusterctl heads clean command. See scyld-clusterctl for details.

    • The status of ClusterWare services on a head node or nodes can now be checked and changed via the scyld-clusterctl heads service command. See scyld-clusterctl for details.

    • Fix a case that failed to find the disk during iscsi booting.

    • Improvements to libvirt power control for VM compute nodes.

    • Improved logging in SSH and Couchbase failure cases.

    • Nodes can be reordered using scyld-cluster-conf load without losing configuration.

    • Fix a cloning failure that left file copies in /opt/scyld/clusterware/storage/.

    • Display "|deleted|" when a database link is broken in scyld-bootctl or scyld-attribctl.

    • More consistent error and success messages from power on/off/status.

    • Reduce database calls in common code paths.

    • Small fixes to /opt/scyld/clusterware-installer/make-iso for ISO image generation.

    • When exiting scyld-modimg, move the stdout of "fixing SELinux file labels" to after choosing to keep an image, not prior to that choice.

    • Document booting memtest86+ on compute nodes.

    • Better error handling in clusterware-node scripts and head initialization.

    • Assorted other improvements, code clean ups, and bug fixes.


  • 11.1.0-g0001 - March 16, 2020

    • Default to rwram booting even when using the cwsquash format.

    • Improvements to the code that pulls images, ISOs, and boot files between heads.

    • More useful error messages from scyld-modimg package commands.

    • Better iSCSI device detection at boot time.

    • Default the authentication cookie lifetime to 20 minutes.

    • Initial support for capturing images from running nodes.

    • Support for the SELinux MLS policy on compute nodes.

    • Support tar input and output for managedb.

    • Expanded ISO upload and kickstart support.

    • Add _boot_style live and next for booting CentOS / RHEL ISOs.

    • Improved support for re-assigning compute node indices.

    • Compute nodes will re-fetch keys and head nodes if their head node is replaced with a new installation.

    • Simplify steps to switch head node SELinux status.

    • Include more tools in the initramfs busybox build.

    • scyld-install is more forgiving when creating the first user.

    • Adding --grouped and --in-order support to scyld-nodectl exec.

    • Officially support scyld-modimg --mount / --unmount.

    • Capture any modified installed file in scyld-sysinfo.

    • Include rsyslog and network information from telegraf.

    • Include progress meters on all scyld-*ctl uploads or downloads.

    • Support uploading larger files such as full DVD ISO files.

    • Add initial support for creating ClusterWare installation ISO images.

    • Assorted other improvements, code clean ups, and bug fixes.


  • 11.0.8-g0000 - November 8, 2019

    • dhcpd.conf.template improvements to simplify bootstrapping systems.

    • Initial implementation of take-snapshot for backing up the database and images.

    • Pass more power command errors up to the user.

    • Fix SELinux permissions for chronograf proxying.

    • Move port numbers into named services for firewalld.

    • FIPS fixes for ISC dhcpd to allow and default OMAPI to hmac-sha1.

    • Default to using -Ilanplus for ipmitool calls.

    • Support for filtering banners out of scyld-nodectl exec.

    • Add a _remote_user attribute so we no longer require root ssh to control compute nodes.

    • Improvements to the Slurm and TORQUE helper scripts.

    • Add the sync-uids script to inject user accounts.

    • Generate longer passwords for Couchbase.

    • Replace most periodic sudo calls with long-lived scripts to reduce logging to /etc/log/secure.

    • Default authentication to pam_authenticator + maplocal.

    • Assorted other improvements and bug fixes.


  • 11.0.7-g0001 - October 2, 2019

    • Add SELinux rule for ClusterWare service to query service status.

    • Fix a small bug where scyld-sysinfo was not capturing modified ClusterWare files (rpms_clusterware_verify).

    • Add a missing line to the clusterware-installer REVISIONS file.


  • 11.0.7-g0000 - October 1, 2019

    • scyld-sysinfo now optionally captures compute node state.

    • Add 20-second keep-alive when wrapping ssh commands.

    • scyld-nodectl ssh command is an alias for scyld-nodectl exec if a command is passed.

    • Expand the head node information stored in the database.

    • Various scyld-*ctl commands support field selection with new --field arguments.

    • Various scyld-*ctl commands support two new output formats: --csv and --table.

    • Include sanboot as a _boot_style to boot local disks or URLs that iPXE sanboot supports.

    • scyld-install doing an upgrade will not run steps that were performed when doing the initial ClusterWare install and which may have been subsequently altered by the local administrator.

    • scyld-install prints version information for each installed or upgraded packages.

    • scyld-install passes http_proxy/https_proxy to underlying calls.

    • Assorted other improvements and bug fixes.


  • 11.0.6-g0000 - September 6, 2019

    • Include version number in REVISIONS files.

    • Fix a scyld-modimg problem that rejected any attempt to create a new image with a name that was a subset of an existing image name.

    • Add scyld-clusterctl heads that treats head nodes as database objects that can be viewed or deleted. More features to come.

    • Support socket-based admin authentication for local user accounts.

    • Fix scyld-cluster-conf save.

    • Eliminate an innocuous "Failure" message "No power URI provided for node" seen when doing scyld-nodectl power cycle or power off.

    • Add nfs-utils to the base image.

    • Pass more ipmitool error messages back to the caller.

    • Catch some exceptions that would unnecessarily stop daemons, and instead handle more gracefully.

    • Initramfs dhclient should not survive the switch_root.

    • Add _hostname as a reserved attribute to override specific compute node hostnames. See Reserved Attributes.

    • Allow administrators to set a boot configuration image to "None" for new kickstart/preseed support, and add new appendices in the ClusterWare documentation that provides examples of how to use Red Hat kickstart for Ubuntu and CentOS (see Appendix: Using Red Hat Kickstart) and Debian preseed (see Appendix: Using Debian Preseed).

    • Assorted other fixes and improvements.


  • 11.0.5-g0001 - August 6, 2019

    • Temporarily disable automatic renaming of unreferenced files.


  • 11.0.5-g0000 - August 1, 2019

    • Fix the --soft then --hard behavior when rebooting or shutting down nodes.

    • Simplify and improve human readable tool output unless --no-pretty is passed.

    • Add a new ssh action to scyld-nodectl; details in documentation.

    • Include /etc/systemd/system/couchbase-server.service.d/override.conf to allow Couchbase to use MD5 even when FIPS mode is enabled.

    • Suppress FIPS mode messages from scyld-nodectl exec.

    • Support for locally installed compute nodes; details in documentation.

    • Fixes when passing binary data to stdin of scyld-nodectl exec.

    • Move the dhcpd.leases file from the default location to /opt/scyld/clusterware-iscdhcp/conf/dhcpd.leases.

    • Give other head nodes a better chance to delete local copies of deleted content.

    • Detect and rename files in storage that are not referenced in the database.

    • Update resolv.conf if the only nameserver was a head node that goes down.

    • Assorted other fixes and improvements.

    • The slurm-scyld packages are updated to version 19.05.1, and openmpi2.0, openmpi1.10, and openmpi1.8 packages are rebuilt as version g0004 for compatibility with the newer slurm-scyld library. The openmpi3.1 packages are updated to version 3.1.4; openmpi3.0 updated to version 3.0.4; openmpi2.1 updated to version 2.1.6; and openmpi4.0 version 4.0.1 has been added to the distribution, all also compatible with the new slurm-scyld library and rebuilt as version g0004.


  • 11.0.4-g0001 - July 3, 2019

    • Support CentOS 6 images for compute nodes.

    • Fix problem of root authorized keys being overwritten on compute node at boot time.

    • Require node status updates to arrive on privileged ports.

    • Improved api_error_log capture of IP addresses.

    • Make --summary the default scyld-nodectl status output.

    • Various scyld-sysinfo improvements, including requesting a comment from the user that gets added to the output.

    • Pass remote IPs through ProxyPass to get them to the logs.

    • Link dracut statically to simplify supporting different compute node OSes.

    • Enable automatic --soft then --hard behavior for scyld-nodectl reboot and shutdown, and document the difference.

    • Convert more exceptions to errors due to bad command line arguments.

    • Wrap ipmitool sol activate in a new scyld-nodectl option.

    • Add an empty /etc/fstab during image creation.

    • Modify the prompt when inside a chroot.

    • Fix a scyld-bootctl clone bug: copy the release field.

    • Better error messages when a Couchbase member is unreachable.

    • Log the head's hostname when starting the service.

    • Add a syncer daemon that fetches remote files in the background.

    • Add managedb update to fix Couchbase after out-of-diskspace conditions.

    • Add scyld-nodectl power on/off/cycle/status and scyld-nodectl sol.

    • If a small file is passed as stdin to scyld-nodectl, then exec the contents instead of streaming it.

    • Cleanups to scyld-modimg around setting name, distro, and description.

    • Rename scyld-modimg --export to --copyout, and implement a new inverse action --copyin.

    • Assorted other fixes and improvements.

    • Various other packages have been released in coordination with Scyld ClusterWare 11.0.4-g0001 and should be updated, if installed: torque-scyld, slurm-scyld, singularity-scyld, openmpi3.1, openmpi3.0, openmpi2.1, openmpi2.0, openmpi1.10, and openmpi1.8.

      The torque-scyld and slurm-scyld packages are now split into three packages for each job scheduler. For example, torque-scyld (which requires torque-scyld-libs) installs on the server, and torque-scyld-node (which requires torque-scyld-libs) gets installed into a node image by the sched-helper script. (See Job Schedulers.)

      singularity-scyld updates to version 3.2.1, and it no longer install files into /opt/scyld/, thus no longer requiring the user to module load singularity. The installed files are now accessible via the standard $PATH and $LD_LIBRARY_PATH.


  • 11.0.3-g0020 - June 6, 2019

    • Fixes to peer download so that only one thread will download at a time.


  • 11.0.3-g0014 - May 24, 2019

    • Stopping the clusterware service now also stops the clusterware-dhcpd and clusterware-dnsmasq services.

    • Include the pciutils package and an empty /etc/sysconfig/network file when creating the base image.

    • Fix various scyld-install --clear-all problems of overly aggressive deletions.

    • Add write_ifcfg.sh to the prenet startup on compute nodes.

    • Move the location of the scyld-helper script and add functionality to improve the configuration of Slurm or TORQUE. See Job Schedulers.

    • Minor fixes to managedb leave and eject.

    • Improve scyld-sysinfo error handling.

    • Expanded documentation around failover.

    • The sched-helper script can now push changes into compute node images.

    • Switch default gateway for compute nodes during head node failover.

    • Implement peer downloads for head node's missing files.

    • scyld-cluster-conf save now handles nodes on multiple networks.


  • 11.0.3-g0000 - May 8, 2019

    • First General Availability release.

    • Mark dnsmasq.conf.template and dhcpd.conf.template as configuration files.

    • Support dhcp relays.

    • Reduce log messages in api_error_log.

    • Fix an early boot issue that was causing yum to fail on nodes booted using roram style.

    • Fix the squashfs packer to work on images up to 100GB.

    • Default to 16 threads in the Apache wsgi configuration.

    • Add --clear-all argument to the installer.

    • Python daemons will now attempt to automatically restart with an exponential backoff.

    • Implement the _preferred_head attribute.

    • Fix a bug where results were listed per node instead of collapsed.

    • Other assorted documentation and tool fixes.

    • Fixes for SELinux on head nodes:

      • dnsmasq properly starts and serves compute node addresses.

      • The repacker daemon disables itself due to required permissions.

    • scyld-cluster-conf load improvements:

      • Multiple PXE boot networks can be loaded from a single configuration file.

      • Nodes will be assigned to the most recently defined network during parsing.

      • Support 'gw', 'via', and 'as' when parsing remote network definitions.

    • scyld-nodectl improvements:

      • Parallelize power control commands.

      • Improved output streaming and parallelization.

      • Improved handling of stdin and --stdin.

      • Default the ssh_runner fanout value to 16 nodes at a time.

      • More documentation and examples.


  • 11.0.1-b0209 - April 19, 2019

    • Third restricted release.

    • Includes the new clusterware-dnsmasq package, which supports resolving host names from /etc/hosts on the head node. See Node Name Resolution.

    • Support for establishing remote access between the head node(s) and compute nodes, or between compute nodes, by distributing SSH keys. See Compute Node Remote Access.

    • Support for booting "disked" compute nodes. See the Installation & Administrator Guide and Reference Guide for details.

    • Excludes /boot/initramfs-* files, and does not exclude /etc/ssh/ssh_host_* files, when packing images.

    • The Penguin serial number now appears in node hardware info, if it exists.

    • scyld-nodectl exec improvements:

      • Command now exits with the subcommand's exit code.

      • Command can now operate through the head node (default) or --direct.

      • Hide some ssh warning messages.


  • 11.0.1-b0197 - April 5, 2019

    • Second restricted release.

    • Numerous bug fixes and enhancements.


  • 11.0.1-b0183 - March 22, 2019

    • First restricted release.

    • ClusterWare TORQUE reverts some changes that were made to the original Adaptive Computing distribution for Legacy ClusterWare 6 and 7:

      • Includes the built-in pbs_sched job scheduler, and does not include the maui scheduler.

      • Includes "LimitCORE=infinity" that 6 and 7 has removed.

      • Reverts the name pbs_trqauthd back to the original trqauthd, and pbs_mom and trqauthd are now systemd daemons.

Known Issues And Workarounds

The following are known issues of significance with the latest version of ClusterWare and suggested workarounds.

  • The head node(s) must use a RHEL7 or CentOS7 base distribution release 7.6 or later environment, due to dependencies on newer libvirt and selinux packages.

  • Scyld OpenMPI versions 4.0 and 4.1 for RHEL/CentOS 8 require ucx version 1.9 or greater, which is available from CentOS 8 Stream and RHEL 8.4.

  • When using a TORQUE or Slurm job scheduler (see Job Schedulers), if a node reboots whose image was not created using /opt/scyld/clusterware-tools/bin/sched-helper, then the cluster administrator must manually restart the job scheduler. For example, if needed for a single node n0: NODE=n0 torque-scyld-node or NODE=n0 slurm-scyld-node. Or to restart on all nodes: torque-scyld.setup cluster-restart or slurm-scyld.setup cluster-restart.

    Ideally, compute node images are updated using torque-scyld.setup update-image or slurm-scyld.setup update-image, which installs the TORQUE/Slurm config file in the image and enables the appropriate service at node startup.

  • If administrators are using scyld-modimg to concurrently modify two different images, then one administrator will see a message of the form:

    WARNING: Local cache contains inconsistencies.
    Use --clean-local to delete temporary files, untracked files,
    and remove missing files from the local manifest.
    

    then use scyld-modimg --clean-local.

    However, only execute --clean-local after all scyld-modimg image manipulations have completed.

  • Ensure that /etc/sudoers does not contain the line Defaults requiretty; otherwise, DHCP misbehaves.

  • The NetworkManger-config-server package includes a NetworkManager.conf config file with an enabled "no-auto-default" setting. That is incompatible with ClusterWare compute node images and will cause nodes to lose network connectivity after their boot-time DHCP lease expires. Either disable that setting or remove the NetworkManger-config-server package from compute node images.

  • The scyld-clusterctl repos create command has a urls= argument that specifies where the new repo's contents can be found. The most common use is urls=http://<URL>. The alternative urls=file://<pathname> does not currently work. Instead, you must first manually create an http-accessible repo from that pathname. See Appendix: Creating Local Repositories without Internet.

  • When moving a head node from one etcd-based cluster to another using the managedb join command, please reboot the joining head once the join is complete.

  • If a new head node is failing to join an existing etcd-based cluster, check /var/log/clusterware/etcd.log and look for repeated lines of the form:

    <DATE> <SERVER> etcd: added member <HEX> [<URL>:52380] to cluster <HEX>
    

    If the log file contains multiple of these line per join attempt, then please try running managedb recover on an existing head node and joining all head nodes back into the cluster one-at-a-time. Re-joining heads that were previously in the cluster may require a --purge argument, i.e. managedb join --purge <IP>

  • scyld-install performs its early check to determine if a newer clusterware-installer RPM is available by parsing the appropriate clusterware repo file (typically /etc/yum.repos.d/clusterware.repo) to find the first base_url= line. If there are multiple such lines, i.e., specifying multiple ClusterWare repos, then the cluster administrator should order the repos so that the repo containing the newest RPMs is the first repo in the file.

  • A compute node using a version of clusterware-node older than 11.2.2 and booting from a head node that has upgraded to 11.7.0 or newer may not successfully send its status to the head node. Please upgrade the clusterware-node package inside the image to resolve this problem.

  • Joining a ClusterWare 11 head node to a ClusterWare 12 head node will perform the join, but will not update the joining head node to ClusterWare 12. We recommend updating the ClusterWare 11 node to 12 prior to performing the join. See Updating ClusterWare 11 to ClusterWare 12 for guidance about performing this update.