ClusterWare Plugin System

The Clusterware Plugin System allows admins to more quickly change the status and monitoring system across the entire cluster or on subsets of nodes.

There are 4 types of plugins:

  • Status Plugins

    • These default to an update every 10 sec, so these are generally sensors or readings that change somewhat frequently; e.g. the free RAM on a node, or the current CPU load;

  • Hardware Plugins

    • Called less often than “regular” status plugins, usually every 300 sec, these are sensors and readings that change less frequently and/or are tied to the hardware itself; e.g. the total RAM on a node, or the CPU architecture;

  • Health-Check Plugins

    • Called less often than “regular” status plugins, usually every 300 sec, these are sensors and readings that can be thought of as answering the question “is this compute node operating correctly?”; the values will usually be “healthy” or “unhealthy”, but may include a time-stamp (indicating that the plugin is still calculating the value) or a longer message such as “unhealthy; some text on why the node is unhealthy”;

  • Telegraf Plugins

    • These are really smaller, more granular Telegraf config files that ClusterWare can individually enable/disable.

If admins know that they want some plugins permanently enabled, they can build those plugins into the disk images that the node boots from. These “built-in” plugins are always enabled and they cannot be disabled later except by changing the disk image.

For information that may only be needed some of the time, admins can add arbitrary plugins to a node using the _status_plugins list (with similar attributes for hardware, health, and telegraf). These on-the-fly plugins can be turned on and off at any time simply by setting, overwriting, or clearing that node attribute. E.g.:

scyld-nodectl –all set _status_plugins=chrony,ipmi

would enable the chrony and ipmi status plugins. While:

scyld-nodectl –all set _status_plugins=chrony

would keep chrony enabled but disable ipmi (since it is not listed anymore).

Best Practices

While the ability to enable/disable plugins in an ad-hoc fashion can be powerful, basic best practices still hold:

  • It may be helpful to consider Clusterware as “the management tool” and Telegraf as “the monitoring tool”:

    • Information which you may want to take action on should be included in the status, hardware, or health updates, and should be permanently enabled in scripts-enabled so that they are available when that action is needed later on;

    • Information that might be useful for long-term analyses and trends should be stored in Telegraf.

  • Frequent changes to plugins may make the underlying data less useful. If a parameter exists at some times and not at others, it will be difficult to make future decisions based on any changes seen in that parameter.

  • Any data that will be viewed through Grafana should be built into the image (in telegraf-enabled), otherwise the data may not exist and any Grafana dashboards may produce empty charts.

  • For more security-conscious admins, any security-relevant plugins should be enabled in the scripts-enabled or telegraf-enabled directories, making them permanently enabled and more resistant to tampering.

For more information: