Telegraf Plugins

Where the status plugins are small scripts that are run during the periodic status-update cycle, Telegraf plugins are small configuration files that can be enabled/disabled by the HPC-admin. A telegraf plugin is usually targeted at one particular kind of data - e.g. CPU usage or memory usage.

The cluserware-telegraf package can be installed on either a compute node or a head node, but the on-the-fly plugin system currently only works on compute nodes.

For larger-scale management and control, one can set the _telegraf_plugins attribute inside an attribute group and then join nodes to that group.

Building Telegraf Plugins into an Image

Similar to status plugins, there is another directory:

/opt/scyld/clusterware-telegraf/telegraf-available

that contains Penguin-provided config files. Those can be sym-linked into ./telegraf-enabled inside a disk Image.

On-The-Fly Telegraf Plugins

On compute nodes, an admin can enable/disable “on-the-fly” plugins by setting or clearing out that node's _telegraf_plugins attribute.

Note

Changes to _telegraf_plugins will force a full restart of the Telegraf daemon, so frequent changes could cause performance degradation.

Head Node Functionality

The ClusterWare-telegraf plugin system has reduced functionality on head nodes. Since head nodes do not currently have attributes, there is no way to do on-the-fly changes to the Telegraf plugins and so adding entries to telegraf-enabled is the only way to add plugins to the system.

Additionally, while the compute nodes will automatically detect changes and start/restart Telegraf automatically, changes to head nodes must be handled manually.

Once the telegraf-enabled directory is ready on the head-node, admins should run reconfig-telegraf.sh to push the enabled plugins into production (this will also restart Telegraf).

/opt/scyld/clusterware-telegraf/bin/reconfig-telegraf.sh

Note

Changes to _telegraf_plugins will be processed by ClusterWare on the next status-update cycle, usually every 10 seconds unless changed by the admin. However, it may take several seconds before Telegraf actually restarts, and it then has to go through its own “data refresh” cycle (again, usually every 10 seconds, unless changed by the admin). So there could be a non-trivial delay (30-40 sec) before a new plugin's data is actually visible on a dashboard.