Telegraf Plugins¶
Where the status plugins are small scripts that are run during the periodic status-update cycle, Telegraf plugins are small configuration files that can be enabled/disabled by the HPC-admin. A telegraf plugin is usually targeted at one particular kind of data - e.g. CPU usage or memory usage.
The cluserware-telegraf package can be installed on either a compute node or a head node, but the on-the-fly plugin system currently only works on compute nodes.
For larger-scale management and control, one can set the
_telegraf_plugins
attribute inside an attribute group and then
join nodes to that group.
Building Telegraf Plugins into an Image
Similar to status plugins, there is another directory:
/opt/scyld/clusterware-telegraf/telegraf-available
that contains Penguin-provided config files. Those can be sym-linked
into ./telegraf-enabled
inside a disk Image.
On-The-Fly Telegraf Plugins
On compute nodes, an admin can enable/disable “on-the-fly” plugins by
setting or clearing out that node's _telegraf_plugins
attribute.
Note
Changes to _telegraf_plugins
will force a full restart of the
Telegraf daemon, so frequent changes could cause performance degradation.
Head Node Functionality
The ClusterWare-telegraf plugin system has reduced functionality on
head nodes. Since head nodes do not currently have attributes,
there is no way to do on-the-fly changes to the Telegraf plugins
and so adding entries to telegraf-enabled
is the only way
to add plugins to the system.
Additionally, while the compute nodes will automatically detect changes and start/restart Telegraf automatically, changes to head nodes must be handled manually.
Once the telegraf-enabled
directory is ready on the head-node,
admins should run reconfig-telegraf.sh
to push the enabled plugins
into production (this will also restart Telegraf).
/opt/scyld/clusterware-telegraf/bin/reconfig-telegraf.sh
Note
Changes to _telegraf_plugins
will be processed by
ClusterWare on the next status-update cycle, usually every
10 seconds unless changed by the admin.
However, it may take several seconds
before Telegraf actually restarts, and it then has to go through
its own “data refresh” cycle (again, usually every 10 seconds,
unless changed by the admin). So there could be a non-trivial
delay (30-40 sec) before a new plugin's data is actually
visible on a dashboard.