InfluxDB Notes

InfluxDB is a database that is optimized for time-series data and analytics. While technically a separate tool, the Telegraf data collection tool is created by the same company and provides optimized "plugins" which can push a variety of metrics into InfluxDB. Both tools are Open-Source but commercial support is available through InfluxData Inc. (https://www.influxdata.com/). There is a vibrant community of users and plugin developers, and the official community forum at https://community.influxdata.com/ often has answers directly from the InfluxData team.

The InfluxDB database can currently be queried through the Flux language, a powerful data scripting and processing language. Documentation for Flux can be found at https://docs.influxdata.com/flux/v0/. As a simple example, a query to find the CPU usage data for all nodes in the cluster is shown below:

from(bucket: "telegraf")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "cpu")
    |> filter(fn: (r) => r["_field"] == "usage_system")

This example illustrates a typical approach to querying and filtering data in InfluxDB/Flux. First, a stream of records is pulled from a "bucket" (similar to a table in relational databases). That stream may include multiple metrics stored in multiple measurements and fields. A given Telegraf data collection cycle, for example, may include CPU and memory measurements, and each of those may have multiple fields of information: the CPU measurement may have per-CPU-core data, the memory measurement may have total, used, and free memory values. Each of those measurement-field combinations will be a separate record in the stream.

It is often useful to reduce the time-range early in the query, otherwise all records from all time will be processed by the stages of the query. The "earlier" the stream's data is reduced, the greater a speed-up will be seen in the query process. In this example, the range is reduced to whatever has been selected by the GUI, with the "v." notation being used to reference one of the dashboard-shared "variables". Since a measurement can include multiple fields, it is useful to filter on measurement next since it will often provide another large reduction in the amount of stream data. Additional filtering on specific field information can be used to further refine the data.

The official documentation from InfluxData is very good, and includes examples and videos.

It should be noted that in August 2023, InfluxData announced plans to de-prioritize their future investments in Flux and to put the project into "maintenance mode". They explicitly state that Flux is not going End-Of-Life, and that they do anticipate supporting customers for some time to come. InfluxData also offers an SQL-like language, InfluxQL, and has plans for full SQL in version 3 of the InfluxDB database, so there will be a path forward regardless.

ClusterWare is fully committed to providing a robust monitoring and alerting platform. As InfluxData's roadmap becomes more firm, ClusterWare will update its roadmap accordingly.

Grafana Notes

Grafana is a powerful and flexible dashboard and visualization tool from GrafanaLabs (https://grafana.com/). It is available as Open-Source software, but Grafana Labs does offer commercial support as well. Besides drawing charts and graphs, it can provide alerting capabilities and, through the Loki plugin, may be used for log aggregation and analysis.

At a high level, a Grafana dashboard is a set of "panels" where each panel is a query against a "data source" along with a visualization for the resulting data. At installation time, ClusterWare will configure the InfluxDB data-source which will connect to the InfluxDB/Telegraf data; it also comes with a cluster-wide and single-node dashboards. Either of those can be copied and then customized to meet any local needs.

When adding panels to a dashboard, note that it will show up as a blank panel of some default size. The corners of the panel can easily be clicked and dragged to be any size, but to change the query or visualization type requires editing the panel through the menu on the top-bar of the panel (look for the small triangle symbol). Keep in mind that since the query will, by default, be empty, there may be no data for the panel to display. As such, it may be helpful to start with the "table" visualization while working on the query since this will give more direct feedback even if the query itself is in error or if it's producing different kind of data than expected. Once the query is refined, then a proper visualization type can be selected for that data.

Note that since the ClusterWare installation will configure the InfluxDB data-source to use the Flux language, all queries in Grafana must also use the Flux language.

Extensive GrafanaLabs documentation is available on-line: