InfluxDB Notes¶
InfluxDB is a database that is optimized for time-series data and analytics. While technically a separate tool, the Telegraf data collection tool is created by the same company and provides optimized "plugins" which can push a variety of metrics into InfluxDB. Both tools are Open-Source but commercial support is available through InfluxData Inc. (https://www.influxdata.com/). There is a vibrant community of users and plugin developers, and the official community forum at https://community.influxdata.com/ often has answers directly from the InfluxData team.
The InfluxDB database can currently be queried through the Flux language, a powerful data scripting and processing language. Documentation for Flux can be found at https://docs.influxdata.com/flux/v0/. As a simple example, a query to find the CPU usage data for all nodes in the cluster is shown below:
from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> filter(fn: (r) => r["_field"] == "usage_system")
This example illustrates a typical approach to querying and filtering data in InfluxDB/Flux. First, a stream of records is pulled from a "bucket" (similar to a table in relational databases). That stream may include multiple metrics stored in multiple measurements and fields. A given Telegraf data collection cycle, for example, may include CPU and memory measurements, and each of those may have multiple fields of information: the CPU measurement may have per-CPU-core data, the memory measurement may have total, used, and free memory values. Each of those measurement-field combinations will be a separate record in the stream.
It is often useful to reduce the time-range early in the query, otherwise all records from all time will be processed by the stages of the query. The "earlier" the stream's data is reduced, the greater a speed-up will be seen in the query process. In this example, the range is reduced to whatever has been selected by the GUI, with the "v." notation being used to reference one of the dashboard-shared "variables". Since a measurement can include multiple fields, it is useful to filter on measurement next since it will often provide another large reduction in the amount of stream data. Additional filtering on specific field information can be used to further refine the data.
The official documentation from InfluxData is very good, and includes examples and videos.
Getting started: https://docs.influxdata.com/flux/v0/get-started/
InfluxDB University includes training materials and videos: https://www.influxdata.com/university/ and https://university.influxdata.com/
The "Data Querying" section includes Basic, Beginner, and Intermediate courses in the Flux language
It should be noted that in August 2023, InfluxData announced plans to de-prioritize their future investments in Flux and to put the project into "maintenance mode". They explicitly state that Flux is not going End-Of-Life, and that they do anticipate supporting customers for some time to come. InfluxData also offers an SQL-like language, InfluxQL, and has plans for full SQL in version 3 of the InfluxDB database, so there will be a path forward regardless.
Future of Flux: https://docs.influxdata.com/flux/v0/future-of-flux/
SQL-like alternatives: https://docs.influxdata.com/influxdb/v2/query-data/influxql/ and https://docs.influxdata.com/influxdb/clustered/query-data/
ClusterWare is fully committed to providing a robust monitoring and alerting platform. As InfluxData's roadmap becomes more firm, ClusterWare will update its roadmap accordingly.
Grafana Notes¶
Grafana is a powerful and flexible dashboard and visualization tool from GrafanaLabs (https://grafana.com/). It is available as Open-Source software, but Grafana Labs does offer commercial support as well. Besides drawing charts and graphs, it can provide alerting capabilities and, through the Loki plugin, may be used for log aggregation and analysis.
At a high level, a Grafana dashboard is a set of "panels" where each panel is a query against a "data source" along with a visualization for the resulting data. At installation time, ClusterWare will configure the InfluxDB data-source which will connect to the InfluxDB/Telegraf data; it also comes with a cluster-wide and single-node dashboards. Either of those can be copied and then customized to meet any local needs.
When adding panels to a dashboard, note that it will show up as a blank panel of some default size. The corners of the panel can easily be clicked and dragged to be any size, but to change the query or visualization type requires editing the panel through the menu on the top-bar of the panel (look for the small triangle symbol). Keep in mind that since the query will, by default, be empty, there may be no data for the panel to display. As such, it may be helpful to start with the "table" visualization while working on the query since this will give more direct feedback even if the query itself is in error or if it's producing different kind of data than expected. Once the query is refined, then a proper visualization type can be selected for that data.
Note that since the ClusterWare installation will configure the InfluxDB data-source to use the Flux language, all queries in Grafana must also use the Flux language.
Extensive GrafanaLabs documentation is available on-line:
General overview of dashboards: https://grafana.com/docs/grafana/latest/dashboards/
Panels and visualizations: https://grafana.com/docs/grafana/latest/panels-visualizations/
Grafana includes scatter, line, bar, and pie charts, tables, gauges, tables, and more.
All of the visualization types can be customized in terms of color, font, etc. And many of them have thresholding features to highlight only data above (or below) some threshold.
Panels and dashboards can be made interactive through the use of "variables" that are shared across a dashboard, and through "data links" which enable hyperlinks from data values to other dashboards or panels.
Dashboard variables: https://grafana.com/docs/grafana/latest/dashboards/variables/
Panel data-links: https://grafana.com/docs/grafana/latest/panels-visualizations/configure-data-links/
For example, clicking on a node's name in the ClusterWare Cluster Overview dashboard will link to the node-specific dashboard.
Alerting: https://grafana.com/docs/grafana/latest/alerting/
Alerts are essentially a query (Flux language) that is run periodically and if any records emitted by that query are above a threshold, then an alert is generated. The output can be as simple as an email sent to one or more recipients, or a connection to one or more external "contact points" such as PagerDuty, Sensu, Slack, etc.
The main tutorials, including videos and quick-start guides: https://grafana.com/tutorials/
Grafana Labs has a set of "Grafana for Beginners" videos at YouTube: https://www.youtube.com/playlist?list=PLDGkOdUX1Ujo27m6qiTPPCpFHVfyKq9jT
The whole series is a broad overview of observability and monitoring, the dashboard and visualization information in Episodes 8 and 9 may be of particular interest.