0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-26 13:54:48 +00:00
netdata_netdata/exporting
Costa Tsaousis cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
..
aws_kinesis Exporting/send variables () 2022-07-11 13:00:09 +00:00
graphite Obsolete RRDSET state () 2022-09-07 15:28:30 +03:00
json RRD structures managed by dictionaries () 2022-09-19 23:46:13 +03:00
mongodb Exporting/send variables () 2022-07-11 13:00:09 +00:00
opentsdb Obsolete RRDSET state () 2022-09-07 15:28:30 +03:00
prometheus RRD structures managed by dictionaries () 2022-09-19 23:46:13 +03:00
pubsub Exporting/send variables () 2022-07-11 13:00:09 +00:00
tests RRD structures managed by dictionaries () 2022-09-19 23:46:13 +03:00
check_filters.c Obsolete RRDSET state () 2022-09-07 15:28:30 +03:00
clean_connectors.c Labels with dictionary () 2022-06-13 20:35:45 +03:00
exporting.conf Exporting/send variables () 2022-07-11 13:00:09 +00:00
exporting_engine.c Extra posthog attributes () 2021-06-28 09:47:46 +03:00
exporting_engine.h Exporting/send variables () 2022-07-11 13:00:09 +00:00
init_connectors.c fix: fix a base64_encode bug () 2022-06-29 16:00:31 +03:00
Makefile.am Move nc backend () 2020-05-26 12:58:50 +00:00
nc-exporting.sh Remove backends subsystem () 2022-03-15 11:50:24 +01:00
process_data.c RRD structures managed by dictionaries () 2022-09-19 23:46:13 +03:00
read_config.c Exporting/send variables () 2022-07-11 13:00:09 +00:00
README.md Add chart filtering parameter to the allmetrics API query () 2022-05-05 19:32:57 +03:00
send_data.c Remove backends subsystem () 2022-03-15 11:50:24 +01:00
send_internal_metrics.c New alarms (exporting and Backend) () 2020-05-26 11:57:56 +00:00
TIMESCALE.md Remove backends subsystem () 2022-03-15 11:50:24 +01:00
WALKTHROUGH.md Docs: Removed Google Analytics tags () 2022-02-17 10:37:46 +00:00

Exporting reference

Welcome to the exporting engine reference guide. This guide contains comprehensive information about enabling, configuring, and monitoring Netdata's exporting engine, which allows you to send metrics to external time-series databases.

For a quick introduction to the exporting engine's features, read our doc on exporting metrics to time-series databases, or jump in to enabling a connector.

The exporting engine has a modular structure and supports metric exporting via multiple exporting connector instances at the same time. You can have different update intervals and filters configured for every exporting connector instance.

When you enable the exporting engine and a connector, the Netdata Agent exports metrics beginning from the time you restart its process, not the entire database of long-term metrics.

Since Netdata collects thousands of metrics per server per second, which would easily congest any database server when several Netdata servers are sending data to it, Netdata allows sending metrics at a lower frequency, by resampling them.

So, although Netdata collects metrics every second, it can send to the external database servers averages or sums every X seconds (though, it can send them per second if you need it to).

Features

Integration

The exporting engine uses a number of connectors to send Netdata metrics to external time-series databases. See our list of supported databases for information on which connector to enable and configure for your database of choice.

Chart filtering

Netdata can filter metrics, to send only a subset of the collected metrics. You can use the configuration file

[prometheus:exporter]
    send charts matching = system.*

or the URL parameter filter in the allmetrics API call.

http://localhost:19999/api/v1/allmetrics?format=shell&filter=system.*

Operation modes

Netdata supports three modes of operation for all exporting connectors:

  • as-collected sends to external databases the metrics as they are collected, in the units they are collected. So, counters are sent as counters and gauges are sent as gauges, much like all data collectors do. For example, to calculate CPU utilization in this format, you need to know how to convert kernel ticks to percentage.

  • average sends to external databases normalized metrics from the Netdata database. In this mode, all metrics are sent as gauges, in the units Netdata uses. This abstracts data collection and simplifies visualization, but you will not be able to copy and paste queries from other sources to convert units. For example, CPU utilization percentage is calculated by Netdata, so Netdata will convert ticks to percentage and send the average percentage to the external database.

  • sum or volume: the sum of the interpolated values shown on the Netdata graphs is sent to the external database. So, if Netdata is configured to send data to the database every 10 seconds, the sum of the 10 values shown on the Netdata charts will be used.

Time-series databases suggest to collect the raw values (as-collected). If you plan to invest on building your monitoring around a time-series database and you already know (or you will invest in learning) how to convert units and normalize the metrics in Grafana or other visualization tools, we suggest to use as-collected.

If, on the other hand, you just need long term archiving of Netdata metrics and you plan to mainly work with Netdata, we suggest to use average. It decouples visualization from data collection, so it will generally be a lot simpler. Furthermore, if you use average, the charts shown in the external service will match exactly what you see in Netdata, which is not necessarily true for the other modes of operation.

Independent operation

This code is smart enough, not to slow down Netdata, independently of the speed of the external database server.

You should keep in mind though that many exporting connector instances can consume a lot of CPU resources if they run their batches at the same time. You can set different update intervals for every exporting connector instance, but even in that case they can occasionally synchronize their batches for a moment.

Configuration

Here are the configuration blocks for every supported connector. Your current exporting.conf file may look a little different.

You can configure each connector individually using the available options. The [graphite:my_graphite_instance] block contains examples of some of these additional options in action.

[exporting:global]
    enabled = yes
    send configured labels = no
    send automatic labels = no
    update every = 10

[prometheus:exporter]
    send names instead of ids = yes
    send configured labels = yes
    end automatic labels = no
    send charts matching = *
    send hosts matching = localhost *
    prefix = netdata

[graphite:my_graphite_instance]
    enabled = yes
    destination = localhost:2003
    data source = average
    prefix = Netdata
    hostname = my-name
    update every = 10
    buffer on failures = 10
    timeout ms = 20000
    send charts matching = *
    send hosts matching = localhost *
    send names instead of ids = yes
    send configured labels = yes
    send automatic labels = yes

[prometheus_remote_write:my_prometheus_remote_write_instance]
    enabled = yes
    destination = localhost
    remote write URL path = /receive

[kinesis:my_kinesis_instance]
    enabled = yes
    destination = us-east-1
    stream name = netdata
    aws_access_key_id = my_access_key_id
    aws_secret_access_key = my_aws_secret_access_key

[pubsub:my_pubsub_instance]
    enabled = yes
    destination = pubsub.googleapis.com
    credentials file = /etc/netdata/pubsub_credentials.json
    project id = my_project
    topic id = my_topic

[mongodb:my_mongodb_instance]
    enabled = yes
    destination = localhost
    database = my_database
    collection = my_collection

[json:my_json_instance]
    enabled = yes
    destination = localhost:5448

[opentsdb:my_opentsdb_plaintext_instance]
    enabled = yes
    destination = localhost:4242

[opentsdb:http:my_opentsdb_http_instance]
    enabled = yes
    destination = localhost:4242
    username = my_username
    password = my_password

[opentsdb:https:my_opentsdb_https_instance]
    enabled = yes
    destination = localhost:8082

Sections

  • [exporting:global] is a section where you can set your defaults for all exporting connectors
  • [prometheus:exporter] defines settings for Prometheus exporter API queries (e.g.: http://NODE:19999/api/v1/allmetrics?format=prometheus&help=yes&source=as-collected).
  • [<type>:<name>] keeps settings for a particular exporting connector instance, where:
  • type selects the exporting connector type: graphite | opentsdb:telnet | opentsdb:http | prometheus_remote_write | json | kinesis | pubsub | mongodb. For graphite, opentsdb, json, and prometheus_remote_write connectors you can also use :http or :https modifiers (e.g.: opentsdb:https).
  • name can be arbitrary instance name you chose.

Options

Configure individual connectors and override any global settings with the following options.

  • enabled = yes | no, enables or disables an exporting connector instance

  • destination = host1 host2 host3 ..., accepts a space separated list of hostnames, IPs (IPv4 and IPv6) and ports to connect to. Netdata will use the first available to send the metrics.

    The format of each item in this list, is: [PROTOCOL:]IP[:PORT].

    PROTOCOL can be udp or tcp. tcp is the default and only supported by the current exporting engine.

    IP can be XX.XX.XX.XX (IPv4), or [XX:XX...XX:XX] (IPv6). For IPv6 you can to enclose the IP in [] to separate it from the port.

    PORT can be a number of a service name. If omitted, the default port for the exporting connector will be used (graphite = 2003, opentsdb = 4242).

    Example IPv4:

   destination = 10.11.14.2:4242 10.11.14.3:4242 10.11.14.4:4242

Example IPv6 and IPv4 together:

   destination = [ffff:...:0001]:2003 10.11.12.1:2003

When multiple servers are defined, Netdata will try the next one when the previous one fails.

Netdata also ships nc-exporting.sh, a script that can be used as a fallback exporting connector to save the metrics to disk and push them to the time-series database when it becomes available again. It can also be used to monitor / trace / debug the metrics Netdata generates.

For the Kinesis exporting connector destination should be set to an AWS region (for example, us-east-1).

For the MongoDB exporting connector destination should be set to a MongoDB URI.

For the Pub/Sub exporting connector destination can be set to a specific service endpoint.

  • data source = as collected, or data source = average, or data source = sum, selects the kind of data that will be sent to the external database.

  • hostname = my-name, is the hostname to be used for sending data to the external database server. By default this is [global].hostname.

  • prefix = Netdata, is the prefix to add to all metrics.

  • update every = 10, is the number of seconds between sending data to the external database. Netdata will add some randomness to this number, to prevent stressing the external server when many Netdata servers send data to the same database. This randomness does not affect the quality of the data, only the time they are sent.

  • buffer on failures = 10, is the number of iterations (each iteration is update every seconds) to buffer data, when the external database server is not available. If the server fails to receive the data after that many failures, data loss on the connector instance is expected (Netdata will also log it).

  • timeout ms = 20000, is the timeout in milliseconds to wait for the external database server to process the data. By default this is 2 * update_every * 1000.

  • send hosts matching = localhost * includes one or more space separated patterns, using * as wildcard (any number of times within each pattern). The patterns are checked against the hostname (the localhost is always checked as localhost), allowing us to filter which hosts will be sent to the external database when this Netdata is a central Netdata aggregating multiple hosts. A pattern starting with ! gives a negative match. So to match all hosts named *db* except hosts containing *child*, use !*child* *db* (so, the order is important: the first pattern matching the hostname will be used - positive or negative).

  • send charts matching = * includes one or more space separated patterns, using * as wildcard (any number of times within each pattern). The patterns are checked against both chart id and chart name. A pattern starting with ! gives a negative match. So to match all charts named apps.* except charts ending in *reads, use !*reads apps.* (so, the order is important: the first pattern matching the chart id or the chart name will be used - positive or negative). There is also a URL parameter filter that can be used while querying allmetrics. The URL parameter has a higher priority than the configuration option.

  • send names instead of ids = yes | no controls the metric names Netdata should send to the external database. Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and names are human friendly labels (also unique). Most charts and metrics have the same ID and name, but in several cases they are different: disks with device-mapper, interrupts, QoS classes, statsd synthetic charts, etc.

  • send configured labels = yes | no controls if labels defined in the [host labels] section in netdata.conf should be sent to the external database

  • send automatic labels = yes | no controls if automatically created labels, like _os_name or _architecture should be sent to the external database

HTTPS

Netdata can send metrics to external databases using the TLS/SSL protocol. Unfortunately, some of them does not support encrypted connections, so you will have to configure a reverse proxy to enable HTTPS communication between Netdata and an external database. You can set up a reverse proxy with Nginx.

Exporting engine monitoring

Netdata creates five charts in the dashboard, under the Netdata Monitoring section, to help you monitor the health and performance of the exporting engine itself:

  1. Buffered metrics, the number of metrics Netdata added to the buffer for dispatching them to the external database server.

  2. Exporting data size, the amount of data (in KB) Netdata added the buffer.

  3. Exporting operations, the number of operations performed by Netdata.

  4. Exporting thread CPU usage, the CPU resources consumed by the Netdata thread, that is responsible for sending the metrics to the external database server.

image

Exporting engine alarms

Netdata adds 3 alarms:

  1. exporting_last_buffering, number of seconds since the last successful buffering of exported data
  2. exporting_metrics_sent, percentage of metrics sent to the external database server
  3. exporting_metrics_lost, number of metrics lost due to repeating failures to contact the external database server

image