0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-09 23:57:55 +00:00
netdata_netdata/exporting/prometheus
Costa Tsaousis cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
..
remote_write RRD structures managed by dictionaries () 2022-09-19 23:46:13 +03:00
Makefile.am Add a Prometheus Remote Write connector to the exporting engine () 2020-03-12 13:28:43 +02:00
prometheus.c RRD structures managed by dictionaries () 2022-09-19 23:46:13 +03:00
prometheus.h Add chart filtering parameter to the allmetrics API query () 2022-05-05 19:32:57 +03:00
README.md Docs: Removed Google Analytics tags () 2022-02-17 10:37:46 +00:00

import { OneLineInstallWget, OneLineInstallCurl } from '../../../src/components/OneLineInstall/'

Using Netdata with Prometheus

Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently Netdata added support for Prometheus. I'm going to quickly show you how to install both Netdata and Prometheus on the same server. We can then use Grafana pointed at Prometheus to obtain long term metrics Netdata offers. I'm assuming we are starting at a fresh ubuntu shell (whether you'd like to follow along in a VM or a cloud instance is up to you).

Installing Netdata and Prometheus

Installing Netdata

There are number of ways to install Netdata according to Installation. The suggested way of installing the latest Netdata and keep it upgrade automatically.

To install Netdata, run the following as your normal user:

Or, if you have cURL but not wget (such as on macOS):

At this point we should have Netdata listening on port 19999. Attempt to take your browser here:

http://your.netdata.ip:19999

(replace your.netdata.ip with the IP or hostname of the server running Netdata)

Installing Prometheus

In order to install Prometheus we are going to introduce our own systemd startup script along with an example of prometheus.yaml configuration. Prometheus needs to be pointed to your server at a specific target url for it to scrape Netdata's api. Prometheus is always a pull model meaning Netdata is the passive client within this architecture. Prometheus always initiates the connection with Netdata.

Download Prometheus

cd /tmp && curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \
| grep "browser_download_url.*linux-amd64.tar.gz" \
| cut -d '"' -f 4 \
| wget -qi -

Create prometheus system user

sudo useradd -r prometheus

Create prometheus directory

sudo mkdir /opt/prometheus
sudo chown prometheus:prometheus /opt/prometheus

Untar prometheus directory

sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1

Install prometheus.yml

We will use the following prometheus.yml file. Save it at /opt/prometheus/prometheus.yml.

Make sure to replace your.netdata.ip with the IP or hostname of the host running Netdata.

# my global config
global:
  scrape_interval:     5s # Set the scrape interval to every 5 seconds. Default is every 1 minute.
  evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['0.0.0.0:9090']

  - job_name: 'netdata-scrape'

    metrics_path: '/api/v1/allmetrics'
    params:
      # format: prometheus | prometheus_all_hosts
      # You can use `prometheus_all_hosts` if you want Prometheus to set the `instance` to your hostname instead of IP 
      format: [prometheus]
      #
      # sources: as-collected | raw | average | sum | volume
      # default is: average
      #source: [as-collected]
      #
      # server name for this prometheus - the default is the client IP
      # for Netdata to uniquely identify it
      #server: ['prometheus1']
    honor_labels: true

    static_configs:
      - targets: ['{your.netdata.ip}:19999']

Install nodes.yml

The following is completely optional, it will enable Prometheus to generate alerts from some Netdata sources. Tweak the values to your own needs. We will use the following nodes.yml file below. Save it at /opt/prometheus/nodes.yml, and add a - "nodes.yml" entry under the rule_files: section in the example prometheus.yml file above.

groups:
  - name: nodes

    rules:
      - alert: node_high_cpu_usage_70
        expr: sum(sum_over_time(netdata_system_cpu_percentage_average{dimension=~"(user|system|softirq|irq|guest)"}[10m])) by (job) / sum(count_over_time(netdata_system_cpu_percentage_average{dimension="idle"}[10m])) by (job) > 70
        for: 1m
        annotations:
          description: '{{ $labels.job }} on ''{{ $labels.job }}'' CPU usage is at {{ humanize $value }}%.'
          summary: CPU alert for container node '{{ $labels.job }}'

      - alert: node_high_memory_usage_70
        expr: 100 / sum(netdata_system_ram_MB_average) by (job)
          * sum(netdata_system_ram_MB_average{dimension=~"free|cached"}) by (job) < 30
        for: 1m
        annotations:
          description: '{{ $labels.job }} memory usage is {{ humanize $value}}%.'
          summary: Memory alert for container node '{{ $labels.job }}'

      - alert: node_low_root_filesystem_space_20
        expr: 100 / sum(netdata_disk_space_GB_average{family="/"}) by (job)
          * sum(netdata_disk_space_GB_average{family="/",dimension=~"avail|cached"}) by (job) < 20
        for: 1m
        annotations:
          description: '{{ $labels.job }} root filesystem space is {{ humanize $value}}%.'
          summary: Root filesystem alert for container node '{{ $labels.job }}'

      - alert: node_root_filesystem_fill_rate_6h
        expr: predict_linear(netdata_disk_space_GB_average{family="/",dimension=~"avail|cached"}[1h], 6 * 3600) < 0
        for: 1h
        labels:
          severity: critical
        annotations:
          description: Container node {{ $labels.job }} root filesystem is going to fill up in 6h.
          summary: Disk fill alert for Swarm node '{{ $labels.job }}'

Install prometheus.service

Save this service file as /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus Server
AssertPathExists=/opt/prometheus

[Service]
Type=simple
WorkingDirectory=/opt/prometheus
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --log.level=info
ExecReload=/bin/kill -SIGHUP $MAINPID
ExecStop=/bin/kill -SIGINT $MAINPID

[Install]
WantedBy=multi-user.target
Start Prometheus
sudo systemctl start prometheus
sudo systemctl enable prometheus

Prometheus should now start and listen on port 9090. Attempt to head there with your browser.

If everything is working correctly when you fetch http://your.prometheus.ip:9090 you will see a 'Status' tab. Click this and click on 'targets' We should see the Netdata host as a scraped target.


Netdata support for Prometheus

Before explaining the changes, we have to understand the key differences between Netdata and Prometheus.

understanding Netdata metrics

charts

Each chart in Netdata has several properties (common to all its metrics):

  • chart_id - uniquely identifies a chart.

  • chart_name - a more human friendly name for chart_id, also unique.

  • context - this is the template of the chart. All disk I/O charts have the same context, all mysql requests charts have the same context, etc. This is used for alarm templates to match all the charts they should be attached to.

  • family groups a set of charts together. It is used as the submenu of the dashboard.

  • units is the units for all the metrics attached to the chart.

dimensions

Then each Netdata chart contains metrics called dimensions. All the dimensions of a chart have the same units of measurement, and are contextually in the same category (ie. the metrics for disk bandwidth are read and write and they are both in the same chart).

Netdata data source

Netdata can send metrics to Prometheus from 3 data sources:

  • as collected or raw - this data source sends the metrics to Prometheus as they are collected. No conversion is done by Netdata. The latest value for each metric is just given to Prometheus. This is the most preferred method by Prometheus, but it is also the harder to work with. To work with this data source, you will need to understand how to get meaningful values out of them.

    The format of the metrics is: CONTEXT{chart="CHART",family="FAMILY",dimension="DIMENSION"}.

    If the metric is a counter (incremental in Netdata lingo), _total is appended the context.

    Unlike Prometheus, Netdata allows each dimension of a chart to have a different algorithm and conversion constants (multiplier and divisor). In this case, that the dimensions of a charts are heterogeneous, Netdata will use this format: CONTEXT_DIMENSION{chart="CHART",family="FAMILY"}

  • average - this data source uses the Netdata database to send the metrics to Prometheus as they are presented on the Netdata dashboard. So, all the metrics are sent as gauges, at the units they are presented in the Netdata dashboard charts. This is the easiest to work with.

    The format of the metrics is: CONTEXT_UNITS_average{chart="CHART",family="FAMILY",dimension="DIMENSION"}.

    When this source is used, Netdata keeps track of the last access time for each Prometheus server fetching the metrics. This last access time is used at the subsequent queries of the same Prometheus server to identify the time-frame the average will be calculated.

    So, no matter how frequently Prometheus scrapes Netdata, it will get all the database data. To identify each Prometheus server, Netdata uses by default the IP of the client fetching the metrics.

    If there are multiple Prometheus servers fetching data from the same Netdata, using the same IP, each Prometheus server can append server=NAME to the URL. Netdata will use this NAME to uniquely identify the Prometheus server.

  • sum or volume, is like average but instead of averaging the values, it sums them.

    The format of the metrics is: CONTEXT_UNITS_sum{chart="CHART",family="FAMILY",dimension="DIMENSION"}. All the other operations are the same with average.

    To change the data source to sum or as-collected you need to provide the source parameter in the request URL. e.g.: http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus&help=yes&source=as-collected

    Keep in mind that early versions of Netdata were sending the metrics as: CHART_DIMENSION{}.

Querying Metrics

Fetch with your web browser this URL:

http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus&help=yes

(replace your.netdata.ip with the ip or hostname of your Netdata server)

Netdata will respond with all the metrics it sends to Prometheus.

If you search that page for "system.cpu" you will find all the metrics Netdata is exporting to Prometheus for this chart. system.cpu is the chart name on the Netdata dashboard (on the Netdata dashboard all charts have a text heading such as : Total CPU utilization (system.cpu). What we are interested here in the chart name: system.cpu).

Searching for "system.cpu" reveals:

# COMMENT homogeneous chart "system.cpu", context "system.cpu", family "cpu", units "percentage"
# COMMENT netdata_system_cpu_percentage_average: dimension "guest_nice", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="guest_nice"} 0.0000000 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "guest", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="guest"} 1.7837326 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "steal", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="steal"} 0.0000000 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "softirq", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="softirq"} 0.5275442 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "irq", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="irq"} 0.2260836 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "user", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="user"} 2.3362762 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "system", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 1.7961062 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "nice", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="nice"} 0.0000000 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "iowait", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="iowait"} 0.9671802 1500066662000
# COMMENT netdata_system_cpu_percentage_average: dimension "idle", value is percentage, gauge, dt 1500066653 to 1500066662 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="idle"} 92.3630770 1500066662000

(Netdata response for system.cpu with source=average)

In average or sum data sources, all values are normalized and are reported to Prometheus as gauges. Now, use the 'expression' text form in Prometheus. Begin to type the metrics we are looking for: netdata_system_cpu. You should see that the text form begins to auto-fill as Prometheus knows about this metric.

If the data source was as collected, the response would be:

# COMMENT homogeneous chart "system.cpu", context "system.cpu", family "cpu", units "percentage"
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "guest_nice", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="guest_nice"} 0 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "guest", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="guest"} 63945 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "steal", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="steal"} 0 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "softirq", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="softirq"} 8295 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "irq", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="irq"} 4079 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "user", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="user"} 116488 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "system", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="system"} 35084 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "nice", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="nice"} 505 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "iowait", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="iowait"} 23314 1500066716438
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "idle", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="idle"} 918470 1500066716438

(Netdata response for system.cpu with source=as-collected)

For more information check Prometheus documentation.

Streaming data from upstream hosts

The format=prometheus parameter only exports the host's Netdata metrics. If you are using the parent-child functionality of Netdata this ignores any upstream hosts - so you should consider using the below in your prometheus.yml:

    metrics_path: '/api/v1/allmetrics'
    params:
      format: [prometheus_all_hosts]
    honor_labels: true

This will report all upstream host data, and honor_labels will make Prometheus take note of the instance names provided.

Timestamps

To pass the metrics through Prometheus pushgateway, Netdata supports the option &timestamps=no to send the metrics without timestamps.

Netdata host variables

Netdata collects various system configuration metrics, like the max number of TCP sockets supported, the max number of files allowed system-wide, various IPC sizes, etc. These metrics are not exposed to Prometheus by default.

To expose them, append variables=yes to the Netdata URL.

TYPE and HELP

To save bandwidth, and because Prometheus does not use them anyway, # TYPE and # HELP lines are suppressed. If wanted they can be re-enabled via types=yes and help=yes, e.g. /api/v1/allmetrics?format=prometheus&types=yes&help=yes

Note that if enabled, the # TYPE and # HELP lines are repeated for every occurrence of a metric, which goes against the Prometheus documentation's specification for these lines.

Names and IDs

Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and names are human friendly labels (also unique).

Most charts and metrics have the same ID and name, but in several cases they are different: disks with device-mapper, interrupts, QoS classes, statsd synthetic charts, etc.

The default is controlled in exporting.conf:

[prometheus:exporter]
	send names instead of ids = yes | no

You can overwrite it from Prometheus, by appending to the URL:

  • &names=no to get IDs (the old behaviour)
  • &names=yes to get names

Filtering metrics sent to Prometheus

Netdata can filter the metrics it sends to Prometheus with this setting:

[prometheus:exporter]
	send charts matching = *

This settings accepts a space separated list of simple patterns to match the charts to be sent to Prometheus. Each pattern can use * as wildcard, any number of times (e.g *a*b*c* is valid). Patterns starting with ! give a negative match (e.g !*.bad users.* groups.* will send all the users and groups except bad user and bad group). The order is important: the first match (positive or negative) left to right, is used.

Changing the prefix of Netdata metrics

Netdata sends all metrics prefixed with netdata_. You can change this in netdata.conf, like this:

[prometheus:exporter]
	prefix = netdata

It can also be changed from the URL, by appending &prefix=netdata.

Metric Units

The default source average adds the unit of measurement to the name of each metric (e.g. _KiB_persec). To hide the units and get the same metric names as with the other sources, append to the URL &hideunits=yes.

The units were standardized in v1.12, with the effect of changing the metric names. To get the metric names as they were before v1.12, append to the URL &oldunits=yes

Accuracy of average and sum data sources

When the data source is set to average or sum, Netdata remembers the last access of each client accessing Prometheus metrics and uses this last access time to respond with the average or sum of all the entries in the database since that. This means that Prometheus servers are not losing data when they access Netdata with data source = average or sum.

To uniquely identify each Prometheus server, Netdata uses the IP of the client accessing the metrics. If however the IP is not good enough for identifying a single Prometheus server (e.g. when Prometheus servers are accessing Netdata through a web proxy, or when multiple Prometheus servers are NATed to a single IP), each Prometheus may append &server=NAME to the URL. This NAME is used by Netdata to uniquely identify each Prometheus server and keep track of its last access time.