mirror of
https://github.com/netdata/netdata.git
synced 2025-05-02 08:20:10 +00:00
src dir docs pass (#18670)
This commit is contained in:
parent
64d33e6eda
commit
0213967d71
28 changed files with 732 additions and 864 deletions
src
aclk
claim
cli
collectors
REFERENCE.md
apps.plugin
cgroups.plugin
charts.d.plugin
ebpf.plugin
freebsd.plugin
log2journal
proc.plugin
profile.plugin
python.d.plugin
systemd-journal.plugin
daemon
exporting
go/plugin/go.d
health
libnetdata
registry
|
@ -10,7 +10,7 @@ The Cloud App lives at app.netdata.cloud which currently resolves to the followi
|
|||
- 44.207.131.212
|
||||
- 44.196.50.41
|
||||
|
||||
> ### Caution
|
||||
> **Caution**
|
||||
>
|
||||
>This list of IPs can change without notice, we strongly advise you to whitelist following domains `app.netdata.cloud`, `mqtt.netdata.cloud`, if this is not an option in your case always verify the current domain resolution (e.g via the `host` command).
|
||||
|
||||
|
@ -34,7 +34,8 @@ If your Agent needs to use a proxy to access the internet, you must [set up a pr
|
|||
connecting to cloud](/src/claim/README.md).
|
||||
|
||||
You can configure following keys in the `netdata.conf` section `[cloud]`:
|
||||
```
|
||||
|
||||
```text
|
||||
[cloud]
|
||||
statistics = yes
|
||||
query thread count = 2
|
||||
|
|
|
@ -102,7 +102,8 @@ cd /var/lib/netdata # Replace with your Netdata library directory, if not /var
|
|||
sudo rm -rf cloud.d/
|
||||
```
|
||||
|
||||
> IMPORTANT:<br/>
|
||||
> **IMPORTANT**
|
||||
>
|
||||
> Keep in mind that the Agent will be **re-claimed automatically** if the environment variables or `claim.conf` exist when the agent is restarted.
|
||||
|
||||
This node no longer has access to the credentials it was used when connecting to Netdata Cloud via the ACLK. You will
|
||||
|
|
|
@ -18,9 +18,7 @@ Available commands:
|
|||
| `ping` | Checks the Agent's status. If the Agent is alive, it exits with status code 0 and prints 'pong' to standard output. Exits with status code 255 otherwise. |
|
||||
| `aclk-state [json]` | Return the current state of ACLK and Cloud connection. Optionally in JSON. |
|
||||
| `dumpconfig` | Display the current netdata.conf configuration. |
|
||||
| `remove-stale-node <node_id \| machine_guid \| hostname \| ALL_NODES>` | Unregisters a stale child Node, removing it from the parent Node's UI and Netdata Cloud. This is useful for ephemeral Nodes that may stop streaming and remain visible as stale. |
|
||||
| `remove-stale-node <node_id \| machine_guid \| hostname \| ALL_NODES>` | Un-registers a stale child Node, removing it from the parent Node's UI and Netdata Cloud. This is useful for ephemeral Nodes that may stop streaming and remain visible as stale. |
|
||||
| `version` | Display the Netdata Agent version. |
|
||||
|
||||
See also the Netdata daemon [command line options](/src/daemon/README.md#command-line-options).
|
||||
|
||||
|
||||
|
|
|
@ -1,12 +1,3 @@
|
|||
<!--
|
||||
title: "Collectors configuration reference"
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/REFERENCE.md"
|
||||
sidebar_label: "Collectors configuration"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "Tasks"
|
||||
learn_rel_path: "Configuration"
|
||||
-->
|
||||
|
||||
# Collectors configuration reference
|
||||
|
||||
The list of supported collectors can be found in [the documentation](/src/collectors/COLLECTORS.md),
|
||||
|
@ -79,8 +70,7 @@ Within this file, you can either disable the orchestrator entirely (`enabled: ye
|
|||
enable/disable it with `yes` and `no` settings. Uncomment any line you change to ensure the Netdata daemon reads it on
|
||||
start.
|
||||
|
||||
After you make your changes, restart the Agent with `sudo systemctl restart netdata`, or the [appropriate
|
||||
method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system.
|
||||
After you make your changes, restart the Agent with the [appropriate method](/docs/netdata-agent/start-stop-restart.md) for your system.
|
||||
|
||||
## Configure a collector
|
||||
|
||||
|
@ -117,8 +107,7 @@ according to your needs. In addition, every collector's documentation shows the
|
|||
configure that collector. Uncomment any line you change to ensure the collector's orchestrator or the Netdata daemon
|
||||
read it on start.
|
||||
|
||||
After you make your changes, restart the Agent with `sudo systemctl restart netdata`, or the [appropriate
|
||||
method](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) for your system.
|
||||
After you make your changes, restart the Agent with the [appropriate method](/docs/netdata-agent/start-stop-restart.md) for your system.
|
||||
|
||||
## Troubleshoot a collector
|
||||
|
||||
|
|
|
@ -1,12 +1,3 @@
|
|||
<!--
|
||||
title: "Application monitoring (apps.plugin)"
|
||||
sidebar_label: "Application monitoring "
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/apps.plugin/README.md"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "References"
|
||||
learn_rel_path: "Integrations/Monitor/System metrics"
|
||||
-->
|
||||
|
||||
# Applications monitoring (apps.plugin)
|
||||
|
||||
`apps.plugin` monitors the resources utilization of all processes running.
|
||||
|
@ -16,15 +7,15 @@ learn_rel_path: "Integrations/Monitor/System metrics"
|
|||
`apps.plugin` aggregates processes in three distinct ways to provide a more insightful
|
||||
breakdown of resource utilization:
|
||||
|
||||
- **Tree** or **Category**: Grouped by their position in the process tree.
|
||||
- **Tree** or **Category**: Grouped by their position in the process tree.
|
||||
This is customizable and allows aggregation by process managers and individual
|
||||
processes of interest. Allows also renaming the processes for presentation purposes.
|
||||
|
||||
- **User**: Grouped by the effective user (UID) under which the processes run.
|
||||
- **User**: Grouped by the effective user (UID) under which the processes run.
|
||||
|
||||
- **Group**: Grouped by the effective group (GID) under which the processes run.
|
||||
- **Group**: Grouped by the effective group (GID) under which the processes run.
|
||||
|
||||
## Short-Lived Process Handling
|
||||
## Short-Lived Process Handling
|
||||
|
||||
`apps.plugin` accounts for resource utilization of both running and exited processes,
|
||||
capturing the impact of processes that spawn short-lived subprocesses, such as shell
|
||||
|
@ -40,7 +31,7 @@ Each type of aggregation is presented as a different section on the dashboard.
|
|||
### Custom Process Groups (Apps)
|
||||
|
||||
In this section, apps.plugin summarizes the resources consumed by all processes, grouped based
|
||||
on the groups provided in `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](docs/netdata-agent/configuration/README.md) script.
|
||||
on the groups provided in `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
|
||||
|
||||
For this section, `apps.plugin` builds a process tree (much like `ps fax` does in Linux), and groups
|
||||
processes together (evaluating both child and parent processes) so that the result is always a list with
|
||||
|
@ -119,7 +110,7 @@ In such cases, you many need to lower its data collection frequency.
|
|||
|
||||
To do this, edit `/etc/netdata/netdata.conf` and find this section:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:apps]
|
||||
# update every = 1
|
||||
# command options =
|
||||
|
@ -130,7 +121,7 @@ its CPU resources will be cut in half, and data collection will be once every 2
|
|||
|
||||
## Configuration
|
||||
|
||||
The configuration file is `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](docs/netdata-agent/configuration/README.md) script.
|
||||
The configuration file is `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
|
||||
|
||||
### Configuring process managers
|
||||
|
||||
|
@ -140,7 +131,7 @@ consider all their sub-processes important to monitor.
|
|||
|
||||
Process managers are configured in `apps_groups.conf` with the prefix `managers:`, like this:
|
||||
|
||||
```
|
||||
```txt
|
||||
managers: process1 process2 process3
|
||||
```
|
||||
|
||||
|
@ -194,7 +185,7 @@ There are a few command line options you can pass to `apps.plugin`. The list of
|
|||
options can be acquired with the `--help` flag. The options can be set in the `netdata.conf` using the [`edit-config` script](/docs/netdata-agent/configuration/README.md).
|
||||
For example, to disable user and user group charts you would set:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:apps]
|
||||
command options = without-users without-groups
|
||||
```
|
||||
|
@ -246,7 +237,7 @@ but it will not be able to collect all the information.
|
|||
|
||||
You can create badges that you can embed anywhere you like, with URLs like this:
|
||||
|
||||
```
|
||||
```txt
|
||||
https://your.netdata.ip:19999/api/v1/badge.svg?chart=apps.processes&dimensions=myapp&value_color=green%3E0%7Cred
|
||||
```
|
||||
|
||||
|
@ -275,7 +266,7 @@ Examples below for process group `sql`:
|
|||
- Open Pipes 
|
||||
- Open Sockets 
|
||||
|
||||
For more information about badges check [Generating Badges](/src/web/api/v2/api_v3_badge/README.md)
|
||||
<!-- For more information about badges check [Generating Badges](/src/web/api/v2/api_v3_badge/README.md) -->
|
||||
|
||||
## Comparison with console tools
|
||||
|
||||
|
@ -302,7 +293,7 @@ If you check the total system CPU utilization, it says there is no idle CPU at a
|
|||
fails to provide a breakdown of the CPU consumption in the system. The sum of the CPU utilization
|
||||
of all processes reported by `top`, is 15.6%.
|
||||
|
||||
```
|
||||
```txt
|
||||
top - 18:46:28 up 3 days, 20:14, 2 users, load average: 0.22, 0.05, 0.02
|
||||
Tasks: 76 total, 2 running, 74 sleeping, 0 stopped, 0 zombie
|
||||
%Cpu(s): 32.8 us, 65.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 1.3 hi, 0.3 si, 0.0 st
|
||||
|
@ -322,7 +313,7 @@ KiB Swap: 0 total, 0 free, 0 used. 753712 avail Mem
|
|||
|
||||
Exactly like `top`, `htop` is providing an incomplete breakdown of the system CPU utilization.
|
||||
|
||||
```
|
||||
```bash
|
||||
CPU[||||||||||||||||||||||||100.0%] Tasks: 27, 11 thr; 2 running
|
||||
Mem[||||||||||||||||||||85.4M/993M] Load average: 1.16 0.88 0.90
|
||||
Swp[ 0K/0K] Uptime: 3 days, 21:37:03
|
||||
|
@ -340,7 +331,7 @@ Exactly like `top`, `htop` is providing an incomplete breakdown of the system CP
|
|||
|
||||
`atop` also fails to break down CPU usage.
|
||||
|
||||
```
|
||||
```bash
|
||||
ATOP - localhost 2016/12/10 20:11:27 ----------- 10s elapsed
|
||||
PRC | sys 1.13s | user 0.43s | #proc 75 | #zombie 0 | #exit 5383 |
|
||||
CPU | sys 67% | user 31% | irq 2% | idle 0% | wait 0% |
|
||||
|
@ -366,7 +357,7 @@ per process utilization.
|
|||
|
||||
Note also, that being a `python` program, `glances` uses 1.6% CPU while it runs.
|
||||
|
||||
```
|
||||
```bash
|
||||
localhost Uptime: 3 days, 21:42:00
|
||||
|
||||
CPU [100.0%] CPU 100.0% MEM 23.7% SWAP 0.0% LOAD 1-core
|
||||
|
@ -388,8 +379,8 @@ FILE SYS Used Total 0.3 2.1 7009 netdata 0 S /usr/sbin/netdata
|
|||
|
||||
### why does this happen?
|
||||
|
||||
All the console tools report usage based on the processes found running *at the moment they
|
||||
examine the process tree*. So, they see just one `ls` command, which is actually very quick
|
||||
All the console tools report usage based on the processes found running _at the moment they
|
||||
examine the process tree_. So, they see just one `ls` command, which is actually very quick
|
||||
with minor CPU utilization. But the shell, is spawning hundreds of them, one after another
|
||||
(much like shell scripts do).
|
||||
|
||||
|
@ -398,12 +389,12 @@ with minor CPU utilization. But the shell, is spawning hundreds of them, one aft
|
|||
The total CPU utilization of the system:
|
||||
|
||||

|
||||
<br/>***Figure 1**: The system overview section at Netdata, just a few seconds after the command was run*
|
||||
<br/>_**Figure 1**: The system overview section at Netdata, just a few seconds after the command was run_
|
||||
|
||||
And at the applications `apps.plugin` breaks down CPU usage per application:
|
||||
|
||||

|
||||
<br/>***Figure 2**: The Applications section at Netdata, just a few seconds after the command was run*
|
||||
<br/>_**Figure 2**: The Applications section at Netdata, just a few seconds after the command was run_
|
||||
|
||||
So, the `ssh` session is using 95% CPU time.
|
||||
|
||||
|
|
|
@ -1,12 +1,3 @@
|
|||
<!--
|
||||
title: "Monitor Cgroups (cgroups.plugin)"
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/cgroups.plugin/README.md"
|
||||
sidebar_label: "Monitor Cgroups"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "References"
|
||||
learn_rel_path: "Integrations/Monitor/Virtualized environments/Containers"
|
||||
-->
|
||||
|
||||
# Monitor Cgroups (cgroups.plugin)
|
||||
|
||||
You can monitor containers and virtual machines using **cgroups**.
|
||||
|
|
|
@ -9,7 +9,6 @@
|
|||
|
||||
To better understand the guidelines and the API behind our External plugins, please have a look at the [Introduction to External plugins](/src/plugins.d/README.md) prior to reading this page.
|
||||
|
||||
|
||||
`charts.d.plugin` has been designed so that the actual script that will do data collection will be permanently in
|
||||
memory, collecting data with as little overheads as possible
|
||||
(i.e. initialize once, repeatedly collect values with minimal overhead).
|
||||
|
@ -121,7 +120,7 @@ Using the above, if the command `mysql` is not available in the system, the `mys
|
|||
`fixid()` will get a string and return a properly formatted id for a chart or dimension.
|
||||
|
||||
This is an expensive function that should not be used in `X_update()`.
|
||||
You can keep the generated id in a BASH associative array to have the values availables in `X_update()`, like this:
|
||||
You can keep the generated id in a BASH associative array to have the values available in `X_update()`, like this:
|
||||
|
||||
```sh
|
||||
declare -A X_ids=()
|
||||
|
|
|
@ -1,13 +1,3 @@
|
|||
<!--
|
||||
title: "Kernel traces/metrics (eBPF) monitoring with Netdata"
|
||||
description: "Use Netdata's extended Berkeley Packet Filter (eBPF) collector to monitor kernel-level metrics about yourcomplex applications with per-second granularity."
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/ebpf.plugin/README.md"
|
||||
sidebar_label: "Kernel traces/metrics (eBPF)"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "References"
|
||||
learn_rel_path: "Integrations/Monitor/System metrics"
|
||||
-->
|
||||
|
||||
# Kernel traces/metrics (eBPF) collector
|
||||
|
||||
The Netdata Agent provides many [eBPF](https://ebpf.io/what-is-ebpf/) programs to help you troubleshoot and debug how applications interact with the Linux kernel. The `ebpf.plugin` uses [tracepoints, trampoline, and2 kprobes](#how-netdata-collects-data-using-probes-and-tracepoints) to collect a wide array of high value data about the host that would otherwise be impossible to capture.
|
||||
|
@ -45,6 +35,7 @@ If your Agent is v1.22 or older, you may to enable the collector yourself.
|
|||
To enable or disable the entire eBPF collector:
|
||||
|
||||
1. Navigate to the [Netdata config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).
|
||||
|
||||
```bash
|
||||
cd /etc/netdata
|
||||
```
|
||||
|
@ -65,14 +56,16 @@ To enable or disable the entire eBPF collector:
|
|||
|
||||
### Configure the eBPF collector
|
||||
|
||||
You can configure the eBPF collector's behavior to fine-tune which metrics you receive and [optimize performance]\(#performance opimization).
|
||||
You can configure the eBPF collector's behavior to fine-tune which metrics you receive and [optimize performance](#performance-opimization).
|
||||
|
||||
To edit the `ebpf.d.conf`:
|
||||
|
||||
1. Navigate to the [Netdata config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).
|
||||
|
||||
```bash
|
||||
cd /etc/netdata
|
||||
```
|
||||
|
||||
2. Use the [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script to edit [`ebpf.d.conf`](https://github.com/netdata/netdata/blob/master/src/collectors/ebpf.plugin/ebpf.d.conf).
|
||||
|
||||
```bash
|
||||
|
@ -133,10 +126,7 @@ If you do not need to monitor specific metrics for your `cgroups`, you can enabl
|
|||
|
||||
#### Maps per Core
|
||||
|
||||
When netdata is running on kernels newer than `4.6` users are allowed to modify how the `ebpf.plugin` creates maps (hash or
|
||||
array). When `maps per core` is defined as `yes`, plugin will create a map per core on host, on the other hand,
|
||||
when the value is set as `no` only one hash table will be created, this option will use less memory, but it also can
|
||||
increase overhead for processes.
|
||||
When netdata is running on kernels newer than `4.6` users are allowed to modify how the `ebpf.plugin` creates maps (hash or array). When `maps per core` is defined as `yes`, plugin will create a map per core on host, on the other hand, when the value is set as `no` only one hash table will be created, this option will use less memory, but it also can increase overhead for processes.
|
||||
|
||||
#### Collect PID
|
||||
|
||||
|
@ -273,9 +263,11 @@ You can configure each thread of the eBPF data collector. This allows you to ove
|
|||
To configure an eBPF thread:
|
||||
|
||||
1. Navigate to the [Netdata config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).
|
||||
|
||||
```bash
|
||||
cd /etc/netdata
|
||||
```
|
||||
|
||||
2. Use the [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script to edit a thread configuration file. The following configuration files are available:
|
||||
|
||||
- `network.conf`: Configuration for the [`network` thread](#network-configuration). This config file overwrites the global options and also
|
||||
|
@ -408,7 +400,6 @@ You can run our helper script to determine whether your system can support eBPF
|
|||
curl -sSL https://raw.githubusercontent.com/netdata/kernel-collector/master/tools/check-kernel-config.sh | sudo bash
|
||||
```
|
||||
|
||||
|
||||
If you see a warning about a missing kernel
|
||||
configuration (`KPROBES KPROBES_ON_FTRACE HAVE_KPROBES BPF BPF_SYSCALL BPF_JIT`), you will need to recompile your kernel
|
||||
to support this configuration. The process of recompiling Linux kernels varies based on your distribution and version.
|
||||
|
@ -899,8 +890,7 @@ node is experiencing high memory usage and there is no obvious culprit to be fou
|
|||
|
||||
If with these changes you still suspect eBPF using too much memory, and there is no obvious culprit to be found
|
||||
in the `apps.mem` chart, consider testing for high kernel memory usage by [disabling eBPF monitoring](#configuring-ebpfplugin).
|
||||
Next, [restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation) with
|
||||
`sudo systemctl restart netdata` to see if system memory usage (see the `system.ram` chart) has dropped significantly.
|
||||
Next, [restart Netdata](/docs/netdata-agent/start-stop-restart.md) to see if system memory usage (see the `system.ram` chart) has dropped significantly.
|
||||
|
||||
Beginning with `v1.31`, kernel memory usage is configurable via the [`pid table size` setting](#pid-table-size)
|
||||
in `ebpf.conf`.
|
||||
|
@ -981,7 +971,7 @@ a feature called "lockdown," which may affect `ebpf.plugin` depending how the ke
|
|||
shows how the lockdown module impacts `ebpf.plugin` based on the selected options:
|
||||
|
||||
| Enforcing kernel lockdown | Enable lockdown LSM early in init | Default lockdown mode | Can `ebpf.plugin` run with this? |
|
||||
| :------------------------ | :-------------------------------- | :-------------------- | :------------------------------- |
|
||||
|:--------------------------|:----------------------------------|:----------------------|:---------------------------------|
|
||||
| YES | NO | NO | YES |
|
||||
| YES | Yes | None | YES |
|
||||
| YES | Yes | Integrity | YES |
|
||||
|
|
|
@ -1,16 +1,5 @@
|
|||
<!--
|
||||
title: "FreeBSD system metrics (freebsd.plugin)"
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/freebsd.plugin/README.md"
|
||||
sidebar_label: "FreeBSD system metrics (freebsd.plugin)"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "References"
|
||||
learn_rel_path: "Integrations/Monitor/System metrics"
|
||||
-->
|
||||
|
||||
# FreeBSD system metrics (freebsd.plugin)
|
||||
|
||||
Collects resource usage and performance data on FreeBSD systems
|
||||
|
||||
By default, Netdata will enable monitoring metrics for disks, memory, and network only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after Netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though). Use `yes` instead of `auto` in plugin configuration sections to enable these charts permanently. You can also set the `enable zero metrics` option to `yes` in the `[global]` section which enables charts with zero metrics for all internal Netdata plugins.
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
# log2journal
|
||||
|
||||
`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.
|
||||
|
@ -11,7 +10,6 @@ The result is like this: nginx logs into systemd-journal:
|
|||
|
||||

|
||||
|
||||
|
||||
The overall process looks like this:
|
||||
|
||||
```bash
|
||||
|
@ -23,7 +21,8 @@ tail -F /var/log/nginx/*.log |\ # outputs log lines
|
|||
These are the steps:
|
||||
|
||||
1. `tail -F /var/log/nginx/*.log`<br/>this command will tail all `*.log` files in `/var/log/nginx/`. We use `-F` instead of `-f` to ensure that files will still be tailed after log rotation.
|
||||
2. `log2joural` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
|
||||
2. `log2journal` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
|
||||
|
||||
```bash
|
||||
KEY1=VALUE1 # << start of the first log line
|
||||
KEY2=VALUE2
|
||||
|
@ -31,8 +30,8 @@ These are the steps:
|
|||
KEY1=VALUE1 # << start of the second log line
|
||||
KEY2=VALUE2
|
||||
```
|
||||
3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
|
||||
|
||||
3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
|
||||
|
||||
## Processing pipeline
|
||||
|
||||
|
@ -81,7 +80,7 @@ We have an nginx server logging in this standard combined log format:
|
|||
|
||||
First, let's find the right pattern for `log2journal`. We ask ChatGPT:
|
||||
|
||||
```
|
||||
```txt
|
||||
My nginx log uses this log format:
|
||||
|
||||
log_format access '$remote_addr - $remote_user [$time_local] '
|
||||
|
@ -122,11 +121,11 @@ ChatGPT replies with this:
|
|||
Let's see what the above says:
|
||||
|
||||
1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it.
|
||||
1. `^`: match the beginning of the line
|
||||
2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
|
||||
3. `\s`: match a space
|
||||
4. `-`: match a hyphen
|
||||
5. and so on...
|
||||
2. `^`: match the beginning of the line
|
||||
3. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
|
||||
4. `\s`: match a space
|
||||
5. `-`: match a hyphen
|
||||
6. and so on...
|
||||
|
||||
We edit `nginx.yaml` and add it, like this:
|
||||
|
||||
|
@ -427,7 +426,6 @@ Rewrite rules are powerful. You can have named groups in them, like in the main
|
|||
|
||||
Now the message is ready to be sent to a systemd-journal. For this we use `systemd-cat-native`. This command can send such messages to a journal running on the localhost, a local journal namespace, or a `systemd-journal-remote` running on another server. By just appending `| systemd-cat-native` to the command, the message will be sent to the local journal.
|
||||
|
||||
|
||||
```bash
|
||||
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native
|
||||
# no output
|
||||
|
@ -486,7 +484,7 @@ tail -F /var/log/nginx/access.log |\
|
|||
|
||||
Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path):
|
||||
|
||||
```
|
||||
```txt
|
||||
[Unit]
|
||||
Description=NGINX Log to Systemd Journal
|
||||
After=network.target
|
||||
|
@ -524,7 +522,6 @@ Netdata will automatically pick the new namespace and present it at the list of
|
|||
|
||||
You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](/src/libnetdata/log/systemd-cat-native.md).
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
`log2journal` and `systemd-cat-native` have been designed to process hundreds of thousands of log lines per second. They both utilize high performance indexing hashtables to speed up lookups, and queues that dynamically adapt to the number of log lines offered, offering a smooth and fast experience under all conditions.
|
||||
|
@ -537,13 +534,13 @@ The key characteristic that can influence the performance of a logs processing p
|
|||
|
||||
Especially the pattern `.*` seems to have the biggest impact on CPU consumption, especially when multiple `.*` are on the same pattern.
|
||||
|
||||
Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.* ` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.
|
||||
Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.*` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.
|
||||
|
||||
### Performance of systemd journals
|
||||
|
||||
The ingestion pipeline of logs, from `tail` to `systemd-journald` or `systemd-journal-remote` is very efficient in all aspects. CPU utilization is better than any other system we tested and RAM usage is independent of the number of fields indexed, making systemd-journal one of the most efficient log management engines for ingesting high volumes of structured logs.
|
||||
|
||||
High fields cardinality does not have a noticable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.
|
||||
High fields cardinality does not have a noticeable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.
|
||||
|
||||
However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries.
|
||||
|
||||
|
@ -578,7 +575,7 @@ If on other hand your organization prefers to maintain the full logs and control
|
|||
|
||||
## `log2journal` options
|
||||
|
||||
```
|
||||
```txt
|
||||
|
||||
Netdata log2journal v1.43.0-341-gdac4df856
|
||||
|
||||
|
|
|
@ -118,7 +118,7 @@ mv netdata.conf.new netdata.conf
|
|||
|
||||
Then edit `netdata.conf` and find the following section. This is the basic plugin configuration.
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/diskstats]
|
||||
# enable new disks detected at runtime = yes
|
||||
# performance metrics for physical disks = auto
|
||||
|
@ -152,7 +152,7 @@ Then edit `netdata.conf` and find the following section. This is the basic plugi
|
|||
|
||||
For each virtual disk, physical disk and partition you will have a section like this:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/diskstats:sda]
|
||||
# enable = yes
|
||||
# enable performance metrics = auto
|
||||
|
@ -180,14 +180,14 @@ After saving `/etc/netdata/netdata.conf`, restart your Netdata to apply them.
|
|||
|
||||
You can pretty easy disable performance metrics for individual device, for ex.:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/diskstats:sda]
|
||||
enable performance metrics = no
|
||||
```
|
||||
|
||||
But sometimes you need disable performance metrics for all devices with the same type, to do it you need to figure out device type from `/proc/diskstats` for ex.:
|
||||
|
||||
```
|
||||
```txt
|
||||
7 0 loop0 1651 0 3452 168 0 0 0 0 0 8 168
|
||||
7 1 loop1 4955 0 11924 880 0 0 0 0 0 64 880
|
||||
7 2 loop2 36 0 216 4 0 0 0 0 0 4 4
|
||||
|
@ -200,7 +200,7 @@ But sometimes you need disable performance metrics for all devices with the same
|
|||
All zram devices starts with `251` number and all loop devices starts with `7`.
|
||||
So, to disable performance metrics for all loop devices you could add `performance metrics for disks with major 7 = no` to `[plugin:proc:/proc/diskstats]` section.
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/diskstats]
|
||||
performance metrics for disks with major 7 = no
|
||||
```
|
||||
|
@ -213,30 +213,30 @@ So, to disable performance metrics for all loop devices you could add `performan
|
|||
|
||||
2. **Disks stats**
|
||||
|
||||
- total (number of devices array ideally would have)
|
||||
- inuse (number of devices currently are in use)
|
||||
- total (number of devices array ideally would have)
|
||||
- inuse (number of devices currently are in use)
|
||||
|
||||
3. **Mismatch count**
|
||||
|
||||
- unsynchronized blocks
|
||||
- unsynchronized blocks
|
||||
|
||||
4. **Current status**
|
||||
|
||||
- resync in percent
|
||||
- recovery in percent
|
||||
- reshape in percent
|
||||
- check in percent
|
||||
- resync in percent
|
||||
- recovery in percent
|
||||
- reshape in percent
|
||||
- check in percent
|
||||
|
||||
5. **Operation status** (if resync/recovery/reshape/check is active)
|
||||
|
||||
- finish in minutes
|
||||
- speed in megabytes/s
|
||||
- finish in minutes
|
||||
- speed in megabytes/s
|
||||
|
||||
6. **Nonredundant array availability**
|
||||
6. **Non-redundant array availability**
|
||||
|
||||
#### configuration
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/mdstat]
|
||||
# faulty devices = yes
|
||||
# nonredundant arrays availability = yes
|
||||
|
@ -402,13 +402,13 @@ You can set the following values for each configuration option:
|
|||
|
||||
There are several alerts defined in `health.d/net.conf`.
|
||||
|
||||
The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alerts can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a child or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alert with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](/src/health/REFERENCE.md#alert-line-families) line in the alert configuration. For example, if you want to disable the `inbound packets dropped` alert for `eth0`, set `families: !eth0 *` in the alert definition for `template: inbound_packets_dropped`.
|
||||
The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alerts can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a child or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alert with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the families line in the alert configuration. For example, if you want to disable the `inbound packets dropped` alert for `eth0`, set `families: !eth0 *` in the alert definition for `template: inbound_packets_dropped`.
|
||||
|
||||
#### configuration
|
||||
|
||||
Module configuration:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/net/dev]
|
||||
# filename to monitor = /proc/net/dev
|
||||
# path to get virtual interfaces = /sys/devices/virtual/net/%s
|
||||
|
@ -427,7 +427,7 @@ Module configuration:
|
|||
|
||||
Per interface configuration:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/proc/net/dev:enp0s3]
|
||||
# enabled = yes
|
||||
# virtual = no
|
||||
|
@ -444,8 +444,6 @@ Per interface configuration:
|
|||
|
||||

|
||||
|
||||
---
|
||||
|
||||
SYNPROXY is a TCP SYN packets proxy. It can be used to protect any TCP server (like a web server) from SYN floods and similar DDos attacks.
|
||||
|
||||
SYNPROXY is a netfilter module, in the Linux kernel (since version 3.12). It is optimized to handle millions of packets per second utilizing all CPUs available without any concurrency locking between the connections.
|
||||
|
@ -487,7 +485,7 @@ and metrics:
|
|||
|
||||
- capacity_now
|
||||
|
||||
2. Charge: The charge for the power supply, expressed as amphours.
|
||||
2. Charge: The charge for the power supply, expressed as amp-hours.
|
||||
|
||||
- charge_full_design
|
||||
- charge_full
|
||||
|
@ -511,9 +509,9 @@ and metrics:
|
|||
- voltage_min
|
||||
- voltage_min_design
|
||||
|
||||
#### configuration
|
||||
### configuration
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/sys/class/power_supply]
|
||||
# battery capacity = yes
|
||||
# battery charge = no
|
||||
|
@ -524,7 +522,7 @@ and metrics:
|
|||
# directory to monitor = /sys/class/power_supply
|
||||
```
|
||||
|
||||
#### notes
|
||||
### notes
|
||||
|
||||
- Most drivers provide at least the first chart. Battery powered ACPI
|
||||
compliant systems (like most laptops) provide all but the third, but do
|
||||
|
@ -568,7 +566,7 @@ If your vendor is supported, you'll also get HW-Counters statistics. These being
|
|||
|
||||
Default configuration will monitor only enabled infiniband ports, and refresh newly activated or created ports every 30 seconds
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/sys/class/infiniband]
|
||||
# dirname to monitor = /sys/class/infiniband
|
||||
# bandwidth counters = yes
|
||||
|
@ -604,12 +602,13 @@ The following charts will be provided:
|
|||
|
||||
The `drm` path can be configured if it differs from the default:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:/sys/class/drm]
|
||||
# directory to monitor = /sys/class/drm
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> **Note**
|
||||
>
|
||||
> Temperature, fan speed, voltage and power metrics for AMD GPUs can be monitored using the [Sensors](/src/go/plugin/go.d/modules/sensors/README.md) plugin.
|
||||
|
||||
## IPC
|
||||
|
@ -627,7 +626,7 @@ As far as the message queue charts are dynamic, sane limits are applied for the
|
|||
|
||||
### configuration
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:proc:ipc]
|
||||
# message queues = yes
|
||||
# semaphore totals = yes
|
||||
|
@ -636,5 +635,3 @@ As far as the message queue charts are dynamic, sane limits are applied for the
|
|||
# shm filename to monitor = /proc/sysvipc/shm
|
||||
# max dimensions in memory allowed = 50
|
||||
```
|
||||
|
||||
|
||||
|
|
|
@ -4,11 +4,11 @@ This plugin allows someone to backfill an agent with random data.
|
|||
|
||||
A user can specify:
|
||||
|
||||
- The number charts they want,
|
||||
- the number of dimensions per chart,
|
||||
- the desire update every collection frequency,
|
||||
- the number of seconds to backfill.
|
||||
- the number of collection threads.
|
||||
- The number charts they want,
|
||||
- the number of dimensions per chart,
|
||||
- the desire update every collection frequency,
|
||||
- the number of seconds to backfill.
|
||||
- the number of collection threads.
|
||||
|
||||
## Configuration
|
||||
|
||||
|
@ -16,7 +16,7 @@ Edit the `netdata.conf` configuration file using [`edit-config`](/docs/netdata-a
|
|||
|
||||
Scroll down to the `[plugin:profile]` section to find the available options:
|
||||
|
||||
```
|
||||
```txt
|
||||
[plugin:profile]
|
||||
update every = 5
|
||||
number of charts = 200
|
||||
|
|
|
@ -1,12 +1,3 @@
|
|||
<!--
|
||||
title: "python.d.plugin"
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/python.d.plugin/README.md"
|
||||
sidebar_label: "python.d.plugin"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "Tasks"
|
||||
learn_rel_path: "Developers/External plugins/python.d.plugin"
|
||||
-->
|
||||
|
||||
# python.d.plugin
|
||||
|
||||
`python.d.plugin` is a Netdata external plugin. It is an **orchestrator** for data collection modules written in `python`.
|
||||
|
@ -55,14 +46,14 @@ other_job:
|
|||
|
||||
## How to debug a python module
|
||||
|
||||
```
|
||||
```bash
|
||||
# become user netdata
|
||||
sudo su -s /bin/bash netdata
|
||||
```
|
||||
|
||||
Depending on where Netdata was installed, execute one of the following commands to trace the execution of a python module:
|
||||
|
||||
```
|
||||
```bash
|
||||
# execute the plugin in debug mode, for a specific module
|
||||
/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin <module> debug trace
|
||||
/usr/libexec/netdata/plugins.d/python.d.plugin <module> debug trace
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
# `systemd` journal plugin
|
||||
|
||||
[KEY FEATURES](#key-features) | [JOURNAL SOURCES](#journal-sources) | [JOURNAL FIELDS](#journal-fields) |
|
||||
|
|
|
@ -47,7 +47,7 @@ sudo systemctl enable --now systemd-journal-gatewayd.socket
|
|||
|
||||
To use it, open your web browser and navigate to:
|
||||
|
||||
```
|
||||
```txt
|
||||
http://server.ip:19531/browse
|
||||
```
|
||||
|
||||
|
|
|
@ -5,12 +5,14 @@ Given that attackers often try to hide their actions by modifying or deleting lo
|
|||
FSS provides administrators with a mechanism to identify any such unauthorized alterations.
|
||||
|
||||
## Importance
|
||||
|
||||
Logs are a crucial component of system monitoring and auditing. Ensuring their integrity means administrators can trust
|
||||
the data, detect potential breaches, and trace actions back to their origins. Traditional methods to maintain this
|
||||
integrity involve writing logs to external systems or printing them out. While these methods are effective, they are
|
||||
not foolproof. FSS offers a more streamlined approach, allowing for log verification directly on the local system.
|
||||
|
||||
## How FSS Works
|
||||
|
||||
FSS operates by "sealing" binary logs at regular intervals. This seal is a cryptographic operation, ensuring that any
|
||||
tampering with the logs prior to the sealing can be detected. If an attacker modifies logs before they are sealed,
|
||||
these changes become a permanent part of the sealed record, highlighting any malicious activity.
|
||||
|
@ -29,6 +31,7 @@ administrators to verify older seals. If logs are tampered with, verification wi
|
|||
breach.
|
||||
|
||||
## Enabling FSS
|
||||
|
||||
To enable FSS, use the following command:
|
||||
|
||||
```bash
|
||||
|
@ -43,6 +46,7 @@ journalctl --setup-keys --interval=10s
|
|||
```
|
||||
|
||||
## Verifying Journals
|
||||
|
||||
After enabling FSS, you can verify the integrity of your logs using the verification key:
|
||||
|
||||
```bash
|
||||
|
@ -52,6 +56,7 @@ journalctl --verify
|
|||
If any discrepancies are found, you'll be alerted, indicating potential tampering.
|
||||
|
||||
## Disabling FSS
|
||||
|
||||
Should you wish to disable FSS:
|
||||
|
||||
**Delete the Sealing Key**: This stops new log entries from being sealed.
|
||||
|
@ -66,7 +71,6 @@ journalctl --rotate
|
|||
journalctl --vacuum-time=1s
|
||||
```
|
||||
|
||||
|
||||
**Adjust Systemd Configuration (Optional)**: If you've made changes to facilitate FSS in `/etc/systemd/journald.conf`,
|
||||
consider reverting or adjusting those. Restart the systemd-journald service afterward:
|
||||
|
||||
|
@ -75,6 +79,7 @@ systemctl restart systemd-journald
|
|||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
FSS is a significant advancement in maintaining log integrity. While not a replacement for all traditional integrity
|
||||
methods, it offers a valuable tool in the battle against unauthorized log tampering. By integrating FSS into your log
|
||||
management strategy, you ensure a more transparent, reliable, and tamper-evident logging system.
|
||||
|
|
|
@ -46,9 +46,9 @@ sudo ./systemd-journal-self-signed-certs.sh "server1" "DNS:hostname1" "IP:10.0.0
|
|||
|
||||
Where:
|
||||
|
||||
- `server1` is the canonical name of the server. On newer systemd version, this name will be used by `systemd-journal-remote` and Netdata when you view the logs on the dashboard.
|
||||
- `DNS:hostname1` is a DNS name that the server is reachable at. Add `"DNS:xyz"` multiple times to define multiple DNS names for the server.
|
||||
- `IP:10.0.0.1` is an IP that the server is reachable at. Add `"IP:xyz"` multiple times to define multiple IPs for the server.
|
||||
- `server1` is the canonical name of the server. On newer systemd version, this name will be used by `systemd-journal-remote` and Netdata when you view the logs on the dashboard.
|
||||
- `DNS:hostname1` is a DNS name that the server is reachable at. Add `"DNS:xyz"` multiple times to define multiple DNS names for the server.
|
||||
- `IP:10.0.0.1` is an IP that the server is reachable at. Add `"IP:xyz"` multiple times to define multiple IPs for the server.
|
||||
|
||||
Repeat this process to create the certificates for all your servers. You can add servers as required, at any time in the future.
|
||||
|
||||
|
@ -198,7 +198,6 @@ Here it is in action, in Netdata:
|
|||
|
||||

|
||||
|
||||
|
||||
## Verify it works
|
||||
|
||||
To verify the central server is receiving logs, run this on the central server:
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
The Netdata daemon is practically a synonym for the Netdata Agent, as it controls its
|
||||
entire operation. We support various methods to
|
||||
[start, stop, or restart the daemon](/packaging/installer/README.md#maintaining-a-netdata-agent-installation).
|
||||
[start, stop, or restart the daemon](/docs/netdata-agent/start-stop-restart.md).
|
||||
|
||||
This document provides some basic information on the command line options, log files, and how to debug and troubleshoot
|
||||
|
||||
|
@ -134,7 +134,7 @@ The `error.log` is the `stderr` of the `netdata` daemon .
|
|||
For most Netdata programs (including standard external plugins shipped by netdata), the following lines may appear:
|
||||
|
||||
| tag | description |
|
||||
|:-:|:----------|
|
||||
|:-------:|:--------------------------------------------------------------------------------------------------------------------------|
|
||||
| `INFO` | Something important the user should know. |
|
||||
| `ERROR` | Something that might disable a part of netdata.<br/>The log line includes `errno` (if it is not zero). |
|
||||
| `FATAL` | Something prevented a program from running.<br/>The log line includes `errno` (if it is not zero) and the program exited. |
|
||||
|
@ -199,7 +199,7 @@ You can set Netdata scheduling policy in `netdata.conf`, like this:
|
|||
You can use the following:
|
||||
|
||||
| policy | description |
|
||||
| :-----------------------: | :---------- |
|
||||
|:-------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `idle` | use CPU only when there is spare - this is lower than nice 19 - it is the default for Netdata and it is so low that Netdata will run in "slow motion" under extreme system load, resulting in short (1-2 seconds) gaps at the charts. |
|
||||
| `other`<br/>or<br/>`nice` | this is the default policy for all processes under Linux. It provides dynamic priorities based on the `nice` level of each process. Check below for setting this `nice` level for netdata. |
|
||||
| `batch` | This policy is similar to `other` in that it schedules the thread according to its dynamic priority (based on the `nice` value). The difference is that this policy will cause the scheduler to always assume that the thread is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wake-up behavior, so that this thread is mildly disfavored in scheduling decisions. |
|
||||
|
@ -278,11 +278,7 @@ all programs), edit `netdata.conf` and set:
|
|||
process nice level = -1
|
||||
```
|
||||
|
||||
then execute this to [restart Netdata](/packaging/installer/README.md#maintaining-a-netdata-agent-installation):
|
||||
|
||||
```sh
|
||||
sudo systemctl restart netdata
|
||||
```
|
||||
then [restart Netdata](/docs/netdata-agent/start-stop-restart.md):
|
||||
|
||||
#### Example 2: Netdata with nice -1 on systemd systems
|
||||
|
||||
|
@ -332,7 +328,7 @@ will roughly get the number of threads running.
|
|||
The system does this for speed. Having a separate memory arena for each thread, allows the threads to run in parallel in
|
||||
multi-core systems, without any locks between them.
|
||||
|
||||
This behaviour is system specific. For example, the chart above when running
|
||||
This behavior is system specific. For example, the chart above when running
|
||||
Netdata on Alpine Linux (that uses **musl** instead of **glibc**) is this:
|
||||
|
||||

|
||||
|
|
|
@ -1,13 +1,3 @@
|
|||
<!--
|
||||
title: "Exporting reference"
|
||||
description: "With the exporting engine, you can archive your Netdata metrics to multiple external databases for long-term storage or further analysis."
|
||||
sidebar_label: "Export"
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/exporting/README.md"
|
||||
learn_status: "Published"
|
||||
learn_rel_path: "Integrations/Export"
|
||||
learn_doc_purpose: "Explain the exporting engine options and all of our the exporting connectors options"
|
||||
-->
|
||||
|
||||
# Exporting reference
|
||||
|
||||
Welcome to the exporting engine reference guide. This guide contains comprehensive information about enabling,
|
||||
|
@ -196,11 +186,11 @@ You can configure each connector individually using the available [options](#opt
|
|||
- `[prometheus:exporter]` defines settings for Prometheus exporter API queries (e.g.:
|
||||
`http://NODE:19999/api/v1/allmetrics?format=prometheus&help=yes&source=as-collected`).
|
||||
- `[<type>:<name>]` keeps settings for a particular exporting connector instance, where:
|
||||
- `type` selects the exporting connector type: graphite | opentsdb:telnet | opentsdb:http |
|
||||
- `type` selects the exporting connector type: graphite | opentsdb:telnet | opentsdb:http |
|
||||
prometheus_remote_write | json | kinesis | pubsub | mongodb. For graphite, opentsdb,
|
||||
json, and prometheus_remote_write connectors you can also use `:http` or `:https` modifiers
|
||||
(e.g.: `opentsdb:https`).
|
||||
- `name` can be arbitrary instance name you chose.
|
||||
- `name` can be arbitrary instance name you chose.
|
||||
|
||||
### Options
|
||||
|
||||
|
@ -323,5 +313,3 @@ Netdata adds 3 alerts:
|
|||
3. `exporting_metrics_lost`, number of metrics lost due to repeating failures to contact the external database server
|
||||
|
||||

|
||||
|
||||
|
||||
|
|
|
@ -1,12 +1,3 @@
|
|||
<!--
|
||||
title: "Writing metrics to TimescaleDB"
|
||||
description: "Send Netdata metrics to TimescaleDB for long-term archiving and further analysis."
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/exporting/TIMESCALE.md"
|
||||
sidebar_label: "Writing metrics to TimescaleDB"
|
||||
learn_status: "Published"
|
||||
learn_rel_path: "Integrations/Export"
|
||||
-->
|
||||
|
||||
# Writing metrics to TimescaleDB
|
||||
|
||||
Thanks to Netdata's community of developers and system administrators, and Mahlon Smith
|
||||
|
@ -23,14 +14,18 @@ What's TimescaleDB? Here's how their team defines the project on their [GitHub p
|
|||
To get started archiving metrics to TimescaleDB right away, check out Mahlon's [`netdata-timescale-relay`
|
||||
repository](https://github.com/mahlonsmith/netdata-timescale-relay) on GitHub. Please be aware that backends subsystem
|
||||
was removed and Netdata configuration should be moved to the new `exporting.conf` configuration file. Use
|
||||
|
||||
```conf
|
||||
[json:my_instance]
|
||||
```
|
||||
|
||||
in `exporting.conf` instead of
|
||||
|
||||
```conf
|
||||
[backend]
|
||||
type = json
|
||||
```
|
||||
|
||||
in `netdata.conf`.
|
||||
|
||||
This small program takes JSON streams from a Netdata client and writes them to a PostgreSQL (aka TimescaleDB) table.
|
||||
|
@ -67,5 +62,3 @@ blog](https://blog.timescale.com/blog/writing-it-metrics-from-netdata-to-timesca
|
|||
|
||||
Thank you to Mahlon, Rune, TimescaleDB, and the members of the Netdata community that requested and then built this
|
||||
exporting connection between Netdata and TimescaleDB!
|
||||
|
||||
|
||||
|
|
|
@ -256,5 +256,3 @@ deployments automatically register Netdata services into Consul and Prometheus a
|
|||
achieved you do not have to think about the monitoring system until Prometheus cannot keep up with your scale. Once this
|
||||
happens there are options presented in the Prometheus documentation for solving this. Hope this was helpful, happy
|
||||
monitoring.
|
||||
|
||||
|
||||
|
|
|
@ -1,14 +1,3 @@
|
|||
<!--
|
||||
title: go.d.plugin
|
||||
description: "go.d.plugin is an external plugin for Netdata, responsible for running individual data collectors written in Go."
|
||||
custom_edit_url: "/src/go/plugin/go.d/README.md"
|
||||
sidebar_label: "go.d.plugin"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "Tasks"
|
||||
learn_rel_path: "Developers/External plugins/go.d.plugin"
|
||||
sidebar_position: 1
|
||||
-->
|
||||
|
||||
# go.d.plugin
|
||||
|
||||
`go.d.plugin` is a [Netdata](https://github.com/netdata/netdata) external plugin. It is an **orchestrator** for data
|
||||
|
|
|
@ -1,14 +1,3 @@
|
|||
<!--
|
||||
title: "How to write a Netdata collector in Go"
|
||||
description: "This guide will walk you through the technical implementation of writing a new Netdata collector in Golang, with tips on interfaces, structure, configuration files, and more."
|
||||
custom_edit_url: "/src/go/plugin/go.d/docs/how-to-write-a-module.md"
|
||||
sidebar_label: "How to write a Netdata collector in Go"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "Tasks"
|
||||
learn_rel_path: "Developers/External plugins/go.d.plugin"
|
||||
sidebar_position: 20
|
||||
-->
|
||||
|
||||
# How to write a Netdata collector in Go
|
||||
|
||||
## Prerequisites
|
||||
|
@ -22,7 +11,7 @@ sidebar_position: 20
|
|||
|
||||
## Write and test a simple collector
|
||||
|
||||
> :exclamation: You can skip most of these steps if you first experiment directy with the existing
|
||||
> :exclamation: You can skip most of these steps if you first experiment directly with the existing
|
||||
> [example module](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/example), which
|
||||
> will
|
||||
> give you an idea of how things work.
|
||||
|
@ -58,7 +47,7 @@ The steps are:
|
|||
|
||||
Every module should implement the following interface:
|
||||
|
||||
```
|
||||
```go
|
||||
type Module interface {
|
||||
Init() bool
|
||||
Check() bool
|
||||
|
@ -75,7 +64,7 @@ type Module interface {
|
|||
|
||||
We propose to use the following template:
|
||||
|
||||
```
|
||||
```go
|
||||
// example.go
|
||||
|
||||
func (e *Example) Init() bool {
|
||||
|
@ -97,7 +86,7 @@ func (e *Example) Init() bool {
|
|||
}
|
||||
```
|
||||
|
||||
Move specific initialization methods into the `init.go` file. See [suggested module layout](#module-Layout).
|
||||
Move specific initialization methods into the `init.go` file. See [suggested module layout](#module-layout).
|
||||
|
||||
### Check method
|
||||
|
||||
|
@ -108,7 +97,7 @@ Move specific initialization methods into the `init.go` file. See [suggested mod
|
|||
The simplest way to implement `Check` is to see if we are getting any metrics from `Collect`. A lot of modules use such
|
||||
approach.
|
||||
|
||||
```
|
||||
```go
|
||||
// example.go
|
||||
|
||||
func (e *Example) Check() bool {
|
||||
|
@ -134,7 +123,7 @@ it contains charts and dimensions structs.
|
|||
|
||||
Usually charts initialized in `Init` and `Chart` method just returns the charts instance:
|
||||
|
||||
```
|
||||
```go
|
||||
// example.go
|
||||
|
||||
func (e *Example) Charts() *Charts {
|
||||
|
@ -151,7 +140,7 @@ func (e *Example) Charts() *Charts {
|
|||
|
||||
We propose to use the following template:
|
||||
|
||||
```
|
||||
```go
|
||||
// example.go
|
||||
|
||||
func (e *Example) Collect() map[string]int64 {
|
||||
|
@ -167,7 +156,7 @@ func (e *Example) Collect() map[string]int64 {
|
|||
}
|
||||
```
|
||||
|
||||
Move metrics collection logic into the `collect.go` file. See [suggested module layout](#module-Layout).
|
||||
Move metrics collection logic into the `collect.go` file. See [suggested module layout](#module-layout).
|
||||
|
||||
### Cleanup method
|
||||
|
||||
|
@ -176,7 +165,7 @@ Move metrics collection logic into the `collect.go` file. See [suggested module
|
|||
|
||||
If you have nothing to clean up:
|
||||
|
||||
```
|
||||
```go
|
||||
// example.go
|
||||
|
||||
func (Example) Cleanup() {}
|
||||
|
@ -229,7 +218,7 @@ All the module initialization details should go in this file.
|
|||
- make a function for each value that needs to be initialized.
|
||||
- a function should return a value(s), not implicitly set/change any values in the main struct.
|
||||
|
||||
```
|
||||
```go
|
||||
// init.go
|
||||
|
||||
// Prefer this approach.
|
||||
|
@ -257,7 +246,7 @@ Feel free to split it into several files if you think it makes the code more rea
|
|||
|
||||
Use `collect_` prefix for the filenames: `collect_this.go`, `collect_that.go`, etc.
|
||||
|
||||
```
|
||||
```go
|
||||
// collect.go
|
||||
|
||||
func (e *Example) collect() (map[string]int64, error) {
|
||||
|
@ -273,10 +262,10 @@ func (e *Example) collect() (map[string]int64, error) {
|
|||
|
||||
> :exclamation: See the
|
||||
> example: [`example_test.go`](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/example/example_test.go).
|
||||
|
||||
>
|
||||
> if you have no experience in testing we recommend starting
|
||||
> with [testing package documentation](https://golang.org/pkg/testing/).
|
||||
|
||||
>
|
||||
> we use `assert` and `require` packages from [github.com/stretchr/testify](https://github.com/stretchr/testify)
|
||||
> library,
|
||||
> check [their documentation](https://pkg.go.dev/github.com/stretchr/testify).
|
||||
|
@ -299,4 +288,3 @@ be [`testdata`](https://golang.org/cmd/go/#hdr-Package_lists_and_patterns).
|
|||
|
||||
There are [some helper packages](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/pkg) for
|
||||
writing a module.
|
||||
|
||||
|
|
|
@ -4,7 +4,9 @@ Netdata offers two ways to receive alert notifications on external integrations.
|
|||
|
||||
Both methods use a node's health alerts to generate the content of a notification.
|
||||
|
||||
Read our documentation on [configuring alerts](/src/health/REFERENCE.md) to change the preconfigured thresholds or to create tailored alerts for your infrastructure.
|
||||
Read our documentation on [configuring alerts](/src/health/REFERENCE.md) to change the pre-configured thresholds or to create tailored alerts for your infrastructure.
|
||||
|
||||
<!-- virtual links below, should not lead anywhere outside of the rendered Learn doc -->
|
||||
|
||||
- Netdata Cloud provides centralized alert notifications, utilizing the health status data already sent to Netdata Cloud from connected nodes to send alerts to configured integrations. [Supported integrations](/docs/alerts-&-notifications/notifications/centralized-cloud-notifications) include Amazon SNS, Discord, Slack, Splunk, and others.
|
||||
|
||||
|
|
|
@ -640,7 +640,7 @@ See our [simple patterns docs](/src/libnetdata/simple_pattern/README.md) for mor
|
|||
Similar to host labels, the `chart labels` key can be used to filter if an alert will load or not for a specific chart, based on
|
||||
whether these chart labels match or not.
|
||||
|
||||
The list of chart labels present on each chart can be obtained from http://localhost:19999/api/v1/charts?all
|
||||
The list of chart labels present on each chart can be obtained from <http://localhost:19999/api/v1/charts?all>
|
||||
|
||||
For example, each `disk_space` chart defines a chart label called `mount_point` with each instance of this chart having
|
||||
a value there of which mount point it monitors.
|
||||
|
@ -808,14 +808,14 @@ You can find all the variables that can be used for a given chart, using
|
|||
Agent dashboard. For example, [variables for the `system.cpu` chart of the
|
||||
registry](https://registry.my-netdata.io/api/v1/alarm_variables?chart=system.cpu).
|
||||
|
||||
> If you don't know how to find the CHART_NAME, you can read about it [here](/src/web/README.md#charts).
|
||||
<!-- > If you don't know how to find the CHART_NAME, you can read about it [here](/src/web/README.md#charts). -->
|
||||
|
||||
Netdata supports 3 internal indexes for variables that will be used in health monitoring.
|
||||
|
||||
<details><summary>The variables below can be used in both chart alerts and context templates.</summary>
|
||||
|
||||
Although the `alarm_variables` link shows you variables for a particular chart, the same variables can also be used in
|
||||
templates for charts belonging to a given [context](/src/web/README.md#contexts). The reason is that all charts of a given
|
||||
templates for charts belonging to a given context. The reason is that all charts of a given
|
||||
context are essentially identical, with the only difference being the family that identifies a particular hardware or software instance.
|
||||
|
||||
</details>
|
||||
|
@ -1064,7 +1064,7 @@ template: ml_5min_cpu_chart
|
|||
info: rolling 5min anomaly rate for system.cpu chart
|
||||
```
|
||||
|
||||
The `lookup` line will calculate the average anomaly rate across all `system.cpu` dimensions over the last 5 minues. In this case
|
||||
The `lookup` line will calculate the average anomaly rate across all `system.cpu` dimensions over the last 5 minutes. In this case
|
||||
Netdata will create one alert for the chart.
|
||||
|
||||
### Example 7 - [Anomaly rate](/src/ml/README.md#anomaly-rate) based node level alert
|
||||
|
@ -1083,7 +1083,7 @@ template: ml_5min_node
|
|||
info: rolling 5min anomaly rate for all ML enabled dims
|
||||
```
|
||||
|
||||
The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](/src/ml/README.md#node-anomaly-rate) over the last 5 minutes.
|
||||
The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](/src/ml/README.md#anomaly-rate) over the last 5 minutes.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
|
@ -1,14 +1,3 @@
|
|||
<!--
|
||||
title: "libnetdata"
|
||||
custom_edit_url: https://github.com/netdata/netdata/edit/master/src/libnetdata/README.md
|
||||
sidebar_label: "libnetdata"
|
||||
learn_status: "Published"
|
||||
learn_topic_type: "Tasks"
|
||||
learn_rel_path: "Developers/libnetdata"
|
||||
-->
|
||||
|
||||
# libnetdata
|
||||
|
||||
`libnetdata` is a collection of library code that is used by all Netdata `C` programs.
|
||||
|
||||
|
||||
|
|
|
@ -1,12 +1,3 @@
|
|||
<!--
|
||||
title: "Registry"
|
||||
description: "Netdata utilizes a central registry of machines/person GUIDs, URLs, and opt-in account information to provide unified cross-server dashboards."
|
||||
custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/registry/README.md"
|
||||
sidebar_label: "Registry"
|
||||
learn_status: "Published"
|
||||
learn_rel_path: "Configuration"
|
||||
-->
|
||||
|
||||
# Registry
|
||||
|
||||
Netdata provides distributed monitoring.
|
||||
|
@ -61,8 +52,7 @@ The registry keeps track of 4 entities:
|
|||
For each URL, the registry keeps the URL and nothing more. Each URL is linked to _persons_ and _machines_. The only
|
||||
way to find a URL is to know its **machine_guid** or have a **person_guid** it is linked to it.
|
||||
|
||||
4. **accounts**: i.e. the information used to sign-in via one of the available sign-in methods. Depending on the
|
||||
method, this may include an email, or an email and a profile picture or avatar.
|
||||
4. **accounts**: i.e. the information used to sign-in via one of the available sign-in methods. Depending on the method, this may include an email, or an email and a profile picture or avatar.
|
||||
|
||||
For _persons_/_accounts_ and _machines_, the registry keeps links to _URLs_, each link with 2 timestamps (first time
|
||||
seen, last time seen) and a counter (number of times it has been seen). *machines_, _persons_ and timestamps are stored
|
||||
|
@ -158,6 +148,7 @@ pattern matching can be controlled with the following setting:
|
|||
```
|
||||
|
||||
The settings are:
|
||||
|
||||
- `yes` allows the pattern to match DNS names.
|
||||
- `no` disables DNS matching for the patterns (they only match IP addresses).
|
||||
- `heuristic` will estimate if the patterns should match FQDNs by the presence or absence of `:`s or alpha-characters.
|
||||
|
@ -174,8 +165,7 @@ There can be up to 2 files:
|
|||
|
||||
- `registry.db`, the database
|
||||
|
||||
every `[registry].registry save db every new entries` entries in `registry-log.db`, Netdata will save its database
|
||||
to `registry.db` and empty `registry-log.db`.
|
||||
every `[registry].registry save db every new entries` entries in `registry-log.db`, Netdata will save its database to `registry.db` and empty `registry-log.db`.
|
||||
|
||||
Both files are machine readable text files.
|
||||
|
||||
|
@ -213,5 +203,3 @@ ERROR 409: Cannot ACCESS netdata registry: https://registry.my-netdata.io respon
|
|||
```
|
||||
|
||||
This error is printed on your web browser console (press F12 on your browser to see it).
|
||||
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue