![]() * cleanup of logging - wip
* first working iteration
* add errno annotator
* replace old logging functions with netdata_logger()
* cleanup
* update error_limit
* fix remanining error_limit references
* work on fatal()
* started working on structured logs
* full cleanup
* default logging to files; fix all plugins initialization
* fix formatting of numbers
* cleanup and reorg
* fix coverity issues
* cleanup obsolete code
* fix formatting of numbers
* fix log rotation
* fix for older systems
* add detection of systemd journal via stderr
* finished on access.log
* remove left-over transport
* do not add empty fields to the logs
* journal get compact uuids; X-Transaction-ID header is added in web responses
* allow compiling on systems without memfd sealing
* added libnetdata/uuid directory
* move datetime formatters to libnetdata
* add missing files
* link the makefiles in libnetdata
* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses
* added stream receiver, sender, proc plugin and pluginsd log stack
* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup
* add message ids to streaming inbound and outbound connections
* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled
* send SIGABRT to external plugins only if we are not shutting down
* fix cross cleanup in pluginsd parser
* fatal when there is a stack error in logs
* compile netdata with -fexceptions
* do not kill external plugins with SIGABRT
* metasync info logs to debug level
* added severity to logs
* added json output; added options per log output; added documentation; fixed issues mentioned
* allow memfd only on linux
* moved journal low level functions to journal.c/h
* move health logs to daemon.log with proper priorities
* fixed a couple of bugs; health log in journal
* updated docs
* systemd-cat-native command to push structured logs to journal from the command line
* fix makefiles
* restored NETDATA_LOG_SEVERITY_LEVEL
* fix makefiles
* systemd-cat-native can also work as the logger of Netdata scripts
* do not require a socket to systemd-journal to log-as-netdata
* alarm notify logs in native format
* properly compare log ids
* fatals log alerts; alarm-notify.sh working
* fix overflow warning
* alarm-notify.sh now logs the request (command line)
* anotate external plugins logs with the function cmd they run
* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash
* reformatted alarm-notify logs
* unify cgroup-network-helper.sh
* added quotes around params
* charts.d.plugin switched logging to journal native
* quotes for logfmt
* unify the status codes of streaming receivers and senders
* alarm-notify: dont log anything, if there is nothing to do
* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available
* migrate cgroup-name.sh to new logging
* systemd-cat-native now supports messages with newlines
* socket.c logs use priority
* cleanup log field types
* inherit the systemd set INVOCATION_ID if found
* allow systemd-cat-native to send messages to a systemd-journal-remote URL
* log2journal command that can convert structured logs to journal export format
* various fixes and documentation of log2journal
* updated log2journal docs
* updated log2journal docs
* updated documentation of fields
* allow compiling without libcurl
* do not use socket as format string
* added version information to newly added tools
* updated documentation and help messages
* fix the namespace socket path
* print errno with error
* do not timeout
* updated docs
* updated docs
* updated docs
* log2journal updated docs and params
* when talking to a remote journal, systemd-cat-native batches the messages
* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote
* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"
This reverts commit
|
||
---|---|---|
.. | ||
Makefile.am | ||
README.md | ||
registry.c | ||
registry.h | ||
registry_db.c | ||
registry_init.c | ||
registry_internals.c | ||
registry_internals.h | ||
registry_log.c | ||
registry_machine.c | ||
registry_machine.h | ||
registry_person.c | ||
registry_person.h |
Registry
Netdata provides distributed monitoring.
Traditional monitoring solutions centralize all the data to provide unified dashboards across all servers. Before Netdata, this was the standard practice. However it has a few issues:
- due to the resources required, the number of metrics collected is limited.
- for the same reason, the data collection frequency is not that high, at best it will be once every 10 or 15 seconds, at worst every 5 or 10 mins.
- the central monitoring solution needs dedicated resources, thus becoming "another bottleneck" in the whole ecosystem. It also requires maintenance, administration, etc.
- most centralized monitoring solutions are usually only good for presenting statistics of past performance (i.e. cannot be used for real-time performance troubleshooting).
Netdata follows a different approach:
- data collection happens per second
- thousands of metrics per server are collected
- data do not leave the server where they are collected
- Netdata servers do not talk to each other
- your browser connects all the Netdata servers
Using Netdata, your monitoring infrastructure is embedded on each server, limiting significantly the need of additional resources. Netdata is blazingly fast, very resource efficient and utilizes server resources that already exist and are spare (on each server). This allows scaling out the monitoring infrastructure.
However, the Netdata approach introduces a few new issues that need to be addressed, one being the list of Netdata we have installed, i.e. the URLs our Netdata servers are listening.
To solve this, Netdata utilizes a central registry. This registry, together with certain browser features, allow Netdata to provide unified cross-server dashboards. For example, when you jump from server to server using the node menu, several session settings (like the currently viewed charts, the current zoom and pan operations on the charts, etc.) are propagated to the new server, so that the new dashboard will come with exactly the same view.
What data does the registry store?
The registry keeps track of 4 entities:
- machines: i.e. the Netdata installations (a random GUID generated by each Netdata the first time it starts; we call this machine_guid)
For each Netdata installation (each machine_guid
) the registry keeps track of the different URLs it has accessed.
- persons: i.e. the web browsers accessing the Netdata installations (a random GUID generated by the registry the first time it sees a new web browser; we call this person_guid)
For each person, the registry keeps track of the Netdata installations it has accessed and their URLs.
- URLs of Netdata installations (as seen by the web browsers)
For each URL, the registry keeps the URL and nothing more. Each URL is linked to persons and machines. The only way to find a URL is to know its machine_guid or have a person_guid it is linked to it.
- accounts: i.e. the information used to sign-in via one of the available sign-in methods. Depending on the method, this may include an email, or an email and a profile picture or avatar.
For persons/accounts and machines, the registry keeps links to URLs, each link with 2 timestamps (first time seen, last time seen) and a counter (number of times it has been seen). *machines_, persons and timestamps are stored in the Netdata registry regardless of whether you sign in or not.
Who talks to the registry?
Your web browser only! If sending this information is against your policies, you can run your own registry
Your Netdata servers do not talk to the registry. This is a UML diagram of its operation:
Which is the default registry?
https://registry.my-netdata.io
, which is currently served by https://london.my-netdata.io
. This registry listens to
both HTTP and HTTPS requests but the default is HTTPS.
Can this registry handle the global load of Netdata installations?
Yeap! The registry can handle 50.000 - 100.000 requests per second per core (depending on the type of CPU, the computer's memory bandwidth, etc). 50.000 is on J1900 (celeron 2Ghz).
We believe, it can do it...
Run your own registry
Every Netdata can be a registry. Just pick one and configure it.
To turn any Netdata into a registry, edit /etc/netdata/netdata.conf
and set:
[registry]
enabled = yes
registry to announce = http://your.registry:19999
Restart your Netdata to activate it.
Then, you need to tell all your other Netdata servers to advertise your registry, instead of the default. To do
this, on each of your Netdata servers, edit /etc/netdata/netdata.conf
and set:
[registry]
enabled = no
registry to announce = http://your.registry:19999
Note that we have not enabled the registry on the other servers. Only one Netdata (the registry) needs
[registry].enabled = yes
.
This is it. You have your registry now.
You may also want to give your server different names under the node menu (i.e. to have them sorted / grouped). You can change its registry name, by setting on each Netdata server:
[registry]
registry hostname = Group1 - Master DB
So this server will appear in the node menu as Group1 - Master DB
. The max name length is 50 characters.
Limiting access to the registry
Netdata v1.9+ support limiting access to the registry from given IPs, like this:
[registry]
allow from = *
allow from
settings are Netdata simple patterns: string matches that use *
as wildcard (any number of times) and a !
prefix for a negative match. So: allow from = !10.1.2.3 10.*
will allow
all IPs in 10.*
except 10.1.2.3
. The order is important: left to right, the first positive or negative match is
used.
Keep in mind that connections to Netdata API ports are filtered by [web].allow connections from
. So, IPs allowed by
[registry].allow from
should also be allowed by [web].allow connection from
.
The patterns can be matches over IP addresses or FQDN of the host. In order to check the FQDN of the connection without opening the Netdata agent to DNS-spoofing, a reverse-dns record must be setup for the connecting host. At connection time the reverse-dns of the peer IP address is resolved, and a forward DNS resolution is made to validate the IP address against the name-pattern.
Please note that this process can be expensive on a machine that is serving many connections. The behaviour of the pattern matching can be controlled with the following setting:
[registry]
allow by dns = heuristic
The settings are:
yes
allows the pattern to match DNS names.no
disables DNS matching for the patterns (they only match IP addresses).heuristic
will estimate if the patterns should match FQDNs by the presence or absence of:
s or alpha-characters.
Where is the registry database stored?
/var/lib/netdata/registry/*.db
There can be up to 2 files:
-
registry-log.db
, the transaction logall incoming requests that affect the registry are saved in this file in real-time.
-
registry.db
, the databaseevery
[registry].registry save db every new entries
entries inregistry-log.db
, Netdata will save its database toregistry.db
and emptyregistry-log.db
.
Both files are machine readable text files.
How can I disable the SameSite and Secure cookies?
Beginning with v1.30.0
, when the Netdata Agent's web server processes a request, it delivers the SameSite=none
and Secure
cookies. If you have problems accessing the local Agent dashboard or Netdata Cloud, disable these
cookies by editing netdata.conf
:
[registry]
enable cookies SameSite and Secure = no
The future
The registry opens a whole world of new possibilities for Netdata. Check here what we think: https://github.com/netdata/netdata/issues/416
Troubleshooting the registry
The registry URL should be set to the URL of a Netdata dashboard. This server has to have [registry].enabled = yes
.
So, accessing the registry URL directly with your web browser, should present the dashboard of the Netdata operating the
registry.
To use the registry, your web browser needs to support third party cookies, since the cookies are set by the registry while you are browsing the dashboard of another Netdata server. The registry, the first time it sees a new web browser it tries to figure if the web browser has cookies enabled or not. It does this by setting a cookie and redirecting the browser back to itself hoping that it will receive the cookie. If it does not receive the cookie, the registry will keep redirecting your web browser back to itself, which after a few redirects will fail with an error like this:
ERROR 409: Cannot ACCESS netdata registry: https://registry.my-netdata.io responded with: {"status":"redirect","registry":"https://registry.my-netdata.io"}
This error is printed on your web browser console (press F12 on your browser to see it).