
* added parser for durations
* preliminary work for timeframes
* Update CMakeLists.txt
* updated parsing and generation for durations
* renames
* report parser errors; added compatibility to existing config_parse_duration()
* duration parsing is used on most netdata.conf and stream.conf entries
* more uses of duration parsing; simplification of stream.conf
* code cleanup
* more duration changes
* added html playground
* improved js code
* duration parsing applied to dbengine retention
* fixed doc
* simplified logic; added size parser
* added parsing for sizes
* renames and documentation updates
* hide appconfig internals from the rest of netdata
* fix crash on cleanup of streaming receivers
* fix buffer overflow in gorilla compression
* config return values are const
* ksm set to auto
* support reformatting migrated values
* removed obsolete metrics correlations settings
* split appconfig to multiple files
* durations documentation
* sizes documentation
* added backward compatibility in retention configuration
* provide description on migrations and reformattings
* config options are now a double linked list
* config sections are now a double linked list; config uses spinlocks; code cleanup and renames
* added data type to all config options
* update data types
* split appconfig api to multiple files
* code cleanup and renames
* removed size units above PiB
* Revert "fix buffer overflow in gorilla compression"
This reverts commit 3d5c48e84b
.
* appconfig internal api changes
3.6 KiB
Clustering and High Availability of Netdata Parents
Netdata supports building Parent clusters of 2+ nodes. Clustering and high availability works like this:
- All Netdata Children are configured to stream to all Netdata Parents. The first one found working will be used by each Netdata Child and the others will be automatically used if and when this connection is interrupted.
- The Netdata Parents are configured to stream to all other Netdata Parents. For each of them, the first found working will be used and the others will be automatically used if and when this connection is interrupted.
All the Netdata Parents in such a cluster will receive all the metrics of all Netdata Children connected to any of them. They will also receive the metrics all the other Netdata Parents have.
In case there is a failure on any of the Netdata Parents, the Netdata Children connected to it will automatically failover to another available Netdata Parent, which now will attempt to re-stream all the metrics it receives to the other available Netdata Parents.
Netdata Cloud will receive registrations for all Netdata Children from all the Netdata Parents. As long as at least one of the Netdata Parents is connected to Netdata Cloud, all the Netdata Children will be available on Netdata Cloud.
Netdata Children need to maintain a retention only for the time required to switch Netdata Parents. When Netdata Children connect to a Netdata Parent, they negotiate the available retention and any missing data on the Netdata Parent are replicated from the Netdata Children.
Restoring a Netdata Parent after maintenance
Given the replication limitations, special care is needed when restoring a Netdata Parent after some long maintenance work on it.
If the Netdata Children do not have enough retention to replicate the missing data on this Netdata Parent, it is preferable to block access to this Netdata Parent from the Netdata Children, until it replicates the missing data from the other Netdata Parents.
To block access from Netdata Children, and still allow access from other Netdata Parent siblings:
- Use
iptables
to block access to port 19999 from Netdata Children to the restored Netdata Parent, or - Use separate streaming API keys (in
stream.conf
) for Netdata Children and Netdata Parents, and disable the API key used by Netdata Children, until the restored Netdata Parent has been synchronized.
Duplicating a Parent
The easiest way is to rsync
the directory /var/cache/netdata
from the existing Netdata Parent to the new Netdata Parent.
Important: Starting the new Netdata Parent with default settings, may delete the new files in
/var/cache/netdata
to apply the default disk size constraints. Therefore it is important to set the right retention settings in the new Netdata Parent before starting it up with the copied files.
To configure retention at the new Netdata Parent, set in netdata.conf
the following to at least the values the old Netdata Parent has:
[db].dbengine tier 0 retention size
, this is the max disk size fortier0
. The default is 1GiB.[db].dbengine tier 1 retention size
, this is the max disk space fortier1
. The default is 1GiB.[db].dbengine tier 2 retention size
, this is the max disk space fortier2
. The default is 1GiB.