* rrdset - in progress
* rrdset optimal constructor; rrdset conflict
* rrdset final touches
* re-organization of rrdset object members
* prevent use-after-free
* dictionary dfe supports also counting of iterations
* rrddim managed by dictionary
* rrd.h cleanup
* DICTIONARY_ITEM now is referencing actual dictionary items in the code
* removed rrdset linked list
* Revert "removed rrdset linked list"
This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.
* removed rrdset linked list
* added comments
* Switch chart uuid to static allocation in rrdset
Remove unused functions
* rrdset_archive() and friends...
* always create rrdfamily
* enable ml_free_dimension
* rrddim_foreach done with dfe
* most custom rrddim loops replaced with rrddim_foreach
* removed accesses to rrddim->dimensions
* removed locks that are no longer needed
* rrdsetvar is now managed by the dictionary
* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853
* conflict callback of rrdsetvar now properly checks if it has to reset the variable
* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM
* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe
* dictionary walkthrough callbacks get dictionary acquired items
* dictionary reference counters that can be dupped from zero
* added advanced functions for get and del
* rrdvar managed by dictionaries
* thread safety for rrdsetvar
* faster rrdvar initialization
* rrdvar string lengths should match in all add, del, get functions
* rrdvar internals hidden from the rest of the world
* rrdvar is now acquired throughout netdata
* hide the internal structures of rrdsetvar
* rrdsetvar is now acquired through out netdata
* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata
* better error handling
* dont create variables if not initialized for health
* dont create variables if not initialized for health again
* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items
* type checking on acquired objects
* rrdcalc renaming of functions
* type checking for rrdfamily_acquired
* rrdcalc managed by dictionaries
* rrdcalc double free fix
* host rrdvars is always needed
* attempt to fix deadlock 1
* attempt to fix deadlock 2
* Remove unused variable
* attempt to fix deadlock 3
* snprintfz
* rrdcalc index in rrdset fix
* Stop storing active charts and computing chart hashes
* Remove store active chart function
* Remove compute chart hash function
* Remove sql_store_chart_hash function
* Remove store_active_dimension function
* dictionary delayed destruction
* formatting and cleanup
* zero dictionary base on rrdsetvar
* added internal error to log delayed destructions of dictionaries
* typo in rrddimvar
* added debugging info to dictionary
* debug info
* fix for rrdcalc keys being empty
* remove forgotten unlock
* remove deadlock
* Switch to metadata version 5 and drop
chart_hash
chart_hash_map
chart_active
dimension_active
v_chart_hash
* SQL cosmetic changes
* do not busy wait while destroying a referenced dictionary
* remove deadlock
* code cleanup; re-organization;
* fast cleanup and flushing of dictionaries
* number formatting fixes
* do not delete configured alerts when archiving a chart
* rrddim obsolete linked list management outside dictionaries
* removed duplicate contexts call
* fix crash when rrdfamily is not initialized
* dont keep rrddimvar referenced
* properly cleanup rrdvar
* removed some locks
* Do not attempt to cleanup chart_hash / chart_hash_map
* rrdcalctemplate managed by dictionary
* register callbacks on the right dictionary
* removed some more locks
* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread
* when looking up for an alarm look using both chart id and chart name
* host initialization a bit more modular
* init rrdlabels on host update
* preparation for dictionary views
* improved comment
* unused variables without internal checks
* service threads isolation and worker info
* more worker info in service thread
* thread cancelability debugging with internal checks
* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647
* dictionary modularization
* Remove unused SQL statement definition
* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated
* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops
* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/proc.plugin/proc_net_dev.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* fix memory leak in rrdset cache_dir
* minor dictionary changes
* dont use index locks in single threaded
* obsolete dict option
* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;
* fix jump on uninitialized value in dictionary; remove double free of cache_dir
* addressed codacy findings
* removed debugging code
* use the private refcount on dictionaries
* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;
* more dictionary statistics
* global statistics about dictionary operations, memory, items, callbacks
* dictionary support for views - missing the public API
* removed warning about unused parameter
* chart and context name for cloud
* chart and context name for cloud, again
* dictionary statistics fixed; first implementation of dictionary views - not currently used
* only the master can globally delete an item
* context needs netdata prefix
* fix context and chart it of spins
* fix for host variables when health is not enabled
* run garbage collector on item insert too
* Fix info message; remove extra "using"
* update dict unittest for new placement of garbage collector
* we need RRDHOST->rrdvars for maintaining custom host variables
* Health initialization needs the host->host_uuid
* split STRING to its own files; no code changes other than that
* initialize health unconditionally
* unit tests do not pollute the global scope with their variables
* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* move chart_labels to rrdset
* rename chart_labels to rrdlabels
* renamed hash_id to uuid
* turned is_ar_chart into an rrdset flag
* removed rrdset state
* removed unused senders_connected member of rrdhost
* removed unused host flag RRDHOST_FLAG_MULTIHOST
* renamed rrdhost host_labels to rrdlabels
* Update exporting unit tests
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* rrdfamily
* rrddim
* rrdset plugin and module names
* rrdset units
* rrdset type
* rrdset family
* rrdset title
* rrdset title more
* rrdset context
* rrdcalctemplate context and removal of context hash from rrdset
* strings statistics
* rrdset name
* rearranged members of rrdset
* eliminate rrdset name hash; rrdcalc chart converted to STRING
* rrdset id, eliminated rrdset hash
* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate
* rrdcalctemplate
* rrdvar
* eval_variable
* rrddimvar and rrdsetvar
* rrdhost hostname, os and tags
* fix master commits
* added thread cache; implemented string_dup without locks
* faster thread cache
* rrdset and rrddim now use dictionaries for indexing
* rrdhost now uses dictionary
* rrdfamily now uses DICTIONARY
* rrdvar using dictionary instead of AVL
* allocate the right size to rrdvar flag members
* rrdhost remaining char * members to STRING *
* better error handling on indexing
* strings now use a read/write lock to allow parallel searches to the index
* removed AVL support from dictionaries; implemented STRING with native Judy calls
* string releases should be negative
* only 31 bits are allowed for enum flags
* proper locking on strings
* string threading unittest and fixes
* fix lgtm finding
* fixed naming
* stream chart/dimension definitions at the beginning of a streaming session
* thread stack variable is undefined on thread cancel
* rrdcontext garbage collect per host on startup
* worker control in garbage collection
* relaxed deletion of rrdmetrics
* type checking on dictfe
* netdata chart to monitor rrdcontext triggers
* Group chart label updates
* rrdcontext better handling of collected rrdsets
* rrdpush incremental transmition of definitions should use as much buffer as possible
* require 1MB per chart
* empty the sender buffer before enabling metrics streaming
* fill up to 50% of buffer
* reset signaling metrics sending
* use the shared variable for status
* use separate host flag for enabling streaming of metrics
* make sure the flag is clear
* add logging for streaming
* add logging for streaming on buffer overflow
* circular_buffer proper sizing
* removed obsolete logs
* do not execute worker jobs if not necessary
* better messages about compression disabling
* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped
* monitor stream sender used buffer ratio
* Update exporting unit tests
* no need to compare label value with strcmp
* streaming send workers now monitor bandwidth
* workers now use strings
* streaming receiver monitors incoming bandwidth
* parser shift of worker ids
* minor fixes
* Group chart label updates
* Populate context with dimensions that have data
* Fix chart id
* better shift of parser worker ids
* fix for streaming compression
* properly count received bytes
* ensure LZ4 compression ring buffer does not wrap prematurely
* do not stream empty charts; do not process empty instances in rrdcontext
* need_to_send_chart_definition() does not need an rrdset lock any more
* rrdcontext objects are collected, after data have been written to the db
* better logging of RRDCONTEXT transitions
* always set all variables needed by the worker utilization charts
* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes
* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock
* STRING code re-organization for clarity
* thread_cache improvements; double numbers precision on worker threads
* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS
* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts
* Add index to speed up database context searches
* Removed last_updated optimization (was also buggy after latest merge with master)
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* type checking on dictionary return values
* first STRING implementation, used by DICTIONARY and RRDLABEL
* enable AVL compilation of STRING
* Initial functions to store context info
* Call simple test functions
* Add host_id when getting charts
* Allow host to be null and in this case it will process the localhost
* Simplify init
Do not use strdupz - link directly to sqlite result set
* Init the database during startup
* make it compile - no functionality yet
* intermediate commit
* intermidiate
* first interface to sql
* loading instances
* check if we need to update cloud
* comparison of rrdcontext on conflict
* merge context titles
* rrdcontext public interface; statistics on STRING; scratchpad on DICTIONARY
* dictionaries maintain version numbers; rrdcontext api
* cascading changes
* first operational cleanup
* string unittest
* proper cleanup of referenced dictionaries
* added rrdmetrics
* rrdmetric starting retention
* Add fields to context
Adjuct context creation and delete
* Memory cleanup
* Fix get context list
Fix memory double free in tests
Store context with two hosts
* calculated retention
* rrdcontext retention with collection
* Persist database and shutdown
* loading all from sql
* Get chart list and dimension list changes
* fully working attempt 1
* fully working attempt 2
* missing archived flag from log
* fixed archived / collected
* operational
* proper cleanup
* cleanup - implemented all interface functions - dictionary react callback triggers after the dictionary is unlocked
* track all reasons for changes
* proper tracking of reasons of changes
* fully working thread
* better versioning of contexts
* fix string indexing with AVL
* running version per context vs hub version; ifdef dbengine
* added option to disable rrdmetrics
* release old context when a chart changes context
* cleanup properly
* renamed config
* cleanup contexts; general cleanup;
* deletion inline with dequeue; lots of cleanup; child connected/disconnected
* ml should start after rrdcontext
* added missing NULL to ri->rrdset; rrdcontext flags are now only changed under a mutex lock
* fix buggy STRING under AVL
* Rework database initialization
Add migration logic to the context database
* fix data race conditions during context deletion
* added version hash algorithm
* fix string over AVL
* update aclk-schemas
* compile new ctx related protos
* add ctx stream message utils
* add context messages
* add dummy rx message handlers
* add the new topics
* add ctx capability
* add helper functions to send the new messages
* update cmake build to not fail
* update topic names
* handle rrdcontext_enabled
* add more functions
* fatal on OOM cases instead of return NULL
* silence unknown query type error
* fully working attempt 1
* fully working attempt 2
* allow compiling without ACLK
* added family to the context
* removed excess character in UUID
* smarter merging of titles and families
* Database migration code to add family
Add family to SQL_CHART_DATA and VERSIONED_CONTEXT_DATA
* add family to context message
* enable ctx in communication
* hardcoded enabled contexts
* Add hard code for CTX
* add update node collectors to json
* add context message log
* fix log about last_time_t
* fix collected flags for queued items
* prevent crash on charts cleanup
* fix bug in AVL indexing of dictionaries; make sure react callback of dictionaries has a reference counter, which is acquired while the dictionary is locked
* fixed dictionary unittest
* strict policy to cleanup and garbage collector
* fix db rotation and garbage collection timings
* remove deadlock
* proper garbage collection - a lot faster retention recalculation
* Added not NULL in database columns
Remove migration code for context -- we will ship with version 1 of the table schema
Added define for query in tests to detect localhost
* Use UUID_STR_LEN instead of GUID_LEN + 1
Use realistic timestamps when adding test data in the database
* Add NULL checks for passed parameters
* Log deleted context when compiled with NETDATA_INTERNAL_CHECKS
* Error checking for null host id
* add missing ContextsCheckpoint log convertor
* Fix spelling in VACCUM
* Hold additional information for host -- prepare to load archived hosts on startup
* Make sure claim id is valid
* is_get_claimed is actually get the current claim id
* Simplify ctx get chart list query
* remove env negotiation
* fix string unittest when there are some strings already in the index
* propagate live-retention flag upstream; cleanup all update reasons; updated instances logging; automated attaching started/stopped collecting flags;
* first implementation of /api/v1/contexts
* full contexts API; updated swagger
* disabled debugging; rrdcontext enabled by default
* final cleanup and renaming of global variables
* return current time on currently collected contexts, charts and dimensions
* added option "deepscan" to the API to have the server refresh the retention and recalculate the contexts on the fly
* fixed identation of yaml
* Add constrains to the host table
* host->node_id may not be available
* new capabilities
* lock the context while rendering json
* update aclk-schemas
* added permanent labels to all charts about plugin, module and family; added labels to all proc plugin modules
* always add the labels
* allow merging of families down to [x]
* dont show uuids by default, added option to enable them; response is now accepting after,before to show only data for a specific timeframe; deleted items are only shown when "deleted" is requested; hub version is now shown when "queue" is requested
* Use the localhost claim id
* Fix to handle host constrains better
* cgroups: add "k8s." prefix to chart context in k8s
* Improve sqlite metadata version migration check
* empty values set to "[none]"; fix labels unit test to reflect that
* Check if we reached the version we want first (address CODACY report re: Array index 'i' is used before limits check)
* Rewrite condition to address CODACY report (Redundant condition: t->filter_callback. '!A || (A && B)' is equivalent to '!A || B')
* Properly unlock context
* fixed memory leak on rrdcontexts - it was not freeing all dictionaries in rrdhost; added wait of up to 100ms on dictionary_destroy() to give time to dictionaries to release their items before destroying them
* fixed memory leak on rrdlabels not freed on rrdinstances
* fixed leak when dimensions and charts are redefined
* Mark entries for charts and dimensions as submitted to the cloud 3600 seconds after their creation
Mark entries for charts and dimensions as updated (confirmed by the cloud) 1800 seconds after their submission
* renamed struct string
* update cgroups alarms
* fixed codacy suggestions
* update dashboard info
* fix k8s_cgroup_10s_received_packets_storm alarm
* added filtering options to /api/v1/contexts and /api/v1/context
* fix eslint
* fix eslint
* Fix pointer binding for host / chart uuids
* Fix cgroups unit tests
* fixed non-retention updates not propagated upstream
* removed non-fatal fatals
* Remove context from 2 way string merge.
* Move string_2way_merge to dictionary.c
* Add 2-way string merge tests.
* split long lines
* fix indentation in netdata-swagger.yaml
* update netdata-swagger.json
* yamllint please
* remove the deleted flag when a context is collected
* fix yaml warning in swagger
* removed non-fatal fatals
* charts should now be able to switch contexts
* allow deletion of unused metrics, instances and contexts
* keep the queued flag
* cleanup old rrdinstance labels
* dont hide objects when there is no filter; mark objects as deleted when there are no sub-objects
* delete old instances once they changed context
* delete all instances and contexts that do not have sub-objects
* more precise transitions
* Load archived hosts on startup (part 1)
* update the queued time every time
* disable by default; dedup deleted dimensions after snapshot
* Load archived hosts on startup (part 2)
* delayed processing of events until charts are being collected
* remove dont-trigger flag when object is collected
* polish all triggers given the new dont_process flag
* Remove always true condition
Enums for readbility / create_host_callback only if ACLK is enabled (for now)
* Skip retention message if context streaming is enabled
Add messages in the access log if context streaming is enabled
* Check for node id being a UUID that can be parsed
Improve error check / reporting when loading archived hosts and creating ACLK sync threads
* collected, archived, deleted are now mutually exclusive
* Enable the "orphan" handling for now
Remove dead code
Fix memory leak on free host
* Queue charts and dimensions will be no-op if host is set to stream contexts
* removed unused parameter and made sure flags are set on rrdcontext insert
* make the rrdcontext thread abort mid-work when exiting
* Skip chart hash computation and storage if contexts streaming is enabled
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Timo <timotej@netdata.cloud>
Co-authored-by: ilyam8 <ilya@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
Co-authored-by: Vasilis Kalintiris <vasilis@netdata.cloud>
* Use atomics ops with host->rrdpush_sender_connected.
* Use different storage unit for rrdim's updated and exposed fields.
The bitfields would end up in the same byte and thus requiring
explicit protection with mutexes.
* netdata doubles
* fix cmocka test
* fix cmocka test again
* fix left-overs of long double to NETDATA_DOUBLE
* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf
* update the memory before saving
* rrdset is now detached from file structures too
* on memory mode map, update the memory mapped structures on every iteration
* allow RRD_ID_LENGTH_MAX to be changed
* granularity secs, back to update every
* fix formatting
* more formatting
* squashed and rebased to master
* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h
* added unittest for UTF-8 multibyte sanitization
* Fix unit test compilation
* Fix CMake build
* remove double sanitizer for opentsdb; cleanup sanitize_json_string()
* rename error_description to error_message to avoid conflict with json-c
* revert last and undef error_description from json-c
* more unittests; attempt to fix protobuf map issue
* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer
* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted
* better sorted dictionary checking
* proper unittesting for sorted dictionaries
* call dictionary deletion callback when destroying the dictionary
* remove obsolete variable
* Fix exporting unit tests
* Fix k8s label parsing test
* workaround for cmocka and strdupz()
* Bypass cmocka memory allocation check
* Revert "Bypass cmocka memory allocation check"
This reverts commit 4c49923839.
* Revert "workaround for cmocka and strdupz()"
This reverts commit 7bebee0480.
* Bypass cmocka memory allocation checks
* respect json formatting for chart labels
* cloud sends colons
* print the value only once
* allow parenthesis in values and spaces; make stream sender send quotes for values
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* replace connect_to_one_of with connect_to_one_of_destinations
* move functions from socket.c
* use sizeof
* move current destination pointer to host
* formatting
* use snprintfz
* get entries in same order
* handle single destination as before (or when it is the last of the list), instead of skiping it every other loop
* try other destinations on ssl problem
After https://github.com/netdata/netdata/pull/12209 per-chart configuration
was used for (a) enabling/disabling a chart, and (b) renaming dimensions.
Regarding the first use case: We already have component-specific
configuration options|flags to finely control how a chart should behave.
Eg. "send charts matching" in streaming, "charts to skip from training"
in ML, etc. If we really need the concept of a disabled chart, we can
add a host-level simple pattern to match these charts.
Regarding the second use case: It's not obvious why we'd need to provide
support for remapping dimension names through a chart-specific configuration
from the core agent. If the need arises, we could add such support at
the right place, ie. a exporter/streaming config section.
This will allow each flag to act indepentendly from each other and
avoid managing flag-state manually at various places, eg:
```
if(unlikely(!rrdset_flag_check(st, RRDSET_FLAG_ENABLED))) {
rrdset_flag_clear(st, RRDSET_FLAG_UPSTREAM_SEND);
rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
} ...
```
* Move CPU usage stats under netdata charts
Use the hostname in each chart's name, and the machine GUID in each
chart's id.
* Move anomaly_detection.* charts to child host instance.
* Add option to enable/disable streaming of ML-related charts.
* Update priority of prediction/training charts.
* Track anomaly rates with DBEngine.
This commit adds support for tracking anomaly rates with DBEngine. We
do so by creating a single chart with id "anomaly_detection.anomaly_rates" for
each trainable/predictable host, which is responsible for tracking the anomaly
rate of each dimension that we train/predict for that host.
The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly
rates charts. We use this flag to:
- Disable exposing the anomaly rates charts through the functionality
in backends/, exporting/ and streaming/.
- Skip generation of configuration options for the name, algorithm,
multiplier, divisor of each dimension in an anomaly rates chart.
- Skip the creation of health variables for anomaly rates dimensions.
- Skip the chart/dim queue of ACLK.
- Post-process the RRDR result of an anomaly rates chart, so that we can
return a sorted, trimmed number of anomalous dimensions.
In a child/parent configuration where both the child and the parent run
ML for the child, we want to be able to stream the rest of the ML-related
charts to the parent. To be able to do this without any chart name collisions,
the charts are now created on localhost and their IDs and titles have the node's
machine_guid and hostname as a suffix, respectively.
* Fix exporting_engine tests.
* Restore default ML configuration.
The reverted changes where meant for local testing only. This commit
restores the default values that we want to have when someone runs
anomaly detection on their node.
* Set context for anomaly_detection.* charts.
* Check for anomaly rates chart only with a valid pointer.
* Remove duplicate code.
* Use a more descriptive name for id/title pair variable
* Add code for LZ4 streaming data compression
* Fix LGTM alert
* Add lz4 library for link when compression enabled
* Add LZ4_resetStream_fast presence detection
* Disable compression for older LZ4 libraries
* Correct LZ4 API check
* [Testing Stream Compression] Debug msgs and report.md
* Add LZ4 library version using LZ4_initStream
* Fixed bug in SSL mode
* [Testing compression] - Add compression info messages
* Set compression enabled by default, update doc
* Update streaming/README.md
Co-authored-by: DShreve2 <david@netdata.cloud>
* [Agent Negotiation] Compression as separate capability
* [Agent Negotiation] Compression as separate capability - default compression variable always active
* Add code to negotiate compression
* [Agent Negotiation] Based on stream version
* [Agent Negotiation] Version based - fix compilation error
* [Agent Negotiation] Fix glob var default_compression_enbaled=0 affects all the connections - Handle compression - stream version based
* [Agent Negotiation - Compression] - Add control flag in 1. sender/receiver state & 2. stream.conf per child
* [Agent Negotiation - Compression] Fix stream.conf key, mguid control
* [Agent Negotiate Compression] Fine control on stream.conf per key,mguid for each child
* [Agent Negotiation Compression] Stop destroying compressor for runtime configuration + Update Readme.md
* [Agent Negotiation Compression] Use stream_version 4 if compression is disabled
* Correct child's compression check
* [Agent Negotiation Compression] Create streaming compression section in docs.
* [Agent Negotiation Compresion] Remove redundant debug msgs
* [Stream Compression] - integrate compression build info & config info in api/v1/info endpoint.
* [Agent Negotiation] Finalize README.md
* [Agent Stream Compression] Fix buildinfo json, Finalize readme.md
* [Agent Stream Compression] Negotiate compression based on stream version
* [Agent Stream Compression] Stream compression control per child in stream.conf | per AP_KEY, MACHINE_GUID
* [Agent Stream Compression] Avoid destroying compressor enabling runtime configuration + Update Readme.md
* [Agent Stream Compression] - Provide stream compression build info & config info in api/v1/info endpoint + Update Readme.md
* [Agent Stream Compression] Fix rebase conflicts
* [Agent Stream Compression] Fix more rebase conflicts
* [Agent Stream Compression] 1. Stream version based negotiation 2. per child stream.conf control 3. finalize docs 4. stream compression build info in web api
* [Agent Stream Compression] 1. Stream version based negotiation 2. per child stream.conf control 3. finalize docs 4. stream compression build info in web api
* [Agent Stream Compression] Change unsuccessful buffer check to error
* [Agent Stream Compression] Readme.md proof-read corrections, downgrade to stream_version_clabels, add shields for supported versions, EOF lint
* [Agent Stream Compression] Fix missed lz4 library on Alpine Linux
* Phrasal review
Co-authored-by: odynik <odynik.ee@gmail.com>
Co-authored-by: DShreve2 <david@netdata.cloud>
Co-authored-by: Tina Lüdtke <tina@kickoke.com>
* Send ML feature information with UpdateNodeInfo.
We achieve this by adding the `ml_{capable,enabled}` fields in
`system_info`. When streaming, these fields allow a parent to understand if
the child has ML and if it runs ML for itself.
The UpdateNodeInfo includes this information about a child, plus a
boolean that is set to true when the parent runs ML for the child.
* Fix unit test and building with --disable-ml.
* Refactoring to use the new MachineLearningInfo message
* Update aclk-schemas repository to include latest ML info message.
* stream chart labels
* update stream protocol to 4
* only send CLABEL_COMMIT when there are labels
* mark host as UNUSED
* log error for stray CLABEL_COMMIT
* remove commented define
Currently, we add the repository's top-level dir in the compiler's
header search path. This means that code in every top-level directory
within the repo can include headers sibling top-level directories.
This patch makes header inclusion consistent when it comes to files
that are included from sibling top-level directories within the repo.
* Fix race condition between orphan host cleanup and new streaming connections.
* Remove health enabling from log replay, it will be handled at streaming connection time.
* Hard code a node for non-legacy multidb test
Skip dbengine initialization for new incoming children
Add code to switch to multidb ctx when accessing the dbengine
* When a non-legacy streaming connection is detected, use the multidb metadata log context
* Clear the superblock memory to avoid random data written in the metadata log
* Activate the host detection during compaction
Activate the host detection during metadata log chart updates
Keep the host in the user object during replay of the HOST command
* Add defaults for health / rrdpush on HOST metadata replay
Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child()
Use defaults from the stream.conf
* Count hosts only if not archived
When host switches from archived to active update rrd_hosts_available
Remove archived hosts from charts and info
* Change parameter from "multidb disk space" to "dbengine multihost disk space"
Remove unused variables
Fix compilation error when dbengine is disabled
Fix condition for machine_guid directory creation under cache_dir
* Enable multidb disk space file creation.
* Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine.
* Fix old bug in the code that confused obsolete hosts with orphan hosts.
* Do not delete multi-host DB host files.
* Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies.
* Identify metadata that collide with non-dbengine memory mode hosts and ignore them.
* Handle non-dbengine localhost with dbengine archived charts in localhost and streaming.
* Ignore archived hosts in streaming.
* Add documentation before merging to master.
Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
The streaming component detects when a receiver stream has closed, and stops an attached sender on the same host. This is to support proxy configurations where the stream is passed through. During the shutdown sequence, once netdata_exit has been set no thread should touch any RRDHOST structure as the non-static threads are not joined before the database shuts down.
The destruction of the thread state has been separated from the cleanup and can be called from two points. If the thread can detach itself from the host (i.e. it is not during the shutdown sequence) then it does so and destroys the state. During shutdown the thread leaves the state intact so that it can be destroyed during the host destruction, and the host destruction now cancels the thread to ensure a consistent sequence of events.
This PR adds (inactive) support that we will use to fill the gaps on chart when a receiving agent goes offline and the sender reconnects. The streaming component has been reworked to make the connection bi-directional and fix several outstanding bugs in the area.
* Fixed an incorrect case of version negotiation. Removed fatal() on exhaustion of fds.
* Fixed cases that fell through to polling the socket after closing.
* Fixed locking of data related to sender and receiver in the host structure.
* Added fine-grained locks to reduce contention.
* Added circular buffer to sender to prevent starvation in high-latency conditions.
* Fixed case where agent is a proxy and negotiated different streaming versions with sender and receiver.
* Changed interface to new parser to put the buffering code in streaming.
* Fixed the bug that stopped senders from reconnecting after their socket times out - this was part of the scaling fixes that provide an early shortcut path for rejecting connections without lock contention.
* Uses fine-grained locking and a different approach to thread shutdown instead.
* Added liveness detection to connections to allow selection of the best connection.
* tls13: This commit brings TLS 1.3 to Netdata
* tls13: Update variables on slave side
* tls13: Fix compilation error for old libraries
* tls13: Fix compilation error for old libraries 2
* tls13 remove ciphers
* tls13: TLS versions
This commit brings the missing tls versions accpeted for Netdata
and it also brings documentation update related to these versions
* tls13: Remove dupplication
This commit removes wrong dupplication of code
* tls13: Documentation
This commit brings fix for the documentation
* tls13: Remove magic number
This commit removes the magic number to allow the code to be readable
* tls13: TLS version
Small adjust with TLS version
* tls13: Security Init
This commit removes array from the function and overwrite the magic number
with a string
* tls13: Remove new variable name from stream
* tls13: OpenSSL versions and old key name
This commit removes the new key names and also update the names
used to define openssl version
* Disallow multiple streaming connections to the same master agent
* Reject multiple streaming connections quickly without blocking
* Increase timeout for systemd service shutdown to give time to flush the db.
* Optimize page correlation ID to use atomic counter instead of locks
* Reduce contention in global configuration mutex
* Optimize complexity of inserting configuration sections from O(N) to O(1)
* Reduce overhead of clockgettime() by utilizing CLOCK_MONOTONIC_COARSE when applicable.
* Fix unit test compile errors
* stream_encode: Enconde slave
This commit encodes the messages before to send them from master to slave
* stream_encode: Remove comma
This commit changes the comma to semi-colon to bring a pattern to code
* stream_container:
Bring the missing container variables to stream
* stream_container: Missing variables
This commit brings 4 new variables that were missed to stream
* stream_forward: Fix protocol
This commit brings the necessary fixes to the protocol
* stream_forward: Fix old slave support
This commit fixes the communication with old versions of Netdata
* stream_forward: Remove declaration
There was a wrong declaration inside a block, so I am removing it
* stream_forward: USe version
This commit brings the use of version instead flags to stream
* stream_forward: Remove variable
This commit removes useless variable from hand shake
* stream_forward: Change message
Change the message setting the protocol version on it
* stream_forward: Fix version number
* stream_forward: readable definition
The definition and the variables were using the same data type, but with different declaration,
this commit fixes this.
* stream_forward: Set master version inside message
This commit updates the message used that there was a successfull connection with master
* stream_forward: FIx wrong version
This commit fixes the multiple set for stream version
* stream_forward: Reorganize code
This commit reorganizes code to speed up the processing
* stream_forward: Adjust code
This commit removes an unecessary else
* stream_forward: Brings old structure
This commits returns a previous necessary to the code
* stream_forward: fix error report
This commit fixes the error report that was happening when the stream version does not match
* stream_forward: Fixes msg and remove unecessary call
Improve the metadata detection for containers. The system_info structure has been updated to hold separate copies of OS_NAME, OS_ID, OS_ID_LIKE, OS_VERSION, OS_VERSION_ID and OS_DETECTION for both the container environment and the host. This new information is communicated through the /api/v1/info endpoint. For the streaming interface a partial copy of the info is carried until the stream protocol is upgraded. The anonymous_statistics script has been updated to carry the new data to Google Analytics. Some minor improvements have been made to OS-X / FreeBSD detection, and the detection of virtualization. The docs have been updated to explain how to pass the host environment to the docker container running Netdata.
Initial work on host labels from the dedicated branch. Includes work for issues #7096, #7400, #7411, #7369, #7410, #7458, #7459, #7412 and #7408 by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
When a slave had SSL activate for stream and local access it was overwritten the addresses,
this PR fixes this problem that was not allowed the stream works 100%
* sslcertificate: Trust certificate
The netdata could not allow invalid certificate or certificate with invalid chain
this commit fixes this!
* sslcertificate: Changing name
We are binging the same names used by the OpenSSL library to simplify the understand of the parameters
* sslcertificate: Name changes and explicity directory
This commit fix the problem with Streams and rename correctly the files in the option, it also uses stat to define the existence of a file
* sslcertificate: Documentation
Fix grammar for the newest section in the documentation
* sslcertificate: Rename variables
The old variables did not represent well what they are doing, so it was renamed
* sslstream: ACL parser
It was noticed in the issue 6457 that the some ACLs were not parsing
correctly when they were along SSL acl, this commit fixes this'
* sslstream: remove comments
This commit removes the comments that were present while I was testing the code
* sslstream: Tests
This commit adds ACL tests to check the Netdata response to them
* sslstream: Tests
Fix the extension to upload the files
* sslstream: more tests
In this commit I am bringing more tests, including the ssl tests'
* sslstream: leading space
Remove leading space from variable that was creating problem with shellcheck
* sslstream: glob
Remove special character from script
* sslstream: Makefile
The Makefile diretives were pointed to wrong files
* sslstream: Missing stream encrypt
This commit solves the problem of the stream not be encrypted, but
it is not the final solution, because the parser made is incomplete.
* sslstream: Finish encrypt channel
This commit brings the step that I was missing, the complete encryptation
in the communication between Master and Slave
* sslstream: Fix argument in script
After the latest tests, it was verified that two arguments given to a function
inside the script were not correct, with this PR I am fixing this!
* sslstream: Fix argument in info
Instead to call a function to deliver an integer I was passing a size_t value.
Only cmake showed this, but not in my clion! :/
* sslstream: Fix redirect
When we were having different SSL configuration, the system were not applying
the option for all
* sslstream: Update documentation
Our documentation was not clear about the rules according our code
so I am updating the text to explain for the users
* sslstream: Adjust script
With this last commit, I am adjusting the tests to avoid false positive
* sslstream: Missing elif
The previous commit had a missing elif in the shell script
* sslstream: Split ports
Before this commit Netdata was having SSL as a global option, now it has as a real ACL.
* sslstream: reduce context
The stream variable will not be affected in the master side, it is only necessary
on the slave side, so I am reducing the context of it
* sslstream: Force SSL
When the user has certificate and he does not set any SSL flag, it is necessary
to append the SSL=force flag
* sslstream: Default flag
It is necessary to have a default flag when the SSL flags are not SET
* sslstream: remove comments
Remove comments from the scrip
* sslstream: moving flag
It is better the flag to be set inside socket instead everytime there is a new connection
* sslstream: documentation
Fix a sentence in the web/server/README.md