* rrdset - in progress
* rrdset optimal constructor; rrdset conflict
* rrdset final touches
* re-organization of rrdset object members
* prevent use-after-free
* dictionary dfe supports also counting of iterations
* rrddim managed by dictionary
* rrd.h cleanup
* DICTIONARY_ITEM now is referencing actual dictionary items in the code
* removed rrdset linked list
* Revert "removed rrdset linked list"
This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.
* removed rrdset linked list
* added comments
* Switch chart uuid to static allocation in rrdset
Remove unused functions
* rrdset_archive() and friends...
* always create rrdfamily
* enable ml_free_dimension
* rrddim_foreach done with dfe
* most custom rrddim loops replaced with rrddim_foreach
* removed accesses to rrddim->dimensions
* removed locks that are no longer needed
* rrdsetvar is now managed by the dictionary
* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853
* conflict callback of rrdsetvar now properly checks if it has to reset the variable
* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM
* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe
* dictionary walkthrough callbacks get dictionary acquired items
* dictionary reference counters that can be dupped from zero
* added advanced functions for get and del
* rrdvar managed by dictionaries
* thread safety for rrdsetvar
* faster rrdvar initialization
* rrdvar string lengths should match in all add, del, get functions
* rrdvar internals hidden from the rest of the world
* rrdvar is now acquired throughout netdata
* hide the internal structures of rrdsetvar
* rrdsetvar is now acquired through out netdata
* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata
* better error handling
* dont create variables if not initialized for health
* dont create variables if not initialized for health again
* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items
* type checking on acquired objects
* rrdcalc renaming of functions
* type checking for rrdfamily_acquired
* rrdcalc managed by dictionaries
* rrdcalc double free fix
* host rrdvars is always needed
* attempt to fix deadlock 1
* attempt to fix deadlock 2
* Remove unused variable
* attempt to fix deadlock 3
* snprintfz
* rrdcalc index in rrdset fix
* Stop storing active charts and computing chart hashes
* Remove store active chart function
* Remove compute chart hash function
* Remove sql_store_chart_hash function
* Remove store_active_dimension function
* dictionary delayed destruction
* formatting and cleanup
* zero dictionary base on rrdsetvar
* added internal error to log delayed destructions of dictionaries
* typo in rrddimvar
* added debugging info to dictionary
* debug info
* fix for rrdcalc keys being empty
* remove forgotten unlock
* remove deadlock
* Switch to metadata version 5 and drop
chart_hash
chart_hash_map
chart_active
dimension_active
v_chart_hash
* SQL cosmetic changes
* do not busy wait while destroying a referenced dictionary
* remove deadlock
* code cleanup; re-organization;
* fast cleanup and flushing of dictionaries
* number formatting fixes
* do not delete configured alerts when archiving a chart
* rrddim obsolete linked list management outside dictionaries
* removed duplicate contexts call
* fix crash when rrdfamily is not initialized
* dont keep rrddimvar referenced
* properly cleanup rrdvar
* removed some locks
* Do not attempt to cleanup chart_hash / chart_hash_map
* rrdcalctemplate managed by dictionary
* register callbacks on the right dictionary
* removed some more locks
* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread
* when looking up for an alarm look using both chart id and chart name
* host initialization a bit more modular
* init rrdlabels on host update
* preparation for dictionary views
* improved comment
* unused variables without internal checks
* service threads isolation and worker info
* more worker info in service thread
* thread cancelability debugging with internal checks
* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647
* dictionary modularization
* Remove unused SQL statement definition
* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated
* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops
* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/proc.plugin/proc_net_dev.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* fix memory leak in rrdset cache_dir
* minor dictionary changes
* dont use index locks in single threaded
* obsolete dict option
* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;
* fix jump on uninitialized value in dictionary; remove double free of cache_dir
* addressed codacy findings
* removed debugging code
* use the private refcount on dictionaries
* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;
* more dictionary statistics
* global statistics about dictionary operations, memory, items, callbacks
* dictionary support for views - missing the public API
* removed warning about unused parameter
* chart and context name for cloud
* chart and context name for cloud, again
* dictionary statistics fixed; first implementation of dictionary views - not currently used
* only the master can globally delete an item
* context needs netdata prefix
* fix context and chart it of spins
* fix for host variables when health is not enabled
* run garbage collector on item insert too
* Fix info message; remove extra "using"
* update dict unittest for new placement of garbage collector
* we need RRDHOST->rrdvars for maintaining custom host variables
* Health initialization needs the host->host_uuid
* split STRING to its own files; no code changes other than that
* initialize health unconditionally
* unit tests do not pollute the global scope with their variables
* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* rrdfamily
* rrddim
* rrdset plugin and module names
* rrdset units
* rrdset type
* rrdset family
* rrdset title
* rrdset title more
* rrdset context
* rrdcalctemplate context and removal of context hash from rrdset
* strings statistics
* rrdset name
* rearranged members of rrdset
* eliminate rrdset name hash; rrdcalc chart converted to STRING
* rrdset id, eliminated rrdset hash
* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate
* rrdcalctemplate
* rrdvar
* eval_variable
* rrddimvar and rrdsetvar
* rrdhost hostname, os and tags
* fix master commits
* added thread cache; implemented string_dup without locks
* faster thread cache
* rrdset and rrddim now use dictionaries for indexing
* rrdhost now uses dictionary
* rrdfamily now uses DICTIONARY
* rrdvar using dictionary instead of AVL
* allocate the right size to rrdvar flag members
* rrdhost remaining char * members to STRING *
* better error handling on indexing
* strings now use a read/write lock to allow parallel searches to the index
* removed AVL support from dictionaries; implemented STRING with native Judy calls
* string releases should be negative
* only 31 bits are allowed for enum flags
* proper locking on strings
* string threading unittest and fixes
* fix lgtm finding
* fixed naming
* stream chart/dimension definitions at the beginning of a streaming session
* thread stack variable is undefined on thread cancel
* rrdcontext garbage collect per host on startup
* worker control in garbage collection
* relaxed deletion of rrdmetrics
* type checking on dictfe
* netdata chart to monitor rrdcontext triggers
* Group chart label updates
* rrdcontext better handling of collected rrdsets
* rrdpush incremental transmition of definitions should use as much buffer as possible
* require 1MB per chart
* empty the sender buffer before enabling metrics streaming
* fill up to 50% of buffer
* reset signaling metrics sending
* use the shared variable for status
* use separate host flag for enabling streaming of metrics
* make sure the flag is clear
* add logging for streaming
* add logging for streaming on buffer overflow
* circular_buffer proper sizing
* removed obsolete logs
* do not execute worker jobs if not necessary
* better messages about compression disabling
* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped
* monitor stream sender used buffer ratio
* Update exporting unit tests
* no need to compare label value with strcmp
* streaming send workers now monitor bandwidth
* workers now use strings
* streaming receiver monitors incoming bandwidth
* parser shift of worker ids
* minor fixes
* Group chart label updates
* Populate context with dimensions that have data
* Fix chart id
* better shift of parser worker ids
* fix for streaming compression
* properly count received bytes
* ensure LZ4 compression ring buffer does not wrap prematurely
* do not stream empty charts; do not process empty instances in rrdcontext
* need_to_send_chart_definition() does not need an rrdset lock any more
* rrdcontext objects are collected, after data have been written to the db
* better logging of RRDCONTEXT transitions
* always set all variables needed by the worker utilization charts
* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes
* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock
* STRING code re-organization for clarity
* thread_cache improvements; double numbers precision on worker threads
* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS
* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts
* Add index to speed up database context searches
* Removed last_updated optimization (was also buggy after latest merge with master)
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* netdata doubles
* fix cmocka test
* fix cmocka test again
* fix left-overs of long double to NETDATA_DOUBLE
* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf
* update the memory before saving
* rrdset is now detached from file structures too
* on memory mode map, update the memory mapped structures on every iteration
* allow RRD_ID_LENGTH_MAX to be changed
* granularity secs, back to update every
* fix formatting
* more formatting
* squashed and rebased to master
* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h
* added unittest for UTF-8 multibyte sanitization
* Fix unit test compilation
* Fix CMake build
* remove double sanitizer for opentsdb; cleanup sanitize_json_string()
* rename error_description to error_message to avoid conflict with json-c
* revert last and undef error_description from json-c
* more unittests; attempt to fix protobuf map issue
* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer
* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted
* better sorted dictionary checking
* proper unittesting for sorted dictionaries
* call dictionary deletion callback when destroying the dictionary
* remove obsolete variable
* Fix exporting unit tests
* Fix k8s label parsing test
* workaround for cmocka and strdupz()
* Bypass cmocka memory allocation check
* Revert "Bypass cmocka memory allocation check"
This reverts commit 4c49923839.
* Revert "workaround for cmocka and strdupz()"
This reverts commit 7bebee0480.
* Bypass cmocka memory allocation checks
* respect json formatting for chart labels
* cloud sends colons
* print the value only once
* allow parenthesis in values and spaces; make stream sender send quotes for values
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Rebased
* use sql health log if it exists
* store alert config in sqlite
* move unlock before loop
* fix warnings
* remove hash message
* check return from counting health log
* remove check of hostname when reading log
* try to create the health log table to catch accidental removals of it
* fix warnings, cast values, report config_hash_id
* use snprintfz, add info logging
* remove unnecessary strdup and free
* check if stored config hash is null
* return if prepare statement fails
* replace with static variables
* remove replace info, free edit_command
* remove setting cfg entries to NULL
* change uuid_copy
* check return of uuid_parse, and exit if its not valid
* also free cfg
* use address
* removed health_alarm_entry_sql2json and sql_health_alarm_log_select_all
* remove check for is_valid_alarm_id
* replace lengths with GUID_LEN
* use uuid_unparse_lower_fix
* removed web api endopoint to get alert config
* check for non null values for name, chart and family
* include a date_updated field in alert_hash
* for config hash, digest NULL string if value to digest is null
* Use empty string instead of null
* read and store new attributes (class, component, type) from health conf files. Replace family variable in info strings
* provide the attributes to jsons
* remove extra semicolon
* populate conf files with new attributes
* added newline
* remove extra defines from health.h
* remove empty line
* remove realloc
* use helper variables for find_and_replace. Adjust position for next strstr
* remove comments
* Add type to mysql.conf and vcsa.conf
* fix formatting
* add parenthesis
* remove extra assignment
* changes to mysql_galera_cluster_state from master
* add type Errors to unbound_request_list_overwritten
* fix identation for info strings spawning more than one line
* check for null, replace with empty string if true
* add class, component, type to systemdunits.conf
Before:
```
struct foobar {
avl avl;
...
}
```
After:
```
struct foobar {
avl_t avl;
...
};
```
Which makes figuring out the type from field name easier.
Initial work on host labels from the dedicated branch. Includes work for issues #7096, #7400, #7411, #7369, #7410, #7458, #7459, #7412 and #7408 by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
* health_connection: Comments inside Health Config
To try to understand better what is necessary to change and where it is necessary
to change anything inside the health, I commented the functions inside this file"
"
* health_connection: Comments about Health in other files
This commit brings the rest of the comments that were missed for health"
* health_connection: Comments on health_log
I had to append more comments on health_log
* health_connection: Create a new variable
New variable is created to work with foreach
* health_connection: Fix new option and doc
The first implementation of the 'foreach' had a problem, this fixes the error.
This commit also brings the updates for the documentation
* health_connection: Understanding health
This commit is to save the place that I am working, it has the map to understand all the alam process
* health_connection: Update map
I changed the position of the error message to identify the correct place to add new alarms
* health_connection: End of simple alarm
This commit finishes what is necessary to bring the same lookup for different dimensions in one unique line
* health_connection: Documentation and template steps
This commit brings the documentation missed for template and comments to help in the next
step of apply a template to create an alarm.
* health_connection: Restoring
After some tests, it was detected that the alarms were not working as expected
* health_connection: Fix bug and bring dimension to template
This commit brings a fix for an old Netdata bug, before this the Netdata always tried to create
a new entry in an index with the same id raising an error.
It also brings the possibility to use 'foreach' in template
* health_connection: Fix cmake compilation
There was a problem with cmake compilation fixed by this commit
* health_connection: shell script
Finilize the shell script to test the PR
* health_connection: Remove debug message
During the development, I used some messages to understand the code
this commit removes the last message
* health_connection: Fix bugs
This commits fix bugs reported by tests
* health_connection: Alarm working
This commit brings the necessary change for the alarms work, but it is missing the unlink from the newest list
* health_connection: Template code written
This commit finishes the creation of alarm from template, but it was not tested yet.
* health_connection: Remove comments
I am removing the comments from this PR to bring back late
* health_connection: Remove lines
Another commit to restore the files before they to be commented
* health_connection: New alarm and remove messages
I am bringing a new alarm to test template with SP and removing comments used during the development
* health_connection: Functional test review
After to review the functional test script, it was necessary to small adjust to
test all the features available with the new version
* health_connection: Free structure
I am moving the free list for the correct place, the previous place was not safe
* health_connection: ShellCheck
This commit fixes the problems with shellcheck
* health_connection: FIx hash
This commit fix the hash calculation that was using wrong input
* health_connection: Fix message error
The system was showing a wronge message, because when we have foreach
the alarm created with templated is added in a second stage to the index
* health_connection: Fix documentation
In this commit I am fixing the grammar of the previous doc and bringing
two examples
* health_connection: Fix examples
This commit fix the last two examples that was brought in this PR
* health_connection: Fix example doc
When I brought the correct grammar in the last commit, I lost a mark
* health_connection: Grammar fix
Fixing grammar of the documentation
* health_connection: Memory leak
This commit fixes the memory leak that was present in the PR
* health_connection: Reload
This commit fix the problem that the alarms were not linked after
to receive a SIGUSR2
* health_connection: False Positive from codacy
Codacy was given a false positive, I changed the function to avoid it.
* health_connection: dead code
Remove dead code from the code.
* health_connection: Memory Leak
Remove memory leak when clean simple pattern
* health_connection: Script format
With this commit I am formatting the last message to return
for the default color on terminal
* health_connection: Script format 2
With this commit I am formatting the last message to return
for the default color on terminal
* health_connection: Script format 3
With this commit I am formatting the error message to return
for the default color on terminal
* alarm_clear: Mapping
In this PR I mapped all the necessary steps to discover the solution for the ISSUE 6581
* alarm_clear: Documentation and fixes
This commit fixes the problem that were present in Netdata and it also updates
the documentation of the functions and Netdata.
* alarm_clear: shell script
The original implementation did not have a shell script, here I begin to fix this
* alarm_clear: shell script
It is necessay to verify why make is not producing the same binary than cmake and finish the changes in the script
* alarm_clear: adjust in health.c
I rewrote the health.c to be more readable, but I discovered the problem I had in the last few hours
were due kernel update
* alarm_clear: script changes
In this commit I am bringing the final version of the script that
test the alarm repetition
* alarm_clear: script fix and remove comments
IN this commit I am fixing the shellcheck errors and removing some debug messages
that were present in the code while I was developing
* alarm_clear: Format
The health.c had wrong tabulation, this PR brings back the pattern of space as tab for this file
* alarm_clear: Script
The script was using killlall that is not more present in all Linux distribution
this commit removes this and bring the new way to stop Netdata
* alarm_clear: return to previous tabulation
I am bringing back the old tabulation here and I will create a new PR
exclusively for this
* alarm_clear: Remove comments
I am removing comments from this PR to keep the focus in the major problem
* alarm_clear: Remove comments 2
I forgot one comment
* alarm_clear: New variable
I am appending a new variable in the check before the rebase, because the health.c changed in other file
has a direct relationship with what I did here until now
* alarm_clear: Fix clear repetition
With this last commit, I am bringing a new way to raise the clear alarm, but it is not repeating more
with this fix, it displayed one time when it is cleaned and it will display the message again, if and only if,
the alarm was raised.
* Alarm_repeat mergin the original!
* Alarm_repeat binary tree!
* Alarm_repeat binary tree finished!
* Alarm_repeat move function and format string
* Alarms bringing a new Binary tree
* Alarms fixing the last two
* Alarm_repeat useless var!
* Alarm fix format and repeat alarm!
* Alarm_backend steps!
* Alarm_repeat stopping to test cloud!
* Alarm_repeat stopping to test cloud 2!
* Alarm_repeat fixing when restart!
##### Summary
fixes#2673fixes#2149fixes#5017fixes#3830fixes#3187fixes#5154
Implements a command API for health which will accept commands via a socket to selectively suppress health checks.
Allows different ports to accept different request types (streaming, dashboard, api, registry, netdata.conf, badges, management)
Removes support for multi-threaded and single-threaded web servers.
##### Component Name
health, daemon
* modularized exporters
* modularized API data queries
* optimized queries
* modularized API data reduction methods
* modularized api queries
* added new directories in makefiles
* added median db query
* moved all RRDR_GROUPING related to query.h
* added stddev query
* operational median and stddev
* working simple exponential smoothing
* too complex to do it right
* fixed ses
* fixed ses
* rewrote query engine
* fix double-exponential-smoothing
* cleanup
* fixed bug identified by @vlvkobal at rrdset_first_slot()
* enable freeipmi on systems with libipmimonitoring; #4440