* cleanup of logging - wip
* first working iteration
* add errno annotator
* replace old logging functions with netdata_logger()
* cleanup
* update error_limit
* fix remanining error_limit references
* work on fatal()
* started working on structured logs
* full cleanup
* default logging to files; fix all plugins initialization
* fix formatting of numbers
* cleanup and reorg
* fix coverity issues
* cleanup obsolete code
* fix formatting of numbers
* fix log rotation
* fix for older systems
* add detection of systemd journal via stderr
* finished on access.log
* remove left-over transport
* do not add empty fields to the logs
* journal get compact uuids; X-Transaction-ID header is added in web responses
* allow compiling on systems without memfd sealing
* added libnetdata/uuid directory
* move datetime formatters to libnetdata
* add missing files
* link the makefiles in libnetdata
* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses
* added stream receiver, sender, proc plugin and pluginsd log stack
* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup
* add message ids to streaming inbound and outbound connections
* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled
* send SIGABRT to external plugins only if we are not shutting down
* fix cross cleanup in pluginsd parser
* fatal when there is a stack error in logs
* compile netdata with -fexceptions
* do not kill external plugins with SIGABRT
* metasync info logs to debug level
* added severity to logs
* added json output; added options per log output; added documentation; fixed issues mentioned
* allow memfd only on linux
* moved journal low level functions to journal.c/h
* move health logs to daemon.log with proper priorities
* fixed a couple of bugs; health log in journal
* updated docs
* systemd-cat-native command to push structured logs to journal from the command line
* fix makefiles
* restored NETDATA_LOG_SEVERITY_LEVEL
* fix makefiles
* systemd-cat-native can also work as the logger of Netdata scripts
* do not require a socket to systemd-journal to log-as-netdata
* alarm notify logs in native format
* properly compare log ids
* fatals log alerts; alarm-notify.sh working
* fix overflow warning
* alarm-notify.sh now logs the request (command line)
* anotate external plugins logs with the function cmd they run
* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash
* reformatted alarm-notify logs
* unify cgroup-network-helper.sh
* added quotes around params
* charts.d.plugin switched logging to journal native
* quotes for logfmt
* unify the status codes of streaming receivers and senders
* alarm-notify: dont log anything, if there is nothing to do
* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available
* migrate cgroup-name.sh to new logging
* systemd-cat-native now supports messages with newlines
* socket.c logs use priority
* cleanup log field types
* inherit the systemd set INVOCATION_ID if found
* allow systemd-cat-native to send messages to a systemd-journal-remote URL
* log2journal command that can convert structured logs to journal export format
* various fixes and documentation of log2journal
* updated log2journal docs
* updated log2journal docs
* updated documentation of fields
* allow compiling without libcurl
* do not use socket as format string
* added version information to newly added tools
* updated documentation and help messages
* fix the namespace socket path
* print errno with error
* do not timeout
* updated docs
* updated docs
* updated docs
* log2journal updated docs and params
* when talking to a remote journal, systemd-cat-native batches the messages
* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote
* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"
This reverts commit b079d53c11.
* note about uncompressed traffic
* log2journal: code reorg and cleanup to make modular
* finished rewriting log2journal
* more comments
* rewriting rules support
* increased limits
* updated docs
* updated docs
* fix old log call
* use journal only when stderr is connected to journal
* update netdata.spec for libcurl, libpcre2 and log2journal
* pcre2-devel
* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse
* log2journal only on systems pcre2 is available
* ignore log2journal in .gitignore
* avoid log2journal on centos 7, amazonlinux 2 and opensuse
* add pcre2-8 to static build
* undo last commit
* Bundle to static
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Add build deps for deb packages
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Add dependencies; build from source
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Test build for amazon linux and centos expect to fail for suse
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* fix minor oversight
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Reorg code
* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
---------
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
* Send node update info only if the host has finished replication
* Log number of hosts replicating / pending to load context
* Remove prefix (thread name is enough)
* Storage engine.
* Host indexes to rrdb
* Move globals to rrdb
* Move storage_tiers_backfill to rrdb
* default_rrd_update_every to rrdb
* default_rrd_history_entries to rrdb
* gap_when_lost_iterations_above to rrdb
* rrdset_free_obsolete_time_s to rrdb
* libuv_worker_threads to rrdb
* ieee754_doubles to rrdb
* rrdhost_free_orphan_time_s to rrdb
* rrd_rwlock to rrdb
* localhost to rrdb
* rm extern from func decls
* mv rrd macro under rrd.h
* default_rrdeng_page_cache_mb to rrdb
* default_rrdeng_extent_cache_mb to rrdb
* db_engine_journal_check to rrdb
* default_rrdeng_disk_quota_mb to rrdb
* default_multidb_disk_quota_mb to rrdb
* multidb_ctx to rrdb
* page_type_size to rrdb
* tier_page_size to rrdb
* No storage_engine_id in rrdim functions
* storage_engine_id is provided by st
* Update to fix merge conflict.
* Update field name
* Remove unnecessary macros from rrd.h
* Rm unused type decls
* Rm duplicate func decls
* make internal function static
* Make the rest of public dbengine funcs accept a storage_instance.
* No more rrdengine_instance :)
* rm rrdset_debug from rrd.h
* Use rrdb to access globals in ML and ACLK
Missed due to not having the submodules in the
worktree.
* rm total_number
* rm RRDVAR_TYPE_TOTAL
* rm unused inline
* Rm names from typedef'd enums
* rm unused header include
* Move include
* Rm unused header include
* s/rrdhost_find_or_create/rrdhost_get_or_create/g
* s/find_host_by_node_id/rrdhost_find_by_node_id/
Also, remove duplicate definition in rrdcontext.c
* rm macro used only once
* rm macro used only once
* Reduce rrd.h api by moving funcs into a collector specific utils header
* Remove unused func
* Move parser specific function out of rrd.h
* return storage_number instead of void pointer
* move code related to rrd initialization out of rrdhost.c
* Remove tier_grouping from rrdim_tier
Saves 8 * storage_tiers bytes per dimension.
* Fix rebase
* s/rrd_update_every/update_every/
* Mark functions as static and constify args
* Add license notes and file to build systems.
* Remove remaining non-log/config mentions of memory mode
* Move rrdlabels api to separate file.
Also, move localhost functions that loads
labels outside of database/ and into daemon/
* Remove function decl in rrd.h
* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir
* Do not expose internal function from rrd.h
* Rm NETDATA_RRD_INTERNALS
Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.
* Add license note
* Include libnetdata.h instead of aral.h
* Use rrdb to access localhost
* Fix builds without dbengine
* Add header to build system files
* Add rrdlabels.h to build systems
* Move func def from rrd.h to rrdhost.c
* Fix macos build
* Rm non-existing function
* Rebase master
* Define buffer length macro in ad_charts.
* Fix FreeBSD builds.
* Mark functions static
* Rm func decls without definitions
* Rebase master
* Rebase master
* Properly initialize value of storage tiers.
* Fix build after rebase.
* configure extent cache size
* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads
* fix dispatched and executing counters
* boost to the max
* increase libuv worker threads
* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical
* fix accounting of query prep
* inlining of time-grouping functions, to speed up queries with billions of points
* make switching based on a local const variable
* print one pending contexts loading message per iteration
* inlined store engine query API
* inlined storage engine data collection api
* inlined all storage engine query ops
* eliminate and inline data collection ops
* simplified query group-by
* more error handling
* optimized partial trimming of group-by queries
* preparative work to support multiple passes of group-by
* more preparative work to support multiple passes of group-by (accepts multiple group-by params)
* unified query timings
* unified query timings - weights endpoint
* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries
* fix query memory accounting
* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name
* limit max ACLK WEB response size to 30MB
* the response type should be text/plain
* more preparative work for multiple group-by passes
* create functions for generating group by keys, ids and names
* multiple group-by passes are now supported
* parse group-by options array also with an index
* implemented percentage-of-instance group by function
* family is now merged in multi-node contexts
* prevent uninitialized use
* Schedule node info to the cloud after child connection
* Remove debug code
* Schedule localhost node info within 5 seconds of startup. If no children are detected Or a child connects (switch to immediate localhost node info update)
* Remove aclk sync threads
* Disable functions if compiled with --disable-cloud
* Allocate and reuse buffer when scanning hosts
Tune transactions when writing metadata
Error checking when executing db_execute (it is already within a loop with retries)
* Schedule host context load in parallel
Child connection will be delayed if context load is not complete
Event loop cleanup
* Delay retention check if context is not loaded
Remove context load check from regular metadata host scan
* Improve checks to check finished threads
* Cleanup warnings when compiling with --disable-cloud
* Clean chart labels that were created before our current maximum retention
* Fix sql statement
* Remove structures members that of no use
Remove buffer allocations when not needed
* Fix compilation error
* Don't check for service running when not from a worker
* Code cleanup if agent is compiled with --disable-cloud
Setup ACLK tables in the database if needed
Submit node status update messages to the cloud
* Fix compilation warning when --disable-cloud is specified
* Address codacy issues
* Remove empty file -- has already been moved under contexts
* Use enum instead of numbers
* Use UUID_STR_LEN
* Add newline at the end of file
* Release node_id to prevent memory leak under certain cases
* Add queries in defines
* Ignore rc from transaction start -- if there is an active transaction, we will use it (same with commit) should further improve in a future PR
* Remove commented out code
* If host is null (it should not be) do not allocate config (coverity reports Resource leak)
* Do garbage collection when contexts is initialized
* Handle the case when config is not yet available for a host
* Schedule direct metadata update on host creation
Virtual hosts do not have a receiver but they are not orphan
Schedule node info update on host activation
New function to store host info and host_system_info
If the host is just created, create tables and sync thread
If the host exists during startup it is not live but reschedule node update if it is reactivated
* New opcode to send current node state
* Remove debug messages
* Fix system host info
* first work on standardizing json formatting
* renamed old grouping to time_grouping and added group_by
* add dummy functions to enable compilation
* buffer json api work
* jsonwrap opening with buffer_json_X() functions
* cleanup
* storage for quotes
* optimize buffer printing for both numbers and strings
* removed ; from define
* contexts json generation using the new json functions
* fix buffer overflow at unit test
* weights endpoint using new json api
* fixes to weights endpoint
* check buffer overflow on all buffer functions
* do synchronous queries for weights
* buffer_flush() now resets json state too
* content type typedef
* print double values that are above the max 64-bit value
* str2ndd() can now parse values above UINT64_MAX
* faster number parsing by avoiding double calculations as much as possible
* faster number parsing
* faster hex parsing
* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers
* full printing and parsing without using library functions - and related unit tests
* added IEEE754 streaming capability to enable streaming of double values in hex
* streaming and replication to transfer all values in hex
* use our own str2ndd for set2
* remove subnormal check from ieee
* base64 encoding for numbers, instead of hex
* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision
* str2ndd_encoded() parses all encoding formats, including integers
* prevent uninitialized use
* /api/v1/info using the new json API
* Fix error when compiling with --disable-ml
* Remove redundant 'buffer_unittest' declaration
* Fix formatting
* Fix formatting
* Fix formatting
* fix buffer unit test
* apps.plugin using the new JSON API
* make sure the metrics registry does not accept negative timestamps
* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache
* Fix more formatting
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
* function renames and code cleanup in popen.c; no actual code changes
* netdata popen() now opens both child process stdin and stdout and returns FILE * for both
* pass both input and output to parser structures
* updated rrdset to call custom functions
* RRDSET FUNCTION leading calls for both sync and async operation
* put RRDSET functions to a separate file
* added format and timeout at function definition
* support for synchronous (internal plugins) and asynchronous (external plugins and children) functions
* /api/v1/function endpoint
* functions are now attached to the host and there is a dictionary view per chart
* functions implemented at plugins.d
* remove the defer until keyword hook from plugins.d when it is done
* stream sender implementation of functions
* sanitization of all functions so that certain characters are only allowed
* strictier sanitization
* common max size
* 1st working plugins.d example
* always init inflight dictionary
* properly destroy dictionaries to avoid parallel insertion of items
* add more debugging on disconnection reasons
* add more debugging on disconnection reasons again
* streaming receiver respects newlines
* dont use the same fp for both streaming receive and send
* dont free dbengine memory with internal checks
* make sender proceed in the buffer
* added timing info and garbage collection at plugins.d
* added info about routing nodes
* added info about routing nodes with delay
* added more info about delays
* added more info about delays again
* signal sending thread to wake up
* streaming version labeling and commented code to support capabilities
* added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info
* redirect top output to stdout
* address coverity findings
* fix resource leaks of popen
* log attempts to connect to individual destinations
* better messages
* properly parse destinations
* try to find a function from the most matching to the least matching
* log added streaming destinations
* rotate destinations bypassing a node in the middle that does not accept our connection
* break the loops properly
* use typedef to define callbacks
* capabilities negotiation during streaming
* functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities
* restore functionality to lookup functions
* better logging of capabilities
* remove old versions from capabilities when a newer version is there
* fix formatting
* optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time
* delayed health initialization for rrddim and rrdset
* cleanup health initialization
* fix for popen() not returning the right value
* add health worker jobs for initializing rrdset and rrddim
* added content type support for functions; apps.plugin permanent function to display all the processes
* fixes for functions parameters parsing in apps.plugin
* fix for process matching in apps.plugiin
* first working function for apps.plugin
* Dashboard ACL is disabled for functions; Function errors are all in JSON format
* apps.plugin function processes returns json table
* use json_escape_string() to escape message
* fix formatting
* apps.plugin exposes all its metrics to function processes
* fix json formatting when filtering out some rows
* reopen the internal pipe of rrdpush in case of errors
* misplaced statement
* do not use buffer->len
* support for GLOBAL functions (functions that are not linked to a chart
* added /api/v1/functions endpoint; removed format from the FUNCTIONS api;
* swagger documentation about the new api end points
* added plugins.d documentation about functions
* never re-close a file
* remove uncessesary ifdef
* fixed issues identified by codacy
* fix for null label value
* make edit-config copy-and-paste friendly
* Revert "make edit-config copy-and-paste friendly"
This reverts commit 54500c0e0a.
* reworked sender handshake to fix coverity findings
* timeout is zero, for both send_timeout() and recv_timeout()
* properly detect that parent closed the socket
* support caching of function responses; limit function response to 10MB; added protection from malformed function responses
* disabled excessive logging
* added units to apps.plugin function processes and normalized all values to be human readable
* shorter field names
* fixed issues reported
* fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses
* use double linked list macros for double linked list management
* faster apps.plugin function printing by minimizing file operations
* added memory percentage
* fix compatibility issues with older compilers and FreeBSD
* rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables;
* fix letftover variable in ifdef
* apps.plugin: do not call detach from the thread; exit immediately when input is broken
* exclude AR charts from health
* flush cleaner; prefer sender output
* clarity
* do not fill the cbuffer if not connected
* fix
* dont enabled host->sender if streaming is not enabled; send host label updates to parent;
* functions are only available through ACLK
* Prepared statement reports only in dev mode
* fix AR chart detection
* fix for streaming not being enabling itself
* more cleanup of sender and receiver structures
* moved read-only flags and configuration options to rrdhost->options
* fixed merge with master
* fix for incomplete rename
* prevent service thread from working on charts that are being collected
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
* rrdset - in progress
* rrdset optimal constructor; rrdset conflict
* rrdset final touches
* re-organization of rrdset object members
* prevent use-after-free
* dictionary dfe supports also counting of iterations
* rrddim managed by dictionary
* rrd.h cleanup
* DICTIONARY_ITEM now is referencing actual dictionary items in the code
* removed rrdset linked list
* Revert "removed rrdset linked list"
This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.
* removed rrdset linked list
* added comments
* Switch chart uuid to static allocation in rrdset
Remove unused functions
* rrdset_archive() and friends...
* always create rrdfamily
* enable ml_free_dimension
* rrddim_foreach done with dfe
* most custom rrddim loops replaced with rrddim_foreach
* removed accesses to rrddim->dimensions
* removed locks that are no longer needed
* rrdsetvar is now managed by the dictionary
* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853
* conflict callback of rrdsetvar now properly checks if it has to reset the variable
* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM
* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe
* dictionary walkthrough callbacks get dictionary acquired items
* dictionary reference counters that can be dupped from zero
* added advanced functions for get and del
* rrdvar managed by dictionaries
* thread safety for rrdsetvar
* faster rrdvar initialization
* rrdvar string lengths should match in all add, del, get functions
* rrdvar internals hidden from the rest of the world
* rrdvar is now acquired throughout netdata
* hide the internal structures of rrdsetvar
* rrdsetvar is now acquired through out netdata
* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata
* better error handling
* dont create variables if not initialized for health
* dont create variables if not initialized for health again
* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items
* type checking on acquired objects
* rrdcalc renaming of functions
* type checking for rrdfamily_acquired
* rrdcalc managed by dictionaries
* rrdcalc double free fix
* host rrdvars is always needed
* attempt to fix deadlock 1
* attempt to fix deadlock 2
* Remove unused variable
* attempt to fix deadlock 3
* snprintfz
* rrdcalc index in rrdset fix
* Stop storing active charts and computing chart hashes
* Remove store active chart function
* Remove compute chart hash function
* Remove sql_store_chart_hash function
* Remove store_active_dimension function
* dictionary delayed destruction
* formatting and cleanup
* zero dictionary base on rrdsetvar
* added internal error to log delayed destructions of dictionaries
* typo in rrddimvar
* added debugging info to dictionary
* debug info
* fix for rrdcalc keys being empty
* remove forgotten unlock
* remove deadlock
* Switch to metadata version 5 and drop
chart_hash
chart_hash_map
chart_active
dimension_active
v_chart_hash
* SQL cosmetic changes
* do not busy wait while destroying a referenced dictionary
* remove deadlock
* code cleanup; re-organization;
* fast cleanup and flushing of dictionaries
* number formatting fixes
* do not delete configured alerts when archiving a chart
* rrddim obsolete linked list management outside dictionaries
* removed duplicate contexts call
* fix crash when rrdfamily is not initialized
* dont keep rrddimvar referenced
* properly cleanup rrdvar
* removed some locks
* Do not attempt to cleanup chart_hash / chart_hash_map
* rrdcalctemplate managed by dictionary
* register callbacks on the right dictionary
* removed some more locks
* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread
* when looking up for an alarm look using both chart id and chart name
* host initialization a bit more modular
* init rrdlabels on host update
* preparation for dictionary views
* improved comment
* unused variables without internal checks
* service threads isolation and worker info
* more worker info in service thread
* thread cancelability debugging with internal checks
* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647
* dictionary modularization
* Remove unused SQL statement definition
* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated
* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops
* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/proc.plugin/proc_net_dev.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* fix memory leak in rrdset cache_dir
* minor dictionary changes
* dont use index locks in single threaded
* obsolete dict option
* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;
* fix jump on uninitialized value in dictionary; remove double free of cache_dir
* addressed codacy findings
* removed debugging code
* use the private refcount on dictionaries
* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;
* more dictionary statistics
* global statistics about dictionary operations, memory, items, callbacks
* dictionary support for views - missing the public API
* removed warning about unused parameter
* chart and context name for cloud
* chart and context name for cloud, again
* dictionary statistics fixed; first implementation of dictionary views - not currently used
* only the master can globally delete an item
* context needs netdata prefix
* fix context and chart it of spins
* fix for host variables when health is not enabled
* run garbage collector on item insert too
* Fix info message; remove extra "using"
* update dict unittest for new placement of garbage collector
* we need RRDHOST->rrdvars for maintaining custom host variables
* Health initialization needs the host->host_uuid
* split STRING to its own files; no code changes other than that
* initialize health unconditionally
* unit tests do not pollute the global scope with their variables
* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* move chart_labels to rrdset
* rename chart_labels to rrdlabels
* renamed hash_id to uuid
* turned is_ar_chart into an rrdset flag
* removed rrdset state
* removed unused senders_connected member of rrdhost
* removed unused host flag RRDHOST_FLAG_MULTIHOST
* renamed rrdhost host_labels to rrdlabels
* Update exporting unit tests
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* rrdfamily
* rrddim
* rrdset plugin and module names
* rrdset units
* rrdset type
* rrdset family
* rrdset title
* rrdset title more
* rrdset context
* rrdcalctemplate context and removal of context hash from rrdset
* strings statistics
* rrdset name
* rearranged members of rrdset
* eliminate rrdset name hash; rrdcalc chart converted to STRING
* rrdset id, eliminated rrdset hash
* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate
* rrdcalctemplate
* rrdvar
* eval_variable
* rrddimvar and rrdsetvar
* rrdhost hostname, os and tags
* fix master commits
* added thread cache; implemented string_dup without locks
* faster thread cache
* rrdset and rrddim now use dictionaries for indexing
* rrdhost now uses dictionary
* rrdfamily now uses DICTIONARY
* rrdvar using dictionary instead of AVL
* allocate the right size to rrdvar flag members
* rrdhost remaining char * members to STRING *
* better error handling on indexing
* strings now use a read/write lock to allow parallel searches to the index
* removed AVL support from dictionaries; implemented STRING with native Judy calls
* string releases should be negative
* only 31 bits are allowed for enum flags
* proper locking on strings
* string threading unittest and fixes
* fix lgtm finding
* fixed naming
* stream chart/dimension definitions at the beginning of a streaming session
* thread stack variable is undefined on thread cancel
* rrdcontext garbage collect per host on startup
* worker control in garbage collection
* relaxed deletion of rrdmetrics
* type checking on dictfe
* netdata chart to monitor rrdcontext triggers
* Group chart label updates
* rrdcontext better handling of collected rrdsets
* rrdpush incremental transmition of definitions should use as much buffer as possible
* require 1MB per chart
* empty the sender buffer before enabling metrics streaming
* fill up to 50% of buffer
* reset signaling metrics sending
* use the shared variable for status
* use separate host flag for enabling streaming of metrics
* make sure the flag is clear
* add logging for streaming
* add logging for streaming on buffer overflow
* circular_buffer proper sizing
* removed obsolete logs
* do not execute worker jobs if not necessary
* better messages about compression disabling
* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped
* monitor stream sender used buffer ratio
* Update exporting unit tests
* no need to compare label value with strcmp
* streaming send workers now monitor bandwidth
* workers now use strings
* streaming receiver monitors incoming bandwidth
* parser shift of worker ids
* minor fixes
* Group chart label updates
* Populate context with dimensions that have data
* Fix chart id
* better shift of parser worker ids
* fix for streaming compression
* properly count received bytes
* ensure LZ4 compression ring buffer does not wrap prematurely
* do not stream empty charts; do not process empty instances in rrdcontext
* need_to_send_chart_definition() does not need an rrdset lock any more
* rrdcontext objects are collected, after data have been written to the db
* better logging of RRDCONTEXT transitions
* always set all variables needed by the worker utilization charts
* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes
* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock
* STRING code re-organization for clarity
* thread_cache improvements; double numbers precision on worker threads
* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS
* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts
* Add index to speed up database context searches
* Removed last_updated optimization (was also buggy after latest merge with master)
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* type checking on dictionary return values
* first STRING implementation, used by DICTIONARY and RRDLABEL
* enable AVL compilation of STRING
* Initial functions to store context info
* Call simple test functions
* Add host_id when getting charts
* Allow host to be null and in this case it will process the localhost
* Simplify init
Do not use strdupz - link directly to sqlite result set
* Init the database during startup
* make it compile - no functionality yet
* intermediate commit
* intermidiate
* first interface to sql
* loading instances
* check if we need to update cloud
* comparison of rrdcontext on conflict
* merge context titles
* rrdcontext public interface; statistics on STRING; scratchpad on DICTIONARY
* dictionaries maintain version numbers; rrdcontext api
* cascading changes
* first operational cleanup
* string unittest
* proper cleanup of referenced dictionaries
* added rrdmetrics
* rrdmetric starting retention
* Add fields to context
Adjuct context creation and delete
* Memory cleanup
* Fix get context list
Fix memory double free in tests
Store context with two hosts
* calculated retention
* rrdcontext retention with collection
* Persist database and shutdown
* loading all from sql
* Get chart list and dimension list changes
* fully working attempt 1
* fully working attempt 2
* missing archived flag from log
* fixed archived / collected
* operational
* proper cleanup
* cleanup - implemented all interface functions - dictionary react callback triggers after the dictionary is unlocked
* track all reasons for changes
* proper tracking of reasons of changes
* fully working thread
* better versioning of contexts
* fix string indexing with AVL
* running version per context vs hub version; ifdef dbengine
* added option to disable rrdmetrics
* release old context when a chart changes context
* cleanup properly
* renamed config
* cleanup contexts; general cleanup;
* deletion inline with dequeue; lots of cleanup; child connected/disconnected
* ml should start after rrdcontext
* added missing NULL to ri->rrdset; rrdcontext flags are now only changed under a mutex lock
* fix buggy STRING under AVL
* Rework database initialization
Add migration logic to the context database
* fix data race conditions during context deletion
* added version hash algorithm
* fix string over AVL
* update aclk-schemas
* compile new ctx related protos
* add ctx stream message utils
* add context messages
* add dummy rx message handlers
* add the new topics
* add ctx capability
* add helper functions to send the new messages
* update cmake build to not fail
* update topic names
* handle rrdcontext_enabled
* add more functions
* fatal on OOM cases instead of return NULL
* silence unknown query type error
* fully working attempt 1
* fully working attempt 2
* allow compiling without ACLK
* added family to the context
* removed excess character in UUID
* smarter merging of titles and families
* Database migration code to add family
Add family to SQL_CHART_DATA and VERSIONED_CONTEXT_DATA
* add family to context message
* enable ctx in communication
* hardcoded enabled contexts
* Add hard code for CTX
* add update node collectors to json
* add context message log
* fix log about last_time_t
* fix collected flags for queued items
* prevent crash on charts cleanup
* fix bug in AVL indexing of dictionaries; make sure react callback of dictionaries has a reference counter, which is acquired while the dictionary is locked
* fixed dictionary unittest
* strict policy to cleanup and garbage collector
* fix db rotation and garbage collection timings
* remove deadlock
* proper garbage collection - a lot faster retention recalculation
* Added not NULL in database columns
Remove migration code for context -- we will ship with version 1 of the table schema
Added define for query in tests to detect localhost
* Use UUID_STR_LEN instead of GUID_LEN + 1
Use realistic timestamps when adding test data in the database
* Add NULL checks for passed parameters
* Log deleted context when compiled with NETDATA_INTERNAL_CHECKS
* Error checking for null host id
* add missing ContextsCheckpoint log convertor
* Fix spelling in VACCUM
* Hold additional information for host -- prepare to load archived hosts on startup
* Make sure claim id is valid
* is_get_claimed is actually get the current claim id
* Simplify ctx get chart list query
* remove env negotiation
* fix string unittest when there are some strings already in the index
* propagate live-retention flag upstream; cleanup all update reasons; updated instances logging; automated attaching started/stopped collecting flags;
* first implementation of /api/v1/contexts
* full contexts API; updated swagger
* disabled debugging; rrdcontext enabled by default
* final cleanup and renaming of global variables
* return current time on currently collected contexts, charts and dimensions
* added option "deepscan" to the API to have the server refresh the retention and recalculate the contexts on the fly
* fixed identation of yaml
* Add constrains to the host table
* host->node_id may not be available
* new capabilities
* lock the context while rendering json
* update aclk-schemas
* added permanent labels to all charts about plugin, module and family; added labels to all proc plugin modules
* always add the labels
* allow merging of families down to [x]
* dont show uuids by default, added option to enable them; response is now accepting after,before to show only data for a specific timeframe; deleted items are only shown when "deleted" is requested; hub version is now shown when "queue" is requested
* Use the localhost claim id
* Fix to handle host constrains better
* cgroups: add "k8s." prefix to chart context in k8s
* Improve sqlite metadata version migration check
* empty values set to "[none]"; fix labels unit test to reflect that
* Check if we reached the version we want first (address CODACY report re: Array index 'i' is used before limits check)
* Rewrite condition to address CODACY report (Redundant condition: t->filter_callback. '!A || (A && B)' is equivalent to '!A || B')
* Properly unlock context
* fixed memory leak on rrdcontexts - it was not freeing all dictionaries in rrdhost; added wait of up to 100ms on dictionary_destroy() to give time to dictionaries to release their items before destroying them
* fixed memory leak on rrdlabels not freed on rrdinstances
* fixed leak when dimensions and charts are redefined
* Mark entries for charts and dimensions as submitted to the cloud 3600 seconds after their creation
Mark entries for charts and dimensions as updated (confirmed by the cloud) 1800 seconds after their submission
* renamed struct string
* update cgroups alarms
* fixed codacy suggestions
* update dashboard info
* fix k8s_cgroup_10s_received_packets_storm alarm
* added filtering options to /api/v1/contexts and /api/v1/context
* fix eslint
* fix eslint
* Fix pointer binding for host / chart uuids
* Fix cgroups unit tests
* fixed non-retention updates not propagated upstream
* removed non-fatal fatals
* Remove context from 2 way string merge.
* Move string_2way_merge to dictionary.c
* Add 2-way string merge tests.
* split long lines
* fix indentation in netdata-swagger.yaml
* update netdata-swagger.json
* yamllint please
* remove the deleted flag when a context is collected
* fix yaml warning in swagger
* removed non-fatal fatals
* charts should now be able to switch contexts
* allow deletion of unused metrics, instances and contexts
* keep the queued flag
* cleanup old rrdinstance labels
* dont hide objects when there is no filter; mark objects as deleted when there are no sub-objects
* delete old instances once they changed context
* delete all instances and contexts that do not have sub-objects
* more precise transitions
* Load archived hosts on startup (part 1)
* update the queued time every time
* disable by default; dedup deleted dimensions after snapshot
* Load archived hosts on startup (part 2)
* delayed processing of events until charts are being collected
* remove dont-trigger flag when object is collected
* polish all triggers given the new dont_process flag
* Remove always true condition
Enums for readbility / create_host_callback only if ACLK is enabled (for now)
* Skip retention message if context streaming is enabled
Add messages in the access log if context streaming is enabled
* Check for node id being a UUID that can be parsed
Improve error check / reporting when loading archived hosts and creating ACLK sync threads
* collected, archived, deleted are now mutually exclusive
* Enable the "orphan" handling for now
Remove dead code
Fix memory leak on free host
* Queue charts and dimensions will be no-op if host is set to stream contexts
* removed unused parameter and made sure flags are set on rrdcontext insert
* make the rrdcontext thread abort mid-work when exiting
* Skip chart hash computation and storage if contexts streaming is enabled
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Timo <timotej@netdata.cloud>
Co-authored-by: ilyam8 <ilya@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
Co-authored-by: Vasilis Kalintiris <vasilis@netdata.cloud>
* squashed and rebased to master
* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h
* added unittest for UTF-8 multibyte sanitization
* Fix unit test compilation
* Fix CMake build
* remove double sanitizer for opentsdb; cleanup sanitize_json_string()
* rename error_description to error_message to avoid conflict with json-c
* revert last and undef error_description from json-c
* more unittests; attempt to fix protobuf map issue
* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer
* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted
* better sorted dictionary checking
* proper unittesting for sorted dictionaries
* call dictionary deletion callback when destroying the dictionary
* remove obsolete variable
* Fix exporting unit tests
* Fix k8s label parsing test
* workaround for cmocka and strdupz()
* Bypass cmocka memory allocation check
* Revert "Bypass cmocka memory allocation check"
This reverts commit 4c49923839.
* Revert "workaround for cmocka and strdupz()"
This reverts commit 7bebee0480.
* Bypass cmocka memory allocation checks
* respect json formatting for chart labels
* cloud sends colons
* print the value only once
* allow parenthesis in values and spaces; make stream sender send quotes for values
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Switch messages to ACLK RES, ACLK REQ, ACLK STA instead of OG, IN and just AC
* Lookup hostname by node id
* Record hostname when receiving an ACK for a chart sequence
* Additional log_access info
* Adjust log message when receing health log request
* Remove redundant ACK log message
* Remove duplicate log message
* Remove duplicate sql statements
* Rearrange variable definition for clarity
* Make sure node is a valid UUID (check return code)
* Find the correct host netdata version from streaming info if not localhost
* Handle old netdata versions that do not supply information during the streaming connection
* Send unknown agent version if child is not connected
* Track anomaly rates with DBEngine.
This commit adds support for tracking anomaly rates with DBEngine. We
do so by creating a single chart with id "anomaly_detection.anomaly_rates" for
each trainable/predictable host, which is responsible for tracking the anomaly
rate of each dimension that we train/predict for that host.
The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly
rates charts. We use this flag to:
- Disable exposing the anomaly rates charts through the functionality
in backends/, exporting/ and streaming/.
- Skip generation of configuration options for the name, algorithm,
multiplier, divisor of each dimension in an anomaly rates chart.
- Skip the creation of health variables for anomaly rates dimensions.
- Skip the chart/dim queue of ACLK.
- Post-process the RRDR result of an anomaly rates chart, so that we can
return a sorted, trimmed number of anomalous dimensions.
In a child/parent configuration where both the child and the parent run
ML for the child, we want to be able to stream the rest of the ML-related
charts to the parent. To be able to do this without any chart name collisions,
the charts are now created on localhost and their IDs and titles have the node's
machine_guid and hostname as a suffix, respectively.
* Fix exporting_engine tests.
* Restore default ML configuration.
The reverted changes where meant for local testing only. This commit
restores the default values that we want to have when someone runs
anomaly detection on their node.
* Set context for anomaly_detection.* charts.
* Check for anomaly rates chart only with a valid pointer.
* Remove duplicate code.
* Use a more descriptive name for id/title pair variable
* Send ML feature information with UpdateNodeInfo.
We achieve this by adding the `ml_{capable,enabled}` fields in
`system_info`. When streaming, these fields allow a parent to understand if
the child has ML and if it runs ML for itself.
The UpdateNodeInfo includes this information about a child, plus a
boolean that is set to true when the parent runs ML for the child.
* Fix unit test and building with --disable-ml.
* Refactoring to use the new MachineLearningInfo message
* Update aclk-schemas repository to include latest ML info message.
* add some logging for ng arch to access.log
* change arrows to IN, OG, AC
* log also the params for aclk requests
* check for wc->host before using wc->host->hostname
* turn two messages to info
* reduce alert event logs
* used thread local variables
* Move retention code to the charts
* Log information about node registration and updates
* Prevent deadlock if aclk_database_enq_cmd locks for a node
* Improve message (indicate that it comes from alerts). This will be improved in a followup PR
* Disable parts that can't be used if the new cloud env is not available
* Set dimension FLAG if message has been queued
* Queue messages using the correct protocol enabled
* Cleanup unused functions
Rename functions that queue charts and dimensions
Improve the generic chart payload add function
Add a counter for pending charts/dimension payloads to avoid polling the db
Delay the retention update message until we are done with the updates
Fix full resync command to handle sequence_id = 0 correctly
Disable functions not needed when the new cloud env functionality is not compiled
* Add chart_payload count and retry count
Output information or error message if we fail to queue chart/dimension PUSH commands
Only try to queue commands if we have chart_payload_count>0
Remove the event loop shutdown opcode handle
* Improve detection of shutdown (check netdata_exit)
* Adjusting info messages
ACLK-NG supports both new and old cloud protocol. Protobuf and C++ compiler are required only for new cloud protocol.
There is no reason to skip building whole ACLK-NG when protobuf is missing.