* cleanup alerts
* fix references
* fix references
* fix references
* load alerts once and apply them to each node
* simplify health_create_alarm_entry()
* Compile without warnings with compiler flags:
-Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self
* code re-organization and cleanup
* generate patterns when applying prototypes; give unique dyncfg names to all alerts
* eval expressions keep the source and the parsed_as as STRING pointers
* renamed host to node in dyncfg ids
* renamed host to node in dyncfg ids
* add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level
* working functionality
* code re-organization: moved health event-loop to a new file, moved health globals to health.c
* rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs
* dyncfg alert prototypes initialization for alerts
* health dyncfg split to separate file
* cleanup not-needed code
* normalize matches between parsing and json
* also detect !* for disabled alerts
* dyncfg capability disabled
* Store alert config part1
* Add rrdlabels_common_count
* wip health variables lookup without indexes
* Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter
* working variables with runtime lookup
* working variables with runtime lookup
* delete rrddimvar and rrdfamily index
* remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts
* added /api/v1/variable that resolves a variable the same way alerts do
* remove rrdcalc from eval
* remove debug code
* remove duplicate assignment
* Fix memory leak
* all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts
* hide all internal structures of EVAL
* Enable -Wformat flag
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Adjust binding for calculation, warning, critical
* Remove unused macro
* Update config hash id
* use the right info and summary in alerts log
* use synchronous queries for alerts
* Handle cases when config_hash_id is missing from health_log
* remove deadlock from health worker
* parsing to json payload for health alert prototypes
* cleaner parsing and avoiding memory leaks in case of duplicate members in json
* fix left-over rename of function
* Keep original lookup field to send to the cloud
Cleanup / rename function to store config
Remove unused DEFINEs, functions
* Use ac->lookup
* link jobs to the host when the template is registered; do not accept running a function without a host
* full dyncfg support for health alerts, except action TEST
* working dyncfg additions, updates, removals
* fixed missing source, wrong status updates
* add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint
* fix dyncfg unittest
* rename functions
* generalize the json-c parser macros and move them to libnetdata
* report progress when enabling and disabling dyncfg templates
* moved rrdcalc and rrdvar to health
* update alarms
* added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts
* enable parsed json alerts; allow sending back accepted but disabled
* added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output
* remove the RRDHOST pointer from DYNCFG
* Fix command field submitted to the cloud
* do not send updates to creation requests, for DYNCFG jobs
---------
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: ilyam8 <ilya@netdata.cloud>
* Consistent naming of STORAGE_INSTANCE instances.
Replace usages of `db_instance` and `instance` with
`si`.
* Rename array `storage_metrics_groups[tier]` to `smg[tier]`
* Rename db_metric_handle to smh
* Rename instances of `storage_engine_query_handle` to `seqh`.
* Rename instances of STORAGE_ENGINE_BACKEND to `seb`.
* Rename instances of STORAGE_COLLECT_HANDLE to `sch`.
* cache ctx in collection handle
* cache rd together with rda
* do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension()
* fix unit tests
* do the absolutely minimum while updating timestamps, ensure validity during reading them
* when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only.
* remove the spinlock from mrg
* remove the metric flags that are not used any more
* mrg writers can be different threads
* update first time when latest clean is also updated
* cleanup
* set hot page with a simple atomic operation
* sender sets chart slot for every chart
* work on senders without SLOT
* enable SLOT capability
* send slot at BEGIN when SLOT is enabled
* fix slot generation and parsing
* send slot while re-streaming
* use the sender capabilities, not the receiver
* cleanup
* add slots support to all chart and dimension related plugin commands
* fix condition
* fix calculation
* check sender capabilties
* assign slots in constructors
* we need the dimension slot at the DIMENSION keyword
* more debug info in case of dimension mismatch
* ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely
* fix renumbering on child restart
* reset rda caching when receiving a chart definition
* optimize pluginsd_end_v2()
* do not do zero sized allocations
* trust the chart slot id of the child
* cleanup charts on pluginsd thread exit
* better cleanup
* find the chart and put it in the slot, if it not already there
* move slots array to host
* initialize pluginsd slots properly
* add slots to replay begin; do not cleanup slots that dont belong to a chart
* cleanup on obsolete
* cleanup slots on obsoletions
* cleanup and renames about obsoletion
* rewrite obsolation service code to remove race conditions
* better service obsoletion log
* added debugging
* more debug
* exposed flag now compares versions
* removed debugging messages
* respolve conflicts
* fix replication check for unsent dimensions
* Storage engine.
* Host indexes to rrdb
* Move globals to rrdb
* Move storage_tiers_backfill to rrdb
* default_rrd_update_every to rrdb
* default_rrd_history_entries to rrdb
* gap_when_lost_iterations_above to rrdb
* rrdset_free_obsolete_time_s to rrdb
* libuv_worker_threads to rrdb
* ieee754_doubles to rrdb
* rrdhost_free_orphan_time_s to rrdb
* rrd_rwlock to rrdb
* localhost to rrdb
* rm extern from func decls
* mv rrd macro under rrd.h
* default_rrdeng_page_cache_mb to rrdb
* default_rrdeng_extent_cache_mb to rrdb
* db_engine_journal_check to rrdb
* default_rrdeng_disk_quota_mb to rrdb
* default_multidb_disk_quota_mb to rrdb
* multidb_ctx to rrdb
* page_type_size to rrdb
* tier_page_size to rrdb
* No storage_engine_id in rrdim functions
* storage_engine_id is provided by st
* Update to fix merge conflict.
* Update field name
* Remove unnecessary macros from rrd.h
* Rm unused type decls
* Rm duplicate func decls
* make internal function static
* Make the rest of public dbengine funcs accept a storage_instance.
* No more rrdengine_instance :)
* rm rrdset_debug from rrd.h
* Use rrdb to access globals in ML and ACLK
Missed due to not having the submodules in the
worktree.
* rm total_number
* rm RRDVAR_TYPE_TOTAL
* rm unused inline
* Rm names from typedef'd enums
* rm unused header include
* Move include
* Rm unused header include
* s/rrdhost_find_or_create/rrdhost_get_or_create/g
* s/find_host_by_node_id/rrdhost_find_by_node_id/
Also, remove duplicate definition in rrdcontext.c
* rm macro used only once
* rm macro used only once
* Reduce rrd.h api by moving funcs into a collector specific utils header
* Remove unused func
* Move parser specific function out of rrd.h
* return storage_number instead of void pointer
* move code related to rrd initialization out of rrdhost.c
* Remove tier_grouping from rrdim_tier
Saves 8 * storage_tiers bytes per dimension.
* Fix rebase
* s/rrd_update_every/update_every/
* Mark functions as static and constify args
* Add license notes and file to build systems.
* Remove remaining non-log/config mentions of memory mode
* Move rrdlabels api to separate file.
Also, move localhost functions that loads
labels outside of database/ and into daemon/
* Remove function decl in rrd.h
* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir
* Do not expose internal function from rrd.h
* Rm NETDATA_RRD_INTERNALS
Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.
* Add license note
* Include libnetdata.h instead of aral.h
* Use rrdb to access localhost
* Fix builds without dbengine
* Add header to build system files
* Add rrdlabels.h to build systems
* Move func def from rrd.h to rrdhost.c
* Fix macos build
* Rm non-existing function
* Rebase master
* Define buffer length macro in ad_charts.
* Fix FreeBSD builds.
* Mark functions static
* Rm func decls without definitions
* Rebase master
* Rebase master
* Properly initialize value of storage tiers.
* Fix build after rebase.
* remove rd->update_every
* reduce amount of memory for RRDDIM
* reorgnize rrddim->db entries
* optimize rrdset and statsd
* optimize dictionaries
* RW_SPINLOCK for dictionaries
* fix codeql warning
* rw_spinlock improvements
* remove obsolete assertion
* fix crash on health_alarm_log_process()
* use RW_SPINLOCK for AVL trees
* add RW_SPINLOCK read/write trylock
* pgc and mrg now use rw_spinlocks; cache line optimizations for mrg
* thread tag of dbegnine init
* append created datafile, lockless
* make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use
* thread cancelability in spinlocks; optimize thread cancelability management
* introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files
* use the last timestamp of each journal file for indexing
* when the previous cannot be found, start from the beginning
* add more stats to PDC to trace routing easier
* rename spinlock functions
* fix for spinlock renames
* revert statsd socket statistics to size_t
* turn fatal into internal_fatal()
* show candidates always
* show connected status and connection attempts
* configure extent cache size
* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads
* fix dispatched and executing counters
* boost to the max
* increase libuv worker threads
* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical
* fix accounting of query prep
* inlining of time-grouping functions, to speed up queries with billions of points
* make switching based on a local const variable
* print one pending contexts loading message per iteration
* inlined store engine query API
* inlined storage engine data collection api
* inlined all storage engine query ops
* eliminate and inline data collection ops
* simplified query group-by
* more error handling
* optimized partial trimming of group-by queries
* preparative work to support multiple passes of group-by
* more preparative work to support multiple passes of group-by (accepts multiple group-by params)
* unified query timings
* unified query timings - weights endpoint
* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries
* fix query memory accounting
* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name
* limit max ACLK WEB response size to 30MB
* the response type should be text/plain
* more preparative work for multiple group-by passes
* create functions for generating group by keys, ids and names
* multiple group-by passes are now supported
* parse group-by options array also with an index
* implemented percentage-of-instance group by function
* family is now merged in multi-node contexts
* prevent uninitialized use
* /api/v2/weights, points key renamed to result
* /api/v2/weights, add node ids in response
* /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics
* /api/v2/data expose view.dimensions.sts{}
* /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses
* /api/v2 nodes output now includes the duration of time spent per node
* jsonwrap view object renames and cleanup
* rework of the statistics returned by the query engine
* swagger work
* swagger work
* more swagger work
* updated swagger json
* added the remaining of the /api/v2 endpoints to swagger
* point.ar has been renamed point.arp
* updated weights endpoint
* fix compilation warnings
* query timestamps are now pre-determined and alignment on timestamps is guarranteed
* turn internal_fatal() to internal_error() to investigate the issue
* handle query when no data exist in the db
* check for non NULL dict when running dictionary garbage collect
* support API v2 requests via ACLK
* add nodes detailed information to /api/v2/nodes
* fixed keys and added dummy nodes for completeness
* added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node
* make sure replication does not double count charts that are already being replicated
* expose min and max in sts structures
* added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters
* respect trimming in percentage calculation; updated swagger
* api/v2/weights preparative work to support multi-node queries - still single node though
* multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data
* when passing the raw option, the query exposes the hidden dimensions
* fix compilation issues on older systems
* the query engine now calculates per dimension min, max, sum, count, anomaly count
* use the macro to calculate storage point anomaly rate
* weights endpoint exposing version hashes
* weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate
* query: expose RESET flag; do not add the same point multiple times to the aggregated point
* weights: more compact output
* weights requests can be interrupted
* all /api/v2 requests can be interrupted and timeout
* allow relative timestamps in weights
* fix macos compilation warnings
* Revert "fix macos compilation warnings"
This reverts commit 8a1d24e41e.
* /api/v2/data group-by now works on dimension names, not ids
* /api/v2/weights does not query metrics without retention and new output format
* /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
* max web request size to 64KB
* fix the request too big message
* increase max request reading tries to 100
* support for bigger web requests
* add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too
* better implementation for partial data trimming
* added group_by=selected to return only one dimension for all selected metrics
* fix acceptance of group_by=selected
* passing option "raw" disables partial data trimming
* remove obsolete option "plan"; use "debug"
* fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing
* per row annotations
* added time column to point annotations
* disable caching for /api/v2/contexts responses
* added api format json2 that returns an array for each points, having all the point values and annotations in them
* work on swagger about /api/v2
* prevent infinite loop
* cleanup and swagger work
* allow negative simple pattern expressions to work as expected
* do not lookup in the dictionary empty names
* garbage collect dictionaries
* make query_target allocate less aggressively; queries fill the remaining points with nulls
* reusable query ops to save memory on huge queries
* move parts of query plans into query ops to save query target memory
* remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed
* preparation for /api/v2/contexts
* working /api/v2/contexts
* add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages
* minor fix
* query targegt now accurately counts hosts, contexts, instances, dimensions, metrics
* cleanup /api/v2/contexts
* full text search with /api/v2/contexts
* simple patterns now support the option to search ignoring case
* full text search API with /api/v2/q
* simple pattern execution optimization
* do not show q when not given
* full text search accounting
* separated /api/v2/nodes from /api/v2/contexts
* fix ssv queries for group_by
* count query instances queried and failed per context and host
* split rrdcontext.c to multiple files
* add query totals
* fix anomaly rate calculation; provide "ni" for indexing hosts
* do not generate zero valued members
* faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point
* fix typo when printing dimensions totals
* added option minify to remove spaces and newlines fron JSON output
* send instance ids and names when they differ
* do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe
* fix for the previous + renames and code cleanup
* when a dimension is filtered, include in the response all the other dimensions that are selectable
* do not add nodes that do not have retention in the current window
* move selection of dimensions to query_dimension_add(), instead of query_metric_add()
* increase the pre-processing capacity of queries
* generate instance fqdn ids and names only when they are needed
* provide detailed statistics about tiers retention, queries, points, update_every
* late allocation of query dimensions
* cleanup
* more cleanup
* support for annotations per displayed point, RESET and PARTIAL
* new type annotations
* if a chart is not linked to contexts and it is collected, link it when it is collected
* make ML run reentrant
* make ML rrdr query synchronous
* optimize replication memory allocation of replication_sort_entry
* change units to percentage, when requesting a coefficinet of variation, or a percentage query
* initialize replication before starting main threads
* properly decrement no room requests counter
* propagate the non-zero flag to group-by
* the same by avoiding the extra loop
* respect non-zero in all dimension arrays
* remove dictionary garbage collection from dictionary_entries() and dictionary_version()
* be more verbose when jv2 indexing is postponed
* prevent infinite loop
* use hidden dimensions even when dimensions pattern is unset
* traverse hosts using dictionaries
* fix dictionary unittests