netdata_netdata

mirror of https://github.com/netdata/netdata.git synced 2025-04-16 18:37:50 +00:00

Author	SHA1	Message	Date
Costa Tsaousis	f466b8aef5	DYNCFG: dynamically configured alerts (#16779 ) * cleanup alerts * fix references * fix references * fix references * load alerts once and apply them to each node * simplify health_create_alarm_entry() * Compile without warnings with compiler flags: -Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self * code re-organization and cleanup * generate patterns when applying prototypes; give unique dyncfg names to all alerts * eval expressions keep the source and the parsed_as as STRING pointers * renamed host to node in dyncfg ids * renamed host to node in dyncfg ids * add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level * working functionality * code re-organization: moved health event-loop to a new file, moved health globals to health.c * rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs * dyncfg alert prototypes initialization for alerts * health dyncfg split to separate file * cleanup not-needed code * normalize matches between parsing and json * also detect !* for disabled alerts * dyncfg capability disabled * Store alert config part1 * Add rrdlabels_common_count * wip health variables lookup without indexes * Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter * working variables with runtime lookup * working variables with runtime lookup * delete rrddimvar and rrdfamily index * remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts * added /api/v1/variable that resolves a variable the same way alerts do * remove rrdcalc from eval * remove debug code * remove duplicate assignment * Fix memory leak * all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts * hide all internal structures of EVAL * Enable -Wformat flag Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Adjust binding for calculation, warning, critical * Remove unused macro * Update config hash id * use the right info and summary in alerts log * use synchronous queries for alerts * Handle cases when config_hash_id is missing from health_log * remove deadlock from health worker * parsing to json payload for health alert prototypes * cleaner parsing and avoiding memory leaks in case of duplicate members in json * fix left-over rename of function * Keep original lookup field to send to the cloud Cleanup / rename function to store config Remove unused DEFINEs, functions * Use ac->lookup * link jobs to the host when the template is registered; do not accept running a function without a host * full dyncfg support for health alerts, except action TEST * working dyncfg additions, updates, removals * fixed missing source, wrong status updates * add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint * fix dyncfg unittest * rename functions * generalize the json-c parser macros and move them to libnetdata * report progress when enabling and disabling dyncfg templates * moved rrdcalc and rrdvar to health * update alarms * added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts * enable parsed json alerts; allow sending back accepted but disabled * added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output * remove the RRDHOST pointer from DYNCFG * Fix command field submitted to the cloud * do not send updates to creation requests, for DYNCFG jobs --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: ilyam8 <ilya@netdata.cloud>	2024-01-23 20:20:41 +02:00
Stelios Fragkakis	1973e70b62	Use original summary for alert transition (#16793 ) Use original summary for alert Fetch transaction and global id for transitions safely	2024-01-15 20:31:23 +02:00
Costa Tsaousis	f2b250a1f5	dyncfg v2 (#16702 ) * split rrdfunctions streaming and progress * simplified internal inline functions API * split rrdfunctions inflight management * split rrd functions exporters * renames * base dyncfg structure * config pluginsd * intercept dyncfg function calls * loading and saving of dyncfg metadata and data * save metadata and payload to a single file; added code to update the plugins with jobs and saved configs * basic working unit test * added payload to functions execution * removed old dyncfg code that is not needed any more * more cleanup * cleanup sender for functions with payload * dyncfg functions are not exposed as functions * remaining work to avoid indexing the \0 terminating character in dictionary keys * added back old dyncfg plugins.d commands as noop, to allow plugins continue working * working api; working streaming; * updated plugins.d documentation * aclk and http api requests share the same header parsing logic * added source type internal * fixed crashes * added god mode for tests * fixes * fixed messages * save host machine guids to configs * cleaner manipulation of supported commands * the functions event loop for external plugins can now process dyncfg requests * unified internal and external plugins dyncfg API * Netdata serves schema requests from /etc/netdata/schema.d and /var/lib/netdata/conf.d/schema.d * cleanup and various fixes; fixed bug in previous dyncfg implementation on streaming that was sending the paylod in a way that allowed other streaming commands to be multiplexed * internals go to a separate header file * fix duplicate ACLK requests sent by aclk queue mechanism * use fstat instead of stat * working api * plugin actions renamed to create and delete; dyncfg files are removed only from user actions * prevent deadlock by using the react callback * fix for string_strndupz() * better dyncfg unittests * more tests at the unittests * properly detect dyncfg functions * hide config functions from the UI * tree response improvements * send the initial update with payload * determine tty using stdout, not stderr * changes to statuses, cleanup and the code to bring all business logic into interception * do not crash when the status is empty * functions now propagate the source of the requests to plugins * avoid warning about unused functions * in the count at items for attention, do not count the orphan entries * save source into dyncfg * make the list null terminated * fixed invalid comparison * prevent memory leak on duplicated headers; log x-forwarded-for * more unit tests * added dyncfg unittests into the default unittests * more unit tests and fixes * more unit tests and fixes * fix dictionary unittests * config functions require admin access	2024-01-11 16:56:45 +02:00
vkalintiris	bead543ea5	Name storage engine variables consistently. (#16753 ) * Consistent naming of STORAGE_INSTANCE instances. Replace usages of `db_instance` and `instance` with `si`. * Rename array `storage_metrics_groups[tier]` to `smg[tier]` * Rename db_metric_handle to smh * Rename instances of `storage_engine_query_handle` to `seqh`. * Rename instances of STORAGE_ENGINE_BACKEND to `seb`. * Rename instances of STORAGE_COLLECT_HANDLE to `sch`.	2024-01-11 14:17:02 +02:00
Costa Tsaousis	da32dd8be8	Queries Progress (#16574 ) * track the progress of queries * add query_progress in libnetdata Makefile.am * add acl, response size and response code to the tracking * define the required functions * fix the last commit * added /api/v2/progress?transaction=ID to report the progress of queries * added function to report netdata-queries * track hashtable additions * when resusing a transaction, maintain the counter * keep track of linked and indexing * added X-Forwarded-Host and X-Forwarded-For to logs. X-Forwarded-For is also added in progress tracking * report compact uuids to match logs; register the actual duration of the transaction * added rowOptions to function; now web_client keeps track if it tracks progress or not * add http request method to progress * add tags per function; /api/vX/functions is now not protected * compact the sanitization array * split pluginsd_parser into multiple files * cleanup keyword definitions * code cleanup * extracted rrd_collector to separate files * added http access level to functions * renamed access "all" to "any" * implemented optional protection on functions * add priority to functions, to allow the UI select the best function (lower priority) when the user has not selected a function * added progress report from the plugins to netdata and from children to parents - untested * added progress reporting in systemd-journal * query timeout is now handled by evloop for external plugins * propagate progress reports to children and plugins * fix codeql warning * adapt to cmake * minor changes * extend function timeout when progress is received; added streaming capability to propagate progress reports to parents and send progress requests to children * revert change in dictionary.h * add log when access level is invalid * update access level of functions * added logs when processing progress updates * log when the deferred response is too big * comment out sender progress to find the issue * added missing newline in streaming progress reports * propogate progress reports to functions * fix logs	2023-12-15 18:15:43 +02:00
Stelios Fragkakis	096d1b1b2b	Code cleanup (#16448 ) * Code cleanup * More cleanup * More cleanup * Use FILENAME_MAX * query fix	2023-12-01 15:45:59 +02:00
Costa Tsaousis	2175104d41	Faster parents (#16127 ) * cache ctx in collection handle * cache rd together with rda * do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension() * fix unit tests * do the absolutely minimum while updating timestamps, ensure validity during reading them * when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only. * remove the spinlock from mrg * remove the metric flags that are not used any more * mrg writers can be different threads * update first time when latest clean is also updated * cleanup * set hot page with a simple atomic operation * sender sets chart slot for every chart * work on senders without SLOT * enable SLOT capability * send slot at BEGIN when SLOT is enabled * fix slot generation and parsing * send slot while re-streaming * use the sender capabilities, not the receiver * cleanup * add slots support to all chart and dimension related plugin commands * fix condition * fix calculation * check sender capabilties * assign slots in constructors * we need the dimension slot at the DIMENSION keyword * more debug info in case of dimension mismatch * ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely * fix renumbering on child restart * reset rda caching when receiving a chart definition * optimize pluginsd_end_v2() * do not do zero sized allocations * trust the chart slot id of the child * cleanup charts on pluginsd thread exit * better cleanup * find the chart and put it in the slot, if it not already there * move slots array to host * initialize pluginsd slots properly * add slots to replay begin; do not cleanup slots that dont belong to a chart * cleanup on obsolete * cleanup slots on obsoletions * cleanup and renames about obsoletion * rewrite obsolation service code to remove race conditions * better service obsoletion log * added debugging * more debug * exposed flag now compares versions * removed debugging messages * respolve conflicts * fix replication check for unsent dimensions	2023-10-27 22:42:29 +03:00
Emmanuel Vasilakis	bdf83311c3	Add summary to /alerts (#16213 )	2023-10-16 17:09:41 +03:00
Costa Tsaousis	9fd9823e07	journal: fix the 1 second latency in play mode (#16123 ) provide a relative_to_absolute function that does not touch the current realtime time	2023-10-04 20:54:16 +03:00
Emmanuel Vasilakis	8c9492a476	Send alerts summary field to cloud (#16056 ) * new aclk schema * transmit summary to cloud and expose in v2/alerts * missing assign	2023-10-02 09:56:01 +03:00
Stelios Fragkakis	e0b36f2865	Switch to uint64_t to avoid overflow in 32bit systems (#16048 )	2023-09-26 18:45:52 +03:00
Stelios Fragkakis	24006ed5c1	Reduce label memory (#15255 )	2023-09-01 15:37:55 +03:00
Costa Tsaousis	41bd902426	Facets histograms (#15846 )	2023-08-21 11:20:18 +03:00
Costa Tsaousis	ce75313de0	systemd-journal plugin (#15363 )	2023-08-03 15:42:11 +03:00
Stelios Fragkakis	a833f6674f	Fix memory corruption (#15724 ) Delay free	2023-08-03 15:38:14 +03:00
vkalintiris	0e230a260e	Revert "Refactor RRD code. (#15423 )" (#15723 ) This reverts commit `440bd51e08`. dbengine was still being used for non-zero tiers even on non-dbengine modes.	2023-08-03 13:13:36 +03:00
Costa Tsaousis	72549b3a22	prefer titles, families, units and priorities from collected charts (#15614 )	2023-08-03 09:38:36 +03:00
vkalintiris	440bd51e08	Refactor RRD code. (#15423 ) * Storage engine. * Host indexes to rrdb * Move globals to rrdb * Move storage_tiers_backfill to rrdb * default_rrd_update_every to rrdb * default_rrd_history_entries to rrdb * gap_when_lost_iterations_above to rrdb * rrdset_free_obsolete_time_s to rrdb * libuv_worker_threads to rrdb * ieee754_doubles to rrdb * rrdhost_free_orphan_time_s to rrdb * rrd_rwlock to rrdb * localhost to rrdb * rm extern from func decls * mv rrd macro under rrd.h * default_rrdeng_page_cache_mb to rrdb * default_rrdeng_extent_cache_mb to rrdb * db_engine_journal_check to rrdb * default_rrdeng_disk_quota_mb to rrdb * default_multidb_disk_quota_mb to rrdb * multidb_ctx to rrdb * page_type_size to rrdb * tier_page_size to rrdb * No storage_engine_id in rrdim functions * storage_engine_id is provided by st * Update to fix merge conflict. * Update field name * Remove unnecessary macros from rrd.h * Rm unused type decls * Rm duplicate func decls * make internal function static * Make the rest of public dbengine funcs accept a storage_instance. * No more rrdengine_instance :) * rm rrdset_debug from rrd.h * Use rrdb to access globals in ML and ACLK Missed due to not having the submodules in the worktree. * rm total_number * rm RRDVAR_TYPE_TOTAL * rm unused inline * Rm names from typedef'd enums * rm unused header include * Move include * Rm unused header include * s/rrdhost_find_or_create/rrdhost_get_or_create/g * s/find_host_by_node_id/rrdhost_find_by_node_id/ Also, remove duplicate definition in rrdcontext.c * rm macro used only once * rm macro used only once * Reduce rrd.h api by moving funcs into a collector specific utils header * Remove unused func * Move parser specific function out of rrd.h * return storage_number instead of void pointer * move code related to rrd initialization out of rrdhost.c * Remove tier_grouping from rrdim_tier Saves 8 * storage_tiers bytes per dimension. * Fix rebase * s/rrd_update_every/update_every/ * Mark functions as static and constify args * Add license notes and file to build systems. * Remove remaining non-log/config mentions of memory mode * Move rrdlabels api to separate file. Also, move localhost functions that loads labels outside of database/ and into daemon/ * Remove function decl in rrd.h * merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir * Do not expose internal function from rrd.h * Rm NETDATA_RRD_INTERNALS Only one function decl is covered. We have more database internal functions that we currently expose for no good reason. These will be placed in a separate internal header in follow up PRs. * Add license note * Include libnetdata.h instead of aral.h * Use rrdb to access localhost * Fix builds without dbengine * Add header to build system files * Add rrdlabels.h to build systems * Move func def from rrd.h to rrdhost.c * Fix macos build * Rm non-existing function * Rebase master * Define buffer length macro in ad_charts. * Fix FreeBSD builds. * Mark functions static * Rm func decls without definitions * Rebase master * Rebase master * Properly initialize value of storage tiers. * Fix build after rebase.	2023-07-26 15:30:49 +03:00
Costa Tsaousis	7c2acb3b24	added missing fields to alerts instances (#15442 )	2023-07-18 18:41:37 +03:00
Costa Tsaousis	3cdedeed40	add chart id and name to alert instances and transitions (#15430 )	2023-07-18 15:37:07 +03:00
Costa Tsaousis	77076d8764	bearer improvements (#15342 )	2023-07-11 02:28:06 +03:00
Stelios Fragkakis	b12edb1208	Use spinlock in host and chart (#15328 ) * Switch alarm log lock to spinlock * Switch the alerts lock in the chart structure to spinlock * Proper lock usage	2023-07-10 14:13:50 +03:00
Costa Tsaousis	880c9fbc61	alerts_transitions outputs hostnames and items statistics (#15329 ) * alerts_transitions outputs hostnames and items statistics * return details about the items in the database * added comments to items list and made the whole of statsd available under debug	2023-07-09 17:02:25 +03:00
Costa Tsaousis	3f78777839	avoid memory allocations for alert transitions facets processing (#15318 )	2023-07-06 17:19:32 +03:00
Costa Tsaousis	4f5228e654	add add summary linking to alert instances (ati) when options=summary,values is requested (#15317 )	2023-07-06 17:18:32 +03:00
Costa Tsaousis	6526a34f86	fix alerts transitions sorting (#15315 )	2023-07-06 14:43:01 +03:00
Costa Tsaousis	c74bf56ee2	Code reorg and cleanup - enrichment of /api/v2 (#15294 ) * claim script now accepts the same params as the kickstart * rewrote buildinfo to unify all methods * added cloud unavailable in cloud status * added all exporters * renamed httpd to h2o * rename ENABLE_COMPRESSION to ENABLE_LZ4 * rename global variable * rename ENABLE_HTTPS to ENABLE_OPENSSL * fix coverity-scan for openssl * add lz4 to coverity-scan * added all plugins and most of the features * added all plugins and most of the features * generalize bitmap code so that we can have any size of bitmaps * cleanup * fix compilation without protobuf * fix compilation with others allocators * fix bitmap * comprehensive bitmaps unit test * bitmap as macros * added developer mode * added system info to build info * cloud available/unavailable * added /api/v2/info * added units and ni to transitions * when showing instances and transitions, show only the instances that have transitions * cleanup * add missing quotes * add anchor to transitions * added more to build info * calculate retention per tier and expose it to /api/v2/info * added currently collected metrics * do not show space and retention when no numbers are available * fix impossible overflow * Add function for transitions and execute callback * In case of error, reset and try next dictionary entry * Fix error message * simpler logic to maintain retention per tier * /api/v2/alert_transitions * Handle case of recipient null Convert after and before to usec * Add classification, type and component * working /api/v2/alert_transitions * Fix query to properly handle context and alert name * cleanup * Add search with transition * accept transition in /api/v2/alert_transitions * totaly dynamic facets * fixed debug info * restructured facets * cleanup; removal of options=transitions * updated alert entries flags * method to exec * Return also exec run timestamp Temp table cleanup only when we don't execute with a transition * cleanup obsolete anchor parameter * Add sql_get_alert_configuration function * added options=config to alert_transitions * added /api/v2/alert_config * preliminary work for /api/v2/claim * initialize variables; do not expose expected retention if no disk space info is available; do not report aclk as initializing when not claimed * fix claim session key filename * put a newline into the session key file * more progress on claiming * final /api/v2/claim endpoint * after claiming, refresh our state at the output * Fix query to fetch config * Remove debug log * add configuration objects * add configuration objects - fixed * respect the NETDATA_DISABLE_CLOUD env variable * NETDATA_DISABLE_CLOUD env variable sets the default, but the config sets the final value * use a new claimed_id on every claiming * regenerate random key on claiming and wait for online status * ignore write() return value when writing a newline * dont show cloud status disabled when claimed_id is missing * added ctx to alert instances * cleanup config and transitions from /api/v2/alerts * fix unused variable * in /api/v2/alert_config show 1 config without an array * show alert values conditionally, by appending options=values * When storing host info if the key value is empty, store unknown * added options=summary to control when the alerts summary is shown * increased http_api_v2 to version 5 * claming random key file is now not world readable * added local-listeners binary that detects all the listening ports, their IPs and their command lines --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>	2023-07-06 01:49:32 +03:00
Costa Tsaousis	5be9be7485	rewrite /api/v2/alerts (#15257 ) * rewrite /api/v2/alerts * implement searching for transition * Find transition id and issue callback * Fix parameters * call and transition filter * Search with transition as well * renames and cleanup * render flags * what if scenario for moving transitions at the top level * If transition is given, limit the query appropriately * Add alert transitions * Optimize find transition to use prepared query Drop temp table properly * enabled alert instances again * Order by when key * Order by global_id * Return last X transitions * updated field names * add ati to configurations and show all keys in debug mode * Code cleanup and optimizations * Drop temp table in case of error * Finalize temp table population statement to prevent memory leak * final changes --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>	2023-06-28 23:14:10 +03:00
Costa Tsaousis	0d61c11b5f	use gperf for the pluginsd/streaming parser hashtable (#15251 ) * use gperf for the pluginsd parser * simplify pluginsd_parser by removing void pointers to user * pluginsd_split_words() with inlined pluginsd_space() * quoted_string_splitter() now uses a map instead of a function for determining spaces * add stress test for pluginsd parser * optimized BITMAP256 * optimized rrdpush receiver reception * optimized rrdpush sender compression * renames and cleanup * remove wrong negation * unify handshake and disconnection reasons * use parser_find_keyword * register job names only for the current repertoire	2023-06-26 14:00:59 +03:00
Stelios Fragkakis	f3efdba1a0	New alerts endpoint (#15232 ) * alerts / alerts_log v2 * Add global_id to ae Populate entries with global id * Remove transition id from template Change history to instances * Link ae to rc in all cases Code cleanup	2023-06-22 01:16:57 +03:00
Costa Tsaousis	c980f48dda	/api/v2 improvements (#15227 ) * readers should be able to recursively acquire the lock, even when there is a writer waiting * added health section into nodes * uniformity of nodes * nodes instances should not return node info; http_api_v2 capability should be version 4 everywhere * added /api/v2/versions * added /api/v2/functions * /api/v2/version should be neat	2023-06-21 22:31:58 +03:00
Costa Tsaousis	a8da697819	Fix /api/v2/contexts,nodes,nodes_instances,q before match (#15223 ) * readers should be able to recursively acquire the lock, even when there is a writer waiting * in /api/v2/contexts/nodes/nodes_instances/q calls, when the context is collected, before should be matched against now, not the latest cached retention	2023-06-20 21:55:08 +03:00
Costa Tsaousis	43c749b07d	Obvious memory reductions (#15204 ) * remove rd->update_every * reduce amount of memory for RRDDIM * reorgnize rrddim->db entries * optimize rrdset and statsd * optimize dictionaries * RW_SPINLOCK for dictionaries * fix codeql warning * rw_spinlock improvements * remove obsolete assertion * fix crash on health_alarm_log_process() * use RW_SPINLOCK for AVL trees * add RW_SPINLOCK read/write trylock * pgc and mrg now use rw_spinlocks; cache line optimizations for mrg * thread tag of dbegnine init * append created datafile, lockless * make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use * thread cancelability in spinlocks; optimize thread cancelability management * introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files * use the last timestamp of each journal file for indexing * when the previous cannot be found, start from the beginning * add more stats to PDC to trace routing easier * rename spinlock functions * fix for spinlock renames * revert statsd socket statistics to size_t * turn fatal into internal_fatal() * show candidates always * show connected status and connection attempts	2023-06-19 23:19:36 +03:00
Costa Tsaousis	0b4f820e9d	/api/v2/nodes and streaming function (#15168 ) * dummy streaming function * expose global functions upstream * separate function for pushing global functions * add missing conditions * allow streaming function to run async * started internal API for functions * cache host retention and expose it to /api/v2/nodes * internal API for function table fields; more progress on streaming status * abstracted and unified rrdhost status * port old coverity warning fix - although it is not needed * add ML information to rrdhost status * add ML capability to streaming to signal the transmission of ML information; added ML information to host status * protect host->receiver * count metrics and instances per host * exposed all inbound and outbound streaming * fix for ML status and dependency of DATA_WITH_ML to INTERPOLATED, not IEEE754 * update ML dummy * added all fields * added streaming group by and cleaned up accepted values by cloud * removed type * Revert "removed type" This reverts commit `faae4177e6`. * added context to db summary * new /api/v2/nodes schema * added ML type * change default function charts * log to trace new capa * add more debug * removed debugging code * retry on receive interrupted read; respect sender reconnect delay in all cases * set disconnected host flag and manipulate localhost child count atomically, inside set/clear receiver * fix infinite loop * send_to_plugin() now has a spinlock to ensure that only 1 thread is writing to the plugin/child at the same time * global cloud_status() call * cloud should be a section, since it will contain error information * put cloud capabilities into cloud * aclk status in /api/v2 agents sections * keep aclk_connection_counter * updates on /api/v2/nodes * final /api/v2/nodes and addition of /api/v2/nodes_instances * parametrize all /api/v2/xxx output to control which info is outputed per endpoint * always accept nodes selector * st needs to be per instance, not per node * fix merging of contexts; fix cups plugin priorities * add after and before parameters to /api/v2/contexts/nodes/nodes_instances/q * give each libuv worker a unique id * aclk http_api_v2 version 4	2023-06-19 20:52:35 +03:00
Costa Tsaousis	80d83b7bd1	api v2 nodes for streaming statuses (#15162 ) * api v2 nodes for streaming statuses * remove test * move parts of the output * in api/v2/data return 5 values per point when aggregation=percentage and raw option is given; return final values when aggregation=percentage is not the final grouping	2023-06-08 16:33:22 +03:00
Costa Tsaousis	66c8546019	Re-write of SSL support in Netdata; restoration of SIGCHLD; detection of stale plugins; streaming improvements (#15113 ) * add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect * make each streaming destination respect its SSL settings * to not send SSL traffic over non-SSL connection * keep track of outgoing streaming connection attempts * retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ * Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ" This reverts commit `14c858677c`. * cleanup SSL connections properly * initialize SSL in rpt before takeover * sender should free SSL when talking to a non-SSL destination * do not shutdown SSL when receiver exits * restore operation of SIGCHLD when the reaper is not enabled * create an fgets function that checks for data and times out * work on error handling of plugins exiting * remove newlines from logs * global call to waitid(), caching the result for netdata_pclose() to process * receiver tid * parser timeouts in 2 minutes instead of 10 * fix crash when UUID is NULL in SQLite * abstract sqlite3 parsing for uuid and text * write proper ssl errors on read and write * fix for SSL_ERROR_WANT_RETRY_VERIFY * SSL WANT per function * unified SSL error logging * fix compilation warning * additional logging about parser cleanup * streaming parser should call the pluginsd parser cleanup * SSL error handling work * SSL initialization unification * check for pending data when receiving SSL response with timeout * macro to check if an SSL connection has been established * remove SSL_pending() * check for SSL macros * use SSL_peek() to find if there is a response * SSL renames * more SSL renames & cleanup * rrdpush ssl connection function * abstract all SSL functions into security.c * keep track of SSL connections and always attempt to use SSL read/write when on SSL connection * signal openssl to skip certificate validation when configured to do so * better SSL error handling and logging * SSL code cleanup * SSL retry on SSL_connect and SSL_accept * SSL provide default return value for old compilers * SSL read/write functions emulate system read/write functions * fix receive/send timeout and switch from SSL_peek() to SSL_pending() * remove SSL_pending() * removed sender auto-retry and debug info for initial recevier response * ssl skip certificate verification config for web server * ssl errors log ip and port of the peer * keep ssl with web_client for its whole lifetime * thread safe socket peers to text * use error_limit() for common ssl errors * cleanup * more cleanup * coverity fixes * ssl error logs include both local and remote ip/port info * remove obsolete code	2023-06-07 21:10:27 +03:00
Costa Tsaousis	d395333d44	On data and weight queries now instances filter matches also instance_id@node_id (#15021 ) instances filter now matches also instance_id@node_id	2023-05-05 22:10:06 +03:00
Costa Tsaousis	204dd9ae27	Boost dbengine (#14832 ) * configure extent cache size * workers can now execute up to 10 jobs in a run, boosting query prep and extent reads * fix dispatched and executing counters * boost to the max * increase libuv worker threads * query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical * fix accounting of query prep * inlining of time-grouping functions, to speed up queries with billions of points * make switching based on a local const variable * print one pending contexts loading message per iteration * inlined store engine query API * inlined storage engine data collection api * inlined all storage engine query ops * eliminate and inline data collection ops * simplified query group-by * more error handling * optimized partial trimming of group-by queries * preparative work to support multiple passes of group-by * more preparative work to support multiple passes of group-by (accepts multiple group-by params) * unified query timings * unified query timings - weights endpoint * query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries * fix query memory accounting * added summary.dimension[].pri and sorted summary.dimensions based on priority and then name * limit max ACLK WEB response size to 30MB * the response type should be text/plain * more preparative work for multiple group-by passes * create functions for generating group by keys, ids and names * multiple group-by passes are now supported * parse group-by options array also with an index * implemented percentage-of-instance group by function * family is now merged in multi-node contexts * prevent uninitialized use	2023-04-07 21:25:01 +03:00
Timotej S	c09dbb224a	minor - add capability signifying this agent can speak apiv2 (#14817 ) * capa apiv2 * build new cancellation proto * commited on wrong branch :D Revert "build new cancellation proto" This reverts commit `8290422de4`. * use common source of truth for capas in apiv2 * fix for possible races	2023-03-31 01:11:53 +03:00
Costa Tsaousis	8a036f0b24	/api/v2/X part 7 (#14797 ) * /api/v2/weights, points key renamed to result * /api/v2/weights, add node ids in response * /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics * /api/v2/data expose view.dimensions.sts{} * /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses * /api/v2 nodes output now includes the duration of time spent per node * jsonwrap view object renames and cleanup * rework of the statistics returned by the query engine * swagger work * swagger work * more swagger work * updated swagger json * added the remaining of the /api/v2 endpoints to swagger * point.ar has been renamed point.arp * updated weights endpoint * fix compilation warnings	2023-03-28 15:23:03 +03:00
Costa Tsaousis	5eed0545d4	/api/v2/X part 5 (#14718 ) * query timestamps are now pre-determined and alignment on timestamps is guarranteed * turn internal_fatal() to internal_error() to investigate the issue * handle query when no data exist in the db * check for non NULL dict when running dictionary garbage collect * support API v2 requests via ACLK * add nodes detailed information to /api/v2/nodes * fixed keys and added dummy nodes for completeness * added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node * make sure replication does not double count charts that are already being replicated * expose min and max in sts structures * added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters * respect trimming in percentage calculation; updated swagger * api/v2/weights preparative work to support multi-node queries - still single node though * multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data * when passing the raw option, the query exposes the hidden dimensions * fix compilation issues on older systems * the query engine now calculates per dimension min, max, sum, count, anomaly count * use the macro to calculate storage point anomaly rate * weights endpoint exposing version hashes * weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate * query: expose RESET flag; do not add the same point multiple times to the aggregated point * weights: more compact output * weights requests can be interrupted * all /api/v2 requests can be interrupted and timeout * allow relative timestamps in weights * fix macos compilation warnings * Revert "fix macos compilation warnings" This reverts commit `8a1d24e41e`. * /api/v2/data group-by now works on dimension names, not ids * /api/v2/weights does not query metrics without retention and new output format * /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms	2023-03-21 21:53:47 +02:00
Costa Tsaousis	021e252fc5	/api/v2/contexts (#14592 ) * preparation for /api/v2/contexts * working /api/v2/contexts * add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages * minor fix * query targegt now accurately counts hosts, contexts, instances, dimensions, metrics * cleanup /api/v2/contexts * full text search with /api/v2/contexts * simple patterns now support the option to search ignoring case * full text search API with /api/v2/q * simple pattern execution optimization * do not show q when not given * full text search accounting * separated /api/v2/nodes from /api/v2/contexts * fix ssv queries for group_by * count query instances queried and failed per context and host * split rrdcontext.c to multiple files * add query totals * fix anomaly rate calculation; provide "ni" for indexing hosts * do not generate zero valued members * faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point * fix typo when printing dimensions totals * added option minify to remove spaces and newlines fron JSON output * send instance ids and names when they differ * do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe * fix for the previous + renames and code cleanup * when a dimension is filtered, include in the response all the other dimensions that are selectable * do not add nodes that do not have retention in the current window * move selection of dimensions to query_dimension_add(), instead of query_metric_add() * increase the pre-processing capacity of queries * generate instance fqdn ids and names only when they are needed * provide detailed statistics about tiers retention, queries, points, update_every * late allocation of query dimensions * cleanup * more cleanup * support for annotations per displayed point, RESET and PARTIAL * new type annotations * if a chart is not linked to contexts and it is collected, link it when it is collected * make ML run reentrant * make ML rrdr query synchronous * optimize replication memory allocation of replication_sort_entry * change units to percentage, when requesting a coefficinet of variation, or a percentage query * initialize replication before starting main threads * properly decrement no room requests counter * propagate the non-zero flag to group-by * the same by avoiding the extra loop * respect non-zero in all dimension arrays * remove dictionary garbage collection from dictionary_entries() and dictionary_version() * be more verbose when jv2 indexing is postponed * prevent infinite loop * use hidden dimensions even when dimensions pattern is unset * traverse hosts using dictionaries * fix dictionary unittests	2023-03-02 22:50:48 +02:00

42 commits