netdata_netdata

mirror of https://github.com/netdata/netdata.git synced 2025-04-26 22:04:46 +00:00

Author	SHA1	Message	Date
vkalintiris	115d074a6c	Create a top-level directory to contain source code. (#16896 ) * Move ML under src * Move spwan under src * Move cli/ under src/ * move registry/ under src/ * move streaming/ under src/ * Move claim under src. Update docs * Move database/ under src/ * Move libnetdata/ under src/ * Update references to libnetdata * Fix logsmanagement includes * Update generated script path.	2024-02-01 13:41:44 +02:00
Stelios Fragkakis	5e4055624e	Fix coverity issue (#16831 ) CID 414122: Resource leaks (RESOURCE_LEAK)	2024-01-24 10:01:49 +02:00
Stelios Fragkakis	952631b952	Change query label matching logic (#16827 ) * Match multi labels * Rework, add support for weights * Fix function return value * Cleanup function	2024-01-23 20:32:08 +02:00
Costa Tsaousis	f466b8aef5	DYNCFG: dynamically configured alerts (#16779 ) * cleanup alerts * fix references * fix references * fix references * load alerts once and apply them to each node * simplify health_create_alarm_entry() * Compile without warnings with compiler flags: -Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self * code re-organization and cleanup * generate patterns when applying prototypes; give unique dyncfg names to all alerts * eval expressions keep the source and the parsed_as as STRING pointers * renamed host to node in dyncfg ids * renamed host to node in dyncfg ids * add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level * working functionality * code re-organization: moved health event-loop to a new file, moved health globals to health.c * rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs * dyncfg alert prototypes initialization for alerts * health dyncfg split to separate file * cleanup not-needed code * normalize matches between parsing and json * also detect !* for disabled alerts * dyncfg capability disabled * Store alert config part1 * Add rrdlabels_common_count * wip health variables lookup without indexes * Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter * working variables with runtime lookup * working variables with runtime lookup * delete rrddimvar and rrdfamily index * remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts * added /api/v1/variable that resolves a variable the same way alerts do * remove rrdcalc from eval * remove debug code * remove duplicate assignment * Fix memory leak * all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts * hide all internal structures of EVAL * Enable -Wformat flag Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Adjust binding for calculation, warning, critical * Remove unused macro * Update config hash id * use the right info and summary in alerts log * use synchronous queries for alerts * Handle cases when config_hash_id is missing from health_log * remove deadlock from health worker * parsing to json payload for health alert prototypes * cleaner parsing and avoiding memory leaks in case of duplicate members in json * fix left-over rename of function * Keep original lookup field to send to the cloud Cleanup / rename function to store config Remove unused DEFINEs, functions * Use ac->lookup * link jobs to the host when the template is registered; do not accept running a function without a host * full dyncfg support for health alerts, except action TEST * working dyncfg additions, updates, removals * fixed missing source, wrong status updates * add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint * fix dyncfg unittest * rename functions * generalize the json-c parser macros and move them to libnetdata * report progress when enabling and disabling dyncfg templates * moved rrdcalc and rrdvar to health * update alarms * added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts * enable parsed json alerts; allow sending back accepted but disabled * added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output * remove the RRDHOST pointer from DYNCFG * Fix command field submitted to the cloud * do not send updates to creation requests, for DYNCFG jobs --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: ilyam8 <ilya@netdata.cloud>	2024-01-23 20:20:41 +02:00
vkalintiris	bead543ea5	Name storage engine variables consistently. (#16753 ) * Consistent naming of STORAGE_INSTANCE instances. Replace usages of `db_instance` and `instance` with `si`. * Rename array `storage_metrics_groups[tier]` to `smg[tier]` * Rename db_metric_handle to smh * Rename instances of `storage_engine_query_handle` to `seqh`. * Rename instances of STORAGE_ENGINE_BACKEND to `seb`. * Rename instances of STORAGE_COLLECT_HANDLE to `sch`.	2024-01-11 14:17:02 +02:00
Costa Tsaousis	dc7ca8644f	code cleanup (#16542 ) fixed minor code cleanup warnings	2023-12-12 18:12:43 +02:00
Costa Tsaousis	2175104d41	Faster parents (#16127 ) * cache ctx in collection handle * cache rd together with rda * do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension() * fix unit tests * do the absolutely minimum while updating timestamps, ensure validity during reading them * when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only. * remove the spinlock from mrg * remove the metric flags that are not used any more * mrg writers can be different threads * update first time when latest clean is also updated * cleanup * set hot page with a simple atomic operation * sender sets chart slot for every chart * work on senders without SLOT * enable SLOT capability * send slot at BEGIN when SLOT is enabled * fix slot generation and parsing * send slot while re-streaming * use the sender capabilities, not the receiver * cleanup * add slots support to all chart and dimension related plugin commands * fix condition * fix calculation * check sender capabilties * assign slots in constructors * we need the dimension slot at the DIMENSION keyword * more debug info in case of dimension mismatch * ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely * fix renumbering on child restart * reset rda caching when receiving a chart definition * optimize pluginsd_end_v2() * do not do zero sized allocations * trust the chart slot id of the child * cleanup charts on pluginsd thread exit * better cleanup * find the chart and put it in the slot, if it not already there * move slots array to host * initialize pluginsd slots properly * add slots to replay begin; do not cleanup slots that dont belong to a chart * cleanup on obsolete * cleanup slots on obsoletions * cleanup and renames about obsoletion * rewrite obsolation service code to remove race conditions * better service obsoletion log * added debugging * more debug * exposed flag now compares versions * removed debugging messages * respolve conflicts * fix replication check for unsent dimensions	2023-10-27 22:42:29 +03:00
Costa Tsaousis	9fd9823e07	journal: fix the 1 second latency in play mode (#16123 ) provide a relative_to_absolute function that does not touch the current realtime time	2023-10-04 20:54:16 +03:00
Costa Tsaousis	ce75313de0	systemd-journal plugin (#15363 )	2023-08-03 15:42:11 +03:00
vkalintiris	0e230a260e	Revert "Refactor RRD code. (#15423 )" (#15723 ) This reverts commit `440bd51e08`. dbengine was still being used for non-zero tiers even on non-dbengine modes.	2023-08-03 13:13:36 +03:00
Costa Tsaousis	6daedef25b	fix expiration dates for API responses (#15546 )	2023-07-26 16:10:31 +03:00
vkalintiris	440bd51e08	Refactor RRD code. (#15423 ) * Storage engine. * Host indexes to rrdb * Move globals to rrdb * Move storage_tiers_backfill to rrdb * default_rrd_update_every to rrdb * default_rrd_history_entries to rrdb * gap_when_lost_iterations_above to rrdb * rrdset_free_obsolete_time_s to rrdb * libuv_worker_threads to rrdb * ieee754_doubles to rrdb * rrdhost_free_orphan_time_s to rrdb * rrd_rwlock to rrdb * localhost to rrdb * rm extern from func decls * mv rrd macro under rrd.h * default_rrdeng_page_cache_mb to rrdb * default_rrdeng_extent_cache_mb to rrdb * db_engine_journal_check to rrdb * default_rrdeng_disk_quota_mb to rrdb * default_multidb_disk_quota_mb to rrdb * multidb_ctx to rrdb * page_type_size to rrdb * tier_page_size to rrdb * No storage_engine_id in rrdim functions * storage_engine_id is provided by st * Update to fix merge conflict. * Update field name * Remove unnecessary macros from rrd.h * Rm unused type decls * Rm duplicate func decls * make internal function static * Make the rest of public dbengine funcs accept a storage_instance. * No more rrdengine_instance :) * rm rrdset_debug from rrd.h * Use rrdb to access globals in ML and ACLK Missed due to not having the submodules in the worktree. * rm total_number * rm RRDVAR_TYPE_TOTAL * rm unused inline * Rm names from typedef'd enums * rm unused header include * Move include * Rm unused header include * s/rrdhost_find_or_create/rrdhost_get_or_create/g * s/find_host_by_node_id/rrdhost_find_by_node_id/ Also, remove duplicate definition in rrdcontext.c * rm macro used only once * rm macro used only once * Reduce rrd.h api by moving funcs into a collector specific utils header * Remove unused func * Move parser specific function out of rrd.h * return storage_number instead of void pointer * move code related to rrd initialization out of rrdhost.c * Remove tier_grouping from rrdim_tier Saves 8 * storage_tiers bytes per dimension. * Fix rebase * s/rrd_update_every/update_every/ * Mark functions as static and constify args * Add license notes and file to build systems. * Remove remaining non-log/config mentions of memory mode * Move rrdlabels api to separate file. Also, move localhost functions that loads labels outside of database/ and into daemon/ * Remove function decl in rrd.h * merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir * Do not expose internal function from rrd.h * Rm NETDATA_RRD_INTERNALS Only one function decl is covered. We have more database internal functions that we currently expose for no good reason. These will be placed in a separate internal header in follow up PRs. * Add license note * Include libnetdata.h instead of aral.h * Use rrdb to access localhost * Fix builds without dbengine * Add header to build system files * Add rrdlabels.h to build systems * Move func def from rrd.h to rrdhost.c * Fix macos build * Rm non-existing function * Rebase master * Define buffer length macro in ad_charts. * Fix FreeBSD builds. * Mark functions static * Rm func decls without definitions * Rebase master * Rebase master * Properly initialize value of storage tiers. * Fix build after rebase.	2023-07-26 15:30:49 +03:00
Stelios Fragkakis	b12edb1208	Use spinlock in host and chart (#15328 ) * Switch alarm log lock to spinlock * Switch the alerts lock in the chart structure to spinlock * Proper lock usage	2023-07-10 14:13:50 +03:00
thiagoftsm	e0f388c43f	Rename generic `error` function (#15296 )	2023-07-06 15:46:48 +00:00
Costa Tsaousis	43c749b07d	Obvious memory reductions (#15204 ) * remove rd->update_every * reduce amount of memory for RRDDIM * reorgnize rrddim->db entries * optimize rrdset and statsd * optimize dictionaries * RW_SPINLOCK for dictionaries * fix codeql warning * rw_spinlock improvements * remove obsolete assertion * fix crash on health_alarm_log_process() * use RW_SPINLOCK for AVL trees * add RW_SPINLOCK read/write trylock * pgc and mrg now use rw_spinlocks; cache line optimizations for mrg * thread tag of dbegnine init * append created datafile, lockless * make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use * thread cancelability in spinlocks; optimize thread cancelability management * introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files * use the last timestamp of each journal file for indexing * when the previous cannot be found, start from the beginning * add more stats to PDC to trace routing easier * rename spinlock functions * fix for spinlock renames * revert statsd socket statistics to size_t * turn fatal into internal_fatal() * show candidates always * show connected status and connection attempts	2023-06-19 23:19:36 +03:00
Costa Tsaousis	8bf58525b1	fix the units when returning percentage of a group (#15105 )	2023-05-26 20:33:53 +03:00
Costa Tsaousis	d395333d44	On data and weight queries now instances filter matches also instance_id@node_id (#15021 ) instances filter now matches also instance_id@node_id	2023-05-05 22:10:06 +03:00
Costa Tsaousis	204dd9ae27	Boost dbengine (#14832 ) * configure extent cache size * workers can now execute up to 10 jobs in a run, boosting query prep and extent reads * fix dispatched and executing counters * boost to the max * increase libuv worker threads * query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical * fix accounting of query prep * inlining of time-grouping functions, to speed up queries with billions of points * make switching based on a local const variable * print one pending contexts loading message per iteration * inlined store engine query API * inlined storage engine data collection api * inlined all storage engine query ops * eliminate and inline data collection ops * simplified query group-by * more error handling * optimized partial trimming of group-by queries * preparative work to support multiple passes of group-by * more preparative work to support multiple passes of group-by (accepts multiple group-by params) * unified query timings * unified query timings - weights endpoint * query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries * fix query memory accounting * added summary.dimension[].pri and sorted summary.dimensions based on priority and then name * limit max ACLK WEB response size to 30MB * the response type should be text/plain * more preparative work for multiple group-by passes * create functions for generating group by keys, ids and names * multiple group-by passes are now supported * parse group-by options array also with an index * implemented percentage-of-instance group by function * family is now merged in multi-node contexts * prevent uninitialized use	2023-04-07 21:25:01 +03:00
Costa Tsaousis	8a036f0b24	/api/v2/X part 7 (#14797 ) * /api/v2/weights, points key renamed to result * /api/v2/weights, add node ids in response * /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics * /api/v2/data expose view.dimensions.sts{} * /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses * /api/v2 nodes output now includes the duration of time spent per node * jsonwrap view object renames and cleanup * rework of the statistics returned by the query engine * swagger work * swagger work * more swagger work * updated swagger json * added the remaining of the /api/v2 endpoints to swagger * point.ar has been renamed point.arp * updated weights endpoint * fix compilation warnings	2023-03-28 15:23:03 +03:00
Costa Tsaousis	5eed0545d4	/api/v2/X part 5 (#14718 ) * query timestamps are now pre-determined and alignment on timestamps is guarranteed * turn internal_fatal() to internal_error() to investigate the issue * handle query when no data exist in the db * check for non NULL dict when running dictionary garbage collect * support API v2 requests via ACLK * add nodes detailed information to /api/v2/nodes * fixed keys and added dummy nodes for completeness * added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node * make sure replication does not double count charts that are already being replicated * expose min and max in sts structures * added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters * respect trimming in percentage calculation; updated swagger * api/v2/weights preparative work to support multi-node queries - still single node though * multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data * when passing the raw option, the query exposes the hidden dimensions * fix compilation issues on older systems * the query engine now calculates per dimension min, max, sum, count, anomaly count * use the macro to calculate storage point anomaly rate * weights endpoint exposing version hashes * weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate * query: expose RESET flag; do not add the same point multiple times to the aggregated point * weights: more compact output * weights requests can be interrupted * all /api/v2 requests can be interrupted and timeout * allow relative timestamps in weights * fix macos compilation warnings * Revert "fix macos compilation warnings" This reverts commit `8a1d24e41e`. * /api/v2/data group-by now works on dimension names, not ids * /api/v2/weights does not query metrics without retention and new output format * /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms	2023-03-21 21:53:47 +02:00
Costa Tsaousis	cf85c3b0e9	/api/v2/X improvements part 3 (#14665 ) * max web request size to 64KB * fix the request too big message * increase max request reading tries to 100 * support for bigger web requests * add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too * better implementation for partial data trimming * added group_by=selected to return only one dimension for all selected metrics * fix acceptance of group_by=selected * passing option "raw" disables partial data trimming * remove obsolete option "plan"; use "debug" * fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing * per row annotations * added time column to point annotations * disable caching for /api/v2/contexts responses * added api format json2 that returns an array for each points, having all the point values and annotations in them * work on swagger about /api/v2 * prevent infinite loop * cleanup and swagger work * allow negative simple pattern expressions to work as expected * do not lookup in the dictionary empty names * garbage collect dictionaries * make query_target allocate less aggressively; queries fill the remaining points with nulls * reusable query ops to save memory on huge queries * move parts of query plans into query ops to save query target memory * remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed	2023-03-10 12:41:14 +02:00
Costa Tsaousis	021e252fc5	/api/v2/contexts (#14592 ) * preparation for /api/v2/contexts * working /api/v2/contexts * add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages * minor fix * query targegt now accurately counts hosts, contexts, instances, dimensions, metrics * cleanup /api/v2/contexts * full text search with /api/v2/contexts * simple patterns now support the option to search ignoring case * full text search API with /api/v2/q * simple pattern execution optimization * do not show q when not given * full text search accounting * separated /api/v2/nodes from /api/v2/contexts * fix ssv queries for group_by * count query instances queried and failed per context and host * split rrdcontext.c to multiple files * add query totals * fix anomaly rate calculation; provide "ni" for indexing hosts * do not generate zero valued members * faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point * fix typo when printing dimensions totals * added option minify to remove spaces and newlines fron JSON output * send instance ids and names when they differ * do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe * fix for the previous + renames and code cleanup * when a dimension is filtered, include in the response all the other dimensions that are selectable * do not add nodes that do not have retention in the current window * move selection of dimensions to query_dimension_add(), instead of query_metric_add() * increase the pre-processing capacity of queries * generate instance fqdn ids and names only when they are needed * provide detailed statistics about tiers retention, queries, points, update_every * late allocation of query dimensions * cleanup * more cleanup * support for annotations per displayed point, RESET and PARTIAL * new type annotations * if a chart is not linked to contexts and it is collected, link it when it is collected * make ML run reentrant * make ML rrdr query synchronous * optimize replication memory allocation of replication_sort_entry * change units to percentage, when requesting a coefficinet of variation, or a percentage query * initialize replication before starting main threads * properly decrement no room requests counter * propagate the non-zero flag to group-by * the same by avoiding the extra loop * respect non-zero in all dimension arrays * remove dictionary garbage collection from dictionary_entries() and dictionary_version() * be more verbose when jv2 indexing is postponed * prevent infinite loop * use hidden dimensions even when dimensions pattern is unset * traverse hosts using dictionaries * fix dictionary unittests	2023-03-02 22:50:48 +02:00

22 commits