0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-26 22:04:46 +00:00
Commit graph

22 commits

Author SHA1 Message Date
vkalintiris
115d074a6c
Create a top-level directory to contain source code. ()
* Move ML under src

* Move spwan under src

* Move cli/ under src/

* move registry/ under src/

* move streaming/ under src/

* Move claim under src. Update docs

* Move database/ under src/

* Move libnetdata/ under src/

* Update references to libnetdata

* Fix logsmanagement includes

* Update generated script path.
2024-02-01 13:41:44 +02:00
Stelios Fragkakis
5e4055624e
Fix coverity issue ()
CID 414122:  Resource leaks  (RESOURCE_LEAK)
2024-01-24 10:01:49 +02:00
Stelios Fragkakis
952631b952
Change query label matching logic ()
* Match multi labels

* Rework, add support for weights

* Fix function return value

* Cleanup function
2024-01-23 20:32:08 +02:00
Costa Tsaousis
f466b8aef5
DYNCFG: dynamically configured alerts ()
* cleanup alerts

* fix references

* fix references

* fix references

* load alerts once and apply them to each node

* simplify health_create_alarm_entry()

* Compile without warnings with compiler flags:

   -Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self

* code re-organization and cleanup

* generate patterns when applying prototypes; give unique dyncfg names to all alerts

* eval expressions keep the source and the parsed_as as STRING pointers

* renamed host to node in dyncfg ids

* renamed host to node in dyncfg ids

* add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level

* working functionality

* code re-organization: moved health event-loop to a new file, moved health globals to health.c

* rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs

* dyncfg alert prototypes initialization for alerts

* health dyncfg split to separate file

* cleanup not-needed code

* normalize matches between parsing and json

* also detect !* for disabled alerts

* dyncfg capability disabled

* Store alert config part1

* Add rrdlabels_common_count

* wip health variables lookup without indexes

* Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter

* working variables with runtime lookup

* working variables with runtime lookup

* delete rrddimvar and rrdfamily index

* remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts

* added /api/v1/variable that resolves a variable the same way alerts do

* remove rrdcalc from eval

* remove debug code

* remove duplicate assignment

* Fix memory leak

* all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts

* hide all internal structures of EVAL

* Enable -Wformat flag

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Adjust binding for calculation, warning, critical

* Remove unused macro

* Update config hash id

* use the right info and summary in alerts log

* use synchronous queries for alerts

* Handle cases when config_hash_id is missing from health_log

* remove deadlock from health worker

* parsing to json payload for health alert prototypes

* cleaner parsing and avoiding memory leaks in case of duplicate members in json

* fix left-over rename of function

* Keep original lookup field to send to the cloud
Cleanup / rename function to store config
Remove unused DEFINEs, functions

* Use ac->lookup

* link jobs to the host when the template is registered; do not accept running a function without a host

* full dyncfg support for health alerts, except action TEST

* working dyncfg additions, updates, removals

* fixed missing source, wrong status updates

* add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint

* fix dyncfg unittest

* rename functions

* generalize the json-c parser macros and move them to libnetdata

* report progress when enabling and disabling dyncfg templates

* moved rrdcalc and rrdvar to health

* update alarms

* added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts

* enable parsed json alerts; allow sending back accepted but disabled

* added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output

* remove the RRDHOST pointer from DYNCFG

* Fix command field submitted to the cloud

* do not send updates to creation requests, for DYNCFG jobs

---------

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: ilyam8 <ilya@netdata.cloud>
2024-01-23 20:20:41 +02:00
vkalintiris
bead543ea5
Name storage engine variables consistently. ()
* Consistent naming of STORAGE_INSTANCE instances.

Replace usages of `db_instance` and `instance` with
`si`.

* Rename array `storage_metrics_groups[tier]` to `smg[tier]`

* Rename db_metric_handle to smh

* Rename instances of `storage_engine_query_handle` to `seqh`.

* Rename instances of STORAGE_ENGINE_BACKEND to `seb`.

* Rename instances of STORAGE_COLLECT_HANDLE to `sch`.
2024-01-11 14:17:02 +02:00
Costa Tsaousis
dc7ca8644f
code cleanup ()
fixed minor code cleanup warnings
2023-12-12 18:12:43 +02:00
Costa Tsaousis
2175104d41
Faster parents ()
* cache ctx in collection handle

* cache rd together with rda

* do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension()

* fix unit tests

* do the absolutely minimum while updating timestamps, ensure validity during reading them

* when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only.

* remove the spinlock from mrg

* remove the metric flags that are not used any more

* mrg writers can be different threads

* update first time when latest clean is also updated

* cleanup

* set hot page with a simple atomic operation

* sender sets chart slot for every chart

* work on senders without SLOT

* enable SLOT capability

* send slot at BEGIN when SLOT is enabled

* fix slot generation and parsing

* send slot while re-streaming

* use the sender capabilities, not the receiver

* cleanup

* add slots support to all chart and dimension related plugin commands

* fix condition

* fix calculation

* check sender capabilties

* assign slots in constructors

* we need the dimension slot at the DIMENSION keyword

* more debug info in case of dimension mismatch

* ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely

* fix renumbering on child restart

* reset rda caching when receiving a chart definition

* optimize pluginsd_end_v2()

* do not do zero sized allocations

* trust the chart slot id of the child

* cleanup charts on pluginsd thread exit

* better cleanup

* find the chart and put it in the slot, if it not already there

* move slots array to host

* initialize pluginsd slots properly

* add slots to replay begin; do not cleanup slots that dont belong to a chart

* cleanup on obsolete

* cleanup slots on obsoletions

* cleanup and renames about obsoletion

* rewrite obsolation service code to remove race conditions

* better service obsoletion log

* added debugging

* more debug

* exposed flag now compares versions

* removed debugging messages

* respolve conflicts

* fix replication check for unsent dimensions
2023-10-27 22:42:29 +03:00
Costa Tsaousis
9fd9823e07
journal: fix the 1 second latency in play mode ()
provide a relative_to_absolute function that does not touch the current realtime time
2023-10-04 20:54:16 +03:00
Costa Tsaousis
ce75313de0
systemd-journal plugin () 2023-08-03 15:42:11 +03:00
vkalintiris
0e230a260e
Revert "Refactor RRD code. ()" ()
This reverts commit 440bd51e08.

dbengine was still being used for non-zero tiers
even on non-dbengine modes.
2023-08-03 13:13:36 +03:00
Costa Tsaousis
6daedef25b
fix expiration dates for API responses () 2023-07-26 16:10:31 +03:00
vkalintiris
440bd51e08
Refactor RRD code. ()
* Storage engine.

* Host indexes to rrdb

* Move globals to rrdb

* Move storage_tiers_backfill to rrdb

* default_rrd_update_every to rrdb

* default_rrd_history_entries to rrdb

* gap_when_lost_iterations_above to rrdb

* rrdset_free_obsolete_time_s to rrdb

* libuv_worker_threads to rrdb

* ieee754_doubles to rrdb

* rrdhost_free_orphan_time_s to rrdb

* rrd_rwlock to rrdb

* localhost to rrdb

* rm extern from func decls

* mv rrd macro under rrd.h

* default_rrdeng_page_cache_mb to rrdb

* default_rrdeng_extent_cache_mb to rrdb

* db_engine_journal_check to rrdb

* default_rrdeng_disk_quota_mb to rrdb

* default_multidb_disk_quota_mb to rrdb

* multidb_ctx to rrdb

* page_type_size to rrdb

* tier_page_size to rrdb

* No storage_engine_id in rrdim functions

* storage_engine_id is provided by st

* Update to fix merge conflict.

* Update field name

* Remove unnecessary macros from rrd.h

* Rm unused type decls

* Rm duplicate func decls

* make internal function static

* Make the rest of public dbengine funcs accept a storage_instance.

* No more rrdengine_instance :)

* rm rrdset_debug from rrd.h

* Use rrdb to access globals in ML and ACLK

Missed due to not having the submodules in the
worktree.

* rm total_number

* rm RRDVAR_TYPE_TOTAL

* rm unused inline

* Rm names from typedef'd enums

* rm unused header include

* Move include

* Rm unused header include

* s/rrdhost_find_or_create/rrdhost_get_or_create/g

* s/find_host_by_node_id/rrdhost_find_by_node_id/

Also, remove duplicate definition in rrdcontext.c

* rm macro used only once

* rm macro used only once

* Reduce rrd.h api by moving funcs into a collector specific utils header

* Remove unused func

* Move parser specific function out of rrd.h

* return storage_number instead of void pointer

* move code related to rrd initialization out of rrdhost.c

* Remove tier_grouping from rrdim_tier

Saves 8 * storage_tiers bytes per dimension.

* Fix rebase

* s/rrd_update_every/update_every/

* Mark functions as static and constify args

* Add license notes and file to build systems.

* Remove remaining non-log/config mentions of memory mode

* Move rrdlabels api to separate file.

Also, move localhost functions that loads
labels outside of database/ and into daemon/

* Remove function decl in rrd.h

* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir

* Do not expose internal function from rrd.h

* Rm NETDATA_RRD_INTERNALS

Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.

* Add license note

* Include libnetdata.h instead of aral.h

* Use rrdb to access localhost

* Fix builds without dbengine

* Add header to build system files

* Add rrdlabels.h to build systems

* Move func def from rrd.h to rrdhost.c

* Fix macos build

* Rm non-existing function

* Rebase master

* Define buffer length macro in ad_charts.

* Fix FreeBSD builds.

* Mark functions static

* Rm func decls without definitions

* Rebase master

* Rebase master

* Properly initialize value of storage tiers.

* Fix build after rebase.
2023-07-26 15:30:49 +03:00
Stelios Fragkakis
b12edb1208
Use spinlock in host and chart ()
* Switch alarm log lock to spinlock

* Switch the alerts lock in the chart structure to spinlock

* Proper lock usage
2023-07-10 14:13:50 +03:00
thiagoftsm
e0f388c43f
Rename generic error function () 2023-07-06 15:46:48 +00:00
Costa Tsaousis
43c749b07d
Obvious memory reductions ()
* remove rd->update_every

* reduce amount of memory for RRDDIM

* reorgnize rrddim->db entries

* optimize rrdset and statsd

* optimize dictionaries

* RW_SPINLOCK for dictionaries

* fix codeql warning

* rw_spinlock improvements

* remove obsolete assertion

* fix crash on health_alarm_log_process()

* use RW_SPINLOCK for AVL trees

* add RW_SPINLOCK read/write trylock

* pgc and mrg now use rw_spinlocks; cache line optimizations for mrg

* thread tag of dbegnine init

* append created datafile, lockless

* make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use

* thread cancelability in spinlocks; optimize thread cancelability management

* introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files

* use the last timestamp of each journal file for indexing

* when the previous cannot be found, start from the beginning

* add more stats to PDC to trace routing easier

* rename spinlock functions

* fix for spinlock renames

* revert statsd socket statistics to size_t

* turn fatal into internal_fatal()

* show candidates always

* show connected status and connection attempts
2023-06-19 23:19:36 +03:00
Costa Tsaousis
8bf58525b1
fix the units when returning percentage of a group () 2023-05-26 20:33:53 +03:00
Costa Tsaousis
d395333d44
On data and weight queries now instances filter matches also instance_id@node_id ()
instances filter now matches also instance_id@node_id
2023-05-05 22:10:06 +03:00
Costa Tsaousis
204dd9ae27
Boost dbengine ()
* configure extent cache size

* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads

* fix dispatched and executing counters

* boost to the max

* increase libuv worker threads

* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical

* fix accounting of query prep

* inlining of time-grouping functions, to speed up queries with billions of points

* make switching based on a local const variable

* print one pending contexts loading message per iteration

* inlined store engine query API

* inlined storage engine data collection api

* inlined all storage engine query ops

* eliminate and inline data collection ops

* simplified query group-by

* more error handling

* optimized partial trimming of group-by queries

* preparative work to support multiple passes of group-by

* more preparative work to support multiple passes of group-by (accepts multiple group-by params)

* unified query timings

* unified query timings - weights endpoint

* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries

* fix query memory accounting

* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name

* limit max ACLK WEB response size to 30MB

* the response type should be text/plain

* more preparative work for multiple group-by passes

* create functions for generating group by keys, ids and names

* multiple group-by passes are now supported

* parse group-by options array also with an index

* implemented percentage-of-instance group by function

* family is now merged in multi-node contexts

* prevent uninitialized use
2023-04-07 21:25:01 +03:00
Costa Tsaousis
8a036f0b24
/api/v2/X part 7 ()
* /api/v2/weights, points key renamed to result

* /api/v2/weights, add node ids in response

* /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics

* /api/v2/data expose view.dimensions.sts{}

* /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses

* /api/v2 nodes output now includes the duration of time spent per node

* jsonwrap view object renames and cleanup

* rework of the statistics returned by the query engine

* swagger work

* swagger work

* more swagger work

* updated swagger json

* added the remaining of the /api/v2 endpoints to swagger

* point.ar has been renamed point.arp

* updated weights endpoint

* fix compilation warnings
2023-03-28 15:23:03 +03:00
Costa Tsaousis
5eed0545d4
/api/v2/X part 5 ()
* query timestamps are now pre-determined and alignment on timestamps is guarranteed

* turn internal_fatal() to internal_error() to investigate the issue

* handle query when no data exist in the db

* check for non NULL dict when running dictionary garbage collect

* support API v2 requests via ACLK

* add nodes detailed information to /api/v2/nodes

* fixed keys and added dummy nodes for completeness

* added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node

* make sure replication does not double count charts that are already being replicated

* expose min and max in sts structures

* added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters

* respect trimming in percentage calculation; updated swagger

* api/v2/weights preparative work to support multi-node queries - still single node though

* multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data

* when passing the raw option, the query exposes the hidden dimensions

* fix compilation issues on older systems

* the query engine now calculates per dimension min, max, sum, count, anomaly count

* use the macro to calculate storage point anomaly rate

* weights endpoint exposing version hashes

* weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate

* query: expose RESET flag; do not add the same point multiple times to the aggregated point

* weights: more compact output

* weights requests can be interrupted

* all /api/v2 requests can be interrupted and timeout

* allow relative timestamps in weights

* fix macos compilation warnings

* Revert "fix macos compilation warnings"

This reverts commit 8a1d24e41e.

* /api/v2/data group-by now works on dimension names, not ids

* /api/v2/weights does not query metrics without retention and new output format

* /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
2023-03-21 21:53:47 +02:00
Costa Tsaousis
cf85c3b0e9
/api/v2/X improvements part 3 ()
* max web request size to 64KB

* fix the request too big message

* increase max request reading tries to 100

* support for bigger web requests

* add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too

* better implementation for partial data trimming

* added group_by=selected to return only one dimension for all selected metrics

* fix acceptance of group_by=selected

* passing option "raw" disables partial data trimming

* remove obsolete option "plan"; use "debug"

* fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing

* per row annotations

* added time column to point annotations

* disable caching for /api/v2/contexts responses

* added api format json2 that returns an array for each points, having all the point values and annotations in them

* work on swagger about /api/v2

* prevent infinite loop

* cleanup and swagger work

* allow negative simple pattern expressions to work as expected

* do not lookup in the dictionary empty names

* garbage collect dictionaries

* make query_target allocate less aggressively; queries fill the remaining points with nulls

* reusable query ops to save memory on huge queries

* move parts of query plans into query ops to save query target memory

* remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed
2023-03-10 12:41:14 +02:00
Costa Tsaousis
021e252fc5
/api/v2/contexts ()
* preparation for /api/v2/contexts

* working /api/v2/contexts

* add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages

* minor fix

* query targegt now accurately counts hosts, contexts, instances, dimensions, metrics

* cleanup /api/v2/contexts

* full text search with /api/v2/contexts

* simple patterns now support the option to search ignoring case

* full text search API with /api/v2/q

* simple pattern execution optimization

* do not show q when not given

* full text search accounting

* separated /api/v2/nodes from /api/v2/contexts

* fix ssv queries for group_by

* count query instances queried and failed per context and host

* split rrdcontext.c to multiple files

* add query totals

* fix anomaly rate calculation; provide "ni" for indexing hosts

* do not generate zero valued members

* faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point

* fix typo when printing dimensions totals

* added option minify to remove spaces and newlines fron JSON output

* send instance ids and names when they differ

* do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe

* fix for the previous + renames and code cleanup

* when a dimension is filtered, include in the response all the other dimensions that are selectable

* do not add nodes that do not have retention in the current window

* move selection of dimensions to query_dimension_add(), instead of query_metric_add()

* increase the pre-processing capacity of queries

* generate instance fqdn ids and names only when they are needed

* provide detailed statistics about tiers retention, queries, points, update_every

* late allocation of query dimensions

* cleanup

* more cleanup

* support for annotations per displayed point, RESET and PARTIAL

* new type annotations

* if a chart is not linked to contexts and it is collected, link it when it is collected

* make ML run reentrant

* make ML rrdr query synchronous

* optimize replication memory allocation of replication_sort_entry

* change units to percentage, when requesting a coefficinet of variation, or a percentage query

* initialize replication before starting main threads

* properly decrement no room requests counter

* propagate the non-zero flag to group-by

* the same by avoiding the extra loop

* respect non-zero in all dimension arrays

* remove dictionary garbage collection from dictionary_entries() and dictionary_version()

* be more verbose when jv2 indexing is postponed

* prevent infinite loop

* use hidden dimensions even when dimensions pattern is unset

* traverse hosts using dictionaries

* fix dictionary unittests
2023-03-02 22:50:48 +02:00