0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-16 18:37:50 +00:00
Commit graph

42 commits

Author SHA1 Message Date
Costa Tsaousis
f466b8aef5
DYNCFG: dynamically configured alerts ()
* cleanup alerts

* fix references

* fix references

* fix references

* load alerts once and apply them to each node

* simplify health_create_alarm_entry()

* Compile without warnings with compiler flags:

   -Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self

* code re-organization and cleanup

* generate patterns when applying prototypes; give unique dyncfg names to all alerts

* eval expressions keep the source and the parsed_as as STRING pointers

* renamed host to node in dyncfg ids

* renamed host to node in dyncfg ids

* add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level

* working functionality

* code re-organization: moved health event-loop to a new file, moved health globals to health.c

* rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs

* dyncfg alert prototypes initialization for alerts

* health dyncfg split to separate file

* cleanup not-needed code

* normalize matches between parsing and json

* also detect !* for disabled alerts

* dyncfg capability disabled

* Store alert config part1

* Add rrdlabels_common_count

* wip health variables lookup without indexes

* Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter

* working variables with runtime lookup

* working variables with runtime lookup

* delete rrddimvar and rrdfamily index

* remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts

* added /api/v1/variable that resolves a variable the same way alerts do

* remove rrdcalc from eval

* remove debug code

* remove duplicate assignment

* Fix memory leak

* all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts

* hide all internal structures of EVAL

* Enable -Wformat flag

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Adjust binding for calculation, warning, critical

* Remove unused macro

* Update config hash id

* use the right info and summary in alerts log

* use synchronous queries for alerts

* Handle cases when config_hash_id is missing from health_log

* remove deadlock from health worker

* parsing to json payload for health alert prototypes

* cleaner parsing and avoiding memory leaks in case of duplicate members in json

* fix left-over rename of function

* Keep original lookup field to send to the cloud
Cleanup / rename function to store config
Remove unused DEFINEs, functions

* Use ac->lookup

* link jobs to the host when the template is registered; do not accept running a function without a host

* full dyncfg support for health alerts, except action TEST

* working dyncfg additions, updates, removals

* fixed missing source, wrong status updates

* add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint

* fix dyncfg unittest

* rename functions

* generalize the json-c parser macros and move them to libnetdata

* report progress when enabling and disabling dyncfg templates

* moved rrdcalc and rrdvar to health

* update alarms

* added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts

* enable parsed json alerts; allow sending back accepted but disabled

* added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output

* remove the RRDHOST pointer from DYNCFG

* Fix command field submitted to the cloud

* do not send updates to creation requests, for DYNCFG jobs

---------

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: ilyam8 <ilya@netdata.cloud>
2024-01-23 20:20:41 +02:00
Stelios Fragkakis
1973e70b62
Use original summary for alert transition ()
Use original summary for alert
Fetch transaction and global id for transitions safely
2024-01-15 20:31:23 +02:00
Costa Tsaousis
f2b250a1f5
dyncfg v2 ()
* split rrdfunctions streaming and progress

* simplified internal inline functions API

* split rrdfunctions inflight management

* split rrd functions exporters

* renames

* base dyncfg structure

* config pluginsd

* intercept dyncfg function calls

* loading and saving of dyncfg metadata and data

* save metadata and payload to a single file; added code to update the plugins with jobs and saved configs

* basic working unit test

* added payload to functions execution

* removed old dyncfg code that is not needed any more

* more cleanup

* cleanup sender for functions with payload

* dyncfg functions are not exposed as functions

* remaining work to avoid indexing the \0 terminating character in dictionary keys

* added back old dyncfg plugins.d commands as noop, to allow plugins continue working

* working api; working streaming;

* updated plugins.d documentation

* aclk and http api requests share the same header parsing logic

* added source type internal

* fixed crashes

* added god mode for tests

* fixes

* fixed messages

* save host machine guids to configs

* cleaner manipulation of supported commands

* the functions event loop for external plugins can now process dyncfg requests

* unified internal and external plugins dyncfg API

* Netdata serves schema requests from /etc/netdata/schema.d and /var/lib/netdata/conf.d/schema.d

* cleanup and various fixes; fixed bug in previous dyncfg implementation on streaming that was sending the paylod in a way that allowed other streaming commands to be multiplexed

* internals go to a separate header file

* fix duplicate ACLK requests sent by aclk queue mechanism

* use fstat instead of stat

* working api

* plugin actions renamed to create and delete; dyncfg files are removed only from user actions

* prevent deadlock by using the react callback

* fix for string_strndupz()

* better dyncfg unittests

* more tests at the unittests

* properly detect dyncfg functions

* hide config functions from the UI

* tree response improvements

* send the initial update with payload

* determine tty using stdout, not stderr

* changes to statuses, cleanup and the code to bring all business logic into interception

* do not crash when the status is empty

* functions now propagate the source of the requests to plugins

* avoid warning about unused functions

* in the count at items for attention, do not count the orphan entries

* save source into dyncfg

* make the list null terminated

* fixed invalid comparison

* prevent memory leak on duplicated headers; log x-forwarded-for

* more unit tests

* added dyncfg unittests into the default unittests

* more unit tests and fixes

* more unit tests and fixes

* fix dictionary unittests

* config functions require admin access
2024-01-11 16:56:45 +02:00
vkalintiris
bead543ea5
Name storage engine variables consistently. ()
* Consistent naming of STORAGE_INSTANCE instances.

Replace usages of `db_instance` and `instance` with
`si`.

* Rename array `storage_metrics_groups[tier]` to `smg[tier]`

* Rename db_metric_handle to smh

* Rename instances of `storage_engine_query_handle` to `seqh`.

* Rename instances of STORAGE_ENGINE_BACKEND to `seb`.

* Rename instances of STORAGE_COLLECT_HANDLE to `sch`.
2024-01-11 14:17:02 +02:00
Costa Tsaousis
da32dd8be8
Queries Progress ()
* track the progress of queries

* add query_progress in libnetdata Makefile.am

* add acl, response size and response code to the tracking

* define the required functions

* fix the last commit

* added /api/v2/progress?transaction=ID to report the progress of queries

* added function to report netdata-queries

* track hashtable additions

* when resusing a transaction, maintain the counter

* keep track of linked and indexing

* added X-Forwarded-Host and X-Forwarded-For to logs. X-Forwarded-For is also added in progress tracking

* report compact uuids to match logs; register the actual duration of the transaction

* added rowOptions to function; now web_client keeps track if it tracks progress or not

* add http request method to progress

* add tags per function; /api/vX/functions is now not protected

* compact the sanitization array

* split pluginsd_parser into multiple files

* cleanup keyword definitions

* code cleanup

* extracted rrd_collector to separate files

* added http access level to functions

* renamed access "all" to "any"

* implemented optional protection on functions

* add priority to functions, to allow the UI select the best function (lower priority) when the user has not selected a function

* added progress report from the plugins to netdata and from children to parents - untested

* added progress reporting in systemd-journal

* query timeout is now handled by evloop for external plugins

* propagate progress reports to children and plugins

* fix codeql warning

* adapt to cmake

* minor changes

* extend function timeout when progress is received; added streaming capability to propagate progress reports to parents and send progress requests to children

* revert change in dictionary.h

* add log when access level is invalid

* update access level of functions

* added logs when processing progress updates

* log when the deferred response is too big

* comment out sender progress to find the issue

* added missing newline in streaming progress reports

* propogate progress reports to functions

* fix logs
2023-12-15 18:15:43 +02:00
Stelios Fragkakis
096d1b1b2b
Code cleanup ()
* Code cleanup

* More cleanup

* More cleanup

* Use FILENAME_MAX

* query fix
2023-12-01 15:45:59 +02:00
Costa Tsaousis
2175104d41
Faster parents ()
* cache ctx in collection handle

* cache rd together with rda

* do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension()

* fix unit tests

* do the absolutely minimum while updating timestamps, ensure validity during reading them

* when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only.

* remove the spinlock from mrg

* remove the metric flags that are not used any more

* mrg writers can be different threads

* update first time when latest clean is also updated

* cleanup

* set hot page with a simple atomic operation

* sender sets chart slot for every chart

* work on senders without SLOT

* enable SLOT capability

* send slot at BEGIN when SLOT is enabled

* fix slot generation and parsing

* send slot while re-streaming

* use the sender capabilities, not the receiver

* cleanup

* add slots support to all chart and dimension related plugin commands

* fix condition

* fix calculation

* check sender capabilties

* assign slots in constructors

* we need the dimension slot at the DIMENSION keyword

* more debug info in case of dimension mismatch

* ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely

* fix renumbering on child restart

* reset rda caching when receiving a chart definition

* optimize pluginsd_end_v2()

* do not do zero sized allocations

* trust the chart slot id of the child

* cleanup charts on pluginsd thread exit

* better cleanup

* find the chart and put it in the slot, if it not already there

* move slots array to host

* initialize pluginsd slots properly

* add slots to replay begin; do not cleanup slots that dont belong to a chart

* cleanup on obsolete

* cleanup slots on obsoletions

* cleanup and renames about obsoletion

* rewrite obsolation service code to remove race conditions

* better service obsoletion log

* added debugging

* more debug

* exposed flag now compares versions

* removed debugging messages

* respolve conflicts

* fix replication check for unsent dimensions
2023-10-27 22:42:29 +03:00
Emmanuel Vasilakis
bdf83311c3
Add summary to /alerts () 2023-10-16 17:09:41 +03:00
Costa Tsaousis
9fd9823e07
journal: fix the 1 second latency in play mode ()
provide a relative_to_absolute function that does not touch the current realtime time
2023-10-04 20:54:16 +03:00
Emmanuel Vasilakis
8c9492a476
Send alerts summary field to cloud ()
* new aclk schema

* transmit summary to cloud and expose in v2/alerts

* missing assign
2023-10-02 09:56:01 +03:00
Stelios Fragkakis
e0b36f2865
Switch to uint64_t to avoid overflow in 32bit systems () 2023-09-26 18:45:52 +03:00
Stelios Fragkakis
24006ed5c1
Reduce label memory () 2023-09-01 15:37:55 +03:00
Costa Tsaousis
41bd902426
Facets histograms () 2023-08-21 11:20:18 +03:00
Costa Tsaousis
ce75313de0
systemd-journal plugin () 2023-08-03 15:42:11 +03:00
Stelios Fragkakis
a833f6674f
Fix memory corruption ()
Delay free
2023-08-03 15:38:14 +03:00
vkalintiris
0e230a260e
Revert "Refactor RRD code. ()" ()
This reverts commit 440bd51e08.

dbengine was still being used for non-zero tiers
even on non-dbengine modes.
2023-08-03 13:13:36 +03:00
Costa Tsaousis
72549b3a22
prefer titles, families, units and priorities from collected charts () 2023-08-03 09:38:36 +03:00
vkalintiris
440bd51e08
Refactor RRD code. ()
* Storage engine.

* Host indexes to rrdb

* Move globals to rrdb

* Move storage_tiers_backfill to rrdb

* default_rrd_update_every to rrdb

* default_rrd_history_entries to rrdb

* gap_when_lost_iterations_above to rrdb

* rrdset_free_obsolete_time_s to rrdb

* libuv_worker_threads to rrdb

* ieee754_doubles to rrdb

* rrdhost_free_orphan_time_s to rrdb

* rrd_rwlock to rrdb

* localhost to rrdb

* rm extern from func decls

* mv rrd macro under rrd.h

* default_rrdeng_page_cache_mb to rrdb

* default_rrdeng_extent_cache_mb to rrdb

* db_engine_journal_check to rrdb

* default_rrdeng_disk_quota_mb to rrdb

* default_multidb_disk_quota_mb to rrdb

* multidb_ctx to rrdb

* page_type_size to rrdb

* tier_page_size to rrdb

* No storage_engine_id in rrdim functions

* storage_engine_id is provided by st

* Update to fix merge conflict.

* Update field name

* Remove unnecessary macros from rrd.h

* Rm unused type decls

* Rm duplicate func decls

* make internal function static

* Make the rest of public dbengine funcs accept a storage_instance.

* No more rrdengine_instance :)

* rm rrdset_debug from rrd.h

* Use rrdb to access globals in ML and ACLK

Missed due to not having the submodules in the
worktree.

* rm total_number

* rm RRDVAR_TYPE_TOTAL

* rm unused inline

* Rm names from typedef'd enums

* rm unused header include

* Move include

* Rm unused header include

* s/rrdhost_find_or_create/rrdhost_get_or_create/g

* s/find_host_by_node_id/rrdhost_find_by_node_id/

Also, remove duplicate definition in rrdcontext.c

* rm macro used only once

* rm macro used only once

* Reduce rrd.h api by moving funcs into a collector specific utils header

* Remove unused func

* Move parser specific function out of rrd.h

* return storage_number instead of void pointer

* move code related to rrd initialization out of rrdhost.c

* Remove tier_grouping from rrdim_tier

Saves 8 * storage_tiers bytes per dimension.

* Fix rebase

* s/rrd_update_every/update_every/

* Mark functions as static and constify args

* Add license notes and file to build systems.

* Remove remaining non-log/config mentions of memory mode

* Move rrdlabels api to separate file.

Also, move localhost functions that loads
labels outside of database/ and into daemon/

* Remove function decl in rrd.h

* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir

* Do not expose internal function from rrd.h

* Rm NETDATA_RRD_INTERNALS

Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.

* Add license note

* Include libnetdata.h instead of aral.h

* Use rrdb to access localhost

* Fix builds without dbengine

* Add header to build system files

* Add rrdlabels.h to build systems

* Move func def from rrd.h to rrdhost.c

* Fix macos build

* Rm non-existing function

* Rebase master

* Define buffer length macro in ad_charts.

* Fix FreeBSD builds.

* Mark functions static

* Rm func decls without definitions

* Rebase master

* Rebase master

* Properly initialize value of storage tiers.

* Fix build after rebase.
2023-07-26 15:30:49 +03:00
Costa Tsaousis
7c2acb3b24
added missing fields to alerts instances () 2023-07-18 18:41:37 +03:00
Costa Tsaousis
3cdedeed40
add chart id and name to alert instances and transitions () 2023-07-18 15:37:07 +03:00
Costa Tsaousis
77076d8764
bearer improvements () 2023-07-11 02:28:06 +03:00
Stelios Fragkakis
b12edb1208
Use spinlock in host and chart ()
* Switch alarm log lock to spinlock

* Switch the alerts lock in the chart structure to spinlock

* Proper lock usage
2023-07-10 14:13:50 +03:00
Costa Tsaousis
880c9fbc61
alerts_transitions outputs hostnames and items statistics ()
* alerts_transitions outputs hostnames and items statistics

* return details about the items in the database

* added comments to items list and made the whole of statsd available under debug
2023-07-09 17:02:25 +03:00
Costa Tsaousis
3f78777839
avoid memory allocations for alert transitions facets processing () 2023-07-06 17:19:32 +03:00
Costa Tsaousis
4f5228e654
add add summary linking to alert instances (ati) when options=summary,values is requested () 2023-07-06 17:18:32 +03:00
Costa Tsaousis
6526a34f86
fix alerts transitions sorting () 2023-07-06 14:43:01 +03:00
Costa Tsaousis
c74bf56ee2
Code reorg and cleanup - enrichment of /api/v2 ()
* claim script now accepts the same params as the kickstart

* rewrote buildinfo to unify all methods

* added cloud unavailable in cloud status

* added all exporters

* renamed httpd to h2o

* rename ENABLE_COMPRESSION to ENABLE_LZ4

* rename global variable

* rename ENABLE_HTTPS to ENABLE_OPENSSL

* fix coverity-scan for openssl

* add lz4 to coverity-scan

* added all plugins and most of the features

* added all plugins and most of the features

* generalize bitmap code so that we can have any size of bitmaps

* cleanup

* fix compilation without protobuf

* fix compilation with others allocators

* fix bitmap

* comprehensive bitmaps unit test

* bitmap as macros

* added developer mode

* added system info to build info

* cloud available/unavailable

* added /api/v2/info

* added units and ni to transitions

* when showing instances and transitions, show only the instances that have transitions

* cleanup

* add missing quotes

* add anchor to transitions

* added more to build info

* calculate retention per tier and expose it to /api/v2/info

* added currently collected metrics

* do not show space and retention when no numbers are available

* fix impossible overflow

* Add function for transitions and execute callback

* In case of error, reset and try next dictionary entry

* Fix error message

* simpler logic to maintain retention per tier

* /api/v2/alert_transitions

* Handle case of recipient null
Convert after and before to usec

* Add classification, type and component

* working /api/v2/alert_transitions

* Fix query to properly handle context and alert name

* cleanup

* Add search with transition

* accept transition in /api/v2/alert_transitions

* totaly dynamic facets

* fixed debug info

* restructured facets

* cleanup; removal of options=transitions

* updated alert entries flags

* method to exec

* Return also exec run timestamp
Temp table cleanup only when we don't execute with a transition

* cleanup obsolete anchor parameter

* Add sql_get_alert_configuration function

* added options=config to alert_transitions

* added /api/v2/alert_config

* preliminary work for /api/v2/claim

* initialize variables; do not expose expected retention if no disk space info is available; do not report aclk as initializing when not claimed

* fix claim session key filename

* put a newline into the session key file

* more progress on claiming

* final /api/v2/claim endpoint

* after claiming, refresh our state at the output

* Fix query to fetch config

* Remove debug log

* add configuration objects

* add configuration objects - fixed

* respect the NETDATA_DISABLE_CLOUD env variable

* NETDATA_DISABLE_CLOUD env variable sets the default, but the config sets the final value

* use a new claimed_id on every claiming

* regenerate random key on claiming and wait for online status

* ignore write() return value when writing a newline

* dont show cloud status disabled when claimed_id is missing

* added ctx to alert instances

* cleanup config and transitions from /api/v2/alerts

* fix unused variable

* in /api/v2/alert_config show 1 config without an array

* show alert values conditionally, by appending options=values

* When storing host info if the key value is empty, store unknown

* added options=summary to control when the alerts summary is shown

* increased http_api_v2 to version 5

* claming random key file is now not world readable

* added local-listeners binary that detects all the listening ports, their IPs and their command lines

---------

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-07-06 01:49:32 +03:00
Costa Tsaousis
5be9be7485
rewrite /api/v2/alerts ()
* rewrite /api/v2/alerts

* implement searching for transition

* Find transition id and issue callback

* Fix parameters

* call and transition filter

* Search with transition as well

* renames and cleanup

* render flags

* what if scenario for moving transitions at the top level

* If transition is given, limit the query appropriately

* Add alert transitions

* Optimize find transition to use prepared query
Drop temp table properly

* enabled alert instances again

* Order by when key

* Order by global_id

* Return last X transitions

* updated field names

* add ati to configurations and show all keys in debug mode

* Code cleanup and optimizations

* Drop temp table in case of error

* Finalize temp table population statement to prevent memory leak

* final changes

---------

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-28 23:14:10 +03:00
Costa Tsaousis
0d61c11b5f
use gperf for the pluginsd/streaming parser hashtable ()
* use gperf for the pluginsd parser

* simplify pluginsd_parser by removing void pointers to user

* pluginsd_split_words() with inlined pluginsd_space()

* quoted_string_splitter() now uses a map instead of a function for determining spaces

* add stress test for pluginsd parser

* optimized BITMAP256

* optimized rrdpush receiver reception

* optimized rrdpush sender compression

* renames and cleanup

* remove wrong negation

* unify handshake and disconnection reasons

* use parser_find_keyword

* register job names only for the current repertoire
2023-06-26 14:00:59 +03:00
Stelios Fragkakis
f3efdba1a0
New alerts endpoint ()
* alerts / alerts_log v2

* Add global_id to ae
Populate entries with global id

* Remove transition id from template
Change history to instances

* Link ae to rc in all cases
Code cleanup
2023-06-22 01:16:57 +03:00
Costa Tsaousis
c980f48dda
/api/v2 improvements ()
* readers should be able to recursively acquire the lock, even when there is a writer waiting

* added health section into nodes

* uniformity of nodes

* nodes instances should not return node info; http_api_v2 capability should be version 4 everywhere

* added /api/v2/versions

* added /api/v2/functions

* /api/v2/version should be neat
2023-06-21 22:31:58 +03:00
Costa Tsaousis
a8da697819
Fix /api/v2/contexts,nodes,nodes_instances,q before match ()
* readers should be able to recursively acquire the lock, even when there is a writer waiting

* in /api/v2/contexts/nodes/nodes_instances/q calls, when the context is collected, before should be matched against now, not the latest cached retention
2023-06-20 21:55:08 +03:00
Costa Tsaousis
43c749b07d
Obvious memory reductions ()
* remove rd->update_every

* reduce amount of memory for RRDDIM

* reorgnize rrddim->db entries

* optimize rrdset and statsd

* optimize dictionaries

* RW_SPINLOCK for dictionaries

* fix codeql warning

* rw_spinlock improvements

* remove obsolete assertion

* fix crash on health_alarm_log_process()

* use RW_SPINLOCK for AVL trees

* add RW_SPINLOCK read/write trylock

* pgc and mrg now use rw_spinlocks; cache line optimizations for mrg

* thread tag of dbegnine init

* append created datafile, lockless

* make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use

* thread cancelability in spinlocks; optimize thread cancelability management

* introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files

* use the last timestamp of each journal file for indexing

* when the previous cannot be found, start from the beginning

* add more stats to PDC to trace routing easier

* rename spinlock functions

* fix for spinlock renames

* revert statsd socket statistics to size_t

* turn fatal into internal_fatal()

* show candidates always

* show connected status and connection attempts
2023-06-19 23:19:36 +03:00
Costa Tsaousis
0b4f820e9d
/api/v2/nodes and streaming function ()
* dummy streaming function

* expose global functions upstream

* separate function for pushing global functions

* add missing conditions

* allow streaming function to run async

* started internal API for functions

* cache host retention and expose it to /api/v2/nodes

* internal API for function table fields; more progress on streaming status

* abstracted and unified rrdhost status

* port old coverity warning fix - although it is not needed

* add ML information to rrdhost status

* add ML capability to streaming to signal the transmission of ML information; added ML information to host status

* protect host->receiver

* count metrics and instances per host

* exposed all inbound and outbound streaming

* fix for ML status and dependency of DATA_WITH_ML to INTERPOLATED, not IEEE754

* update ML dummy

* added all fields

* added streaming group by and cleaned up accepted values by cloud

* removed type

* Revert "removed type"

This reverts commit faae4177e6.

* added context to db summary

* new /api/v2/nodes schema

* added ML type

* change default function charts

* log to trace new capa

* add more debug

* removed debugging code

* retry on receive interrupted read; respect sender reconnect delay in all cases

* set disconnected host flag and manipulate localhost child count atomically, inside set/clear receiver

* fix infinite loop

* send_to_plugin() now has a spinlock to ensure that only 1 thread is writing to the plugin/child at the same time

* global cloud_status() call

* cloud should be a section, since it will contain error information

* put cloud capabilities into cloud

* aclk status in /api/v2 agents sections

* keep aclk_connection_counter

* updates on /api/v2/nodes

* final /api/v2/nodes and addition of /api/v2/nodes_instances

* parametrize all /api/v2/xxx output to control which info is outputed per endpoint

* always accept nodes selector

* st needs to be per instance, not per node

* fix merging of contexts; fix cups plugin priorities

* add after and before parameters to /api/v2/contexts/nodes/nodes_instances/q

* give each libuv worker a unique id

* aclk http_api_v2 version 4
2023-06-19 20:52:35 +03:00
Costa Tsaousis
80d83b7bd1
api v2 nodes for streaming statuses ()
* api v2 nodes for streaming statuses

* remove test

* move parts of the output

* in api/v2/data return 5 values per point when aggregation=percentage and raw option is given; return final values when aggregation=percentage is not the final grouping
2023-06-08 16:33:22 +03:00
Costa Tsaousis
66c8546019
Re-write of SSL support in Netdata; restoration of SIGCHLD; detection of stale plugins; streaming improvements ()
* add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect

* make each streaming destination respect its SSL settings

* to not send SSL traffic over non-SSL connection

* keep track of outgoing streaming connection attempts

* retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ

* Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ"

This reverts commit 14c858677c.

* cleanup SSL connections properly

* initialize SSL in rpt before takeover

* sender should free SSL when talking to a non-SSL destination

* do not shutdown SSL when receiver exits

* restore operation of SIGCHLD when the reaper is not enabled

* create an fgets function that checks for data and times out

* work on error handling of plugins exiting

* remove newlines from logs

* global call to waitid(), caching the result for netdata_pclose() to process

* receiver tid

* parser timeouts in 2 minutes instead of 10

* fix crash when UUID is NULL in SQLite

* abstract sqlite3 parsing for uuid and text

* write proper ssl errors on read and write

* fix for SSL_ERROR_WANT_RETRY_VERIFY

* SSL WANT per function

* unified SSL error logging

* fix compilation warning

* additional logging about parser cleanup

* streaming parser should call the pluginsd parser cleanup

* SSL error handling work

* SSL initialization unification

* check for pending data when receiving SSL response with timeout

* macro to check if an SSL connection has been established

* remove SSL_pending()

* check for SSL macros

* use SSL_peek() to find if there is a response

* SSL renames

* more SSL renames & cleanup

* rrdpush ssl connection function

* abstract all SSL functions into security.c

* keep track of SSL connections and always attempt to use SSL read/write when on SSL connection

* signal openssl to skip certificate validation when configured to do so

* better SSL error handling and logging

* SSL code cleanup

* SSL retry on SSL_connect and SSL_accept

* SSL provide default return value for old compilers

* SSL read/write functions emulate system read/write functions

* fix receive/send timeout and switch from SSL_peek() to SSL_pending()

* remove SSL_pending()

* removed sender auto-retry and debug info for initial recevier response

* ssl skip certificate verification config for web server

* ssl errors log ip and port of the peer

* keep ssl with web_client for its whole lifetime

* thread safe socket peers to text

* use error_limit() for common ssl errors

* cleanup

* more cleanup

* coverity fixes

* ssl error logs include both local and remote ip/port info

* remove obsolete code
2023-06-07 21:10:27 +03:00
Costa Tsaousis
d395333d44
On data and weight queries now instances filter matches also instance_id@node_id ()
instances filter now matches also instance_id@node_id
2023-05-05 22:10:06 +03:00
Costa Tsaousis
204dd9ae27
Boost dbengine ()
* configure extent cache size

* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads

* fix dispatched and executing counters

* boost to the max

* increase libuv worker threads

* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical

* fix accounting of query prep

* inlining of time-grouping functions, to speed up queries with billions of points

* make switching based on a local const variable

* print one pending contexts loading message per iteration

* inlined store engine query API

* inlined storage engine data collection api

* inlined all storage engine query ops

* eliminate and inline data collection ops

* simplified query group-by

* more error handling

* optimized partial trimming of group-by queries

* preparative work to support multiple passes of group-by

* more preparative work to support multiple passes of group-by (accepts multiple group-by params)

* unified query timings

* unified query timings - weights endpoint

* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries

* fix query memory accounting

* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name

* limit max ACLK WEB response size to 30MB

* the response type should be text/plain

* more preparative work for multiple group-by passes

* create functions for generating group by keys, ids and names

* multiple group-by passes are now supported

* parse group-by options array also with an index

* implemented percentage-of-instance group by function

* family is now merged in multi-node contexts

* prevent uninitialized use
2023-04-07 21:25:01 +03:00
Timotej S
c09dbb224a
minor - add capability signifying this agent can speak apiv2 ()
* capa apiv2

* build new cancellation proto

* commited on wrong branch :D Revert "build new cancellation proto"

This reverts commit 8290422de4.

* use common source of truth for capas in apiv2

* fix for possible races
2023-03-31 01:11:53 +03:00
Costa Tsaousis
8a036f0b24
/api/v2/X part 7 ()
* /api/v2/weights, points key renamed to result

* /api/v2/weights, add node ids in response

* /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics

* /api/v2/data expose view.dimensions.sts{}

* /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses

* /api/v2 nodes output now includes the duration of time spent per node

* jsonwrap view object renames and cleanup

* rework of the statistics returned by the query engine

* swagger work

* swagger work

* more swagger work

* updated swagger json

* added the remaining of the /api/v2 endpoints to swagger

* point.ar has been renamed point.arp

* updated weights endpoint

* fix compilation warnings
2023-03-28 15:23:03 +03:00
Costa Tsaousis
5eed0545d4
/api/v2/X part 5 ()
* query timestamps are now pre-determined and alignment on timestamps is guarranteed

* turn internal_fatal() to internal_error() to investigate the issue

* handle query when no data exist in the db

* check for non NULL dict when running dictionary garbage collect

* support API v2 requests via ACLK

* add nodes detailed information to /api/v2/nodes

* fixed keys and added dummy nodes for completeness

* added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node

* make sure replication does not double count charts that are already being replicated

* expose min and max in sts structures

* added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters

* respect trimming in percentage calculation; updated swagger

* api/v2/weights preparative work to support multi-node queries - still single node though

* multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data

* when passing the raw option, the query exposes the hidden dimensions

* fix compilation issues on older systems

* the query engine now calculates per dimension min, max, sum, count, anomaly count

* use the macro to calculate storage point anomaly rate

* weights endpoint exposing version hashes

* weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate

* query: expose RESET flag; do not add the same point multiple times to the aggregated point

* weights: more compact output

* weights requests can be interrupted

* all /api/v2 requests can be interrupted and timeout

* allow relative timestamps in weights

* fix macos compilation warnings

* Revert "fix macos compilation warnings"

This reverts commit 8a1d24e41e.

* /api/v2/data group-by now works on dimension names, not ids

* /api/v2/weights does not query metrics without retention and new output format

* /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
2023-03-21 21:53:47 +02:00
Costa Tsaousis
021e252fc5
/api/v2/contexts ()
* preparation for /api/v2/contexts

* working /api/v2/contexts

* add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages

* minor fix

* query targegt now accurately counts hosts, contexts, instances, dimensions, metrics

* cleanup /api/v2/contexts

* full text search with /api/v2/contexts

* simple patterns now support the option to search ignoring case

* full text search API with /api/v2/q

* simple pattern execution optimization

* do not show q when not given

* full text search accounting

* separated /api/v2/nodes from /api/v2/contexts

* fix ssv queries for group_by

* count query instances queried and failed per context and host

* split rrdcontext.c to multiple files

* add query totals

* fix anomaly rate calculation; provide "ni" for indexing hosts

* do not generate zero valued members

* faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point

* fix typo when printing dimensions totals

* added option minify to remove spaces and newlines fron JSON output

* send instance ids and names when they differ

* do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe

* fix for the previous + renames and code cleanup

* when a dimension is filtered, include in the response all the other dimensions that are selectable

* do not add nodes that do not have retention in the current window

* move selection of dimensions to query_dimension_add(), instead of query_metric_add()

* increase the pre-processing capacity of queries

* generate instance fqdn ids and names only when they are needed

* provide detailed statistics about tiers retention, queries, points, update_every

* late allocation of query dimensions

* cleanup

* more cleanup

* support for annotations per displayed point, RESET and PARTIAL

* new type annotations

* if a chart is not linked to contexts and it is collected, link it when it is collected

* make ML run reentrant

* make ML rrdr query synchronous

* optimize replication memory allocation of replication_sort_entry

* change units to percentage, when requesting a coefficinet of variation, or a percentage query

* initialize replication before starting main threads

* properly decrement no room requests counter

* propagate the non-zero flag to group-by

* the same by avoiding the extra loop

* respect non-zero in all dimension arrays

* remove dictionary garbage collection from dictionary_entries() and dictionary_version()

* be more verbose when jv2 indexing is postponed

* prevent infinite loop

* use hidden dimensions even when dimensions pattern is unset

* traverse hosts using dictionaries

* fix dictionary unittests
2023-03-02 22:50:48 +02:00