0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-23 13:00:23 +00:00
Commit graph

65 commits

Author SHA1 Message Date
Costa Tsaousis
cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
Costa Tsaousis
3f6a75250d
Obsolete RRDSET state ()
* move chart_labels to rrdset

* rename chart_labels to rrdlabels

* renamed hash_id to uuid

* turned is_ar_chart into an rrdset flag

* removed rrdset state

* removed unused senders_connected member of rrdhost

* removed unused host flag RRDHOST_FLAG_MULTIHOST

* renamed rrdhost host_labels to rrdlabels

* Update exporting unit tests

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-07 15:28:30 +03:00
Costa Tsaousis
5e1b95cf92
Deduplicate all netdata strings ()
* rrdfamily

* rrddim

* rrdset plugin and module names

* rrdset units

* rrdset type

* rrdset family

* rrdset title

* rrdset title more

* rrdset context

* rrdcalctemplate context and removal of context hash from rrdset

* strings statistics

* rrdset name

* rearranged members of rrdset

* eliminate rrdset name hash; rrdcalc chart converted to STRING

* rrdset id, eliminated rrdset hash

* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate

* rrdcalctemplate

* rrdvar

* eval_variable

* rrddimvar and rrdsetvar

* rrdhost hostname, os and tags

* fix master commits

* added thread cache; implemented string_dup without locks

* faster thread cache

* rrdset and rrddim now use dictionaries for indexing

* rrdhost now uses dictionary

* rrdfamily now uses DICTIONARY

* rrdvar using dictionary instead of AVL

* allocate the right size to rrdvar flag members

* rrdhost remaining char * members to STRING *

* better error handling on indexing

* strings now use a read/write lock to allow parallel searches to the index

* removed AVL support from dictionaries; implemented STRING with native Judy calls

* string releases should be negative

* only 31 bits are allowed for enum flags

* proper locking on strings

* string threading unittest and fixes

* fix lgtm finding

* fixed naming

* stream chart/dimension definitions at the beginning of a streaming session

* thread stack variable is undefined on thread cancel

* rrdcontext garbage collect per host on startup

* worker control in garbage collection

* relaxed deletion of rrdmetrics

* type checking on dictfe

* netdata chart to monitor rrdcontext triggers

* Group chart label updates

* rrdcontext better handling of collected rrdsets

* rrdpush incremental transmition of definitions should use as much buffer as possible

* require 1MB per chart

* empty the sender buffer before enabling metrics streaming

* fill up to 50% of buffer

* reset signaling metrics sending

* use the shared variable for status

* use separate host flag for enabling streaming of metrics

* make sure the flag is clear

* add logging for streaming

* add logging for streaming on buffer overflow

* circular_buffer proper sizing

* removed obsolete logs

* do not execute worker jobs if not necessary

* better messages about compression disabling

* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped

* monitor stream sender used buffer ratio

* Update exporting unit tests

* no need to compare label value with strcmp

* streaming send workers now monitor bandwidth

* workers now use strings

* streaming receiver monitors incoming bandwidth

* parser shift of worker ids

* minor fixes

* Group chart label updates

* Populate context with dimensions that have data

* Fix chart id

* better shift of parser worker ids

* fix for streaming compression

* properly count received bytes

* ensure LZ4 compression ring buffer does not wrap prematurely

* do not stream empty charts; do not process empty instances in rrdcontext

* need_to_send_chart_definition() does not need an rrdset lock any more

* rrdcontext objects are collected, after data have been written to the db

* better logging of RRDCONTEXT transitions

* always set all variables needed by the worker utilization charts

* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes

* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock

* STRING code re-organization for clarity

* thread_cache improvements; double numbers precision on worker threads

* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS

* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts

* Add index to speed up database context searches

* Removed last_updated optimization (was also buggy after latest merge with master)

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05 19:31:06 +03:00
Costa Tsaousis
ccf0f6b6f4
/api/v1/weights endpoint ()
* /api/v1/weights endpoints

* high resolution anomaly rate in parallel with queries; points and options in /api/v1/weights reflect the truth

* context printing

* merged metric_correlations with weights API; added parameter tier to select the tier to run the query; weight api now returns points per tier; added swagger info about weights api

* moved metric_correlations files to web/api/queries as weights

* added contexts filtering; renamed correlated_dimensions; weights API is always enabled; code cleanup

* allow returning zero results
2022-08-01 21:47:14 +03:00
Emmanuel Vasilakis
fc82c47b32
Get last_entry_t only when st changes ()
get last_entry_t when st changes
2022-07-28 17:26:19 +03:00
Stelios Fragkakis
49234f23de
Multi-Tier database backend for long term metrics storage ()
* Tier part 1

* Tier part 2

* Tier part 3

* Tier part 4

* Tier part 5

* Fix some ML compilation errors

* fix more conflicts

* pass proper tier

* move metric_uuid from state to RRDDIM

* move aclk_live_status from state to RRDDIM

* move ml_dimension from state to RRDDIM

* abstracted the data collection interface

* support flushing for mem db too

* abstracted the query api

* abstracted latest/oldest time per metric

* cleanup

* store_metric for tier1

* fix for store_metric

* allow multiple tiers, more than 2

* state to tier

* Change storage type in db. Query param to request min, max, sum or average

* Store tier data correctly

* Fix skipping tier page type

* Add tier grouping in the tier

* Fix to handle archived charts (part 1)

* Temp fix for query granularity when requesting tier1 data

* Fix parameters in the correct order and calculate the anomaly based on the anomaly count

* Proper tiering grouping

* Anomaly calculation based on anomaly count

* force type checking on storage handles

* update cmocka tests

* fully dynamic number of storage tiers

* fix static allocation

* configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode

* use default page dt using the tiering info

* automatic selection of tier

* fix for automatic selection of tier

* working prototype of dynamic tier selection

* automatic selection of tier done right (I hope)

* ask for the proper tier value, based on the grouping function

* fixes for unittests and load_metric_next()

* fixes for lgtm findings

* minor renames

* add dbengine to page cache size setting

* add dbengine to page cache with malloc

* query engine optimized to loop as little are required based on the view_update_every

* query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA

* report db points per tier in jsonwrap

* query planer that switches database tiers on the fly to satisfy the query for the entire timeframe

* dbegnine statistics and documentation (in progress)

* calculate average point duration in db

* handle single point pages the best we can

* handle single point pages even better

* Keep page type in the rrdeng_page_descr

* updated doc

* handle future backwards compatibility - improved statistics

* support &tier=X in queries

* enfore increasing iterations on tiers

* tier 1 is always 1 iteration

* backfilling higher tiers on first data collection

* reversed anomaly bit

* set up to 5 tiers

* natural points should only be offered on tier 0, except a specific tier is selected

* do not allow more than 65535 points of tier0 to be aggregated on any tier

* Work only on actually activated tiers

* fix query interpolation

* fix query interpolation again

* fix lgtm finding

* Activate one tier for now

* backfilling of higher tiers using raw metrics from lower tiers

* fix for crash on start when storage tiers is increased from the default

* more statistics on exit

* fix bug that prevented higher tiers to get any values; added backfilling options

* fixed the statistics log line

* removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle

* fixed division by zero on zero points_wanted

* removed dead code

* Decide on the descr->type for the type of metric

* dont store metrics on unknown page types

* free db_metric_handle on sql based context queries

* Disable STORAGE_POINT value check in the exporting engine unit tests

* fix for db modes other than dbengine

* fix for aclk archived chart queries destroying db_metric_handles of valid rrddims

* fix left-over freez() instead of OWA freez on median queries

Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-07-06 14:01:53 +03:00
thiagoftsm
df784d47da
Fix alignment in charts endpoint () 2022-06-30 14:24:42 +00:00
Costa Tsaousis
c3dfbe52a6
netdata doubles ()
* netdata doubles

* fix cmocka test

* fix cmocka test again

* fix left-overs of long double to NETDATA_DOUBLE

* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf

* update the memory before saving

* rrdset is now detached from file structures too

* on memory mode map, update the memory mapped structures on every iteration

* allow RRD_ID_LENGTH_MAX to be changed

* granularity secs, back to update every

* fix formatting

* more formatting
2022-06-28 17:04:37 +03:00
Costa Tsaousis
b32ca44319
Query Engine multi-granularity support (and MC improvements) ()
* set grouping functions

* storage engine should check the validity of timestamps, not the query engine

* calculate and store in RRDR anomaly rates for every query

* anomaly rate used by volume metric correlations

* mc volume should use absolute data, to avoid cancelling effect

* return anomaly-rates in jasonwrap with jw-anomaly-rates option to data queries

* dont return null on anomaly rates

* allow passing group query options from the URL

* added countif to the query engine and used it in metric correlations

* fix configure

* fix countif and anomaly rate percentages

* added group_options to metric correlations; updated swagger

* added newline at the end of yaml file

* always check the time the highlighted window was above/below the highlighted window

* properly track time in memory queries

* error for internal checks only

* moved pack_storage_number() into the storage engines

* moved unpack_storage_number() inside the storage engines

* remove old comment

* pass unit tests

* properly detect zero or subnormal values in pack_storage_number()

* fill nulls before the value, not after

* make sure math.h is included

* workaround for isfinite()

* fix for isfinite()

* faster isfinite() alternative

* fix for faster isfinite() alternative

* next_metric() now returns end_time too

* variable step implemented in a generic way

* remove left-over variables

* ensure we always complete the wanted number of points

* fixes

* ensure no infinite loop

* mc-volume-improvements: Add information about invalid condition

* points should have a duration in the past

* removed unneeded info() line

* Fix unit tests for exporting engine

* new_point should only be checked when it is fetched from the db; better comment about the premature breaking of the main query loop

Co-authored-by: Thiago Marques <thiagoftsm@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-22 11:19:08 +03:00
Costa Tsaousis
72184b533c
fixed coveriry 379136 379135 379134 379133 () 2022-06-14 15:54:27 +03:00
Costa Tsaousis
986a1abf68
73x times faster metrics correlations at the agent ()
* faster correlations

* 4x times faster correlations

* a little bit more help

* 10x times faster metrics correlations

* 6 digits precision; better comments

* enabled metrics correlations by default

* abstracted DIFFS_NUMBER to allow easily changing it

* reworked the entire logic to have more accuracy and support a baseline that is power of two multiple of highlight

* properly calculate shifts

* even more improved version

* added support for timeout; fixed another memory leak; skipped hidden dimensions

* default timeout 1min

* reduce memory even further

* use dictionary for the list of charts and optimize locks

* return 403 forbidden, when mc is not enabled

* added query options

* dont process zero dimensions

* added volume method as an option to metric correlations ; now metric correlations can support multiple implementations

* make sure we will never crash

* spread results evenly for both kstwo and volume

* fixed bug in query engine that was missing misaligned queries when a single point was requested from the db; improved comments; improved query flags

* updated swagger and added sane defaults; query options are now supported, including anomaly-bit

* added "raw" option to allow cross node correlations; added "group" option to allow different time aggregations; allowed calling metric correlations without any parameters; allowed calling metric correlations with relative timestamps; added timeout to volume method; properly handled timeout on ks2 method; json output now sends all parameters back - same for json_wrap; modified query engine to use present time for relative timestamps; modified "allow_past" to mean both past backwards and forwards

* emulate the old behaviour about zero points

* 100% accuracy against python ks_2samp(); now the default is volume and the default points are 500

* added config option to change default metric correlations method

* removed work-arounds now that rrdlabels are merged
2022-06-13 21:31:52 +03:00
Costa Tsaousis
1b0f6c6b22
Labels with dictionary ()
* squashed and rebased to master

* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h

* added unittest for UTF-8 multibyte sanitization

* Fix unit test compilation

* Fix CMake build

* remove double sanitizer for opentsdb; cleanup sanitize_json_string()

* rename error_description to error_message to avoid conflict with json-c

* revert last and undef error_description from json-c

* more unittests; attempt to fix protobuf map issue

* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer

* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted

* better sorted dictionary checking

* proper unittesting for sorted dictionaries

* call dictionary deletion callback when destroying the dictionary

* remove obsolete variable

* Fix exporting unit tests

* Fix k8s label parsing test

* workaround for cmocka and strdupz()

* Bypass cmocka memory allocation check

* Revert "Bypass cmocka memory allocation check"

This reverts commit 4c49923839.

* Revert "workaround for cmocka and strdupz()"

This reverts commit 7bebee0480.

* Bypass cmocka memory allocation checks

* respect json formatting for chart labels

* cloud sends colons

* print the value only once

* allow parenthesis in values and spaces; make stream sender send quotes for values

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-13 20:35:45 +03:00
Costa Tsaousis
7784a16cc7
Dictionary with JudyHS and double linked list ()
* dictionary internals isolation

* more dictionary cleanups

* added unit test

* we should use DICT internally

* disable cups in cmake

* implement DICTIONARY with Judy arrays

* operational JUDY implementation

* JUDY cleanup

* JUDY summary added

* JudyHS implementation with double linked list

* test negative searches too

* optimize destruction

* optimize set to insert first without lookup

* updated stats

* code cleanup; better organization; updated info

* more code cleanup and commenting

* more cleanup, renames and comments

* fix rename

* more cleanups

* use Judy.h from system paths

* added foreach traversal; added flag to add item in front; isolated locks to their own functions; destruction returns the number of bytes freed

* more comments; flags are now 16-bit

* completed unittesting

* addressed comments and added reference counters maintainance

* added unittest in main; tested removal of items in front, back and middle

* added read/write walkthrough and foreach; allowed walkthrough and foreach in write mode to delete the current element (used by cups.plugin); referenced counters removed from the API

* DICTFE.name should be const too

* added API calls for exposing all statistics

* dictionary flags as enum and reference counters as atomic operations

* more comments; improved error handling at unit tests

* added functions to allow unsafe access while traversing the dictionary with locks in place

* check for libcups in cmake

* added delete callback; implemented statsd with this dictionary

* added missing dfe_done()

* added alternative implementation with AVL

* added documentation

* added comments and warning about AVL

* dictionary walktrhough on new code

* simplified foreach; updated docs

* updated docs

* AVL is much faster without hashes

* AVL should follow DBENGINE
2022-06-01 20:01:52 +03:00
Stelios Fragkakis
3071aa055c
Add additional metadata to the data response ()
* Consolidate query params

* Add new option to show full dimensions in the json header (this will include dimensions, charts and chart labels)

* Group and pass parameters with query_params
2022-05-31 21:46:44 +03:00
Stelios Fragkakis
92d48b1778
Return stable or nightly based on version if the file check fails () 2022-05-13 12:48:53 +03:00
vkalintiris
ebdd819d6e
Remove per chart configuration. ()
After https://github.com/netdata/netdata/pull/12209 per-chart configuration
was used for (a) enabling/disabling a chart, and (b) renaming dimensions.

Regarding the first use case: We already have component-specific
configuration options|flags to finely control how a chart should behave.
Eg. "send charts matching" in streaming, "charts to skip from training"
in ML, etc. If we really need the concept of a disabled chart, we can
add a host-level simple pattern to match these charts.

Regarding the second use case: It's not obvious why we'd need to provide
support for remapping dimension names through a chart-specific configuration
from the core agent. If the need arises, we could add such support at
the right place, ie. a exporter/streaming config section.

This will allow each flag to act indepentendly from each other and
avoid managing flag-state manually at various places, eg:

```
    if(unlikely(!rrdset_flag_check(st, RRDSET_FLAG_ENABLED))) {
        rrdset_flag_clear(st, RRDSET_FLAG_UPSTREAM_SEND);
        rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
    } ...
```
2022-05-03 19:02:36 +03:00
Costa Tsaousis
87c0cc2d60
One way allocator to double the speed of parallel context queries ()
* one way allocator to speed up context queries

* fixed a bug while expanding memory pages

* reworked for clarity and finally fixed the bug of allocating memory beyond the page size

* further optimize allocation step to minimize the number of allocations made

* implement strdup with memcpy instead of strcpy

* added documentation

* prevent an uninitialized use of owa

* added callocz() interface

* integrate onewayalloc everywhere - apart sql queries

* one way allocator is now used in context queries using archived charts in sql

* align on the size of pointers

* forgotten freez()

* removed not needed memcpys

* give unique names to global variables to avoid conflicts with system definitions
2022-05-03 00:31:19 +03:00
Costa Tsaousis
47fa3d7089
Speed up BUFFER increases (minimize reallocs) ()
* speedup BUFFER increases by forward looking reallocs

* implemented buffer_vsprintf() and optimized buffer_sprintf() to minimize calls to vsnprintfz()

* optimize json generation for well known strings
2022-05-03 00:30:21 +03:00
Stelios Fragkakis
81b3d4b71e
Add a timeout parameter to data queries ()
* Add timeout parameter in queries and in calling functions

* Add CANCEL flag in RRDR and code to cancel a query

* Update swagger

* Format swagger file properly
2022-04-11 22:34:04 +03:00
Stelios Fragkakis
e376fb8ca4
Add a check to make sure internal chart state is initialized 2022-02-25 14:49:53 +02:00
vkalintiris
69ea17d6ec
Track anomaly rates with DBEngine. ()
* Track anomaly rates with DBEngine.

This commit adds support for tracking anomaly rates with DBEngine. We
do so by creating a single chart with id "anomaly_detection.anomaly_rates" for
each trainable/predictable host, which is responsible for tracking the anomaly
rate of each dimension that we train/predict for that host.

The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly
rates charts. We use this flag to:

    - Disable exposing the anomaly rates charts through the functionality
      in backends/, exporting/ and streaming/.
    - Skip generation of configuration options for the name, algorithm,
      multiplier, divisor of each dimension in an anomaly rates chart.
    - Skip the creation of health variables for anomaly rates dimensions.
    - Skip the chart/dim queue of ACLK.
    - Post-process the RRDR result of an anomaly rates chart, so that we can
      return a sorted, trimmed number of anomalous dimensions.

In a child/parent configuration where both the child and the parent run
ML for the child, we want to be able to stream the rest of the ML-related
charts to the parent. To be able to do this without any chart name collisions,
the charts are now created on localhost and their IDs and titles have the node's
machine_guid and hostname as a suffix, respectively.

* Fix exporting_engine tests.

* Restore default ML configuration.

The reverted changes where meant for local testing only. This commit
restores the default values that we want to have when someone runs
anomaly detection on their node.

* Set context for anomaly_detection.* charts.

* Check for anomaly rates chart only with a valid pointer.

* Remove duplicate code.

* Use a more descriptive name for id/title pair variable
2022-02-24 10:57:30 +02:00
Tina Luedtke
c7f2647a62
Docs: Removed Google Analytics tags () 2022-02-17 10:37:46 +00:00
Stelios Fragkakis
305708523e
Fix the format=array output in context queries ()
* Add a new parameter (list of dimensions for the context query) to rrdr2ssv & rrdr2value
Add the parameter to the function calls

* Use the temporary dimension list (if available) for the calculations
2022-02-17 09:13:56 +02:00
Vladimir Kobal
4919103c4b
Fix time_t format () 2022-01-11 13:12:09 +02:00
Stelios Fragkakis
881a1b9e13
Use the chart id instead of chart name in response to incoming cloud context queries () 2021-12-13 16:03:38 +02:00
Stelios Fragkakis
37bee1d197
Store uuid_t metric_uuid in the dimension state structure instead of uuid_t * () 2021-06-01 14:26:22 +03:00
Stelios Fragkakis
ee64ef04f0
Fix memory corruption issue when executing context queries in RAM/SAVE memory mode () 2021-04-07 19:25:55 +03:00
Stelios Fragkakis
7ebb0a4da2
Fix memory leak when archived data is requested () 2021-03-23 18:09:34 +02:00
Stelios Fragkakis
0f78020f04
Add lock check to avoid shutdown when compiled with internal and locking checks () 2021-03-23 14:41:53 +02:00
Stelios Fragkakis
65bc43d9cb
Add data query support for archived charts () 2021-03-22 09:47:22 +02:00
Stelios Fragkakis
b76a297de1
Fix the context filtering on the data query endpoint () 2021-02-17 21:13:38 +02:00
Stelios Fragkakis
cd443de780
Support multiple chart label keys in data queries () 2021-01-14 18:50:33 +02:00
Joel Hans
46a8075c8f
Docs housekeeping for SEO and syntax, part 1 ()
* First pass to get the script working right

* Finish adding analytics tags
2021-01-07 11:44:43 -07:00
Ilya Mashchenko
0f8175dd30
Kubernetes labels ()
Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2020-12-14 17:27:55 +03:00
Markos Fountoulakis
5ffba490e3
Fix race condition in rrdset_first_entry_t() and rrdset_last_entry_t() () 2020-11-28 15:53:12 +02:00
Stelios Fragkakis
e9d59e37d9
Migrate metadata log to SQLite () 2020-11-24 20:00:02 +02:00
Stelios Fragkakis
ab0ffcebf8
Fixed compile error in CENTOS 6 () 2020-10-21 15:41:28 +03:00
Stelios Fragkakis
ed0546b32e
Fixed locking order to address CID_362348 () 2020-09-25 16:19:41 +03:00
Stelios Fragkakis
8067642361
Improved the data query when using the context parameter () 2020-09-24 13:05:15 +03:00
Stelios Fragkakis
8194ac83f0
Fixed chart's last accessed time during context queries () 2020-09-18 23:45:23 +03:00
Stelios Fragkakis
8f6f1baf9a
Added context parameter to the data endpoint ()
Added functionality to support composite charts
2020-09-15 19:41:39 +03:00
Ilya Mashchenko
01c220b90e
installer: update go.d.plugin version to v0.20.0 ()
Bumps the go.d plugin version to 0.20.0, includes the Prometheus Generic Collector. Fixes a bug in dimension name / id escaping that could break the json output of the web API.

Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com>
2020-08-04 18:31:44 +02:00
Stelios Fragkakis
eda12f579f
Implemented multihost database ()
* Hard code a node for non-legacy multidb test
Skip dbengine initialization for new incoming children
Add code to switch to multidb ctx when accessing the dbengine

* When a non-legacy streaming connection is detected, use the multidb metadata log context

* Clear the superblock memory to avoid random data written in the metadata log

* Activate the host detection during compaction
Activate the host detection during metadata log chart updates
Keep the host in the user object during replay of the HOST command

* Add defaults for health / rrdpush on HOST metadata replay
Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child()

Use defaults from the stream.conf

* Count hosts only if not archived
When host switches from archived to active update rrd_hosts_available
Remove archived hosts from charts and info

* Change parameter from "multidb disk space" to "dbengine multihost disk space"
Remove unused variables
Fix compilation error when dbengine is disabled
Fix condition for machine_guid directory creation under cache_dir

* Enable multidb disk space file creation.

* Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine.

* Fix old bug in the code that confused obsolete hosts with orphan hosts.

* Do not delete multi-host DB host files.

* Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies.

* Identify metadata that collide with non-dbengine memory mode hosts and ignore them.

* Handle non-dbengine localhost with dbengine archived charts in localhost and streaming.

* Ignore archived hosts in streaming.

* Add documentation before merging to master.

Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-07-28 15:04:39 +03:00
Stelios Fragkakis
1bd8a25544
Add support for persistent metadata ()
* Implemented collector metadata logging 
* Added persistent GUIDs for charts and dimensions
* Added metadata log replay and automatic compaction
* Added detection of charts with no active collector (archived)
* Added new endpoint to report archived charts via `/api/v1/archivedcharts`
* Added support for collector metadata update

Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-06-12 10:35:17 +03:00
Joel Hans
e99692f145
Docs: Standardize links between documentation ()
* Trying out some absolute-ish links

* Try one out on installer

* Testing logic

* Trying out some more links

* Fixing links

* Fix links in python collectors

* Changed a bunch more links

* Fix build errors

* Another push of links

* Fix build error and add more links

* Complete first pass

* Fix final broken links

* Fix links to files

* Fix for Netlify

* Two more fixes
2020-04-14 10:26:13 -07:00
Joel Hans
9342704a41
Bulk add frontmatter to all documentation ()
* Bulk add frontmatter

* A few extra edge cases
2020-03-10 14:29:51 -07:00
Andrew Moss
c6d945200f
Merging the feature branch for the ACLK in the previous sprint. ()
* ACLK connection and protocol improvements ()
* Adding ACLK retry on connection failure ()
* Fixed reconnect issues on the ACLK. ()
* Cleaning up ACLK - part 1 ()

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2020-02-24 12:10:10 +01:00
cosmix
a48ab2124f
Fix CSV -> SSV in docs () 2020-01-17 19:15:12 +02:00
Vladimir Kobal
8cf5889194
Clean up host labels in API responses ()
* Remove host labels from the Swagger specification

* Remove host labels from the api responses
2020-01-06 17:34:49 +02:00
Andrew Moss
c8c72f18a6
Labels issues ()
Initial work on host labels from the dedicated branch. Includes work for issues , , , , , , ,  and  by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
2019-12-16 15:12:00 +01:00