0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-16 10:31:07 +00:00
Commit graph

63 commits

Author SHA1 Message Date
Costa Tsaousis
cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
Costa Tsaousis
3f6a75250d
Obsolete RRDSET state ()
* move chart_labels to rrdset

* rename chart_labels to rrdlabels

* renamed hash_id to uuid

* turned is_ar_chart into an rrdset flag

* removed rrdset state

* removed unused senders_connected member of rrdhost

* removed unused host flag RRDHOST_FLAG_MULTIHOST

* renamed rrdhost host_labels to rrdlabels

* Update exporting unit tests

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-07 15:28:30 +03:00
Costa Tsaousis
58c79fd329
Faster rrdcontext ()
* moved rrdcontexts processing to worker thread

* added loggings

* check for aclk deeper in the code

* removed unessesary logs

* code re-organization; cleanup; more comments; better error handling; rrdcontext locks optimization; more clarity

* updated 2 comments

* make instances walkthrough reentrant; move context lock to the place is really needed

* created macro for reentrant dictionary walkthrough

* incremental updates on instances and metrics

* renamed family of rrdcontext workers

* prevent crash in case RRDINSTANCE or RRDMETRIC is freed during shutdown

* prevent crash during rrddim save, on out of memory fatal()

* always post-process contexts

* added tracing for tracking the caller that trigger updates

* more details on tracing info

* fix for charts that are collected without metrics
2022-09-06 19:02:39 +03:00
Costa Tsaousis
5e1b95cf92
Deduplicate all netdata strings ()
* rrdfamily

* rrddim

* rrdset plugin and module names

* rrdset units

* rrdset type

* rrdset family

* rrdset title

* rrdset title more

* rrdset context

* rrdcalctemplate context and removal of context hash from rrdset

* strings statistics

* rrdset name

* rearranged members of rrdset

* eliminate rrdset name hash; rrdcalc chart converted to STRING

* rrdset id, eliminated rrdset hash

* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate

* rrdcalctemplate

* rrdvar

* eval_variable

* rrddimvar and rrdsetvar

* rrdhost hostname, os and tags

* fix master commits

* added thread cache; implemented string_dup without locks

* faster thread cache

* rrdset and rrddim now use dictionaries for indexing

* rrdhost now uses dictionary

* rrdfamily now uses DICTIONARY

* rrdvar using dictionary instead of AVL

* allocate the right size to rrdvar flag members

* rrdhost remaining char * members to STRING *

* better error handling on indexing

* strings now use a read/write lock to allow parallel searches to the index

* removed AVL support from dictionaries; implemented STRING with native Judy calls

* string releases should be negative

* only 31 bits are allowed for enum flags

* proper locking on strings

* string threading unittest and fixes

* fix lgtm finding

* fixed naming

* stream chart/dimension definitions at the beginning of a streaming session

* thread stack variable is undefined on thread cancel

* rrdcontext garbage collect per host on startup

* worker control in garbage collection

* relaxed deletion of rrdmetrics

* type checking on dictfe

* netdata chart to monitor rrdcontext triggers

* Group chart label updates

* rrdcontext better handling of collected rrdsets

* rrdpush incremental transmition of definitions should use as much buffer as possible

* require 1MB per chart

* empty the sender buffer before enabling metrics streaming

* fill up to 50% of buffer

* reset signaling metrics sending

* use the shared variable for status

* use separate host flag for enabling streaming of metrics

* make sure the flag is clear

* add logging for streaming

* add logging for streaming on buffer overflow

* circular_buffer proper sizing

* removed obsolete logs

* do not execute worker jobs if not necessary

* better messages about compression disabling

* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped

* monitor stream sender used buffer ratio

* Update exporting unit tests

* no need to compare label value with strcmp

* streaming send workers now monitor bandwidth

* workers now use strings

* streaming receiver monitors incoming bandwidth

* parser shift of worker ids

* minor fixes

* Group chart label updates

* Populate context with dimensions that have data

* Fix chart id

* better shift of parser worker ids

* fix for streaming compression

* properly count received bytes

* ensure LZ4 compression ring buffer does not wrap prematurely

* do not stream empty charts; do not process empty instances in rrdcontext

* need_to_send_chart_definition() does not need an rrdset lock any more

* rrdcontext objects are collected, after data have been written to the db

* better logging of RRDCONTEXT transitions

* always set all variables needed by the worker utilization charts

* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes

* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock

* STRING code re-organization for clarity

* thread_cache improvements; double numbers precision on worker threads

* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS

* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts

* Add index to speed up database context searches

* Removed last_updated optimization (was also buggy after latest merge with master)

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05 19:31:06 +03:00
Costa Tsaousis
291b978282
Rrdcontext ()
* type checking on dictionary return values

* first STRING implementation, used by DICTIONARY and RRDLABEL

* enable AVL compilation of STRING

* Initial functions to store context info

* Call simple test functions

* Add host_id when getting charts

* Allow host to be null and in this case it will process the localhost

* Simplify init
Do not use strdupz - link directly to sqlite result set

* Init the database during startup

* make it compile - no functionality yet

* intermediate commit

* intermidiate

* first interface to sql

* loading instances

* check if we need to update cloud

* comparison of rrdcontext on conflict

* merge context titles

* rrdcontext public interface; statistics on STRING; scratchpad on DICTIONARY

* dictionaries maintain version numbers; rrdcontext api

* cascading changes

* first operational cleanup

* string unittest

* proper cleanup of referenced dictionaries

* added rrdmetrics

* rrdmetric starting retention

* Add fields to context
Adjuct context creation and delete

* Memory cleanup

* Fix get context list
Fix memory double free in tests
Store context with two hosts

* calculated retention

* rrdcontext retention with collection

* Persist database and shutdown

* loading all from sql

* Get chart list and dimension list changes

* fully working attempt 1

* fully working attempt 2

* missing archived flag from log

* fixed archived / collected

* operational

* proper cleanup

* cleanup - implemented all interface functions - dictionary react callback triggers after the dictionary is unlocked

* track all reasons for changes

* proper tracking of reasons of changes

* fully working thread

* better versioning of contexts

* fix string indexing with AVL

* running version per context vs hub version; ifdef dbengine

* added option to disable rrdmetrics

* release old context when a chart changes context

* cleanup properly

* renamed config

* cleanup contexts; general cleanup;

* deletion inline with dequeue; lots of cleanup; child connected/disconnected

* ml should start after rrdcontext

* added missing NULL to ri->rrdset; rrdcontext flags are now only changed under a mutex lock

* fix buggy STRING under AVL

* Rework database initialization
Add migration logic to the context database

* fix data race conditions during context deletion

* added version hash algorithm

* fix string over AVL

* update aclk-schemas

* compile new ctx related protos

* add ctx stream message utils

* add context messages

* add dummy rx message handlers

* add the new topics

* add ctx capability

* add helper functions to send the new messages

* update cmake build to not fail

* update topic names

* handle rrdcontext_enabled

* add more functions

* fatal on OOM cases instead of return NULL

* silence unknown query type error

* fully working attempt 1

* fully working attempt 2

* allow compiling without ACLK

* added family to the context

* removed excess character in UUID

* smarter merging of titles and families

* Database migration code to add family
Add family to SQL_CHART_DATA and VERSIONED_CONTEXT_DATA

* add family to context message

* enable ctx in communication

* hardcoded enabled contexts

* Add hard code for CTX

* add update node collectors to json

* add context message log

* fix log about last_time_t

* fix collected flags for queued items

* prevent crash on charts cleanup

* fix bug in AVL indexing of dictionaries; make sure react callback of dictionaries has a reference counter, which is acquired while the dictionary is locked

* fixed dictionary unittest

* strict policy to cleanup and garbage collector

* fix db rotation and garbage collection timings

* remove deadlock

* proper garbage collection - a lot faster retention recalculation

* Added not NULL in database columns
Remove migration code for context -- we will ship with version 1 of the table schema
Added define for query in tests to detect localhost

* Use UUID_STR_LEN instead of GUID_LEN + 1
Use realistic timestamps when adding test data in the database

* Add NULL checks for passed parameters

* Log deleted context when compiled with NETDATA_INTERNAL_CHECKS

* Error checking for null host id

* add missing ContextsCheckpoint log convertor

* Fix spelling in VACCUM

* Hold additional information for host -- prepare to load archived hosts on startup

* Make sure claim id is valid

* is_get_claimed is actually get the current claim id

* Simplify ctx get chart list query

* remove env negotiation

* fix string unittest when there are some strings already in the index

* propagate live-retention flag upstream; cleanup all update reasons; updated instances logging; automated attaching started/stopped collecting flags;

* first implementation of /api/v1/contexts

* full contexts API; updated swagger

* disabled debugging; rrdcontext enabled by default

* final cleanup and renaming of global variables

* return current time on currently collected contexts, charts and dimensions

* added option "deepscan" to the API to have the server refresh the retention and recalculate the contexts on the fly

* fixed identation of yaml

* Add constrains to the host table

* host->node_id may not be available

* new capabilities

* lock the context while rendering json

* update aclk-schemas

* added permanent labels to all charts about plugin, module and family; added labels to all proc plugin modules

* always add the labels

* allow merging of families down to [x]

* dont show uuids by default, added option to enable them; response is now accepting after,before to show only data for a specific timeframe; deleted items are only shown when "deleted" is requested; hub version is now shown when "queue" is requested

* Use the localhost claim id

* Fix to handle host constrains better

* cgroups: add "k8s." prefix to chart context in k8s

* Improve sqlite metadata version migration check

* empty values set to "[none]"; fix labels unit test to reflect that

* Check if we reached the version we want first (address CODACY report re: Array index 'i' is used before limits check)

* Rewrite condition to address CODACY report (Redundant condition: t->filter_callback. '!A || (A && B)' is equivalent to '!A || B')

* Properly unlock context

* fixed memory leak on rrdcontexts - it was not freeing all dictionaries in rrdhost; added wait of up to 100ms on dictionary_destroy() to give time to dictionaries to release their items before destroying them

* fixed memory leak on rrdlabels not freed on rrdinstances

* fixed leak when dimensions and charts are redefined

* Mark entries for charts and dimensions as submitted to the cloud 3600 seconds after their creation
Mark entries for charts and dimensions as updated (confirmed by the cloud) 1800 seconds after their submission

* renamed struct string

* update cgroups alarms

* fixed codacy suggestions

* update dashboard info

* fix k8s_cgroup_10s_received_packets_storm alarm

* added filtering options to /api/v1/contexts and /api/v1/context

* fix eslint

* fix eslint

* Fix pointer binding for host / chart uuids

* Fix cgroups unit tests

* fixed non-retention updates not propagated upstream

* removed non-fatal fatals

* Remove context from 2 way string merge.

* Move string_2way_merge to dictionary.c

* Add 2-way string merge tests.

* split long lines

* fix indentation in netdata-swagger.yaml

* update netdata-swagger.json

* yamllint please

* remove the deleted flag when a context is collected

* fix yaml warning in swagger

* removed non-fatal fatals

* charts should now be able to switch contexts

* allow deletion of unused metrics, instances and contexts

* keep the queued flag

* cleanup old rrdinstance labels

* dont hide objects when there is no filter; mark objects as deleted when there are no sub-objects

* delete old instances once they changed context

* delete all instances and contexts that do not have sub-objects

* more precise transitions

* Load archived hosts on startup (part 1)

* update the queued time every time

* disable by default; dedup deleted dimensions after snapshot

* Load archived hosts on startup (part 2)

* delayed processing of events until charts are being collected

* remove dont-trigger flag when object is collected

* polish all triggers given the new dont_process flag

* Remove always true condition
Enums for readbility / create_host_callback only if ACLK is enabled (for now)

* Skip retention message if context streaming is enabled
Add messages in the access log if context streaming is enabled

* Check for node id being a UUID that can be parsed
Improve error check / reporting when loading archived hosts and creating ACLK sync threads

* collected, archived, deleted are now mutually exclusive

* Enable the "orphan" handling for now
Remove dead code
Fix memory leak on free host

* Queue charts and dimensions will be no-op if host is set to stream contexts

* removed unused parameter and made sure flags are set on rrdcontext insert

* make the rrdcontext thread abort mid-work when exiting

* Skip chart hash computation and storage if contexts streaming is enabled

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Timo <timotej@netdata.cloud>
Co-authored-by: ilyam8 <ilya@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
Co-authored-by: Vasilis Kalintiris <vasilis@netdata.cloud>
2022-07-24 22:33:09 +03:00
Stelios Fragkakis
49234f23de
Multi-Tier database backend for long term metrics storage ()
* Tier part 1

* Tier part 2

* Tier part 3

* Tier part 4

* Tier part 5

* Fix some ML compilation errors

* fix more conflicts

* pass proper tier

* move metric_uuid from state to RRDDIM

* move aclk_live_status from state to RRDDIM

* move ml_dimension from state to RRDDIM

* abstracted the data collection interface

* support flushing for mem db too

* abstracted the query api

* abstracted latest/oldest time per metric

* cleanup

* store_metric for tier1

* fix for store_metric

* allow multiple tiers, more than 2

* state to tier

* Change storage type in db. Query param to request min, max, sum or average

* Store tier data correctly

* Fix skipping tier page type

* Add tier grouping in the tier

* Fix to handle archived charts (part 1)

* Temp fix for query granularity when requesting tier1 data

* Fix parameters in the correct order and calculate the anomaly based on the anomaly count

* Proper tiering grouping

* Anomaly calculation based on anomaly count

* force type checking on storage handles

* update cmocka tests

* fully dynamic number of storage tiers

* fix static allocation

* configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode

* use default page dt using the tiering info

* automatic selection of tier

* fix for automatic selection of tier

* working prototype of dynamic tier selection

* automatic selection of tier done right (I hope)

* ask for the proper tier value, based on the grouping function

* fixes for unittests and load_metric_next()

* fixes for lgtm findings

* minor renames

* add dbengine to page cache size setting

* add dbengine to page cache with malloc

* query engine optimized to loop as little are required based on the view_update_every

* query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA

* report db points per tier in jsonwrap

* query planer that switches database tiers on the fly to satisfy the query for the entire timeframe

* dbegnine statistics and documentation (in progress)

* calculate average point duration in db

* handle single point pages the best we can

* handle single point pages even better

* Keep page type in the rrdeng_page_descr

* updated doc

* handle future backwards compatibility - improved statistics

* support &tier=X in queries

* enfore increasing iterations on tiers

* tier 1 is always 1 iteration

* backfilling higher tiers on first data collection

* reversed anomaly bit

* set up to 5 tiers

* natural points should only be offered on tier 0, except a specific tier is selected

* do not allow more than 65535 points of tier0 to be aggregated on any tier

* Work only on actually activated tiers

* fix query interpolation

* fix query interpolation again

* fix lgtm finding

* Activate one tier for now

* backfilling of higher tiers using raw metrics from lower tiers

* fix for crash on start when storage tiers is increased from the default

* more statistics on exit

* fix bug that prevented higher tiers to get any values; added backfilling options

* fixed the statistics log line

* removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle

* fixed division by zero on zero points_wanted

* removed dead code

* Decide on the descr->type for the type of metric

* dont store metrics on unknown page types

* free db_metric_handle on sql based context queries

* Disable STORAGE_POINT value check in the exporting engine unit tests

* fix for db modes other than dbengine

* fix for aclk archived chart queries destroying db_metric_handles of valid rrddims

* fix left-over freez() instead of OWA freez on median queries

Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-07-06 14:01:53 +03:00
Stelios Fragkakis
dd01f896b4
Null terminate string if file read was not successful () 2022-07-05 11:42:38 +03:00
Costa Tsaousis
c3dfbe52a6
netdata doubles ()
* netdata doubles

* fix cmocka test

* fix cmocka test again

* fix left-overs of long double to NETDATA_DOUBLE

* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf

* update the memory before saving

* rrdset is now detached from file structures too

* on memory mode map, update the memory mapped structures on every iteration

* allow RRD_ID_LENGTH_MAX to be changed

* granularity secs, back to update every

* fix formatting

* more formatting
2022-06-28 17:04:37 +03:00
Timotej S
cb13f0787d
Removes Legacy JSON Cloud Protocol Support In Agent ()
* removes old protocol support (cloud removed support already)
2022-06-27 16:03:20 +02:00
Stelios Fragkakis
77b30d25d8
Optimize the dimensions option store to the metadata database ()
* Add a flag to "cache" the latest hidden status written in the database

* rrddim hide and unhide will check "cached" state, update the database if needed and set the cache flag accordingly

* Check the dimension option and only do the database update if the cached state is different
2022-05-18 20:10:26 +03:00
Stelios Fragkakis
3b8d4c21e5
Adjust the dimension liveness status check ()
* Mark a chart to be exposed only if dimension is created or metadata changes

* Add a calculate liveness for the dimension for collected to non collected (live -> stale) and vice versa

* queue_dimension_to_aclk will have the rrdset and either 0 or last collected time
  If 0 then it will be marked as live else it will be marked as stale and last collected time will be sent to the cloud

* Add an extra parameter to indicate if the payload check should be done in the database or it has been done already

* Queue dimension sets dimension liveness and queues the exact payload to store in the database

* Fix compilation error when --disable-cloud is specified
2022-05-17 16:58:49 +03:00
Vladimir Kobal
a68ae03b9c
Fix compilation warnings () 2022-05-12 14:08:38 +02:00
Adrien Béraud
9adf4dd782
Configurable storage engine for Netdata agents: step 2 () 2022-05-11 16:17:40 +03:00
Stelios Fragkakis
0b3ee50c76
Resolve coverity issues ()
- Variable "hostname" going out of scope leaks the storage it points to.
- Null-checking "rd->name" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
2022-05-09 10:47:58 +03:00
Costa Tsaousis
79444d3645
fix memory leaks and mismatches of the use of the z functions for allocations ()
* fix mismatches of the use of the z functions for allocations

* when there was no memory; the original name of the dimensions was freed, and with mismatching deallocator..

* fixed memory leak at rrdeng_load_metric_*() functions

* fixed memory leak on exit of plugins.d parser

* fixed memory leak on plugins and streaming receiver threads exit

* fixed compiler warnings
2022-05-07 23:00:44 +03:00
vkalintiris
6acc6a3e9c
Optimize linking of foreach alarms to dimensions. ()
* Optimize linking of foreach alarms to dimensions.

Keep the write-lock on host but use read-lock for charts because it's
easy to verify that they aren't modified by the linking of foreach
alarms to dimensions.

* Protect alarm log modifications with write-lock.
2022-05-04 22:00:22 +03:00
Stelios Fragkakis
154cf74d6a
Improve agent cloud chart synchronization ()
* Try to queue dimension always when:
 Trying to clean obsolete charts
 If chart has been sent and liveness apparently changed

* delay rotation and skip chart check if not send to cloud

* No need to CLEAR flag during database rotation
Do not clear chart ACLK status for dimension requests

* Change payload_sent to return timestamp of submitted message

* Clear the dimension ACLK flag if we are processing all the charts again

* Check if dimension is already queued to ACLK and ignore it
If queue fails then reset it to retry
Already try to queue the dimension

* Improve dimension cleanup during the retention message calculation

* Change queue_dimension_to_aclk to return void

* If no time range for this dimension then assume it is deleted

* Start streaming for inactive nodes

* Remove dead code

* Correctly report hostname in the access log

* Schedule a dimension deletion without trying to submit a message immediately

* Enable dimension cleanup -- also delete dimension if not found in the dbengine files
Free hostname
2022-05-03 21:38:12 +03:00
vkalintiris
ebdd819d6e
Remove per chart configuration. ()
After https://github.com/netdata/netdata/pull/12209 per-chart configuration
was used for (a) enabling/disabling a chart, and (b) renaming dimensions.

Regarding the first use case: We already have component-specific
configuration options|flags to finely control how a chart should behave.
Eg. "send charts matching" in streaming, "charts to skip from training"
in ML, etc. If we really need the concept of a disabled chart, we can
add a host-level simple pattern to match these charts.

Regarding the second use case: It's not obvious why we'd need to provide
support for remapping dimension names through a chart-specific configuration
from the core agent. If the need arises, we could add such support at
the right place, ie. a exporter/streaming config section.

This will allow each flag to act indepentendly from each other and
avoid managing flag-state manually at various places, eg:

```
    if(unlikely(!rrdset_flag_check(st, RRDSET_FLAG_ENABLED))) {
        rrdset_flag_clear(st, RRDSET_FLAG_UPSTREAM_SEND);
        rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
    } ...
```
2022-05-03 19:02:36 +03:00
Stelios Fragkakis
20303b34ba
Skip dimension deletion on free (temp fix) () 2022-05-03 14:03:13 +03:00
Adrien Béraud
d92890b5f1
Configurable storage engine for Netdata agents: step 1 ()
* rrd: move API structures out of rrddim_volatile

In C, unlike C++, it's not possible to reference a nested structure
from outside this structure.

Since we later want to use rrddim_query_ops and rrddim_collect_ops
separately from rrddim_volatile, move these nested structures out.

* rrd: use opaque handle types for different memory modes
2022-05-03 11:34:15 +03:00
Ilya Mashchenko
091540d59a
feat(dbengine): make dbengine page cache undumpable and dedupuble ()
* make netdata more awesome

* reworked on-madvise and mmap to provide clarity
2022-04-28 11:19:15 +03:00
Vladimir Kobal
7bfc543172
Fix memory leaks on Netdata exit ()
* Fix memory leaks in dimensions and charts

* Initialize superblock memory regions

* Clean up static threads

* Fix memory leaks in compression

* Fix memory leaks in rrdcaltemplate

* Fix memory leaks in health config

* Fix ACLK memory leaks
2022-04-01 15:22:49 +02:00
vkalintiris
af72ed83f9
Initialize foreach alarms of dimensions in health thread. ()
The previous approach required us to try wr-lock the host
after locking a chart and sleeping on failure. Lock contention
would lead to alarms not being created and the agent to become
unresponsive.
2022-03-31 17:49:44 +03:00
vkalintiris
ca6a2292ed
Skip foreach alarms for dimensions of anomaly rate chart. ()
Health is not enabled for the anomaly rates chart. This was missed in
the original PR that added support for tracking anomaly rates with
dbengine. The side-effect was that the agent would block when opening
the dashboard before its initialization was done.
2022-03-28 12:23:48 +03:00
Stelios Fragkakis
6872df9e6a
Adjust cloud dimension update frequency ()
* Queue a chart immediately to the cloud

* Do not inform the cloud immediately if a dimension stopped collecting use MAX(obsoletion time, 1.5 * update_every)

* Notify cloud immediately on dimension deletion

* Add debug messages

* Do not schedule an update if we are shutting down
2022-03-08 20:06:30 +02:00
vkalintiris
69ea17d6ec
Track anomaly rates with DBEngine. ()
* Track anomaly rates with DBEngine.

This commit adds support for tracking anomaly rates with DBEngine. We
do so by creating a single chart with id "anomaly_detection.anomaly_rates" for
each trainable/predictable host, which is responsible for tracking the anomaly
rate of each dimension that we train/predict for that host.

The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly
rates charts. We use this flag to:

    - Disable exposing the anomaly rates charts through the functionality
      in backends/, exporting/ and streaming/.
    - Skip generation of configuration options for the name, algorithm,
      multiplier, divisor of each dimension in an anomaly rates chart.
    - Skip the creation of health variables for anomaly rates dimensions.
    - Skip the chart/dim queue of ACLK.
    - Post-process the RRDR result of an anomaly rates chart, so that we can
      return a sorted, trimmed number of anomalous dimensions.

In a child/parent configuration where both the child and the parent run
ML for the child, we want to be able to stream the rest of the ML-related
charts to the parent. To be able to do this without any chart name collisions,
the charts are now created on localhost and their IDs and titles have the node's
machine_guid and hostname as a suffix, respectively.

* Fix exporting_engine tests.

* Restore default ML configuration.

The reverted changes where meant for local testing only. This commit
restores the default values that we want to have when someone runs
anomaly detection on their node.

* Set context for anomaly_detection.* charts.

* Check for anomaly rates chart only with a valid pointer.

* Remove duplicate code.

* Use a more descriptive name for id/title pair variable
2022-02-24 10:57:30 +02:00
Stelios Fragkakis
a763d4111c
Store dimension hidden option in the metadata db ()
* Add a function to update dimension options in the metadata database

* Update the option for dimension to be hidden/unhinden when rrdim_hide/rrdim_unhide is called

* Store the hidden option for dimensions to the database
2022-02-23 18:31:37 +02:00
Stelios Fragkakis
a330a27a62
Remove chart specific configuration from netdata.conf except enabled () 2022-02-22 18:23:08 +02:00
vkalintiris
9ed4cea590
Anomaly Detection MVP ()
* Add support for feature extraction and K-Means clustering.

This patch adds support for performing feature extraction and running the
K-Means clustering algorithm on the extracted features.

We use the open-source dlib library to compute the K-Means clustering
centers, which has been added as a new git submodule.

The build system has been updated to recognize two new options:

    1) --enable-ml: build an agent with ml functionality, and
    2) --enable-ml-tests: support running tests with the `-W mltest`
       option in netdata.

The second flag is meant only for internal use. To build tests successfully,
you need to install the GoogleTest framework on your machine.

* Boilerplate code to track hosts/dims and init ML config options.

A new opaque pointer field is added to the database's host and dimension
data structures. The fields point to C++ wrapper classes that will be used
to store ML-related information in follow-up patches.

The ML functionality needs to iterate all tracked dimensions twice per
second. To avoid locking the entire DB multiple times, we use a
separate dictionary to add/remove dimensions as they are created/deleted
by the database.

A global configuration object is initialized during the startup of the
agent. It will allow our users to specify ML-related configuration
options, eg. hosts/charts to skip from training, etc.

* Add support for training and prediction of dimensions.

Every new host spawns a training thread which is used to train the model
of each dimension.

Training of dimensions is done in a non-batching mode in order to avoid
impacting the generated ML model by the CPU, RAM and disk utilization of
the training code itself.

For performance reasons, prediction is done at the time a new value
is pushed in the database. The alternative option, ie. maintaining a
separate thread for prediction, would be ~3-4x times slower and would
increase locking contention considerably.

For similar reasons, we use a custom function to unpack storage_numbers
into doubles, instead of long doubles.

* Add data structures required by the anomaly detector.

This patch adds two data structures that will be used by the anomaly
detector in follow-up patches.

The first data structure is a circular bit buffer which is being used to
count the number of set bits over time.

The second data structure represents an expandable, rolling window that
tracks set/unset bits. It is explicitly modeled as a finite-state
machine in order to make the anomaly detector's behaviour easier to test
and reason about.

* Add anomaly detection thread.

This patch creates a new anomaly detection thread per host. Each thread
maintains a BitRateWindow which is updated every second based on the
anomaly status of the correspondent host.

Based on the updated status of the anomaly window, we can identify the
existence/absence of an anomaly event, it's start/end time and the
dimensions that participate in it.

* Create/insert/query anomaly events from Sqlite DB.

* Create anomaly event endpoints.

This patch adds two endpoints to expose information about anomaly
events. The first endpoint returns the list of anomalous events within a
specified time range. The second endpoint provides detailed information
about a single anomaly event, ie. the list of anomalous dimensions in
that event along with their anomaly rate.

The `anomaly-bit` option has been added to the `/data` endpoint in order
to allow users to get the anomaly status of individual dimensions per
second.

* Fix build failures on Ubuntu 16.04 & CentOS 7.

These distros do not have toolchains with C++11 enabled by default.
Replacing nullptr with NULL should be fix the build problems on these
platforms when the ML feature is not enabled.

* Fix `make dist` to include ML makefiles and dlib sources.

Currently, we add ml/kmeans/dlib to EXTRA_DIST. We might want to
generate an explicit list of source files in the future, in order to
bring down the generated archive's file size.

* Small changes to make the LGTM & Codacy bots happy.

- Cast unused result of function calls to void.
- Pass a const-ref string to Database's constructor.
- Reduce the scope of a local variable in the anomaly detector.

* Add user configuration option to enable/disable anomaly detection.

* Do not log dimension-specific operations.

Training and prediction operations happen every second for each
dimension. In prep for making this PR easier to run anomaly detection
for many charts & dimensions, I've removed logs that would cause log
flooding.

* Reset dimensions' bit counter when not above anomaly rate threshold.

* Update the default config options with real values.

With this patch the default configuration options will match the ones
we want our users to use by default.

* Update conditions for creating new ML dimensions.

1. Skip dimensions with update_every != 1,
2. Skip dimensions that come from the ML charts.

With this filtering in place, any configuration value for the
relevant simple_pattern expressions will work correctly.

* Teach buildinfo{,json} about the ML feature.

* Set --enable-ml by default in the configuration options.

This patch is only meant for testing the building of the ML functionality
on Github. It will be reverted once tests pass successfully.

* Minor build system fixes.

- Add path to json header
- Enable C++ linker when ML functionality is enabled
- Rename ml/ml-dummy.cc to ml/ml-dummy.c

* Revert "Set --enable-ml by default in the configuration options."

This reverts commit 28206952a59a577675c86194f2590ec63b60506c.

We pass all Github checks when building the ML functionality, except for
those that run on CentOS 7 due to not having a C++11 toolchain.

* Check for missing dlib and nlohmann files.

We simply check the single-source files upon which our build system
depends. If they are missing, an error message notifies the user
about missing git submodules which are required for the ML
functionality.

* Allow users to specify the maximum number of KMeans iterations.

* Use dlib v19.10

v19.22 broke compatibility with CentOS 7's g++. Development of the
anomaly detection used v19.10, which is the version used by most Debian and
Ubuntu distribution versions that are not past EOL.

No observable performance improvements/regressions specific to the K-Means
algorithm occur between the two versions.

* Detect and use the -std=c++11 flag when building anomaly detection.

This patch automatically adds the -std=c++11 when building netdata
with the ML functionality, if it's supported by the user's toolchain.

With this change we are able to build the agent correctly on CentOS 7.

* Restructure configuration options.

- update default values,
- clamp values to min/max defaults,
- validate and identify conflicting values.

* Add update_every configuration option.

Considerring that the MVP does not support per host configuration
options, the update_every option will be used to filter hosts to train.

With this change anomaly detection will be supported on:

    - Single nodes with update_every != 1, and
    - Children nodes with a common update_every value that might differ from
      the value of the parent node.

* Reorganize anomaly detection charts.

This follows Andrew's suggestion to have four charts to show the number
of anomalous/normal dimensions, the anomaly rate, the detector's window
length, and the events that occur in the prediction step.

Context and family values, along with the necessary information in the
dashboard_info.js file, will be updated in a follow-up commit.

* Do not dump anomaly event info in logs.

* Automatically handle low "train every secs" configuration values.

If a user specifies a very low value for the "train every secs", then
it is possible that the time it takes to train a dimension is higher
than the its allotted time.

In that case, we want the training thread to:

    - Reduce it's CPU usage per second, and
    - Allow the prediction thread to proceed.

We achieve this by limiting the training time of a single dimension to
be equal to half the time allotted to it. This means, that the training
thread will never consume more than 50% of a single core.

* Automatically detect if ML functionality should be enabled.

With these changes, we enable ML if:

    - The user has not explicitly specified --disable-ml, and
    - Git submodules have been checked out properly, and
    - The toolchain supports C++11.

If the user has explicitly specified --enable-ml, the build fails if
git submodules are missing, or the toolchain does not support C++11.

* Disable anomaly detection by default.

* Do not update charts in locked region.

* Cleanup code reading configuration options.

* Enable C++ linker when building ML.

* Disable ML functionality for CMake builds.

* Skip LGTM for dlib and nlohmann libraries.

* Do not build ML if libuuid is missing.

* Fix dlib path in LGTM's yaml config file.

* Add chart to track duration of prediction step.

* Add chart to track duration of training step.

* Limit the number dimensions in an anomaly event.

This will ensure our JSON results won't grow without any limit. The
default ML configuration options, train approximately ~1700 dimensions
in a newly-installed Netdata agent. The hard-limit is set to 2000
dimensions which:

    - Is well above the default number of dimensions we train,
    - If it is ever reached it means that the user had accidentaly a
      very low anomaly rate threshold, and
    - Considering that we sort the result by anomaly score, the cutoff
      dimensions will be the less anomalous, ie. the least important to
      investigate.

* Add information about the ML charts.

* Update family value in ML charts.

This fix will allow us to show the individual charts in the RHS Anomaly
Detection submenu.

* Rename chart type

s/anomalydetection/anomaly_detection/g

* Expose ML feat in /info endpoint.

* Export ML config through /info endpoint.

* Fix CentOS 7 build.

* Reduce the critical region of a host's lock.

Before this change, each host had a single, dedicated lock to protect
its map of dimensions from adding/deleting new dimensions while training
and detecting anomalies. This was problematic because training of a
single dimension can take several seconds in nodes that are under heavy
load.

After this change, the host's lock protects only the insertion/deletion
of new dimensions, and the prediction step. For the training of dimensions
we use a dedicated lock per dimension, which is responsible for protecting
the dimension from deletion while training.

Prediction is fast enough, even on slow machines or under heavy load,
which allows us to use the host's main lock and avoid increasing the
complexity of our implementation in the anomaly detector.

* Improve the way we are tracking anomaly detector's performance.

This change allows us to:

    - track the total training time per update_every period,
    - track the maximum training time of a single dimension per
      update_every period, and
    - export the current number of total, anomalous, normal dimensions
      to the /info endpoint.

Also, now that we use dedicated locks per dimensions, we can train under
heavy load continuously without having to sleep in order to yield the
training thread and allow the prediction thread to progress.

* Use samples instead of seconds in ML configuration.

This commit changes the way we are handling input ML configuration
options from the user. Instead of treating values as seconds, we
interpret all inputs as number of update_every periods. This allows
us to enable anomaly detection on hosts that have update_every != 1
second, and still produce a model for training/prediction & detection
that behaves in an expected way.

Tested by running anomaly detection on an agent with update_every = [1,
2, 4] seconds.

* Remove unecessary log message in detection thread

* Move ML configuration to global section.

* Update web/gui/dashboard_info.js

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>

* Fix typo

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>

* Rebase.

* Use negative logic for anomaly bit.

* Add info for prediction_stats and training_stats charts.

* Disable ML on PPC64EL.

The CI test fails with -std=c++11 and requires -std=gnu++11 instead.
However, it's not easy to quickly append the required flag to CXXFLAGS.
For the time being, simply disable ML on PPC64EL and if any users
require this functionality we can fix it in the future.

* Add comment on why we disable ML on PPC64EL.

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>
2021-10-27 09:26:21 +03:00
vkalintiris
81e2f8da65
Reuse the SN_EXISTS bit to track anomaly status. ()
* Replace all usages of SN_EXISTS with SN_DEFAULT_FLAGS.

* Remove references to SN_NOT_EXISTS in comments.

* Replace raw zero constant with SN_EMPTY_SLOT.

* Use get_storage_number_flags only in storage_number.{c,h}

* Compare against SN_EMPTY_SLOT to check if a storage_number exists.

This is safe because:

  1. rrdset_done_interpolate() is the only place where we call store_metric(),
  2. All store_metric() calls, except for one, store an SN_EMPTY_SLOT value.
  3. When we are not storing an SN_EMPTY_SLOT value, the flags that we pass to
     pack_storage_number() can be either SN_EXISTS *or* SN_EXISTS_RESET.

* Compare only the SN_EXISTS_RESET bit to find reset values.

* Remove get_storage_number_flags from storage_number.h

* Do not set storage_number flags outside of rrdset_done_interpolate().

This is a NFC intended to limit the scope of storage_number flags
processing to just one function.

* Set reset bit without overwriting the rest of the flags.

* Rename SN_EXISTS to SN_ANOMALY_BIT.

* Use GOTOs in pack_storage_number to return from a single place.

* Teach pack_storage_number how to handle anomalous zero values.

Up until now, a storage_number had always either the SN_EXISTS or
SN_EXISTS_RESET bit set. This meant that it was not possible for any
packed storage_number to compare equal to the SN_EMPTY_SLOT.

However, the SN_ANOMALY_BIT can be set to zero. This is fine for every
value other than the anomalous 0 value, because it would compare equal to
SN_EMPTY_SLOT. We address this issue by mapping the anomalous zero value
to SN_EXISTS_100 (a number which was not possible to generate with the
previous versions of the agent, ie. it won't exist in older dbengine files).

This change was tested manually by intentionally flipping the anomaly
bit for odd/even iterations in rrdset_done_interpolate. Prior to this
change, charts whose dimensions had 0 values, where showing up in the
dashboard as gaps (SN_EMPTY_SLOT), whereas with this commit the values
are displayed correctly.
2021-10-22 17:35:48 +03:00
Stelios Fragkakis
12f16063f5
Enable additional functionality for the new cloud architecture () 2021-10-06 20:55:31 +03:00
Stelios Fragkakis
37bee1d197
Store uuid_t metric_uuid in the dimension state structure instead of uuid_t * () 2021-06-01 14:26:22 +03:00
Josh Soref
2a4650607a
Spelling database () 2021-04-14 12:34:27 +03:00
Tomáš Kopal
757e418090
Rename abs to ABS to avoid clash with standard definitions. Fixes . () 2021-03-17 12:18:33 +02:00
Stelios Fragkakis
87002ef11f
Enable metadata persistence in all memory modes () 2021-03-15 22:28:44 +02:00
vkalintiris
adec24dffa
Rename struct avl to avl_element and the typedef to avl_t ()
Before:

```
struct foobar {
    avl avl;
    ...
}
```

After:

```
struct foobar {
    avl_t avl;
    ...
};
```

Which makes figuring out the type from field name easier.
2021-03-10 10:37:47 +02:00
Stelios Fragkakis
d600ae20c0
Fix issue with chart metadata sent multiple times over ACLK ()
* Add a flag RRDSET_FLAG_ACLK to mark that a chart needs to go to the cloud

* Change calls to aclk_update_chart to set the RRDSET_FLAG_ACLK instead
Make the call to aclk_update_chart only in rrdset_done (and in case the chart is deleted)

* Fix compilation error when cloud is disabled

* Skip netdata_cloud_setting check when setting the flag / calling aclk_update_chart (checked in there)
2020-12-14 17:32:11 +02:00
Markos Fountoulakis
5ffba490e3
Fix race condition in rrdset_first_entry_t() and rrdset_last_entry_t() () 2020-11-28 15:53:12 +02:00
Stelios Fragkakis
e9d59e37d9
Migrate metadata log to SQLite () 2020-11-24 20:00:02 +02:00
Markos Fountoulakis
88a6572382
Fix memory mode none not dropping stale dimension data in general when streaming to a parent. () 2020-09-11 19:57:22 +03:00
Markos Fountoulakis
f9f813cd46
Fix memory mode none not marking dimensions as obsolete. ()
* Fix memory mode none not marking dimensions as obsolete.
2020-09-11 10:16:29 +03:00
Stelios Fragkakis
f5a85f7747
Added code to release memory used by the global GUID map ()
Fixed memory leak issues associated with the global GUID map during agent shutdown
2020-08-20 18:50:13 +03:00
Stelios Fragkakis
53d7d3c3fa
Fixed issue with missing alarms ()
Fixed the alarm configuration when dimensions switch from archived to active
2020-08-11 18:16:10 +03:00
Stelios Fragkakis
eda12f579f
Implemented multihost database ()
* Hard code a node for non-legacy multidb test
Skip dbengine initialization for new incoming children
Add code to switch to multidb ctx when accessing the dbengine

* When a non-legacy streaming connection is detected, use the multidb metadata log context

* Clear the superblock memory to avoid random data written in the metadata log

* Activate the host detection during compaction
Activate the host detection during metadata log chart updates
Keep the host in the user object during replay of the HOST command

* Add defaults for health / rrdpush on HOST metadata replay
Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child()

Use defaults from the stream.conf

* Count hosts only if not archived
When host switches from archived to active update rrd_hosts_available
Remove archived hosts from charts and info

* Change parameter from "multidb disk space" to "dbengine multihost disk space"
Remove unused variables
Fix compilation error when dbengine is disabled
Fix condition for machine_guid directory creation under cache_dir

* Enable multidb disk space file creation.

* Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine.

* Fix old bug in the code that confused obsolete hosts with orphan hosts.

* Do not delete multi-host DB host files.

* Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies.

* Identify metadata that collide with non-dbengine memory mode hosts and ignore them.

* Handle non-dbengine localhost with dbengine archived charts in localhost and streaming.

* Ignore archived hosts in streaming.

* Add documentation before merging to master.

Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-07-28 15:04:39 +03:00
Markos Fountoulakis
822880265e
Remove health from archived metrics ()
* Disassociate health variables and alarms from archived charts and dimensions.

* Ignore archived charts during health reload.
2020-07-11 17:27:37 +03:00
Markos Fountoulakis
2f5e6ab14f
Disallow dimensions or charts being obsoleted and archived simultaneously. () 2020-06-29 18:05:27 +03:00
Stelios Fragkakis
11a19ccf27
Fixed compiler warnings ()
Fixed compiler warnings
2020-06-15 14:38:42 +03:00
Stelios Fragkakis
5ce7b4afe4
Fixed invalid pointer access rrddim_free_custom ()
Fixed invalid memory access in host creation and dimension deletion
2020-06-12 15:40:24 +03:00
Stelios Fragkakis
1bd8a25544
Add support for persistent metadata ()
* Implemented collector metadata logging 
* Added persistent GUIDs for charts and dimensions
* Added metadata log replay and automatic compaction
* Added detection of charts with no active collector (archived)
* Added new endpoint to report archived charts via `/api/v1/archivedcharts`
* Added support for collector metadata update

Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-06-12 10:35:17 +03:00
Andrew Moss
8fe7485a60
Switching over to soft feature flag ()
Preparing for the cloud release. This changes how we handle the feature flag so that it no longer requires installer switches and can be set from the config file. This still requires internal access to use and is not ready for public access yet.
2020-03-31 21:19:34 +02:00