0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-09 07:37:54 +00:00
Commit graph

21 commits

Author SHA1 Message Date
Costa Tsaousis
cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
Costa Tsaousis
5e1b95cf92
Deduplicate all netdata strings ()
* rrdfamily

* rrddim

* rrdset plugin and module names

* rrdset units

* rrdset type

* rrdset family

* rrdset title

* rrdset title more

* rrdset context

* rrdcalctemplate context and removal of context hash from rrdset

* strings statistics

* rrdset name

* rearranged members of rrdset

* eliminate rrdset name hash; rrdcalc chart converted to STRING

* rrdset id, eliminated rrdset hash

* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate

* rrdcalctemplate

* rrdvar

* eval_variable

* rrddimvar and rrdsetvar

* rrdhost hostname, os and tags

* fix master commits

* added thread cache; implemented string_dup without locks

* faster thread cache

* rrdset and rrddim now use dictionaries for indexing

* rrdhost now uses dictionary

* rrdfamily now uses DICTIONARY

* rrdvar using dictionary instead of AVL

* allocate the right size to rrdvar flag members

* rrdhost remaining char * members to STRING *

* better error handling on indexing

* strings now use a read/write lock to allow parallel searches to the index

* removed AVL support from dictionaries; implemented STRING with native Judy calls

* string releases should be negative

* only 31 bits are allowed for enum flags

* proper locking on strings

* string threading unittest and fixes

* fix lgtm finding

* fixed naming

* stream chart/dimension definitions at the beginning of a streaming session

* thread stack variable is undefined on thread cancel

* rrdcontext garbage collect per host on startup

* worker control in garbage collection

* relaxed deletion of rrdmetrics

* type checking on dictfe

* netdata chart to monitor rrdcontext triggers

* Group chart label updates

* rrdcontext better handling of collected rrdsets

* rrdpush incremental transmition of definitions should use as much buffer as possible

* require 1MB per chart

* empty the sender buffer before enabling metrics streaming

* fill up to 50% of buffer

* reset signaling metrics sending

* use the shared variable for status

* use separate host flag for enabling streaming of metrics

* make sure the flag is clear

* add logging for streaming

* add logging for streaming on buffer overflow

* circular_buffer proper sizing

* removed obsolete logs

* do not execute worker jobs if not necessary

* better messages about compression disabling

* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped

* monitor stream sender used buffer ratio

* Update exporting unit tests

* no need to compare label value with strcmp

* streaming send workers now monitor bandwidth

* workers now use strings

* streaming receiver monitors incoming bandwidth

* parser shift of worker ids

* minor fixes

* Group chart label updates

* Populate context with dimensions that have data

* Fix chart id

* better shift of parser worker ids

* fix for streaming compression

* properly count received bytes

* ensure LZ4 compression ring buffer does not wrap prematurely

* do not stream empty charts; do not process empty instances in rrdcontext

* need_to_send_chart_definition() does not need an rrdset lock any more

* rrdcontext objects are collected, after data have been written to the db

* better logging of RRDCONTEXT transitions

* always set all variables needed by the worker utilization charts

* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes

* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock

* STRING code re-organization for clarity

* thread_cache improvements; double numbers precision on worker threads

* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS

* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts

* Add index to speed up database context searches

* Removed last_updated optimization (was also buggy after latest merge with master)

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05 19:31:06 +03:00
boxjan
56a1808d2e
Exporting/send variables () 2022-07-11 13:00:09 +00:00
Stelios Fragkakis
49234f23de
Multi-Tier database backend for long term metrics storage ()
* Tier part 1

* Tier part 2

* Tier part 3

* Tier part 4

* Tier part 5

* Fix some ML compilation errors

* fix more conflicts

* pass proper tier

* move metric_uuid from state to RRDDIM

* move aclk_live_status from state to RRDDIM

* move ml_dimension from state to RRDDIM

* abstracted the data collection interface

* support flushing for mem db too

* abstracted the query api

* abstracted latest/oldest time per metric

* cleanup

* store_metric for tier1

* fix for store_metric

* allow multiple tiers, more than 2

* state to tier

* Change storage type in db. Query param to request min, max, sum or average

* Store tier data correctly

* Fix skipping tier page type

* Add tier grouping in the tier

* Fix to handle archived charts (part 1)

* Temp fix for query granularity when requesting tier1 data

* Fix parameters in the correct order and calculate the anomaly based on the anomaly count

* Proper tiering grouping

* Anomaly calculation based on anomaly count

* force type checking on storage handles

* update cmocka tests

* fully dynamic number of storage tiers

* fix static allocation

* configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode

* use default page dt using the tiering info

* automatic selection of tier

* fix for automatic selection of tier

* working prototype of dynamic tier selection

* automatic selection of tier done right (I hope)

* ask for the proper tier value, based on the grouping function

* fixes for unittests and load_metric_next()

* fixes for lgtm findings

* minor renames

* add dbengine to page cache size setting

* add dbengine to page cache with malloc

* query engine optimized to loop as little are required based on the view_update_every

* query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA

* report db points per tier in jsonwrap

* query planer that switches database tiers on the fly to satisfy the query for the entire timeframe

* dbegnine statistics and documentation (in progress)

* calculate average point duration in db

* handle single point pages the best we can

* handle single point pages even better

* Keep page type in the rrdeng_page_descr

* updated doc

* handle future backwards compatibility - improved statistics

* support &tier=X in queries

* enfore increasing iterations on tiers

* tier 1 is always 1 iteration

* backfilling higher tiers on first data collection

* reversed anomaly bit

* set up to 5 tiers

* natural points should only be offered on tier 0, except a specific tier is selected

* do not allow more than 65535 points of tier0 to be aggregated on any tier

* Work only on actually activated tiers

* fix query interpolation

* fix query interpolation again

* fix lgtm finding

* Activate one tier for now

* backfilling of higher tiers using raw metrics from lower tiers

* fix for crash on start when storage tiers is increased from the default

* more statistics on exit

* fix bug that prevented higher tiers to get any values; added backfilling options

* fixed the statistics log line

* removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle

* fixed division by zero on zero points_wanted

* removed dead code

* Decide on the descr->type for the type of metric

* dont store metrics on unknown page types

* free db_metric_handle on sql based context queries

* Disable STORAGE_POINT value check in the exporting engine unit tests

* fix for db modes other than dbengine

* fix for aclk archived chart queries destroying db_metric_handles of valid rrddims

* fix left-over freez() instead of OWA freez on median queries

Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-07-06 14:01:53 +03:00
Costa Tsaousis
c3dfbe52a6
netdata doubles ()
* netdata doubles

* fix cmocka test

* fix cmocka test again

* fix left-overs of long double to NETDATA_DOUBLE

* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf

* update the memory before saving

* rrdset is now detached from file structures too

* on memory mode map, update the memory mapped structures on every iteration

* allow RRD_ID_LENGTH_MAX to be changed

* granularity secs, back to update every

* fix formatting

* more formatting
2022-06-28 17:04:37 +03:00
Costa Tsaousis
b32ca44319
Query Engine multi-granularity support (and MC improvements) ()
* set grouping functions

* storage engine should check the validity of timestamps, not the query engine

* calculate and store in RRDR anomaly rates for every query

* anomaly rate used by volume metric correlations

* mc volume should use absolute data, to avoid cancelling effect

* return anomaly-rates in jasonwrap with jw-anomaly-rates option to data queries

* dont return null on anomaly rates

* allow passing group query options from the URL

* added countif to the query engine and used it in metric correlations

* fix configure

* fix countif and anomaly rate percentages

* added group_options to metric correlations; updated swagger

* added newline at the end of yaml file

* always check the time the highlighted window was above/below the highlighted window

* properly track time in memory queries

* error for internal checks only

* moved pack_storage_number() into the storage engines

* moved unpack_storage_number() inside the storage engines

* remove old comment

* pass unit tests

* properly detect zero or subnormal values in pack_storage_number()

* fill nulls before the value, not after

* make sure math.h is included

* workaround for isfinite()

* fix for isfinite()

* faster isfinite() alternative

* fix for faster isfinite() alternative

* next_metric() now returns end_time too

* variable step implemented in a generic way

* remove left-over variables

* ensure we always complete the wanted number of points

* fixes

* ensure no infinite loop

* mc-volume-improvements: Add information about invalid condition

* points should have a duration in the past

* removed unneeded info() line

* Fix unit tests for exporting engine

* new_point should only be checked when it is fetched from the db; better comment about the premature breaking of the main query loop

Co-authored-by: Thiago Marques <thiagoftsm@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-22 11:19:08 +03:00
Costa Tsaousis
1b0f6c6b22
Labels with dictionary ()
* squashed and rebased to master

* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h

* added unittest for UTF-8 multibyte sanitization

* Fix unit test compilation

* Fix CMake build

* remove double sanitizer for opentsdb; cleanup sanitize_json_string()

* rename error_description to error_message to avoid conflict with json-c

* revert last and undef error_description from json-c

* more unittests; attempt to fix protobuf map issue

* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer

* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted

* better sorted dictionary checking

* proper unittesting for sorted dictionaries

* call dictionary deletion callback when destroying the dictionary

* remove obsolete variable

* Fix exporting unit tests

* Fix k8s label parsing test

* workaround for cmocka and strdupz()

* Bypass cmocka memory allocation check

* Revert "Bypass cmocka memory allocation check"

This reverts commit 4c49923839.

* Revert "workaround for cmocka and strdupz()"

This reverts commit 7bebee0480.

* Bypass cmocka memory allocation checks

* respect json formatting for chart labels

* cloud sends colons

* print the value only once

* allow parenthesis in values and spaces; make stream sender send quotes for values

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-13 20:35:45 +03:00
Vladimir Kobal
52456f5baf
Remove backends subsystem () 2022-03-15 11:50:24 +01:00
Stelios Fragkakis
454387fcf4
Cleanup compilation warnings ()
* Fix compilation warnings (variables used when debugging is enabled using NETDATA_INTERNAL_CHECKS)
* Fix compilation warning (casting)
2021-11-19 22:12:29 +02:00
Vladimir Kobal
943ee2482b
Add HTTP and HTTPS support to the simple exporting connector () 2020-11-05 19:08:17 +02:00
Vladimir Kobal
43c4d1edaa
Add check for spurious wakeups () 2020-08-17 10:31:25 +03:00
Vladimir Kobal
3136ef1373
Fix exporting update point () 2020-08-17 10:25:01 +03:00
Vladimir Kobal
a606a27f16
Fix error handling in exporting connector () 2020-05-14 11:42:40 +03:00
Austin S. Hemmelgarn
983a26d1a2
Revert "Revert changes since v1.21 in pereparation for hotfix release."
This reverts commit e2874320fc.
2020-04-13 10:32:33 -04:00
Austin S. Hemmelgarn
e2874320fc
Revert changes since v1.21 in pereparation for hotfix release. 2020-04-13 08:42:22 -04:00
Vladimir Kobal
231d19351d
Show internal stats for the exporting engine ()
* Add a print function for internal exporting statistics

* Send statistics for simple connectors

* Flush sending buffers on failures

* Send statistics for the Kinesis connector

* Send statistics for the MongoDB connector

* Add unit tests
2020-04-10 12:26:36 +03:00
Vladimir Kobal
ebbce7c777
Prometheus web api connector ()
* Fix the Prometheus web API code in the exporting engine

* Rename connector types

* Remove the conditional compilation of the exporting engine

* Use labels instead of tags

* Fix the exporter configuration

* Document functions

* Add unit tests
2020-04-06 09:26:25 +03:00
Vladimir Kobal
36c2e1dbf3
Add a MongoDB connector to the exporting engine ()
* Copy files from the MongoDB backend

* Update the documentation

* Rename functions in the MongoDB backend

* Add the connector to the Netdata build

* Add an initializer and a worker

* Add specific configuration options

* Initialize the connector

* Add a ring buffer for inserting data to a MongoDB database

* Add unit tests
2020-03-30 09:54:39 +03:00
Vladimir Kobal
d79bbbf943
Add an AWS Kinesis connector to the exporting engine ()
* Prepare files for the AWS Kinesis exporting connector

* Update the documentation

* Rename functions in backends

* Include the connector to the Netdata buid

* Add initializers and a worker

* Add Kinesis specific configuration options

* Add a compile time configuration check

* Remove the connector data structure

* Restore unit tests

* Fix the compile-time configuration check

* Initialize AWS SDK only once

* Don't create an instance for an unknown exporting connector

* Separate client and request outcome data for every instance

* Fix memory cleanup, document functions

* Add unit tests

* Update the documentation
2020-02-25 21:08:41 +02:00
Vladimir Kobal
0fba85e2c2
Send host labels via exporting connectors ()
* Add labels to the JSON exporting connector

* Add labels to the Graphite exporting connector

* Add labels to the OpenTSDB telnet exporting connector

* Add labels to the OpenTSDB HTTP exporting connector

* Replace control characters in JSON strings

* Add unit tests
2020-01-09 12:51:41 +02:00
Vladimir Kobal
6f27081912
Implement the main flow for the Exporting Engine ()
* Add top level tests

* Add a skeleton for preparing buffers

* Initialize graphite instance

* Prepare buffers for all instances

* Add Grafite collected value formatter

* Add support for exporting.conf read and parsing

* - Use new exporting_config instead of netdata_config

* Implement Grafite worker

* Disable exporting engine compilation if libuv is not available

* Add mutex locks

- Configure connectors as connector_<type> in sections of exporting.conf

- Change exporting_select_type to check for connector_ fields

* - Override exporting_config structure if there no exporting.conf so that
  look ups don't fail and we maintain backwards compatibility

* Separate fixtures in unit tests

* Test exporting_discard_responce

* Test response receiving

* Test buffer sending

* Test simple connector worker

- Instance section has the format connector:instance_name
  e.g graphite:my_graphite_instance

- Connectors with : in their name e.g graphite:plaintext are reserved
  So graphite:plaintext is not accepted because it would activate an
  instance with name "plaintext"
  It should be graphite:plaintext:instance_name

* - Enable the add_connector_instance to cleanup the internal structure
  by passing NULL,not NULL arguments

* Implement configurable update interval

- Add additional check to verify instance uniqueness across connectors

* Add host and chart filters

* Add the value calculation over a database series

* Add the calculated over stored data graphite connector

* Add tests for graphite connector

* Add JSON connector

* Add tests for JSON formatting functions

* Add OpenTSDB connector

* Add tests for the OpenTSDB connector

* Add temporaty notes to the documentation
2019-12-12 21:41:11 +02:00