0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-05-15 22:10:42 +00:00
Commit graph

38 commits

Author SHA1 Message Date
Costa Tsaousis
c3d70ffcb4
WEBRTC for communication between agents and browsers ()
* initial webrtc setup

* missing files

* rewrite of webrtc integration

* initialization and cleanup of webrtc connections

* make it compile without libdatachannel

* add missing webrtc_initialize() function when webrtc is not enabled

* make c++17 optional

* add build/m4/ax_compiler_vendor.m4

* add ax_cxx_compile_stdcxx.m4

* added new m4 files to makefile.am

* id all webrtc connections

* show warning when webrtc is disabled

* fixed message

* moved all webrtc error checking inside webrtc.cpp

* working webrtc connection establishment and cleanup

* remove obsolete code

* rewrote webrtc code in C to remove dependency for c++17

* fixed left-over reference

* detect binary and text messages

* minor fix

* naming of webrtc threads

* added webrtc configuration

* fix for thread_get_name_np()

* smaller web_client memory footprint

* universal web clients cache

* free web clients every 100 uses

* webrtc is now enabled by default only when compiled with internal checks

* webrtc responses to /api/ requests, including LZ4 compression

* fix for binary and text messages

* web_client_cache is now global

* unification of the internal web server API, for web requests, aclk request, webrtc requests

* more cleanup and unification of web client timings

* fixed compiler warnings

* update sent and received bytes

* eliminated of almost all big buffers in web client

* registry now uses the new json generation

* cookies are now an array; fixed redirects

* fix redirects, again

* write cookies directly to the header buffer, eliminating the need for cookie structures in web client

* reset the has_cookies flag

* gathered all web client cleanup to one function

* fixes redirects

* added summary.globals in /api/v2/data response

* ars to arc in /api/v2/data

* properly handle host impersonation

* set the context of mem.numa_nodes
2023-04-20 20:49:06 +03:00
Costa Tsaousis
4c82a15651
/api/v2 part 8 ()
* configure extent cache size

* /api/v2/data responsed with the id of dimensions in result.labels

* provide the original units and sts to db.dimensions

* updated swagger for the changes
2023-04-10 16:24:45 +03:00
Costa Tsaousis
204dd9ae27
Boost dbengine ()
* configure extent cache size

* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads

* fix dispatched and executing counters

* boost to the max

* increase libuv worker threads

* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical

* fix accounting of query prep

* inlining of time-grouping functions, to speed up queries with billions of points

* make switching based on a local const variable

* print one pending contexts loading message per iteration

* inlined store engine query API

* inlined storage engine data collection api

* inlined all storage engine query ops

* eliminate and inline data collection ops

* simplified query group-by

* more error handling

* optimized partial trimming of group-by queries

* preparative work to support multiple passes of group-by

* more preparative work to support multiple passes of group-by (accepts multiple group-by params)

* unified query timings

* unified query timings - weights endpoint

* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries

* fix query memory accounting

* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name

* limit max ACLK WEB response size to 30MB

* the response type should be text/plain

* more preparative work for multiple group-by passes

* create functions for generating group by keys, ids and names

* multiple group-by passes are now supported

* parse group-by options array also with an index

* implemented percentage-of-instance group by function

* family is now merged in multi-node contexts

* prevent uninitialized use
2023-04-07 21:25:01 +03:00
Costa Tsaousis
8a036f0b24
/api/v2/X part 7 ()
* /api/v2/weights, points key renamed to result

* /api/v2/weights, add node ids in response

* /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics

* /api/v2/data expose view.dimensions.sts{}

* /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses

* /api/v2 nodes output now includes the duration of time spent per node

* jsonwrap view object renames and cleanup

* rework of the statistics returned by the query engine

* swagger work

* swagger work

* more swagger work

* updated swagger json

* added the remaining of the /api/v2 endpoints to swagger

* point.ar has been renamed point.arp

* updated weights endpoint

* fix compilation warnings
2023-03-28 15:23:03 +03:00
Costa Tsaousis
5eed0545d4
/api/v2/X part 5 ()
* query timestamps are now pre-determined and alignment on timestamps is guarranteed

* turn internal_fatal() to internal_error() to investigate the issue

* handle query when no data exist in the db

* check for non NULL dict when running dictionary garbage collect

* support API v2 requests via ACLK

* add nodes detailed information to /api/v2/nodes

* fixed keys and added dummy nodes for completeness

* added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node

* make sure replication does not double count charts that are already being replicated

* expose min and max in sts structures

* added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters

* respect trimming in percentage calculation; updated swagger

* api/v2/weights preparative work to support multi-node queries - still single node though

* multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data

* when passing the raw option, the query exposes the hidden dimensions

* fix compilation issues on older systems

* the query engine now calculates per dimension min, max, sum, count, anomaly count

* use the macro to calculate storage point anomaly rate

* weights endpoint exposing version hashes

* weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate

* query: expose RESET flag; do not add the same point multiple times to the aggregated point

* weights: more compact output

* weights requests can be interrupted

* all /api/v2 requests can be interrupted and timeout

* allow relative timestamps in weights

* fix macos compilation warnings

* Revert "fix macos compilation warnings"

This reverts commit 8a1d24e41e.

* /api/v2/data group-by now works on dimension names, not ids

* /api/v2/weights does not query metrics without retention and new output format

* /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
2023-03-21 21:53:47 +02:00
Costa Tsaousis
cd50bf4236
/api/v2 part 4 ()
* expose the order of group by

* key renames in json wrapper v2

* added group by context and group by units

* added view_average_values

* fix for view_average_values when percentage is specified

* option group-by-labels is enabling the exposure of all the labels that are used for each of the final grouped dimensions

* when executing group by queries, allocate one dimension data at a time - not all of them

* respect hidden dimensions

* cancel running data query on socket error

* use poll to detect socket errors

* use POLLRDHUP to detect half closed connections

* make sure POLLRDHUP is available

* do not destroy aral-by-size arals

* completed documentation of /api/v2/data.

* moved min, max back to view; updated swagger yaml and json

* default format for /api/v2/data is json2
2023-03-13 23:39:06 +02:00
Costa Tsaousis
cf85c3b0e9
/api/v2/X improvements part 3 ()
* max web request size to 64KB

* fix the request too big message

* increase max request reading tries to 100

* support for bigger web requests

* add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too

* better implementation for partial data trimming

* added group_by=selected to return only one dimension for all selected metrics

* fix acceptance of group_by=selected

* passing option "raw" disables partial data trimming

* remove obsolete option "plan"; use "debug"

* fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing

* per row annotations

* added time column to point annotations

* disable caching for /api/v2/contexts responses

* added api format json2 that returns an array for each points, having all the point values and annotations in them

* work on swagger about /api/v2

* prevent infinite loop

* cleanup and swagger work

* allow negative simple pattern expressions to work as expected

* do not lookup in the dictionary empty names

* garbage collect dictionaries

* make query_target allocate less aggressively; queries fill the remaining points with nulls

* reusable query ops to save memory on huge queries

* move parts of query plans into query ops to save query target memory

* remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed
2023-03-10 12:41:14 +02:00
Costa Tsaousis
021e252fc5
/api/v2/contexts ()
* preparation for /api/v2/contexts

* working /api/v2/contexts

* add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages

* minor fix

* query targegt now accurately counts hosts, contexts, instances, dimensions, metrics

* cleanup /api/v2/contexts

* full text search with /api/v2/contexts

* simple patterns now support the option to search ignoring case

* full text search API with /api/v2/q

* simple pattern execution optimization

* do not show q when not given

* full text search accounting

* separated /api/v2/nodes from /api/v2/contexts

* fix ssv queries for group_by

* count query instances queried and failed per context and host

* split rrdcontext.c to multiple files

* add query totals

* fix anomaly rate calculation; provide "ni" for indexing hosts

* do not generate zero valued members

* faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point

* fix typo when printing dimensions totals

* added option minify to remove spaces and newlines fron JSON output

* send instance ids and names when they differ

* do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe

* fix for the previous + renames and code cleanup

* when a dimension is filtered, include in the response all the other dimensions that are selectable

* do not add nodes that do not have retention in the current window

* move selection of dimensions to query_dimension_add(), instead of query_metric_add()

* increase the pre-processing capacity of queries

* generate instance fqdn ids and names only when they are needed

* provide detailed statistics about tiers retention, queries, points, update_every

* late allocation of query dimensions

* cleanup

* more cleanup

* support for annotations per displayed point, RESET and PARTIAL

* new type annotations

* if a chart is not linked to contexts and it is collected, link it when it is collected

* make ML run reentrant

* make ML rrdr query synchronous

* optimize replication memory allocation of replication_sort_entry

* change units to percentage, when requesting a coefficinet of variation, or a percentage query

* initialize replication before starting main threads

* properly decrement no room requests counter

* propagate the non-zero flag to group-by

* the same by avoiding the extra loop

* respect non-zero in all dimension arrays

* remove dictionary garbage collection from dictionary_entries() and dictionary_version()

* be more verbose when jv2 indexing is postponed

* prevent infinite loop

* use hidden dimensions even when dimensions pattern is unset

* traverse hosts using dictionaries

* fix dictionary unittests
2023-03-02 22:50:48 +02:00
Costa Tsaousis
1cfad181a8
/api/v2/data - multi-host/context/instance/dimension/label queries ()
* fundamentals for having /api/v2/ working

* use an atomic to prevent writing to internal pipe too much

* first attempt of multi-node, multi-context, multi-chart, multi-dimension queries

* v2 jsonwrap

* first attempt for group by

* cleaned up RRDR and fixed group by

* improvements to /api/v2/api

* query instance may be realloced, so pointers to it get invalid; solved memory leaks

* count of quried metrics in summary information

* provide detailed information about selected, excluded, queried and failed metrics for each entity

* select instances by fqdn too

* add timing information to json output

* link charts to rrdcontexts, if a query comes in and it is found unlinked

* calculate min, max, sum, average, volume, count per metric

* api v2 parameters naming

* renders alerts and units

* render machine_guid and node_id in all sections it is relevant

* unified keys

* group by now takes into account units and when there are multiple units involved, it creates a dimension per unit

* request and detailed are hidden behind an option

* summary includes only a flattened list of alerts

* alert counts per host and instance

* count of grouped metrics per dimension

* added contexts to summary

* added chart title

* added dimension priorities and chart type

* support for multiple group by at the same time

* minor fixes

* labels are now a tree

* keys uniformity

* filtering by alerts, both having a specific alert and having a specific alert in a specific status

* added scope of hosts and contexts

* count of instances on contexts and hosts

* make the api return valid responses even when the response contains no data

* calculate average and contribution % for every item in the summary

* fix compilation warnings

* fix compilation warnings - again
2023-02-22 22:30:40 +02:00
Costa Tsaousis
d2daa19bf5
JSON internal API, IEEE754 base64/hex streaming, weights endpoint optimization ()
* first work on standardizing json formatting

* renamed old grouping to time_grouping and added group_by

* add dummy functions to enable compilation

* buffer json api work

* jsonwrap opening with buffer_json_X() functions

* cleanup

* storage for quotes

* optimize buffer printing for both numbers and strings

* removed ; from define

* contexts json generation using the new json functions

* fix buffer overflow at unit test

* weights endpoint using new json api

* fixes to weights endpoint

* check buffer overflow on all buffer functions

* do synchronous queries for weights

* buffer_flush() now resets json state too

* content type typedef

* print double values that are above the max 64-bit value

* str2ndd() can now parse values above UINT64_MAX

* faster number parsing by avoiding double calculations as much as possible

* faster number parsing

* faster hex parsing

* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers

* full printing and parsing without using library functions - and related unit tests

* added IEEE754 streaming capability to enable streaming of double values in hex

* streaming and replication to transfer all values in hex

* use our own str2ndd for set2

* remove subnormal check from ieee

* base64 encoding for numbers, instead of hex

* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision

* str2ndd_encoded() parses all encoding formats, including integers

* prevent uninitialized use

* /api/v1/info using the new json API

* Fix error when compiling with --disable-ml

* Remove redundant 'buffer_unittest' declaration

* Fix formatting

* Fix formatting

* Fix formatting

* fix buffer unit test

* apps.plugin using the new JSON API

* make sure the metrics registry does not accept negative timestamps

* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache

* Fix more formatting

---------

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-02-15 21:16:29 +02:00
Costa Tsaousis
82150596e7
do not report dimensions that failed to be queried ()
* do not report dimensions that failed to be queried

* renamed SELECTED to QUERIED to have clarity on what it means

* fix wrong placement of continue
2023-02-07 11:25:41 +02:00
Costa Tsaousis
c83abcfb9d
DBENGINE v2 - improvements part 1 ()
* allow running multiple evictors and flushers

* flipped aggressive and critical evictions

* dont run more than 1 evictor

* switch to batch evictions when the size of the cache is critical

* remove batching of evictions

* dedup extent load pending requests

* accounting for merged extents

* always use double linked list

* add extent merging to the overall cache hit ratio

* support requeuing merged extents to higher priorities

* fix function name

* query planner now prefers higher tiers even when they miss some data at the end, which it fills from lower tiers; adding the option "plan" to jsonwrap now renders the query plan

* update statistics after every dimension completes

* use the retention of all tiers to calculate coverage per tier

* use the original window of the query for the planner

* give 2.5% befenit for each higher tier

* update cmd->priority so that it be requeued multiple times

* merged extent pages is a cache hit

* fixed dbegnine cache hit stats
2023-01-12 23:10:12 +02:00
Costa Tsaousis
368a26cfee
DBENGINE v2 ()
* count open cache pages refering to datafile

* eliminate waste flush attempts

* remove eliminated variable

* journal v2 scanning split functions

* avoid locking open cache for a long time while migrating to journal v2

* dont acquire datafile for the loop; disable thread cancelability while a query is running

* work on datafile acquiring

* work on datafile deletion

* work on datafile deletion again

* logs of dbengine should start with DBENGINE

* thread specific key for queries to check if a query finishes without a finalize

* page_uuid is not used anymore

* Cleanup judy traversal when building new v2
Remove not needed calls to metric registry

* metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent

* disable checks for invalid time-ranges

* Remove type from page details

* report scanning time

* remove infinite loop from datafile acquire for deletion

* remove infinite loop from datafile acquire for deletion again

* trace query handles

* properly allocate array of dimensions in replication

* metrics cleanup

* metrics registry uses arrayalloc

* arrayalloc free should be protected by lock

* use array alloc in page cache

* journal v2 scanning fix

* datafile reference leaking hunding

* do not load metrics of future timestamps

* initialize reasons

* fix datafile reference leak

* do not load pages that are entirely overlapped by others

* expand metric retention atomically

* split replication logic in initialization and execution

* replication prepare ahead queries

* replication prepare ahead queries fixed

* fix replication workers accounting

* add router active queries chart

* restore accounting of pages metadata sources; cleanup replication

* dont count skipped pages as unroutable

* notes on services shutdown

* do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file

* do not add pages we dont need to pdc

* time in range re-work to provide info about past and future matches

* finner control on the pages selected for processing; accounting of page related issues

* fix invalid reference to handle->page

* eliminate data collection handle of pg_lookup_next

* accounting for queries with gaps

* query preprocessing the same way the processing is done; cache now supports all operations on Judy

* dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3)

* get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query

* finner gaps calculation; accounting of overlapping pages in queries

* fix gaps accounting

* move datafile deletion to worker thread

* tune libuv workers and thread stack size

* stop netdata threads gradually

* run indexing together with cache flush/evict

* more work on clean shutdown

* limit the number of pages to evict per run

* do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction

* economies on flags for smaller page footprint; cleanup and renames

* eviction moves referenced pages to the end of the queue

* use murmur hash for indexing partition

* murmur should be static

* use more indexing partitions

* revert number of partitions to number of cpus

* cancel threads first, then stop services

* revert default thread stack size

* dont execute replication requests of disconnected senders

* wait more time for services that are exiting gradually

* fixed last commit

* finer control on page selection algorithm

* default stacksize of 1MB

* fix formatting

* fix worker utilization going crazy when the number is rotating

* avoid buffer full due to replication preprocessing of requests

* support query priorities

* add count of spins in spinlock when compiled with netdata internal checks

* remove prioritization from dbengine queries; cache now uses mutexes for the queues

* hot pages are now in sections judy arrays, like dirty

* align replication queries to optimal page size

* during flushing add to clean and evict in batches

* Revert "during flushing add to clean and evict in batches"

This reverts commit 8fb2b69d06.

* dont lock clean while evicting pages during flushing

* Revert "dont lock clean while evicting pages during flushing"

This reverts commit d6c82b5f40.

* Revert "Revert "during flushing add to clean and evict in batches""

This reverts commit ca7a187537.

* dont cross locks during flushing, for the fastest flushes possible

* low-priority queries load pages synchronously

* Revert "low-priority queries load pages synchronously"

This reverts commit 1ef2662ddc.

* cache uses spinlock again

* during flushing, dont lock the clean queue at all; each item is added atomically

* do smaller eviction runs

* evict one page at a time to minimize lock contention on the clean queue

* fix eviction statistics

* fix last commit

* plain should be main cache

* event loop cleanup; evictions and flushes can now happen concurrently

* run flush and evictions from tier0 only

* remove not needed variables

* flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time

* added worker jobs in timer to find the slow part of it

* support fast eviction of pages when all_of_them is set

* revert default thread stack size

* bypass event loop for dispatching read extent commands to workers - send them directly

* Revert "bypass event loop for dispatching read extent commands to workers - send them directly"

This reverts commit 2c08bc5bab.

* cache work requests

* minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors

* publish flushed pages to open cache in the thread pool

* prevent eventloop requests from getting stacked in the event loop

* single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c

* more rrdengine.c cleanup

* enable db rotation

* do not log when there is a filter

* do not run multiple migration to journal v2

* load all extents async

* fix wrong paste

* report opcodes waiting, works dispatched, works executing

* cleanup event loop memory every 10 minutes

* dont dispatch more work requests than the number of threads available

* use the dispatched counter instead of the executing counter to check if the worker thread pool is full

* remove UV_RUN_NOWAIT

* replication to fill the queues

* caching of extent buffers; code cleanup

* caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation

* single transaction wal

* synchronous flushing

* first cancel the threads, then signal them to exit

* caching of rrdeng query handles; added priority to query target; health is now low prio

* add priority to the missing points; do not allow critical priority in queries

* offload query preparation and routing to libuv thread pool

* updated timing charts for the offloaded query preparation

* caching of WALs

* accounting for struct caches (buffers); do not load extents with invalid sizes

* protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset

* also check if the expanded before is not the chart later updated time

* also check if the expanded before is not after the wall clock time of when the query started

* Remove unused variable

* replication to queue less queries; cleanup of internal fatals

* Mark dimension to be updated async

* caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol)

* disable pgc stress test, under an ifdef

* disable mrg stress test under an ifdef

* Mark chart and host labels, host info for async check and store in the database

* dictionary items use arrayalloc

* cache section pages structure is allocated with arrayalloc

* Add function to wakeup the aclk query threads and check for exit
Register function to be called during shutdown after signaling the service to exit

* parallel preparation of all dimensions of queries

* be more sensitive to enable streaming after replication

* atomically finish chart replication

* fix last commit

* fix last commit again

* fix last commit again again

* fix last commit again again again

* unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication

* do not cancel start streaming; use high priority queries when we have locked chart data collection

* prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered

* opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue

* Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore

* Fix bad memory allocation / cleanup

* Cleanup ACLK sync initialization (part 1)

* Don't update metric registry during shutdown (part 1)

* Prevent crash when dashboard is refreshed and host goes away

* Mark ctx that is shutting down.
Test not adding flushed pages to open cache as hot if we are shutting down

* make ML work

* Fix compile without NETDATA_INTERNAL_CHECKS

* shutdown each ctx independently

* fix completion of quiesce

* do not update shared ML charts

* Create ML charts on child hosts.

When a parent runs a ML for a child, the relevant-ML charts
should be created on the child host. These charts should use
the parent's hostname to differentiate multiple parents that might
run ML for a child.

The only exception to this rule is the training/prediction resource
usage charts. These are created on the localhost of the parent host,
because they provide information specific to said host.

* check new ml code

* first save the database, then free all memory

* dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db

* Cleanup metadata thread (part 2)

* increase refcount before dispatching prep command

* Do not try to stop anomaly detection threads twice.

A separate function call has been added to stop anomaly detection threads.
This commit removes the left over function calls that were made
internally when a host was being created/destroyed.

* Remove allocations when smoothing samples buffer

The number of dims per sample is always 1, ie. we are training and
predicting only individual dimensions.

* set the orphan flag when loading archived hosts

* track worker dispatch callbacks and threadpool worker init

* make ML threads joinable; mark ctx having flushing in progress as early as possible

* fix allocation counter

* Cleanup metadata thread (part 3)

* Cleanup metadata thread (part 4)

* Skip metadata host scan when running unittest

* unittest support during init

* dont use all the libuv threads for queries

* break an infinite loop when sleep_usec() is interrupted

* ml prediction is a collector for several charts

* sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit

* worker_unregister() in netdata threads cleanup

* moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h

* fixed ML issues

* removed engine2 directory

* added dbengine2 files in CMakeLists.txt

* move query plan data to query target, so that they can be exposed by in jsonwrap

* uniform definition of query plan according to the other query target members

* event_loop should be in daemon, not libnetdata

* metric_retention_by_uuid() is now part of the storage engine abstraction

* unify time_t variables to have the suffix _s (meaning: seconds)

* old dbengine statistics become "dbengine io"

* do not enable ML resource usage charts by default

* unify ml chart families, plugins and modules

* cleanup query plans from query target

* cleanup all extent buffers

* added debug info for rrddim slot to time

* rrddim now does proper gap management

* full rewrite of the mem modes

* use library functions for madvise

* use CHECKSUM_SZ for the checksum size

* fix coverity warning about the impossible case of returning a page that is entirely in the past of the query

* fix dbengine shutdown

* keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently

* fine tune cache evictions

* dont initialize health if the health service is not running - prevent crash on shutdown while children get connected

* rename AS threads to ACLK[hostname]

* prevent re-use of uninitialized memory in queries

* use JulyL instead of JudyL for PDC operations - to test it first

* add also JulyL files

* fix July memory accounting

* disable July for PDC (use Judy)

* use the function to remove datafiles from linked list

* fix july and event_loop

* add july to libnetdata subdirs

* rename time_t variables that end in _t to end in _s

* replicate when there is a gap at the beginning of the replication period

* reset postponing of sender connections when a receiver is connected

* Adjust update every properly

* fix replication infinite loop due to last change

* packed enums in rrd.h and cleanup of obsolete rrd structure members

* prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests()

* void unused variable

* void unused variables

* fix indentation

* entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps

* macros to caclulate page entries by time and size

* prevent statsd cleanup crash on exit

* cleanup health thread related variables

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: vkalintiris <vasilis@netdata.cloud>
2023-01-10 19:59:21 +02:00
Emmanuel Vasilakis
bf1cb6048b
Use print macros ()
* use print macros

* cast instead
2022-10-25 17:24:07 +03:00
Costa Tsaousis
00712b351b
QUERY_TARGET: new query engine for Netdata Agent ()
* initial implementation of QUERY_TARGET

* rrd2rrdr() interface

* rrddim_find_best_tier_for_timeframe() ported

* added dimension filtering

* added db object in query target

* rrd2rrdr() ported

* working on formatters

* working on jsonwrapper

* finally, it compiles...

* 1st run without crashes

* query planer working

* cleanup old code

* review changes

* fix also changing data collection frequency

* fix signess

* fix rrdlabels and dimension ordering

* fixes

* remove unused variable

* ml should accept NULL response from rrd2rrdr()

* number formatting fixes

* more number formatting fixes

* more number formatting fixes

* support mc parallel queries

* formatting and cleanup

* added rrd2rrdr_legacy() as a simplified interface to run a query

* make sure rrdset_find_natural_update_every_for_timeframe() returns a value

* make signed comparisons

* weights endpoint using rrdcontexts

* fix for legacy db modes and cleanup

* fix for chart_ids and remove AR chart from weights endpoint

* Ignore command if not initialized yet

* remove unused members

* properly initialize window

* code cleanup - rrddim linked list is gone; rrdset rwlock is gone too

* reviewed RRDR.internal members

* eliminate unnecessary members of QUERY_TARGET

* more complete query ids; more detailed information on aborted queries

* properly terminate option strings

* query id contains group_options which is controlled by users, so escaping is necessary

* tense in query id

* tense in query id - again

* added the remaining query options to the query id

* Expose hidden option to the dimension

* use the hidden flag when loading context dimensions

* Specify table alias for option

* dont update chart last access time, unless at least a dimension of the chart will be queried

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-10-23 23:46:43 +03:00
Costa Tsaousis
8fc3b351a2
Allow netdata plugins to expose functions for querying more information about specific charts ()
* function renames and code cleanup in popen.c; no actual code changes

* netdata popen() now opens both child process stdin and stdout and returns FILE * for both

* pass both input and output to parser structures

* updated rrdset to call custom functions

* RRDSET FUNCTION leading calls for both sync and async operation

* put RRDSET functions to a separate file

* added format and timeout at function definition

* support for synchronous (internal plugins) and asynchronous (external plugins and children) functions

* /api/v1/function endpoint

* functions are now attached to the host and there is a dictionary view per chart

* functions implemented at plugins.d

* remove the defer until keyword hook from plugins.d when it is done

* stream sender implementation of functions

* sanitization of all functions so that certain characters are only allowed

* strictier sanitization

* common max size

* 1st working plugins.d example

* always init inflight dictionary

* properly destroy dictionaries to avoid parallel insertion of items

* add more debugging on disconnection reasons

* add more debugging on disconnection reasons again

* streaming receiver respects newlines

* dont use the same fp for both streaming receive and send

* dont free dbengine memory with internal checks

* make sender proceed in the buffer

* added timing info and garbage collection at plugins.d

* added info about routing nodes

* added info about routing nodes with delay

* added more info about delays

* added more info about delays again

* signal sending thread to wake up

* streaming version labeling and commented code to support capabilities

* added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info

* redirect top output to stdout

* address coverity findings

* fix resource leaks of popen

* log attempts to connect to individual destinations

* better messages

* properly parse destinations

* try to find a function from the most matching to the least matching

* log added streaming destinations

* rotate destinations bypassing a node in the middle that does not accept our connection

* break the loops properly

* use typedef to define callbacks

* capabilities negotiation during streaming

* functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities

* restore functionality to lookup functions

* better logging of capabilities

* remove old versions from capabilities when a newer version is there

* fix formatting

* optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time

* delayed health initialization for rrddim and rrdset

* cleanup health initialization

* fix for popen() not returning the right value

* add health worker jobs for initializing rrdset and rrddim

* added content type support for functions; apps.plugin permanent function to display all the processes

* fixes for functions parameters parsing in apps.plugin

* fix for process matching in apps.plugiin

* first working function for apps.plugin

* Dashboard ACL is disabled for functions; Function errors are all in JSON format

* apps.plugin function processes returns json table

* use json_escape_string() to escape message

* fix formatting

* apps.plugin exposes all its metrics to function processes

* fix json formatting when filtering out some rows

* reopen the internal pipe of rrdpush in case of errors

* misplaced statement

* do not use buffer->len

* support for GLOBAL functions (functions that are not linked to a chart

* added /api/v1/functions endpoint; removed format from the FUNCTIONS api;

* swagger documentation about the new api end points

* added plugins.d documentation about functions

* never re-close a file

* remove uncessesary ifdef

* fixed issues identified by codacy

* fix for null label value

* make edit-config copy-and-paste friendly

* Revert "make edit-config copy-and-paste friendly"

This reverts commit 54500c0e0a.

* reworked sender handshake to fix coverity findings

* timeout is zero, for both send_timeout() and recv_timeout()

* properly detect that parent closed the socket

* support caching of function responses; limit function response to 10MB; added protection from malformed function responses

* disabled excessive logging

* added units to apps.plugin function processes and normalized all values to be human readable

* shorter field names

* fixed issues reported

* fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses

* use double linked list macros for double linked list management

* faster apps.plugin function printing by minimizing file operations

* added memory percentage

* fix compatibility issues with older compilers and FreeBSD

* rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables;

* fix letftover variable in ifdef

* apps.plugin: do not call detach from the thread; exit immediately when input is broken

* exclude AR charts from health

* flush cleaner; prefer sender output

* clarity

* do not fill the cbuffer if not connected

* fix

* dont enabled host->sender if streaming is not enabled; send host label updates to parent;

* functions are only available through ACLK

* Prepared statement reports only in dev mode

* fix AR chart detection

* fix for streaming not being enabling itself

* more cleanup of sender and receiver structures

* moved read-only flags and configuration options to rrdhost->options

* fixed merge with master

* fix for incomplete rename

* prevent service thread from working on charts that are being collected

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-10-05 14:13:46 +03:00
Costa Tsaousis
cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
Costa Tsaousis
3f6a75250d
Obsolete RRDSET state ()
* move chart_labels to rrdset

* rename chart_labels to rrdlabels

* renamed hash_id to uuid

* turned is_ar_chart into an rrdset flag

* removed rrdset state

* removed unused senders_connected member of rrdhost

* removed unused host flag RRDHOST_FLAG_MULTIHOST

* renamed rrdhost host_labels to rrdlabels

* Update exporting unit tests

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-07 15:28:30 +03:00
Costa Tsaousis
5e1b95cf92
Deduplicate all netdata strings ()
* rrdfamily

* rrddim

* rrdset plugin and module names

* rrdset units

* rrdset type

* rrdset family

* rrdset title

* rrdset title more

* rrdset context

* rrdcalctemplate context and removal of context hash from rrdset

* strings statistics

* rrdset name

* rearranged members of rrdset

* eliminate rrdset name hash; rrdcalc chart converted to STRING

* rrdset id, eliminated rrdset hash

* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate

* rrdcalctemplate

* rrdvar

* eval_variable

* rrddimvar and rrdsetvar

* rrdhost hostname, os and tags

* fix master commits

* added thread cache; implemented string_dup without locks

* faster thread cache

* rrdset and rrddim now use dictionaries for indexing

* rrdhost now uses dictionary

* rrdfamily now uses DICTIONARY

* rrdvar using dictionary instead of AVL

* allocate the right size to rrdvar flag members

* rrdhost remaining char * members to STRING *

* better error handling on indexing

* strings now use a read/write lock to allow parallel searches to the index

* removed AVL support from dictionaries; implemented STRING with native Judy calls

* string releases should be negative

* only 31 bits are allowed for enum flags

* proper locking on strings

* string threading unittest and fixes

* fix lgtm finding

* fixed naming

* stream chart/dimension definitions at the beginning of a streaming session

* thread stack variable is undefined on thread cancel

* rrdcontext garbage collect per host on startup

* worker control in garbage collection

* relaxed deletion of rrdmetrics

* type checking on dictfe

* netdata chart to monitor rrdcontext triggers

* Group chart label updates

* rrdcontext better handling of collected rrdsets

* rrdpush incremental transmition of definitions should use as much buffer as possible

* require 1MB per chart

* empty the sender buffer before enabling metrics streaming

* fill up to 50% of buffer

* reset signaling metrics sending

* use the shared variable for status

* use separate host flag for enabling streaming of metrics

* make sure the flag is clear

* add logging for streaming

* add logging for streaming on buffer overflow

* circular_buffer proper sizing

* removed obsolete logs

* do not execute worker jobs if not necessary

* better messages about compression disabling

* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped

* monitor stream sender used buffer ratio

* Update exporting unit tests

* no need to compare label value with strcmp

* streaming send workers now monitor bandwidth

* workers now use strings

* streaming receiver monitors incoming bandwidth

* parser shift of worker ids

* minor fixes

* Group chart label updates

* Populate context with dimensions that have data

* Fix chart id

* better shift of parser worker ids

* fix for streaming compression

* properly count received bytes

* ensure LZ4 compression ring buffer does not wrap prematurely

* do not stream empty charts; do not process empty instances in rrdcontext

* need_to_send_chart_definition() does not need an rrdset lock any more

* rrdcontext objects are collected, after data have been written to the db

* better logging of RRDCONTEXT transitions

* always set all variables needed by the worker utilization charts

* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes

* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock

* STRING code re-organization for clarity

* thread_cache improvements; double numbers precision on worker threads

* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS

* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts

* Add index to speed up database context searches

* Removed last_updated optimization (was also buggy after latest merge with master)

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05 19:31:06 +03:00
Stelios Fragkakis
49234f23de
Multi-Tier database backend for long term metrics storage ()
* Tier part 1

* Tier part 2

* Tier part 3

* Tier part 4

* Tier part 5

* Fix some ML compilation errors

* fix more conflicts

* pass proper tier

* move metric_uuid from state to RRDDIM

* move aclk_live_status from state to RRDDIM

* move ml_dimension from state to RRDDIM

* abstracted the data collection interface

* support flushing for mem db too

* abstracted the query api

* abstracted latest/oldest time per metric

* cleanup

* store_metric for tier1

* fix for store_metric

* allow multiple tiers, more than 2

* state to tier

* Change storage type in db. Query param to request min, max, sum or average

* Store tier data correctly

* Fix skipping tier page type

* Add tier grouping in the tier

* Fix to handle archived charts (part 1)

* Temp fix for query granularity when requesting tier1 data

* Fix parameters in the correct order and calculate the anomaly based on the anomaly count

* Proper tiering grouping

* Anomaly calculation based on anomaly count

* force type checking on storage handles

* update cmocka tests

* fully dynamic number of storage tiers

* fix static allocation

* configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode

* use default page dt using the tiering info

* automatic selection of tier

* fix for automatic selection of tier

* working prototype of dynamic tier selection

* automatic selection of tier done right (I hope)

* ask for the proper tier value, based on the grouping function

* fixes for unittests and load_metric_next()

* fixes for lgtm findings

* minor renames

* add dbengine to page cache size setting

* add dbengine to page cache with malloc

* query engine optimized to loop as little are required based on the view_update_every

* query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA

* report db points per tier in jsonwrap

* query planer that switches database tiers on the fly to satisfy the query for the entire timeframe

* dbegnine statistics and documentation (in progress)

* calculate average point duration in db

* handle single point pages the best we can

* handle single point pages even better

* Keep page type in the rrdeng_page_descr

* updated doc

* handle future backwards compatibility - improved statistics

* support &tier=X in queries

* enfore increasing iterations on tiers

* tier 1 is always 1 iteration

* backfilling higher tiers on first data collection

* reversed anomaly bit

* set up to 5 tiers

* natural points should only be offered on tier 0, except a specific tier is selected

* do not allow more than 65535 points of tier0 to be aggregated on any tier

* Work only on actually activated tiers

* fix query interpolation

* fix query interpolation again

* fix lgtm finding

* Activate one tier for now

* backfilling of higher tiers using raw metrics from lower tiers

* fix for crash on start when storage tiers is increased from the default

* more statistics on exit

* fix bug that prevented higher tiers to get any values; added backfilling options

* fixed the statistics log line

* removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle

* fixed division by zero on zero points_wanted

* removed dead code

* Decide on the descr->type for the type of metric

* dont store metrics on unknown page types

* free db_metric_handle on sql based context queries

* Disable STORAGE_POINT value check in the exporting engine unit tests

* fix for db modes other than dbengine

* fix for aclk archived chart queries destroying db_metric_handles of valid rrddims

* fix left-over freez() instead of OWA freez on median queries

Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-07-06 14:01:53 +03:00
Costa Tsaousis
c3dfbe52a6
netdata doubles ()
* netdata doubles

* fix cmocka test

* fix cmocka test again

* fix left-overs of long double to NETDATA_DOUBLE

* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf

* update the memory before saving

* rrdset is now detached from file structures too

* on memory mode map, update the memory mapped structures on every iteration

* allow RRD_ID_LENGTH_MAX to be changed

* granularity secs, back to update every

* fix formatting

* more formatting
2022-06-28 17:04:37 +03:00
Costa Tsaousis
b32ca44319
Query Engine multi-granularity support (and MC improvements) ()
* set grouping functions

* storage engine should check the validity of timestamps, not the query engine

* calculate and store in RRDR anomaly rates for every query

* anomaly rate used by volume metric correlations

* mc volume should use absolute data, to avoid cancelling effect

* return anomaly-rates in jasonwrap with jw-anomaly-rates option to data queries

* dont return null on anomaly rates

* allow passing group query options from the URL

* added countif to the query engine and used it in metric correlations

* fix configure

* fix countif and anomaly rate percentages

* added group_options to metric correlations; updated swagger

* added newline at the end of yaml file

* always check the time the highlighted window was above/below the highlighted window

* properly track time in memory queries

* error for internal checks only

* moved pack_storage_number() into the storage engines

* moved unpack_storage_number() inside the storage engines

* remove old comment

* pass unit tests

* properly detect zero or subnormal values in pack_storage_number()

* fill nulls before the value, not after

* make sure math.h is included

* workaround for isfinite()

* fix for isfinite()

* faster isfinite() alternative

* fix for faster isfinite() alternative

* next_metric() now returns end_time too

* variable step implemented in a generic way

* remove left-over variables

* ensure we always complete the wanted number of points

* fixes

* ensure no infinite loop

* mc-volume-improvements: Add information about invalid condition

* points should have a duration in the past

* removed unneeded info() line

* Fix unit tests for exporting engine

* new_point should only be checked when it is fetched from the db; better comment about the premature breaking of the main query loop

Co-authored-by: Thiago Marques <thiagoftsm@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-22 11:19:08 +03:00
Costa Tsaousis
986a1abf68
73x times faster metrics correlations at the agent ()
* faster correlations

* 4x times faster correlations

* a little bit more help

* 10x times faster metrics correlations

* 6 digits precision; better comments

* enabled metrics correlations by default

* abstracted DIFFS_NUMBER to allow easily changing it

* reworked the entire logic to have more accuracy and support a baseline that is power of two multiple of highlight

* properly calculate shifts

* even more improved version

* added support for timeout; fixed another memory leak; skipped hidden dimensions

* default timeout 1min

* reduce memory even further

* use dictionary for the list of charts and optimize locks

* return 403 forbidden, when mc is not enabled

* added query options

* dont process zero dimensions

* added volume method as an option to metric correlations ; now metric correlations can support multiple implementations

* make sure we will never crash

* spread results evenly for both kstwo and volume

* fixed bug in query engine that was missing misaligned queries when a single point was requested from the db; improved comments; improved query flags

* updated swagger and added sane defaults; query options are now supported, including anomaly-bit

* added "raw" option to allow cross node correlations; added "group" option to allow different time aggregations; allowed calling metric correlations without any parameters; allowed calling metric correlations with relative timestamps; added timeout to volume method; properly handled timeout on ks2 method; json output now sends all parameters back - same for json_wrap; modified query engine to use present time for relative timestamps; modified "allow_past" to mean both past backwards and forwards

* emulate the old behaviour about zero points

* 100% accuracy against python ks_2samp(); now the default is volume and the default points are 500

* added config option to change default metric correlations method

* removed work-arounds now that rrdlabels are merged
2022-06-13 21:31:52 +03:00
Costa Tsaousis
1b0f6c6b22
Labels with dictionary ()
* squashed and rebased to master

* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h

* added unittest for UTF-8 multibyte sanitization

* Fix unit test compilation

* Fix CMake build

* remove double sanitizer for opentsdb; cleanup sanitize_json_string()

* rename error_description to error_message to avoid conflict with json-c

* revert last and undef error_description from json-c

* more unittests; attempt to fix protobuf map issue

* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer

* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted

* better sorted dictionary checking

* proper unittesting for sorted dictionaries

* call dictionary deletion callback when destroying the dictionary

* remove obsolete variable

* Fix exporting unit tests

* Fix k8s label parsing test

* workaround for cmocka and strdupz()

* Bypass cmocka memory allocation check

* Revert "Bypass cmocka memory allocation check"

This reverts commit 4c49923839.

* Revert "workaround for cmocka and strdupz()"

This reverts commit 7bebee0480.

* Bypass cmocka memory allocation checks

* respect json formatting for chart labels

* cloud sends colons

* print the value only once

* allow parenthesis in values and spaces; make stream sender send quotes for values

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-13 20:35:45 +03:00
Costa Tsaousis
7784a16cc7
Dictionary with JudyHS and double linked list ()
* dictionary internals isolation

* more dictionary cleanups

* added unit test

* we should use DICT internally

* disable cups in cmake

* implement DICTIONARY with Judy arrays

* operational JUDY implementation

* JUDY cleanup

* JUDY summary added

* JudyHS implementation with double linked list

* test negative searches too

* optimize destruction

* optimize set to insert first without lookup

* updated stats

* code cleanup; better organization; updated info

* more code cleanup and commenting

* more cleanup, renames and comments

* fix rename

* more cleanups

* use Judy.h from system paths

* added foreach traversal; added flag to add item in front; isolated locks to their own functions; destruction returns the number of bytes freed

* more comments; flags are now 16-bit

* completed unittesting

* addressed comments and added reference counters maintainance

* added unittest in main; tested removal of items in front, back and middle

* added read/write walkthrough and foreach; allowed walkthrough and foreach in write mode to delete the current element (used by cups.plugin); referenced counters removed from the API

* DICTFE.name should be const too

* added API calls for exposing all statistics

* dictionary flags as enum and reference counters as atomic operations

* more comments; improved error handling at unit tests

* added functions to allow unsafe access while traversing the dictionary with locks in place

* check for libcups in cmake

* added delete callback; implemented statsd with this dictionary

* added missing dfe_done()

* added alternative implementation with AVL

* added documentation

* added comments and warning about AVL

* dictionary walktrhough on new code

* simplified foreach; updated docs

* updated docs

* AVL is much faster without hashes

* AVL should follow DBENGINE
2022-06-01 20:01:52 +03:00
Stelios Fragkakis
3071aa055c
Add additional metadata to the data response ()
* Consolidate query params

* Add new option to show full dimensions in the json header (this will include dimensions, charts and chart labels)

* Group and pass parameters with query_params
2022-05-31 21:46:44 +03:00
Stelios Fragkakis
881a1b9e13
Use the chart id instead of chart name in response to incoming cloud context queries () 2021-12-13 16:03:38 +02:00
Stelios Fragkakis
7ebb0a4da2
Fix memory leak when archived data is requested () 2021-03-23 18:09:34 +02:00
Stelios Fragkakis
65bc43d9cb
Add data query support for archived charts () 2021-03-22 09:47:22 +02:00
Stelios Fragkakis
cd443de780
Support multiple chart label keys in data queries () 2021-01-14 18:50:33 +02:00
Ilya Mashchenko
0f8175dd30
Kubernetes labels ()
Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2020-12-14 17:27:55 +03:00
Markos Fountoulakis
5ffba490e3
Fix race condition in rrdset_first_entry_t() and rrdset_last_entry_t() () 2020-11-28 15:53:12 +02:00
Stelios Fragkakis
8f6f1baf9a
Added context parameter to the data endpoint ()
Added functionality to support composite charts
2020-09-15 19:41:39 +03:00
Vladimir Kobal
8cf5889194
Clean up host labels in API responses ()
* Remove host labels from the Swagger specification

* Remove host labels from the api responses
2020-01-06 17:34:49 +02:00
Andrew Moss
c8c72f18a6
Labels issues ()
Initial work on host labels from the dedicated branch. Includes work for issues , , , , , , ,  and  by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
2019-12-16 15:12:00 +01:00
Valentin Rakush
92642269f1 Add alarm variables to the response of chart and data ()
##### Summary
Implements feature  

Now requests like 

http://localhost:19999/api/v1/chart?chart=example.random
http://localhost:19999/api/v1/data?chart=example.random&options=jsonwrap&options=showcustomvars

- return chart variables in their responses. Chart variables include only those with options set to RRDVAR_OPTION_CUSTOM_CHART_VAR
- for /api/v1/data requests chart variables are returned when parameter options=jsonwrap and options=showcustomvars

##### Component Name
[/database](https://github.com/netdata/netdata/tree/master/database/)
[/web/api/formatters](https://github.com/netdata/netdata/tree/master/web/api/formatters)
2019-08-20 11:11:43 +02:00
Markos Fountoulakis
6ca6d840dd Database engine ()
* Database engine prototype version 0

* Database engine initial integration with netdata POC

* Scalable database engine with file and memory management.

* Database engine integration with netdata

* Added MIN MAX definitions to fix alpine build of travis CI

* Bugfix for backends and new DB engine, remove useless rrdset_time2slot() calls and erroneous checks

* DB engine disk protocol correction

* Moved DB engine storage file location to /var/cache/netdata/{host}/dbengine

* Fix configure to require openSSL for DB engine

* Fix netdata daemon health not holding read lock when iterating chart dimensions

* Optimized query API for new DB engine and old netdata DB fallback code-path

* netdata database internal query API improvements and cleanup

* Bugfix for DB engine queries returning empty values

* Added netdata internal check for data queries for old and new DB

* Added statistics to DB engine and fixed memory corruption bug

* Added preliminary charts for DB engine statistics

* Changed DB engine ratio statistics to incremental

* Added netdata statistics charts for DB engine internal statistics

* Fix for netdata not compiling successfully when missing dbengine dependencies

* Added DB engine functional test to netdata unittest command parameter

* Implemented DB engine dataset generator based on example.random chart

* Fix build error in CI

* Support older versions of libuv1

* Fixes segmentation fault when using multiple DB engine instances concurrently

* Fix memory corruption bug

* Fixed createdataset advanced option not exiting

* Fix for DB engine not working on FreeBSD

* Support FreeBSD library paths of new dependencies

* Workaround for unsupported O_DIRECT in OS X

* Fix unittest crashing during cleanup

* Disable DB engine FS caching in Apple OS X since O_DIRECT is not available

* Fix segfault when unittest and DB engine dataset generator don't have permissions to create temporary host

* Modified DB engine dataset generator to create multiple files

* Toned down overzealous page cache prefetcher

* Reduce internal memory fragmentation for page-cache data pages

* Added documentation describing the DB engine

* Documentation bugfixes

* Fixed unit tests compilation errors since last rebase

* Added note to back-up the DB engine files in documentation

* Added codacy fix.

* Support old gcc versions for atomic counters in DB engine
2019-05-15 08:28:06 +03:00
Costa Tsaousis
798c141c49
Split the API formatters in modules ()
* split all API formatters in modules

* added markdown formatting

* updated csv readme

* updated csv readme

* more documentation

* added more documentation

* updated documentation

* fixed typo

* fixed typo
2018-10-27 19:44:27 +03:00