![]() * rrdset - in progress * rrdset optimal constructor; rrdset conflict * rrdset final touches * re-organization of rrdset object members * prevent use-after-free * dictionary dfe supports also counting of iterations * rrddim managed by dictionary * rrd.h cleanup * DICTIONARY_ITEM now is referencing actual dictionary items in the code * removed rrdset linked list * Revert "removed rrdset linked list" This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5. * removed rrdset linked list * added comments * Switch chart uuid to static allocation in rrdset Remove unused functions * rrdset_archive() and friends... * always create rrdfamily * enable ml_free_dimension * rrddim_foreach done with dfe * most custom rrddim loops replaced with rrddim_foreach * removed accesses to rrddim->dimensions * removed locks that are no longer needed * rrdsetvar is now managed by the dictionary * set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853 * conflict callback of rrdsetvar now properly checks if it has to reset the variable * dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM * dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe * dictionary walkthrough callbacks get dictionary acquired items * dictionary reference counters that can be dupped from zero * added advanced functions for get and del * rrdvar managed by dictionaries * thread safety for rrdsetvar * faster rrdvar initialization * rrdvar string lengths should match in all add, del, get functions * rrdvar internals hidden from the rest of the world * rrdvar is now acquired throughout netdata * hide the internal structures of rrdsetvar * rrdsetvar is now acquired through out netdata * rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata * better error handling * dont create variables if not initialized for health * dont create variables if not initialized for health again * rrdfamily is now managed by dictionaries; references of it are acquired dictionary items * type checking on acquired objects * rrdcalc renaming of functions * type checking for rrdfamily_acquired * rrdcalc managed by dictionaries * rrdcalc double free fix * host rrdvars is always needed * attempt to fix deadlock 1 * attempt to fix deadlock 2 * Remove unused variable * attempt to fix deadlock 3 * snprintfz * rrdcalc index in rrdset fix * Stop storing active charts and computing chart hashes * Remove store active chart function * Remove compute chart hash function * Remove sql_store_chart_hash function * Remove store_active_dimension function * dictionary delayed destruction * formatting and cleanup * zero dictionary base on rrdsetvar * added internal error to log delayed destructions of dictionaries * typo in rrddimvar * added debugging info to dictionary * debug info * fix for rrdcalc keys being empty * remove forgotten unlock * remove deadlock * Switch to metadata version 5 and drop chart_hash chart_hash_map chart_active dimension_active v_chart_hash * SQL cosmetic changes * do not busy wait while destroying a referenced dictionary * remove deadlock * code cleanup; re-organization; * fast cleanup and flushing of dictionaries * number formatting fixes * do not delete configured alerts when archiving a chart * rrddim obsolete linked list management outside dictionaries * removed duplicate contexts call * fix crash when rrdfamily is not initialized * dont keep rrddimvar referenced * properly cleanup rrdvar * removed some locks * Do not attempt to cleanup chart_hash / chart_hash_map * rrdcalctemplate managed by dictionary * register callbacks on the right dictionary * removed some more locks * rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread * when looking up for an alarm look using both chart id and chart name * host initialization a bit more modular * init rrdlabels on host update * preparation for dictionary views * improved comment * unused variables without internal checks * service threads isolation and worker info * more worker info in service thread * thread cancelability debugging with internal checks * strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647 * dictionary modularization * Remove unused SQL statement definition * unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated * remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops * rewritten dictionary to have 2 separate locks, one for indexing and another for traversal * Update collectors/cgroups.plugin/sys_fs_cgroup.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * Update collectors/cgroups.plugin/sys_fs_cgroup.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * Update collectors/proc.plugin/proc_net_dev.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * fix memory leak in rrdset cache_dir * minor dictionary changes * dont use index locks in single threaded * obsolete dict option * rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim; * fix jump on uninitialized value in dictionary; remove double free of cache_dir * addressed codacy findings * removed debugging code * use the private refcount on dictionaries * make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim; * more dictionary statistics * global statistics about dictionary operations, memory, items, callbacks * dictionary support for views - missing the public API * removed warning about unused parameter * chart and context name for cloud * chart and context name for cloud, again * dictionary statistics fixed; first implementation of dictionary views - not currently used * only the master can globally delete an item * context needs netdata prefix * fix context and chart it of spins * fix for host variables when health is not enabled * run garbage collector on item insert too * Fix info message; remove extra "using" * update dict unittest for new placement of garbage collector * we need RRDHOST->rrdvars for maintaining custom host variables * Health initialization needs the host->host_uuid * split STRING to its own files; no code changes other than that * initialize health unconditionally * unit tests do not pollute the global scope with their variables * Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net> |
||
---|---|---|
.. | ||
average | ||
countif | ||
des | ||
incremental_sum | ||
max | ||
median | ||
min | ||
percentile | ||
ses | ||
stddev | ||
sum | ||
trimmed_mean | ||
Makefile.am | ||
query.c | ||
query.h | ||
README.md | ||
rrdr.c | ||
rrdr.h | ||
weights.c | ||
weights.h |
Database Queries
Netdata database can be queried with /api/v1/data
and /api/v1/badge.svg
REST API methods.
Every data query accepts the following parameters:
name | required | description |
---|---|---|
chart |
yes | The chart to be queried. |
points |
no | The number of points to be returned. Netdata can reduce number of points by applying query grouping methods. If not given, the result will have the same granularity as the database (although this relates to gtime ). |
before |
no | The absolute timestamp or the relative (to now) time the query should finish evaluating data. If not given, it defaults to the timestamp of the latest point in the database. |
after |
no | The absolute timestamp or the relative (to before ) time the query should start evaluating data. if not given, it defaults to the timestamp of the oldest point in the database. |
group |
no | The grouping method to use when reducing the points the database has. If not given, it defaults to average . |
gtime |
no | A resampling period to change the units of the metrics (i.e. setting this to 60 will convert per second metrics to per minute . If not given it defaults to granularity of the database. |
options |
no | A bitmap of options that can affect the operation of the query. Only 2 options are used by the query engine: unaligned and percentage . All the other options are used by the output formatters. The default is to return aligned data. |
dimensions |
no | A simple pattern to filter the dimensions to be queried. The default is to return all the dimensions of the chart. |
Operation
The query engine works as follows (in this order):
Time-frame
after
and before
define a time-frame, accepting:
-
absolute timestamps (unix timestamps, i.e. seconds since epoch).
-
relative timestamps:
before
is relative to now andafter
is relative tobefore
.Example:
before=-60&after=-60
evaluates to the time-frame from -120 up to -60 seconds in the past, relative to the latest entry of the database of the chart.
The engine verifies that the time-frame requested is available at the database:
-
If the requested time-frame overlaps with the database, the excess requested will be truncated.
-
If the requested time-frame does not overlap with the database, the engine will return an empty data set.
At the end of this operation, after
and before
are absolute timestamps.
Data grouping
Database points grouping is applied when the caller requests a time-frame to be expressed with fewer points, compared to what is available at the database.
There are 2 uses that enable this feature:
-
The caller requests a specific number of
points
to be returned.For example, for a time-frame of 10 minutes, the database has 600 points (1/sec), while the caller requested these 10 minutes to be expressed in 200 points.
This feature is used by Netdata dashboards when you zoom-out the charts. The dashboard is requesting the number of points the user's screen has. This saves bandwidth and speeds up the browser (fewer points to evaluate for drawing the charts).
-
The caller requests a re-sampling of the database, by setting
gtime
to any value above the granularity of the chart.For example, the chart's units is
requests/sec
and caller wantsrequests/min
.
Using points
and gtime
the query engine tries to find a best fit for database-points
vs result-points (we call this ratio group points
). It always tries to keep group points
an integer. Keep in mind the query engine may shift after
if required. See also the example.
Time-frame Alignment
Alignment is a very important aspect of Netdata queries. Without it, the animated charts on the dashboards would constantly change shape during incremental updates.
To provide consistent grouping through time, the query engine (by default) aligns
after
and before
to be a multiple of group points
.
For example, if group points
is 60 and alignment is enabled, the engine will return
each point with durations XX:XX:00 - XX:XX:59, matching whole minutes.
To disable alignment, pass &options=unaligned
to the query.
Query Execution
To execute the query, the engine evaluates all dimensions of the chart, one after another.
The engine does not evaluate dimensions that do not match the simple pattern
given at the dimensions
parameter, except when options=percentage
is given (this option
requires all the dimensions to be evaluated to find the percentage of each dimension vs to chart
total).
For each dimension, it starts evaluating values starting at after
(not inclusive) towards
before
(inclusive).
For each value it calls the grouping method given with the &group=
query parameter
(the default is average
).
Grouping methods
The following grouping methods are supported. These are given all the values in the time-frame
and they group the values every group points
.
finds the minimum value
finds the maximum value
finds the average value
adds all the values and returns the sum
sorts the values and returns the value in the middle of the list
finds the standard deviation of the values
finds the relative standard deviation (coefficient of variation) of the values
finds the exponential weighted moving average of the values
applies Holt-Winters double exponential smoothing
finds the difference of the last vs the first value
The examples shown above, are live information from the successful
web requests of the global Netdata registry.
Further processing
The result of the query engine is always a structure that has dimensions and values for each dimension.
Formatting modules are then used to convert this result in many different formats and return it to the caller.
Performance
The query engine is highly optimized for speed. Most of its modules implement "online" versions of the algorithms, requiring just one pass on the database values to produce the result.
Example
When Netdata is reducing metrics, it tries to return always the same boundaries. So, if we want 10s averages, it will always return points starting at a unix timestamp % 10 = 0
.
Let's see why this is needed, by looking at the error case.
Assume we have 5 points:
time | value |
---|---|
00:01 | 1 |
00:02 | 2 |
00:03 | 3 |
00:04 | 4 |
00:05 | 5 |
At 00:04 you ask for 2 points for 4 seconds in the past. So group = 2
. Netdata would return:
point | time | value |
---|---|---|
1 | 00:01 - 00:02 | 1.5 |
2 | 00:03 - 00:04 | 3.5 |
A second later the chart is to be refreshed, and makes again the same request at 00:05. These are the points that would have been returned:
point | time | value |
---|---|---|
1 | 00:02 - 00:03 | 2.5 |
2 | 00:04 - 00:05 | 4.5 |
Wait a moment! The chart was shifted just one point and it changed value! Point 2 was 3.5 and when shifted to point 1 is 2.5! If you see this in a chart, it's a mess. The charts change shape constantly.
For this reason, Netdata always aligns the data it returns to the group
.
When you request points=1
, Netdata understands that you need 1 point for the whole database, so group = 3600
. Then it tries to find the starting point which would be timestamp % 3600 = 0
Within a database of 3600 seconds, there is one such point for sure. Then it tries to find the average of 3600 points. But, most probably it will not find 3600 of them (for just 1 out of 3600 seconds this query will return something).
So, the proper way to query the database is to also set at least after
. The following call will returns 1 point for the last complete 10-second duration (it starts at timestamp % 10 = 0
):
http://netdata.firehol.org/api/v1/data?chart=system.cpu&points=1&after=-10&options=seconds
When you keep calling this URL, you will see that it returns one new value every 10 seconds, and the timestamp always ends with zero. Similarly, if you say points=1&after=-5
it will always return timestamps ending with 0 or 5.