0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-05-21 00:09:54 +00:00
Commit graph

56 commits

Author SHA1 Message Date
Costa Tsaousis
080e0aee27
Faster queries ()
* faster rrdeng_load_metric_next()

* no need to check validity for number - already done at the query side

* solve discrepancy between query create and free

* inline unpack_storage_number
2022-05-24 08:20:40 +03:00
Costa Tsaousis
7b272cbea8
query engine optimizations and cleanup ()
* move number unpacking close to next_metric

* dont miss group value flags
2022-05-21 12:29:38 +03:00
Adrien Béraud
9adf4dd782
Configurable storage engine for Netdata agents: step 2 () 2022-05-11 16:17:40 +03:00
Costa Tsaousis
79444d3645
fix memory leaks and mismatches of the use of the z functions for allocations ()
* fix mismatches of the use of the z functions for allocations

* when there was no memory; the original name of the dimensions was freed, and with mismatching deallocator..

* fixed memory leak at rrdeng_load_metric_*() functions

* fixed memory leak on exit of plugins.d parser

* fixed memory leak on plugins and streaming receiver threads exit

* fixed compiler warnings
2022-05-07 23:00:44 +03:00
Costa Tsaousis
5b48a70abd
speedup queries by providing optimization in the main loop () 2022-05-07 08:38:41 +03:00
Adrien Béraud
d92890b5f1
Configurable storage engine for Netdata agents: step 1 ()
* rrd: move API structures out of rrddim_volatile

In C, unlike C++, it's not possible to reference a nested structure
from outside this structure.

Since we later want to use rrddim_query_ops and rrddim_collect_ops
separately from rrddim_volatile, move these nested structures out.

* rrd: use opaque handle types for different memory modes
2022-05-03 11:34:15 +03:00
Costa Tsaousis
87c0cc2d60
One way allocator to double the speed of parallel context queries ()
* one way allocator to speed up context queries

* fixed a bug while expanding memory pages

* reworked for clarity and finally fixed the bug of allocating memory beyond the page size

* further optimize allocation step to minimize the number of allocations made

* implement strdup with memcpy instead of strcpy

* added documentation

* prevent an uninitialized use of owa

* added callocz() interface

* integrate onewayalloc everywhere - apart sql queries

* one way allocator is now used in context queries using archived charts in sql

* align on the size of pointers

* forgotten freez()

* removed not needed memcpys

* give unique names to global variables to avoid conflicts with system definitions
2022-05-03 00:31:19 +03:00
Stelios Fragkakis
81b3d4b71e
Add a timeout parameter to data queries ()
* Add timeout parameter in queries and in calling functions

* Add CANCEL flag in RRDR and code to cancel a query

* Update swagger

* Format swagger file properly
2022-04-11 22:34:04 +03:00
Vladimir Kobal
d8b7b6a25f
Fix compilation warnings on macOS () 2022-02-21 12:23:48 +02:00
Tina Luedtke
c7f2647a62
Docs: Removed Google Analytics tags () 2022-02-17 10:37:46 +00:00
Stelios Fragkakis
15dd0e4b5a
Fix so that allow_past correctly works in memory mode ram and save () 2022-02-17 09:12:58 +02:00
Stelios Fragkakis
454387fcf4
Cleanup compilation warnings ()
* Fix compilation warnings (variables used when debugging is enabled using NETDATA_INTERNAL_CHECKS)
* Fix compilation warning (casting)
2021-11-19 22:12:29 +02:00
vkalintiris
9ed4cea590
Anomaly Detection MVP ()
* Add support for feature extraction and K-Means clustering.

This patch adds support for performing feature extraction and running the
K-Means clustering algorithm on the extracted features.

We use the open-source dlib library to compute the K-Means clustering
centers, which has been added as a new git submodule.

The build system has been updated to recognize two new options:

    1) --enable-ml: build an agent with ml functionality, and
    2) --enable-ml-tests: support running tests with the `-W mltest`
       option in netdata.

The second flag is meant only for internal use. To build tests successfully,
you need to install the GoogleTest framework on your machine.

* Boilerplate code to track hosts/dims and init ML config options.

A new opaque pointer field is added to the database's host and dimension
data structures. The fields point to C++ wrapper classes that will be used
to store ML-related information in follow-up patches.

The ML functionality needs to iterate all tracked dimensions twice per
second. To avoid locking the entire DB multiple times, we use a
separate dictionary to add/remove dimensions as they are created/deleted
by the database.

A global configuration object is initialized during the startup of the
agent. It will allow our users to specify ML-related configuration
options, eg. hosts/charts to skip from training, etc.

* Add support for training and prediction of dimensions.

Every new host spawns a training thread which is used to train the model
of each dimension.

Training of dimensions is done in a non-batching mode in order to avoid
impacting the generated ML model by the CPU, RAM and disk utilization of
the training code itself.

For performance reasons, prediction is done at the time a new value
is pushed in the database. The alternative option, ie. maintaining a
separate thread for prediction, would be ~3-4x times slower and would
increase locking contention considerably.

For similar reasons, we use a custom function to unpack storage_numbers
into doubles, instead of long doubles.

* Add data structures required by the anomaly detector.

This patch adds two data structures that will be used by the anomaly
detector in follow-up patches.

The first data structure is a circular bit buffer which is being used to
count the number of set bits over time.

The second data structure represents an expandable, rolling window that
tracks set/unset bits. It is explicitly modeled as a finite-state
machine in order to make the anomaly detector's behaviour easier to test
and reason about.

* Add anomaly detection thread.

This patch creates a new anomaly detection thread per host. Each thread
maintains a BitRateWindow which is updated every second based on the
anomaly status of the correspondent host.

Based on the updated status of the anomaly window, we can identify the
existence/absence of an anomaly event, it's start/end time and the
dimensions that participate in it.

* Create/insert/query anomaly events from Sqlite DB.

* Create anomaly event endpoints.

This patch adds two endpoints to expose information about anomaly
events. The first endpoint returns the list of anomalous events within a
specified time range. The second endpoint provides detailed information
about a single anomaly event, ie. the list of anomalous dimensions in
that event along with their anomaly rate.

The `anomaly-bit` option has been added to the `/data` endpoint in order
to allow users to get the anomaly status of individual dimensions per
second.

* Fix build failures on Ubuntu 16.04 & CentOS 7.

These distros do not have toolchains with C++11 enabled by default.
Replacing nullptr with NULL should be fix the build problems on these
platforms when the ML feature is not enabled.

* Fix `make dist` to include ML makefiles and dlib sources.

Currently, we add ml/kmeans/dlib to EXTRA_DIST. We might want to
generate an explicit list of source files in the future, in order to
bring down the generated archive's file size.

* Small changes to make the LGTM & Codacy bots happy.

- Cast unused result of function calls to void.
- Pass a const-ref string to Database's constructor.
- Reduce the scope of a local variable in the anomaly detector.

* Add user configuration option to enable/disable anomaly detection.

* Do not log dimension-specific operations.

Training and prediction operations happen every second for each
dimension. In prep for making this PR easier to run anomaly detection
for many charts & dimensions, I've removed logs that would cause log
flooding.

* Reset dimensions' bit counter when not above anomaly rate threshold.

* Update the default config options with real values.

With this patch the default configuration options will match the ones
we want our users to use by default.

* Update conditions for creating new ML dimensions.

1. Skip dimensions with update_every != 1,
2. Skip dimensions that come from the ML charts.

With this filtering in place, any configuration value for the
relevant simple_pattern expressions will work correctly.

* Teach buildinfo{,json} about the ML feature.

* Set --enable-ml by default in the configuration options.

This patch is only meant for testing the building of the ML functionality
on Github. It will be reverted once tests pass successfully.

* Minor build system fixes.

- Add path to json header
- Enable C++ linker when ML functionality is enabled
- Rename ml/ml-dummy.cc to ml/ml-dummy.c

* Revert "Set --enable-ml by default in the configuration options."

This reverts commit 28206952a59a577675c86194f2590ec63b60506c.

We pass all Github checks when building the ML functionality, except for
those that run on CentOS 7 due to not having a C++11 toolchain.

* Check for missing dlib and nlohmann files.

We simply check the single-source files upon which our build system
depends. If they are missing, an error message notifies the user
about missing git submodules which are required for the ML
functionality.

* Allow users to specify the maximum number of KMeans iterations.

* Use dlib v19.10

v19.22 broke compatibility with CentOS 7's g++. Development of the
anomaly detection used v19.10, which is the version used by most Debian and
Ubuntu distribution versions that are not past EOL.

No observable performance improvements/regressions specific to the K-Means
algorithm occur between the two versions.

* Detect and use the -std=c++11 flag when building anomaly detection.

This patch automatically adds the -std=c++11 when building netdata
with the ML functionality, if it's supported by the user's toolchain.

With this change we are able to build the agent correctly on CentOS 7.

* Restructure configuration options.

- update default values,
- clamp values to min/max defaults,
- validate and identify conflicting values.

* Add update_every configuration option.

Considerring that the MVP does not support per host configuration
options, the update_every option will be used to filter hosts to train.

With this change anomaly detection will be supported on:

    - Single nodes with update_every != 1, and
    - Children nodes with a common update_every value that might differ from
      the value of the parent node.

* Reorganize anomaly detection charts.

This follows Andrew's suggestion to have four charts to show the number
of anomalous/normal dimensions, the anomaly rate, the detector's window
length, and the events that occur in the prediction step.

Context and family values, along with the necessary information in the
dashboard_info.js file, will be updated in a follow-up commit.

* Do not dump anomaly event info in logs.

* Automatically handle low "train every secs" configuration values.

If a user specifies a very low value for the "train every secs", then
it is possible that the time it takes to train a dimension is higher
than the its allotted time.

In that case, we want the training thread to:

    - Reduce it's CPU usage per second, and
    - Allow the prediction thread to proceed.

We achieve this by limiting the training time of a single dimension to
be equal to half the time allotted to it. This means, that the training
thread will never consume more than 50% of a single core.

* Automatically detect if ML functionality should be enabled.

With these changes, we enable ML if:

    - The user has not explicitly specified --disable-ml, and
    - Git submodules have been checked out properly, and
    - The toolchain supports C++11.

If the user has explicitly specified --enable-ml, the build fails if
git submodules are missing, or the toolchain does not support C++11.

* Disable anomaly detection by default.

* Do not update charts in locked region.

* Cleanup code reading configuration options.

* Enable C++ linker when building ML.

* Disable ML functionality for CMake builds.

* Skip LGTM for dlib and nlohmann libraries.

* Do not build ML if libuuid is missing.

* Fix dlib path in LGTM's yaml config file.

* Add chart to track duration of prediction step.

* Add chart to track duration of training step.

* Limit the number dimensions in an anomaly event.

This will ensure our JSON results won't grow without any limit. The
default ML configuration options, train approximately ~1700 dimensions
in a newly-installed Netdata agent. The hard-limit is set to 2000
dimensions which:

    - Is well above the default number of dimensions we train,
    - If it is ever reached it means that the user had accidentaly a
      very low anomaly rate threshold, and
    - Considering that we sort the result by anomaly score, the cutoff
      dimensions will be the less anomalous, ie. the least important to
      investigate.

* Add information about the ML charts.

* Update family value in ML charts.

This fix will allow us to show the individual charts in the RHS Anomaly
Detection submenu.

* Rename chart type

s/anomalydetection/anomaly_detection/g

* Expose ML feat in /info endpoint.

* Export ML config through /info endpoint.

* Fix CentOS 7 build.

* Reduce the critical region of a host's lock.

Before this change, each host had a single, dedicated lock to protect
its map of dimensions from adding/deleting new dimensions while training
and detecting anomalies. This was problematic because training of a
single dimension can take several seconds in nodes that are under heavy
load.

After this change, the host's lock protects only the insertion/deletion
of new dimensions, and the prediction step. For the training of dimensions
we use a dedicated lock per dimension, which is responsible for protecting
the dimension from deletion while training.

Prediction is fast enough, even on slow machines or under heavy load,
which allows us to use the host's main lock and avoid increasing the
complexity of our implementation in the anomaly detector.

* Improve the way we are tracking anomaly detector's performance.

This change allows us to:

    - track the total training time per update_every period,
    - track the maximum training time of a single dimension per
      update_every period, and
    - export the current number of total, anomalous, normal dimensions
      to the /info endpoint.

Also, now that we use dedicated locks per dimensions, we can train under
heavy load continuously without having to sleep in order to yield the
training thread and allow the prediction thread to progress.

* Use samples instead of seconds in ML configuration.

This commit changes the way we are handling input ML configuration
options from the user. Instead of treating values as seconds, we
interpret all inputs as number of update_every periods. This allows
us to enable anomaly detection on hosts that have update_every != 1
second, and still produce a model for training/prediction & detection
that behaves in an expected way.

Tested by running anomaly detection on an agent with update_every = [1,
2, 4] seconds.

* Remove unecessary log message in detection thread

* Move ML configuration to global section.

* Update web/gui/dashboard_info.js

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>

* Fix typo

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>

* Rebase.

* Use negative logic for anomaly bit.

* Add info for prediction_stats and training_stats charts.

* Disable ML on PPC64EL.

The CI test fails with -std=c++11 and requires -std=gnu++11 instead.
However, it's not easy to quickly append the required flag to CXXFLAGS.
For the time being, simply disable ML on PPC64EL and if any users
require this functionality we can fix it in the future.

* Add comment on why we disable ML on PPC64EL.

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>
2021-10-27 09:26:21 +03:00
vkalintiris
b8cd2bdc50
Remove unecessary relative paths when including headers. ()
Currently, we add the repository's top-level dir in the compiler's
header search path. This means that code in every top-level directory
within the repo can include headers sibling top-level directories.

This patch makes header inclusion consistent when it comes to files
that are included from sibling top-level directories within the repo.
2021-05-24 17:44:50 +03:00
Josh Soref
b9fff8a3aa
Spelling web api server () 2021-04-14 13:39:23 +03:00
Stelios Fragkakis
0f78020f04
Add lock check to avoid shutdown when compiled with internal and locking checks () 2021-03-23 14:41:53 +02:00
Stelios Fragkakis
65bc43d9cb
Add data query support for archived charts () 2021-03-22 09:47:22 +02:00
Tomáš Kopal
757e418090
Rename abs to ABS to avoid clash with standard definitions. Fixes . () 2021-03-17 12:18:33 +02:00
Stelios Fragkakis
b76a297de1
Fix the context filtering on the data query endpoint () 2021-02-17 21:13:38 +02:00
Stelios Fragkakis
f7b3283956
Fix memory allocation when computing standard deviation () 2021-01-13 12:53:40 +02:00
Joel Hans
1f321de3cb
Fixes for SEO changes () 2021-01-08 14:54:36 -07:00
Joel Hans
46a8075c8f
Docs housekeeping for SEO and syntax, part 1 ()
* First pass to get the script working right

* Finish adding analytics tags
2021-01-07 11:44:43 -07:00
Markos Fountoulakis
5ffba490e3
Fix race condition in rrdset_first_entry_t() and rrdset_last_entry_t() () 2020-11-28 15:53:12 +02:00
Stelios Fragkakis
f3d00f79ab
Added new data query option "allow_past" () 2020-10-22 21:43:20 +03:00
Stelios Fragkakis
8067642361
Improved the data query when using the context parameter () 2020-09-24 13:05:15 +03:00
Stelios Fragkakis
8f6f1baf9a
Added context parameter to the data endpoint ()
Added functionality to support composite charts
2020-09-15 19:41:39 +03:00
Markos Fountoulakis
3e0661a2af
Fix buffer overflow in rrdr structure when metric timestamps are out of order. () 2020-09-09 17:03:25 +03:00
Joel Hans
e99692f145
Docs: Standardize links between documentation ()
* Trying out some absolute-ish links

* Try one out on installer

* Testing logic

* Trying out some more links

* Fixing links

* Fix links in python collectors

* Changed a bunch more links

* Fix build errors

* Another push of links

* Fix build error and add more links

* Complete first pass

* Fix final broken links

* Fix links to files

* Fix for Netlify

* Two more fixes
2020-04-14 10:26:13 -07:00
Joel Hans
9342704a41
Bulk add frontmatter to all documentation ()
* Bulk add frontmatter

* A few extra edge cases
2020-03-10 14:29:51 -07:00
Konstantinos Natsakis
675383b26a
Makefile.am files indentation ()
* Use 4 spaces for indentation of non-recipe lines in Makefile.am files

* Be more consistent in the use of space before = in Makefile.am files
2019-11-11 01:30:00 +02:00
thiagoftsm
ad8b796621
Clang warnings ()
* clang_warnings: Fix unecessary comparison

Netdata was verifying whether a pointer that will never be NULL could be NULL.
This commit removes this

* clang_warnings: Fix unecessary comparison

Netdata was doing another unecessary comparison in other file

* clang_warnings: Unecessary parenthesis

This commit removes the excess of parenthesis in a file

* clang_warnings: Remove unecessary initialization

Remove from json file a initial set that is overwritten few lines late

* clang_warnings: Comments

Fix comments on top of the function

* clang_warnings: Missing Cast

Volatile variable generates warnings with Clang sometimes, so it
was necessary to cast variables

* clang_warnings: Return from previous

Considering the possible problems given by the solution,
I am returning for the previous stage
2019-10-15 20:04:09 +00:00
Markos Fountoulakis
65727b6a30
Restore original alignment behaviour of RRDR ()
* Restore original alignment behaviour of RRDR
2019-09-25 00:11:05 +03:00
Markos Fountoulakis
2a06960117
Variable Granularity support for data collection ()
* Variable Granularity support for data collection in the dbengine.

* Variable Granularity support for data collection in the daemon.

* Added tests to validate the data being queried after having been collected by changing data collection interval

* Fix memory corruption

* Updated database engine documentation about data collection frequency behaviour
2019-08-28 17:33:51 +03:00
Valentin Rakush
92642269f1 Add alarm variables to the response of chart and data ()
##### Summary
Implements feature  

Now requests like 

http://localhost:19999/api/v1/chart?chart=example.random
http://localhost:19999/api/v1/data?chart=example.random&options=jsonwrap&options=showcustomvars

- return chart variables in their responses. Chart variables include only those with options set to RRDVAR_OPTION_CUSTOM_CHART_VAR
- for /api/v1/data requests chart variables are returned when parameter options=jsonwrap and options=showcustomvars

##### Component Name
[/database](https://github.com/netdata/netdata/tree/master/database/)
[/web/api/formatters](https://github.com/netdata/netdata/tree/master/web/api/formatters)
2019-08-20 11:11:43 +02:00
Promise Akpan
f5006d51e8 Fix Markdown Lint warnings ()
* make remark access all directories

* detailed fix after autofix by remark lint

* cross check autofix for this set of files

* crosscheck more files

* crosschecking and small fixes

* crosscheck autofixed md files
2019-08-15 13:06:39 +02:00
Joel Hans
a726c905bd
Change "netdata" to "Netdata" in all docs ()
* First pass of changing netdata to Netdata

* Second pass of netdata -> Netdata

* Starting work on netdata with no whitespace after

* Pass for netdata with no whitespace at the end

* Pass for netdata with no whitespace at the front
2019-08-13 08:07:17 -07:00
Markos Fountoulakis
6ca6d840dd Database engine ()
* Database engine prototype version 0

* Database engine initial integration with netdata POC

* Scalable database engine with file and memory management.

* Database engine integration with netdata

* Added MIN MAX definitions to fix alpine build of travis CI

* Bugfix for backends and new DB engine, remove useless rrdset_time2slot() calls and erroneous checks

* DB engine disk protocol correction

* Moved DB engine storage file location to /var/cache/netdata/{host}/dbengine

* Fix configure to require openSSL for DB engine

* Fix netdata daemon health not holding read lock when iterating chart dimensions

* Optimized query API for new DB engine and old netdata DB fallback code-path

* netdata database internal query API improvements and cleanup

* Bugfix for DB engine queries returning empty values

* Added netdata internal check for data queries for old and new DB

* Added statistics to DB engine and fixed memory corruption bug

* Added preliminary charts for DB engine statistics

* Changed DB engine ratio statistics to incremental

* Added netdata statistics charts for DB engine internal statistics

* Fix for netdata not compiling successfully when missing dbengine dependencies

* Added DB engine functional test to netdata unittest command parameter

* Implemented DB engine dataset generator based on example.random chart

* Fix build error in CI

* Support older versions of libuv1

* Fixes segmentation fault when using multiple DB engine instances concurrently

* Fix memory corruption bug

* Fixed createdataset advanced option not exiting

* Fix for DB engine not working on FreeBSD

* Support FreeBSD library paths of new dependencies

* Workaround for unsupported O_DIRECT in OS X

* Fix unittest crashing during cleanup

* Disable DB engine FS caching in Apple OS X since O_DIRECT is not available

* Fix segfault when unittest and DB engine dataset generator don't have permissions to create temporary host

* Modified DB engine dataset generator to create multiple files

* Toned down overzealous page cache prefetcher

* Reduce internal memory fragmentation for page-cache data pages

* Added documentation describing the DB engine

* Documentation bugfixes

* Fixed unit tests compilation errors since last rebase

* Added note to back-up the DB engine files in documentation

* Added codacy fix.

* Support old gcc versions for atomic counters in DB engine
2019-05-15 08:28:06 +03:00
Costa Tsaousis
44955f720e
fix incorrect use of isnormal() () 2019-03-21 19:14:14 +02:00
Chris Akritidis
415f57c5bf
Ga ()
* Added GA tags to markdowns

* Add GA tags to mds
2018-12-07 11:30:04 +01:00
Chris Akritidis
36b8bacf49 Add info from PR 208 ()
Fixes 
2018-11-30 21:09:02 +02:00
Costa Tsaousis
ba75ffc056
bugfix: query engine resampling duration () 2018-11-27 15:14:43 +02:00
Costa Tsaousis
fc1544c4d7
fixed wrong annotations given to google charts ()
* fixed wrong annotations given to google charts

* added default rrdr dimension flag
2018-10-31 22:45:02 +02:00
Costa Tsaousis
798c141c49
Split the API formatters in modules ()
* split all API formatters in modules

* added markdown formatting

* updated csv readme

* updated csv readme

* more documentation

* added more documentation

* updated documentation

* fixed typo

* fixed typo
2018-10-27 19:44:27 +03:00
Costa Tsaousis
bcdfedbe82
fixed rpm build; () 2018-10-26 20:34:48 +03:00
Costa Tsaousis
cdf57a00e1
fix query min-max, again... () 2018-10-26 02:53:22 +03:00
Costa Tsaousis
652654c1ac
restored min-max calculation of RRDR () 2018-10-25 16:32:12 +03:00
Costa Tsaousis
f8001c4a7c
updated queries README () 2018-10-25 04:44:46 +03:00
Costa Tsaousis
c8ae8d380e
updated queries README () 2018-10-25 04:33:38 +03:00
Costa Tsaousis
5d1feb195a
updated queries README () 2018-10-25 04:08:35 +03:00
Costa Tsaousis
93467af78c
query engine documentation and stats () 2018-10-25 03:34:21 +03:00