0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-05-21 00:09:54 +00:00
Commit graph

80 commits

Author SHA1 Message Date
Stelios Fragkakis
60ee48f7bb
Take into account the in queue wait time when executing a data query ()
Take into account the in queue wait time when executing a query with a timeout
2022-05-12 22:39:18 +03:00
Emmanuel Vasilakis
4078661e33
Metric correlations ()
* initial attempt at metric correlations

* fix loop

* simplify struct

* change json

* get points from query

* comment

* dont lock the host as much

* add a configuration option to enable/disable metric correlations

* remove KSfbar from header file

* lock charts

* add timeout

* cast multiplication

* add licencing info

* better licencing

* use onewayalloc

* destroy owa
2022-05-04 13:59:58 +03:00
Costa Tsaousis
87c0cc2d60
One way allocator to double the speed of parallel context queries ()
* one way allocator to speed up context queries

* fixed a bug while expanding memory pages

* reworked for clarity and finally fixed the bug of allocating memory beyond the page size

* further optimize allocation step to minimize the number of allocations made

* implement strdup with memcpy instead of strcpy

* added documentation

* prevent an uninitialized use of owa

* added callocz() interface

* integrate onewayalloc everywhere - apart sql queries

* one way allocator is now used in context queries using archived charts in sql

* align on the size of pointers

* forgotten freez()

* removed not needed memcpys

* give unique names to global variables to avoid conflicts with system definitions
2022-05-03 00:31:19 +03:00
Stelios Fragkakis
fedab0d7fa
Add a chart label filter parameter in context data queries ()
* Add function to filter chart labels

* Add new parameter to filter chart labels on context queries

* Change swagger

* Better formatting for swagger
2022-04-12 08:18:40 +03:00
Stelios Fragkakis
81b3d4b71e
Add a timeout parameter to data queries ()
* Add timeout parameter in queries and in calling functions

* Add CANCEL flag in RRDR and code to cancel a query

* Update swagger

* Format swagger file properly
2022-04-11 22:34:04 +03:00
Emmanuel Vasilakis
2d293e2f87
Don't do fatal on error writing the health api management key. ()
* dont fatal on error writing the management key

* make message more descriptive
2022-04-07 19:53:38 +03:00
Ilya Mashchenko
c98fcf5754
feat: add support for cloud providers info to /api/v1/info () 2022-04-06 13:42:41 +03:00
Erdem Ergen
382e021d47
[Agent crash on api/v1/info call] - fixes ()
* [Agent crash on api/v1/info call] - fixes 
returns boolean values
updated swagger files
2022-03-31 00:02:41 +03:00
odynik
57d6c179c8
Stream compression - Deactivate compression at runtime in case of a compressor buffer overflow ()
* [Stream compression] Downgrade stream version if compressor buffer overflows
* [Stream compression] More user friendly compression messages
* [Stream compression] Fix mutex starvation
* [Stream compression] enable compression by default
* Update streaming/README.md

Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
2022-03-24 12:14:19 +02:00
Timotej S
f0afc344fa
adds node_id into mirrored_hosts list () 2022-03-03 13:56:53 +01:00
vkalintiris
69ea17d6ec
Track anomaly rates with DBEngine. ()
* Track anomaly rates with DBEngine.

This commit adds support for tracking anomaly rates with DBEngine. We
do so by creating a single chart with id "anomaly_detection.anomaly_rates" for
each trainable/predictable host, which is responsible for tracking the anomaly
rate of each dimension that we train/predict for that host.

The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly
rates charts. We use this flag to:

    - Disable exposing the anomaly rates charts through the functionality
      in backends/, exporting/ and streaming/.
    - Skip generation of configuration options for the name, algorithm,
      multiplier, divisor of each dimension in an anomaly rates chart.
    - Skip the creation of health variables for anomaly rates dimensions.
    - Skip the chart/dim queue of ACLK.
    - Post-process the RRDR result of an anomaly rates chart, so that we can
      return a sorted, trimmed number of anomalous dimensions.

In a child/parent configuration where both the child and the parent run
ML for the child, we want to be able to stream the rest of the ML-related
charts to the parent. To be able to do this without any chart name collisions,
the charts are now created on localhost and their IDs and titles have the node's
machine_guid and hostname as a suffix, respectively.

* Fix exporting_engine tests.

* Restore default ML configuration.

The reverted changes where meant for local testing only. This commit
restores the default values that we want to have when someone runs
anomaly detection on their node.

* Set context for anomaly_detection.* charts.

* Check for anomaly rates chart only with a valid pointer.

* Remove duplicate code.

* Use a more descriptive name for id/title pair variable
2022-02-24 10:57:30 +02:00
avstrakhov
b003e5fd40
Add code for LZ4 streaming data compression ()
* Add code for LZ4 streaming data compression

* Fix LGTM alert

* Add lz4 library for link when compression enabled

* Add LZ4_resetStream_fast presence detection

* Disable compression for older LZ4 libraries

* Correct LZ4 API check

* [Testing Stream Compression] Debug msgs and report.md

* Add LZ4 library version using LZ4_initStream

* Fixed bug in SSL mode

* [Testing compression] - Add compression info messages

* Set compression enabled by default, update doc

* Update streaming/README.md

Co-authored-by: DShreve2 <david@netdata.cloud>

* [Agent Negotiation] Compression as separate capability

* [Agent Negotiation] Compression as separate capability - default compression variable always active

* Add code to negotiate compression

* [Agent Negotiation] Based on stream version

* [Agent Negotiation] Version based - fix compilation error

* [Agent Negotiation] Fix glob var default_compression_enbaled=0 affects all the connections - Handle compression - stream version based

* [Agent Negotiation - Compression] - Add control flag in 1. sender/receiver state & 2. stream.conf per child

* [Agent Negotiation - Compression] Fix stream.conf key, mguid control

* [Agent Negotiate Compression] Fine control on stream.conf per key,mguid for each child

* [Agent Negotiation Compression] Stop destroying compressor for runtime configuration + Update Readme.md

* [Agent Negotiation Compression] Use stream_version 4 if compression is disabled

* Correct child's compression check

* [Agent Negotiation Compression] Create streaming compression section in docs.

* [Agent Negotiation Compresion] Remove redundant debug msgs

* [Stream Compression] - integrate compression build info & config info in api/v1/info endpoint.

* [Agent Negotiation] Finalize README.md

* [Agent Stream Compression] Fix buildinfo json, Finalize readme.md

* [Agent Stream Compression] Negotiate compression based on stream version

* [Agent Stream Compression] Stream compression control per child in stream.conf |  per AP_KEY, MACHINE_GUID

* [Agent Stream Compression] Avoid destroying compressor enabling runtime configuration + Update Readme.md

* [Agent Stream Compression] - Provide stream compression build info & config info in api/v1/info endpoint + Update Readme.md

* [Agent Stream Compression] Fix rebase conflicts

* [Agent Stream Compression] Fix more rebase conflicts

* [Agent Stream Compression] 1. Stream version based negotiation 2. per child stream.conf control 3. finalize docs 4. stream compression build info in web api

* [Agent Stream Compression] 1. Stream version based negotiation 2. per child stream.conf control 3. finalize docs 4. stream compression build info in web api

* [Agent Stream Compression] Change unsuccessful buffer check to error

* [Agent Stream Compression] Readme.md proof-read corrections, downgrade to stream_version_clabels, add shields for supported versions, EOF lint

* [Agent Stream Compression] Fix missed lz4 library on Alpine Linux

* Phrasal review

Co-authored-by: odynik <odynik.ee@gmail.com>
Co-authored-by: DShreve2 <david@netdata.cloud>
Co-authored-by: Tina Lüdtke <tina@kickoke.com>
2022-01-19 17:57:49 +02:00
Emmanuel Vasilakis
d229783de4
Send the cloud protocol used to posthog ()
* send analytics for cloud protocol used

* add aclk-available-protocol to api/v1/info

* fix build with --disable-cloud

* remove aclk_legacy and aclk_ng define checks
2022-01-13 12:22:15 +02:00
Vladimir Kobal
4919103c4b
Fix time_t format () 2022-01-11 13:12:09 +02:00
Timotej S
5736b4bcb1
Removes ACLK Legacy ()
* remove legacy from makefiles
* remove ACLK Legacy from installer
* remove ACLK Legacy from configure.ac
* remove legacy from cmake
* aclk api cleanup
* remove legacy files from packaging
* changes for CI from Austin
2022-01-04 10:11:04 +01:00
vkalintiris
5049de8f7a
Provide runtime ml info from a new endpoint. ()
* Provide runtime ml info from a new endpoint.

* Add hosts & charts to skip from ML in the /info endpoint.

This information belongs in /info, and not in /ml-info, because the
value of these variables can not change at the agent's runtime.

* Use strdupz instead of strdup.
2021-12-22 17:13:45 +02:00
Emmanuel Vasilakis
95a2f16795
Only add comma in api/v1/info if ml-info is going to be printed () 2021-11-02 10:15:22 +02:00
vkalintiris
9ed4cea590
Anomaly Detection MVP ()
* Add support for feature extraction and K-Means clustering.

This patch adds support for performing feature extraction and running the
K-Means clustering algorithm on the extracted features.

We use the open-source dlib library to compute the K-Means clustering
centers, which has been added as a new git submodule.

The build system has been updated to recognize two new options:

    1) --enable-ml: build an agent with ml functionality, and
    2) --enable-ml-tests: support running tests with the `-W mltest`
       option in netdata.

The second flag is meant only for internal use. To build tests successfully,
you need to install the GoogleTest framework on your machine.

* Boilerplate code to track hosts/dims and init ML config options.

A new opaque pointer field is added to the database's host and dimension
data structures. The fields point to C++ wrapper classes that will be used
to store ML-related information in follow-up patches.

The ML functionality needs to iterate all tracked dimensions twice per
second. To avoid locking the entire DB multiple times, we use a
separate dictionary to add/remove dimensions as they are created/deleted
by the database.

A global configuration object is initialized during the startup of the
agent. It will allow our users to specify ML-related configuration
options, eg. hosts/charts to skip from training, etc.

* Add support for training and prediction of dimensions.

Every new host spawns a training thread which is used to train the model
of each dimension.

Training of dimensions is done in a non-batching mode in order to avoid
impacting the generated ML model by the CPU, RAM and disk utilization of
the training code itself.

For performance reasons, prediction is done at the time a new value
is pushed in the database. The alternative option, ie. maintaining a
separate thread for prediction, would be ~3-4x times slower and would
increase locking contention considerably.

For similar reasons, we use a custom function to unpack storage_numbers
into doubles, instead of long doubles.

* Add data structures required by the anomaly detector.

This patch adds two data structures that will be used by the anomaly
detector in follow-up patches.

The first data structure is a circular bit buffer which is being used to
count the number of set bits over time.

The second data structure represents an expandable, rolling window that
tracks set/unset bits. It is explicitly modeled as a finite-state
machine in order to make the anomaly detector's behaviour easier to test
and reason about.

* Add anomaly detection thread.

This patch creates a new anomaly detection thread per host. Each thread
maintains a BitRateWindow which is updated every second based on the
anomaly status of the correspondent host.

Based on the updated status of the anomaly window, we can identify the
existence/absence of an anomaly event, it's start/end time and the
dimensions that participate in it.

* Create/insert/query anomaly events from Sqlite DB.

* Create anomaly event endpoints.

This patch adds two endpoints to expose information about anomaly
events. The first endpoint returns the list of anomalous events within a
specified time range. The second endpoint provides detailed information
about a single anomaly event, ie. the list of anomalous dimensions in
that event along with their anomaly rate.

The `anomaly-bit` option has been added to the `/data` endpoint in order
to allow users to get the anomaly status of individual dimensions per
second.

* Fix build failures on Ubuntu 16.04 & CentOS 7.

These distros do not have toolchains with C++11 enabled by default.
Replacing nullptr with NULL should be fix the build problems on these
platforms when the ML feature is not enabled.

* Fix `make dist` to include ML makefiles and dlib sources.

Currently, we add ml/kmeans/dlib to EXTRA_DIST. We might want to
generate an explicit list of source files in the future, in order to
bring down the generated archive's file size.

* Small changes to make the LGTM & Codacy bots happy.

- Cast unused result of function calls to void.
- Pass a const-ref string to Database's constructor.
- Reduce the scope of a local variable in the anomaly detector.

* Add user configuration option to enable/disable anomaly detection.

* Do not log dimension-specific operations.

Training and prediction operations happen every second for each
dimension. In prep for making this PR easier to run anomaly detection
for many charts & dimensions, I've removed logs that would cause log
flooding.

* Reset dimensions' bit counter when not above anomaly rate threshold.

* Update the default config options with real values.

With this patch the default configuration options will match the ones
we want our users to use by default.

* Update conditions for creating new ML dimensions.

1. Skip dimensions with update_every != 1,
2. Skip dimensions that come from the ML charts.

With this filtering in place, any configuration value for the
relevant simple_pattern expressions will work correctly.

* Teach buildinfo{,json} about the ML feature.

* Set --enable-ml by default in the configuration options.

This patch is only meant for testing the building of the ML functionality
on Github. It will be reverted once tests pass successfully.

* Minor build system fixes.

- Add path to json header
- Enable C++ linker when ML functionality is enabled
- Rename ml/ml-dummy.cc to ml/ml-dummy.c

* Revert "Set --enable-ml by default in the configuration options."

This reverts commit 28206952a59a577675c86194f2590ec63b60506c.

We pass all Github checks when building the ML functionality, except for
those that run on CentOS 7 due to not having a C++11 toolchain.

* Check for missing dlib and nlohmann files.

We simply check the single-source files upon which our build system
depends. If they are missing, an error message notifies the user
about missing git submodules which are required for the ML
functionality.

* Allow users to specify the maximum number of KMeans iterations.

* Use dlib v19.10

v19.22 broke compatibility with CentOS 7's g++. Development of the
anomaly detection used v19.10, which is the version used by most Debian and
Ubuntu distribution versions that are not past EOL.

No observable performance improvements/regressions specific to the K-Means
algorithm occur between the two versions.

* Detect and use the -std=c++11 flag when building anomaly detection.

This patch automatically adds the -std=c++11 when building netdata
with the ML functionality, if it's supported by the user's toolchain.

With this change we are able to build the agent correctly on CentOS 7.

* Restructure configuration options.

- update default values,
- clamp values to min/max defaults,
- validate and identify conflicting values.

* Add update_every configuration option.

Considerring that the MVP does not support per host configuration
options, the update_every option will be used to filter hosts to train.

With this change anomaly detection will be supported on:

    - Single nodes with update_every != 1, and
    - Children nodes with a common update_every value that might differ from
      the value of the parent node.

* Reorganize anomaly detection charts.

This follows Andrew's suggestion to have four charts to show the number
of anomalous/normal dimensions, the anomaly rate, the detector's window
length, and the events that occur in the prediction step.

Context and family values, along with the necessary information in the
dashboard_info.js file, will be updated in a follow-up commit.

* Do not dump anomaly event info in logs.

* Automatically handle low "train every secs" configuration values.

If a user specifies a very low value for the "train every secs", then
it is possible that the time it takes to train a dimension is higher
than the its allotted time.

In that case, we want the training thread to:

    - Reduce it's CPU usage per second, and
    - Allow the prediction thread to proceed.

We achieve this by limiting the training time of a single dimension to
be equal to half the time allotted to it. This means, that the training
thread will never consume more than 50% of a single core.

* Automatically detect if ML functionality should be enabled.

With these changes, we enable ML if:

    - The user has not explicitly specified --disable-ml, and
    - Git submodules have been checked out properly, and
    - The toolchain supports C++11.

If the user has explicitly specified --enable-ml, the build fails if
git submodules are missing, or the toolchain does not support C++11.

* Disable anomaly detection by default.

* Do not update charts in locked region.

* Cleanup code reading configuration options.

* Enable C++ linker when building ML.

* Disable ML functionality for CMake builds.

* Skip LGTM for dlib and nlohmann libraries.

* Do not build ML if libuuid is missing.

* Fix dlib path in LGTM's yaml config file.

* Add chart to track duration of prediction step.

* Add chart to track duration of training step.

* Limit the number dimensions in an anomaly event.

This will ensure our JSON results won't grow without any limit. The
default ML configuration options, train approximately ~1700 dimensions
in a newly-installed Netdata agent. The hard-limit is set to 2000
dimensions which:

    - Is well above the default number of dimensions we train,
    - If it is ever reached it means that the user had accidentaly a
      very low anomaly rate threshold, and
    - Considering that we sort the result by anomaly score, the cutoff
      dimensions will be the less anomalous, ie. the least important to
      investigate.

* Add information about the ML charts.

* Update family value in ML charts.

This fix will allow us to show the individual charts in the RHS Anomaly
Detection submenu.

* Rename chart type

s/anomalydetection/anomaly_detection/g

* Expose ML feat in /info endpoint.

* Export ML config through /info endpoint.

* Fix CentOS 7 build.

* Reduce the critical region of a host's lock.

Before this change, each host had a single, dedicated lock to protect
its map of dimensions from adding/deleting new dimensions while training
and detecting anomalies. This was problematic because training of a
single dimension can take several seconds in nodes that are under heavy
load.

After this change, the host's lock protects only the insertion/deletion
of new dimensions, and the prediction step. For the training of dimensions
we use a dedicated lock per dimension, which is responsible for protecting
the dimension from deletion while training.

Prediction is fast enough, even on slow machines or under heavy load,
which allows us to use the host's main lock and avoid increasing the
complexity of our implementation in the anomaly detector.

* Improve the way we are tracking anomaly detector's performance.

This change allows us to:

    - track the total training time per update_every period,
    - track the maximum training time of a single dimension per
      update_every period, and
    - export the current number of total, anomalous, normal dimensions
      to the /info endpoint.

Also, now that we use dedicated locks per dimensions, we can train under
heavy load continuously without having to sleep in order to yield the
training thread and allow the prediction thread to progress.

* Use samples instead of seconds in ML configuration.

This commit changes the way we are handling input ML configuration
options from the user. Instead of treating values as seconds, we
interpret all inputs as number of update_every periods. This allows
us to enable anomaly detection on hosts that have update_every != 1
second, and still produce a model for training/prediction & detection
that behaves in an expected way.

Tested by running anomaly detection on an agent with update_every = [1,
2, 4] seconds.

* Remove unecessary log message in detection thread

* Move ML configuration to global section.

* Update web/gui/dashboard_info.js

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>

* Fix typo

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>

* Rebase.

* Use negative logic for anomaly bit.

* Add info for prediction_stats and training_stats charts.

* Disable ML on PPC64EL.

The CI test fails with -std=c++11 and requires -std=gnu++11 instead.
However, it's not easy to quickly append the required flag to CXXFLAGS.
For the time being, simply disable ML on PPC64EL and if any users
require this functionality we can fix it in the future.

* Add comment on why we disable ML on PPC64EL.

Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>
2021-10-27 09:26:21 +03:00
Emmanuel Vasilakis
5377adb065
Fix warnings from -Wformat-truncation=2 ()
* mark host as UNUSED

* use snprintfz instead of snprintf. removes warning: %s directive output between 0 and 4096 bytes may exceed minimum required size of 4095

* increase length to 22 to include full int length. stops warning %d directive output may be truncated writing between 1 and 11 bytes into a region of size 5

* increase buffers to stop warning %0.1f directive output may be truncated writing between 3 and 312 bytes into a region of size 100

* use sprintfz
2021-10-22 15:56:46 +03:00
Timotej S
dad48421a6
Makes New Cloud architecture optional for ACLK-NG ()
ACLK-NG supports both new and old cloud protocol. Protobuf and C++ compiler are required only for new cloud protocol.
There is no reason to skip building whole ACLK-NG when protobuf is missing.
2021-09-29 17:53:53 +02:00
Timotej S
96bef38147
Adds api/v1/aclk call to webserver () 2021-09-29 14:16:42 +02:00
Stelios Fragkakis
59394b5f9d
Add hop count for children () 2021-07-07 13:23:45 +03:00
Timotej S
59af90b08c
Allows ACLK NG and Legacy to coexist () 2021-06-14 10:38:58 +02:00
Emmanuel Vasilakis
e9ccc75a45
Provide more agent analytics to posthog ()
* Move statistics related functions to analytics.c

* error message change, space added after if

* start an analytics thread

* use heartbeat instead of sleep

* add late enviroment (after rrdinit) pick of some attributes

* change loop

* re-enable info messages

* remove possible new line

* log and report hits on allmetrics pages. detect if exporting engines are enabled/in use, and report them

* use lowercase for analytics variables

* add collectors

* add buildinfo

* more attributes from late environment

* add new attributes to v1/info

* re-gather meta data before exit. update allmetrics counters to be available in v1/info

* log hits to dashboard

* add mirrored hosts

* added notification methods

* fix spaces, proper JSON naming

* add alerts, charts and metrics count

* more attributes

* keep the thread up, and report a meta event every 2 hours

* small formating changes. Disable analytics_log_prometheus when for unit testing. Add the new attributes to the anonymous-statistics.sh.in script

* applied clang-format

* dont gather data again on exit

* safe buffer length in snprintfz

* add rrdset lock

* remove show_archived

* remove setenv

* calculate lengths during sets
2021-04-27 10:11:20 +03:00
Emmanuel Vasilakis
3d571ebb44
Revert "Provide more agent analytics to posthog ()" ()
This reverts commit a1ce482f3e.
2021-04-21 20:53:12 +03:00
Emmanuel Vasilakis
a1ce482f3e
Provide more agent analytics to posthog ()
* Move statistics related functions to analytics.c

* error message change, space added after if

* start an analytics thread

* use heartbeat instead of sleep

* add late enviroment (after rrdinit) pick of some attributes

* change loop

* re-enable info messages

* remove possible new line

* log and report hits on allmetrics pages. detect if exporting engines are enabled/in use, and report them

* use lowercase for analytics variables

* add collectors

* add buildinfo

* more attributes from late environment

* add new attributes to v1/info

* re-gather meta data before exit. update allmetrics counters to be available in v1/info

* log hits to dashboard

* add mirrored hosts

* added notification methods

* fix spaces, proper JSON naming

* add alerts, charts and metrics count

* more attributes

* keep the thread up, and report a meta event every 2 hours

* small formating changes. Disable analytics_log_prometheus when for unit testing. Add the new attributes to the anonymous-statistics.sh.in script

* applied clang-format

* dont gather data again on exit

* safe buffer length in snprintfz

* add rrdset lock

* remove show_archived
2021-04-21 18:24:51 +03:00
Emmanuel Vasilakis
5510b429a6
Add a new parameter 'chart' to the /api/v1/alarm_log. ()
* add a chart parameter to api alarm_log

* Use hash_chart instead

* also do the strcmp

* cleaner?

* save an if

* move simple_hash out of the loop

* Changed if

* formatting changes

* fix formating
2021-03-26 11:34:23 +02:00
Stelios Fragkakis
c527863c53
Fix agent crash when executing data query with context and non-existing chart_label_key () 2021-03-24 11:30:43 +02:00
Stelios Fragkakis
65bc43d9cb
Add data query support for archived charts () 2021-03-22 09:47:22 +02:00
Timotej S
e7e5d0c372
Adds ACLK-NG as fallback()
* adds a new implementation of ACLK written almost from scratch
* external dependencies only OpenSSL and JSON-C
* fallback for systems where ACLK Legacy can't build (for technical or philosophical reasons)
* can be forced to build by giving "--aclk-ng" to the installer
2021-03-16 12:38:16 +01:00
Ilya Mashchenko
10b745cba8
add _is_k8s_node label to the host labels ()
* auto format system-info.sh

* detect whether the node is k8s node in system-info.sh

* fix unmae=>uname

* add_is_k8s_node_to_host_labels: Add new variable to structure

* add_is_k8s_node_to_host_labels: Add is_k8_node to labels

* add_is_k8s_node_to_host_labels: Add is_k8_node inside endpoint

* add_is_k8s_node_to_host_labels: Add data to swagge file

* change yes/no to true/false

* Update web/api/netdata-swagger.json

* add_is_k8s_node_to_host_labels: Add anonymous statistic

* add_is_k8s_node_to_host_labels: Add information to using-host-labels.md

* add_is_k8s_node_to_host_labels: Add variable to stream

* add_is_k8s_node_to_host_labels: Change swagger.yaml

* add_is_k8s_node_to_host_labels: Adding missing documentation

* add_is_k8s_node_to_host_labels: rename variable

* add_is_k8s_node_to_host_labels: Rename lables to match variable names

* add_is_k8s_node_to_host_labels: Add to wget

* add_is_k8s_node_to_host_labels: Add content to swagger files

* add_is_k8s_node_to_host_labels: update both swagger files

* add_is_k8s_node_to_host_labels: fix wrong exportation

Co-authored-by: Thiago Marques <thiagoftsm@gmail.com>
2021-01-18 07:44:14 -05:00
Stelios Fragkakis
cd443de780
Support multiple chart label keys in data queries () 2021-01-14 18:50:33 +02:00
Ilya Mashchenko
0f8175dd30
Kubernetes labels ()
Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2020-12-14 17:27:55 +03:00
Timotej S
f1db235a36
ACLK Child Availability Messages ()
* new ACLK messages for Claiming MVP1
2020-11-26 17:26:01 +01:00
Stelios Fragkakis
f8f4aed80d
Fixed the data endpoint so that the context param is correctly applied to children () 2020-11-26 13:00:32 +02:00
Stelios Fragkakis
e9d59e37d9
Migrate metadata log to SQLite () 2020-11-24 20:00:02 +02:00
Chris Akritidis
9b1e7aa937
Don't cache registry responses ()
Summary
Stop caching responses from the registry, in order to correctly record machine and person GUIDs

Component Name
web

Test Plan
Verify that the registry doesn't cache responses (expires header is lower than 1 day in the future).
2020-11-03 04:54:56 -08:00
Stelios Fragkakis
f3d00f79ab
Added new data query option "allow_past" () 2020-10-22 21:43:20 +03:00
Stelios Fragkakis
ce79c3d7a8
Fixed the data endpoint to prioritize chart over context if both are present () 2020-10-06 12:58:55 +03:00
Stelios Fragkakis
8067642361
Improved the data query when using the context parameter () 2020-09-24 13:05:15 +03:00
Stelios Fragkakis
8f6f1baf9a
Added context parameter to the data endpoint ()
Added functionality to support composite charts
2020-09-15 19:41:39 +03:00
Timotej S
ab7ff3131f
Adds claimed_id streaming ()
* streams claimed_id of child nodes to parents
* adds this information into /api/v1/info
2020-08-26 14:50:37 +02:00
Stelios Fragkakis
eda12f579f
Implemented multihost database ()
* Hard code a node for non-legacy multidb test
Skip dbengine initialization for new incoming children
Add code to switch to multidb ctx when accessing the dbengine

* When a non-legacy streaming connection is detected, use the multidb metadata log context

* Clear the superblock memory to avoid random data written in the metadata log

* Activate the host detection during compaction
Activate the host detection during metadata log chart updates
Keep the host in the user object during replay of the HOST command

* Add defaults for health / rrdpush on HOST metadata replay
Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child()

Use defaults from the stream.conf

* Count hosts only if not archived
When host switches from archived to active update rrd_hosts_available
Remove archived hosts from charts and info

* Change parameter from "multidb disk space" to "dbengine multihost disk space"
Remove unused variables
Fix compilation error when dbengine is disabled
Fix condition for machine_guid directory creation under cache_dir

* Enable multidb disk space file creation.

* Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine.

* Fix old bug in the code that confused obsolete hosts with orphan hosts.

* Do not delete multi-host DB host files.

* Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies.

* Identify metadata that collide with non-dbengine memory mode hosts and ignore them.

* Handle non-dbengine localhost with dbengine archived charts in localhost and streaming.

* Ignore archived hosts in streaming.

* Add documentation before merging to master.

Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-07-28 15:04:39 +03:00
Stelios Fragkakis
1bd8a25544
Add support for persistent metadata ()
* Implemented collector metadata logging 
* Added persistent GUIDs for charts and dimensions
* Added metadata log replay and automatic compaction
* Added detection of charts with no active collector (archived)
* Added new endpoint to report archived charts via `/api/v1/archivedcharts`
* Added support for collector metadata update

Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-06-12 10:35:17 +03:00
Andrew Moss
53efa359d6
Regenerate topic base on connect ()
Allow agents to be reclaimed while they are running. Fix a race hazard between claiming and the ACLK. Changes the private key, base topic, username and contents of the LWT.

Co-authored-by: <hilari@hilarimoragrega.com>
2020-05-20 16:28:45 +02:00
Andrew Moss
aa3ec552c8
Enable support for Netdata Cloud.
This PR merges the feature-branch to make the cloud live. It contains the following work:
Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com(opens in new tab)>
Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)>
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud(opens in new tab)>
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com(opens in new tab)>
Co-authored-by: Timotej S <6674623+underhood@users.noreply.github.com(opens in new tab)>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com(opens in new tab)>
* dashboard with new navbars, v1.0-alpha.9: PR 
* dashboard v1.0.11: 
Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)>
* Added installer code to bundle JSON-c if it's not present. PR 
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
* Fix claiming config PR 
* Adds JSON-c as hard dep. for ACLK PR 
* Fix SSL renegotiation errors in old versions of openssl. PR . Also - we have a transient problem with opensuse CI so this PR disables them with a commit from @prologic.
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
* Fix claiming error handling PR 
* Added CI to verify JSON-C bundling code in installer PR 
* Make cloud-enabled flag in web/api/v1/info be independent of ACLK build success PR 
* Reduce ACLK_STABLE_TIMEOUT from 10 to 3 seconds PR 
* remove old-cloud related UI from old dashboard (accessible now via /old suffix) PR 
* dashboard v1.0.13 PR 
* dashboard v1.0.14 PR 
* Provide feedback on proxy setting changes PR 
* Change the name of the connect message to update during an ongoing session PR 
* Fetch active alarms from alarm_log PR 
2020-05-11 16:37:27 +10:00
Andrew Moss
8fe7485a60
Switching over to soft feature flag ()
Preparing for the cloud release. This changes how we handle the feature flag so that it no longer requires installer switches and can be set from the config file. This still requires internal access to use and is not ready for public access yet.
2020-03-31 21:19:34 +02:00
Andrew Moss
d6f0703a09
Updating the info endpoint for cloud notifications () 2020-03-30 15:03:24 +02:00
Stelios Fragkakis
78c3f35af8
Improved ACLK ()
Improved the stability of the ACLK
2020-03-26 18:02:21 +02:00
Andrew Moss
806b6fadfd
Fix the new cloud info in the info endpoint () 2020-03-19 10:27:53 +02:00