0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-22 20:42:33 +00:00
Commit graph

32 commits

Author SHA1 Message Date
vkalintiris
afae8971f0
Revert "Configurable storage engine for Netdata agents: step 3 ()" ()
This reverts commit 100a12c6cc.

A couple parent/child startup/shutdown scenarios can lead to crashes.
2022-06-17 14:59:35 +03:00
Adrien Béraud
100a12c6cc
Configurable storage engine for Netdata agents: step 3 ()
* storage engine: add host context API

Add a new API to allow storage engines to manage host contexts.
* Replace single global context with per-engine global context
* Context is full managed by storage engines: a storage engine
  can use no context, a global engine context, per host contexts,
  or a mix of these.
* Currently, only dbengine uses contexts.
  Following the current logic, legacy hosts use their own context,
  while non-legacy hosts share the global context.

* storage engine: use empty function instead of null for context ops

* rrdhost: don't check return value for void call

* rrdhost: create context with host

* storage engine: move rrddim ops to rrddim_mem.{c,h}

* storage engine: don't use NULL for end-of-list marker

* storage engine: fallback to default engine
2022-06-16 16:53:35 +03:00
Costa Tsaousis
1b0f6c6b22
Labels with dictionary ()
* squashed and rebased to master

* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h

* added unittest for UTF-8 multibyte sanitization

* Fix unit test compilation

* Fix CMake build

* remove double sanitizer for opentsdb; cleanup sanitize_json_string()

* rename error_description to error_message to avoid conflict with json-c

* revert last and undef error_description from json-c

* more unittests; attempt to fix protobuf map issue

* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer

* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted

* better sorted dictionary checking

* proper unittesting for sorted dictionaries

* call dictionary deletion callback when destroying the dictionary

* remove obsolete variable

* Fix exporting unit tests

* Fix k8s label parsing test

* workaround for cmocka and strdupz()

* Bypass cmocka memory allocation check

* Revert "Bypass cmocka memory allocation check"

This reverts commit 4c49923839.

* Revert "workaround for cmocka and strdupz()"

This reverts commit 7bebee0480.

* Bypass cmocka memory allocation checks

* respect json formatting for chart labels

* cloud sends colons

* print the value only once

* allow parenthesis in values and spaces; make stream sender send quotes for values

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-13 20:35:45 +03:00
Stelios Fragkakis
b8958eb8ad
Fix locking access to chart labels ()
No write lock required
2022-06-03 15:24:55 +03:00
Stelios Fragkakis
c261a771cc
Schedule retention message calculation to a worker thread ()
* Move aclk_update_retention to the proper header file

* Do a scan but avoid going through all the dimensions if we have too much to delete -- do not generate a retention message in that case

* Schedule the retention calculation to a worker

* Adjust messages in the access log

* Fix compilation errors with --disable-cloud
2022-06-01 19:10:32 +03:00
Stelios Fragkakis
665f7ba25b
When sending a dimension for the first time, make sure there is a non zero created_at timestamp () 2022-05-31 12:25:39 +03:00
Stelios Fragkakis
4db41c80be
Defer the dimension payload check to the ACLK sync thread ()
Defer payload check to the aclk sync thread
2022-05-18 21:11:27 +03:00
Stelios Fragkakis
3b8d4c21e5
Adjust the dimension liveness status check ()
* Mark a chart to be exposed only if dimension is created or metadata changes

* Add a calculate liveness for the dimension for collected to non collected (live -> stale) and vice versa

* queue_dimension_to_aclk will have the rrdset and either 0 or last collected time
  If 0 then it will be marked as live else it will be marked as stale and last collected time will be sent to the cloud

* Add an extra parameter to indicate if the payload check should be done in the database or it has been done already

* Queue dimension sets dimension liveness and queues the exact payload to store in the database

* Fix compilation error when --disable-cloud is specified
2022-05-17 16:58:49 +03:00
Stelios Fragkakis
7bba071aec
Fix the log entry for incoming cloud start streaming commands ()
Add the correct requested chart sequence id from the cloud and also record the local one we have
2022-05-16 12:38:38 +03:00
Stelios Fragkakis
0b3ee50c76
Resolve coverity issues ()
- Variable "hostname" going out of scope leaks the storage it points to.
- Null-checking "rd->name" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
2022-05-09 10:47:58 +03:00
Stelios Fragkakis
154cf74d6a
Improve agent cloud chart synchronization ()
* Try to queue dimension always when:
 Trying to clean obsolete charts
 If chart has been sent and liveness apparently changed

* delay rotation and skip chart check if not send to cloud

* No need to CLEAR flag during database rotation
Do not clear chart ACLK status for dimension requests

* Change payload_sent to return timestamp of submitted message

* Clear the dimension ACLK flag if we are processing all the charts again

* Check if dimension is already queued to ACLK and ignore it
If queue fails then reset it to retry
Already try to queue the dimension

* Improve dimension cleanup during the retention message calculation

* Change queue_dimension_to_aclk to return void

* If no time range for this dimension then assume it is deleted

* Start streaming for inactive nodes

* Remove dead code

* Correctly report hostname in the access log

* Schedule a dimension deletion without trying to submit a message immediately

* Enable dimension cleanup -- also delete dimension if not found in the dbengine files
Free hostname
2022-05-03 21:38:12 +03:00
Stelios Fragkakis
e816ee4923
Fix issue with charts not properly synchronized with the cloud ()
* Add function to check a specific chart

* If a chart is not obsoleted, check if the liveness needs to be updated

* Calculate liveness based on a (constant * update_every) for each dimension

* Scan all dimensions when the retention message is constructed and update liveness if needed

* If initial state, set to computed live

* Set computed live state to dimension

* Add a maximum dimension cleanup on startup to prevent message flood

* Schedule chart updates if charts streaming is enabled

* Adjust live state for dimension

* The query executed will have a valid dimension uuid only if memory mode is dbengine
2022-04-01 18:12:50 +03:00
Stelios Fragkakis
5a944497d3
Improve ACLK sync logging ()
* Switch messages to ACLK RES, ACLK REQ, ACLK STA instead of OG, IN and just AC

* Lookup hostname by node id

* Record hostname when receiving an ACK for a chart sequence

* Additional log_access info

* Adjust log message when receing health log request

* Remove redundant ACK log message

* Remove duplicate log message

* Remove duplicate sql statements

* Rearrange variable definition for clarity

* Make sure node is a valid UUID (check return code)
2022-03-31 21:30:02 +03:00
Emmanuel Vasilakis
026a875146
Replace write with read locks () 2022-03-10 15:29:34 +02:00
Stelios Fragkakis
a706491f77
Improve agent to cloud synchronization performance ()
* Switch to prepare statement when storing active charts / dimensions

* Switch to prepare statement when storing chart labels

* Switch to prepare statement when doing a node id lookup

* Switch to prepare statement when loading the node id for a host

* Improve performance by avoiding db query

* Use prepare statement when counting pending chart messages to send to the cloud

* Delay locking while preparing commands

* No need to use buffer, avoid memory allocation overhead

* Switch to prepare statement when loading pending chart updates to send to the cloud
2022-03-09 19:54:58 +02:00
Timotej S
d8aba23d0f
Adds more info to aclk-state API call () 2022-03-09 14:08:20 +01:00
Stelios Fragkakis
6872df9e6a
Adjust cloud dimension update frequency ()
* Queue a chart immediately to the cloud

* Do not inform the cloud immediately if a dimension stopped collecting use MAX(obsoletion time, 1.5 * update_every)

* Notify cloud immediately on dimension deletion

* Add debug messages

* Do not schedule an update if we are shutting down
2022-03-08 20:06:30 +02:00
Emmanuel Vasilakis
bf023b50fe
Try to find worker thread from parked ones () 2022-01-11 15:42:24 +02:00
Vladimir Kobal
3ba9dc6cf0
Fix compilation warnings () 2022-01-10 15:17:45 +02:00
Emmanuel Vasilakis
d13b918ad0
Δont use wc στρθψτ if it may not exist () 2021-11-22 20:58:05 +02:00
Stelios Fragkakis
454387fcf4
Cleanup compilation warnings ()
* Fix compilation warnings (variables used when debugging is enabled using NETDATA_INTERNAL_CHECKS)
* Fix compilation warning (casting)
2021-11-19 22:12:29 +02:00
Stelios Fragkakis
11b8588c94
Fix coverity issues ()
* Add check for NULL wc->host

* Use sqlite3_exec, if it fails it will be retried on the next health log entries rotation
2021-11-19 16:56:51 +02:00
Emmanuel Vasilakis
dc42e45c6a
Add some logging for cloud new architecture to access.log ()
* add some logging for ng arch to access.log

* change arrows to IN, OG, AC

* log also the params for aclk requests

* check for wc->host before using wc->host->hostname

* turn two messages to info

* reduce alert event logs

* used thread local variables
2021-11-18 11:56:49 +02:00
Stelios Fragkakis
512e98a397
Remove feature flag and commented out code () 2021-11-11 14:56:55 +02:00
Stelios Fragkakis
a2852377d0
Store and submit dimension delete messages for new cloud architecture ()
* Enhance the dimension delete table and adjust the trigger to include chart_id and host_id
* Add the aclk_process_dimension_deletion function
* Change variable chart_name in aclk_upd_dimension_event (it is st->id from st.type dot st.id)

* Process dimension deletion when retention updates are sent

* Do not send charts if we don't have dimensions

* Add check for uuid_parse return code
2021-11-09 21:25:04 +02:00
Stelios Fragkakis
989c68cac8
Fix retention messages () 2021-11-08 14:29:34 +02:00
Stelios Fragkakis
e9efad18e8
Improve the ACLK sync process for the new cloud architecture ()
* Move retention code to the charts

* Log information about node registration and updates

* Prevent deadlock if aclk_database_enq_cmd locks for a node

* Improve message (indicate that it comes from alerts). This will be improved in a followup PR

* Disable parts that can't be used if the new cloud env is not available

* Set dimension FLAG if message has been queued

* Queue messages using the correct protocol enabled

* Cleanup unused functions
Rename functions that queue charts and dimensions
Improve the generic chart payload add function
Add a counter for pending charts/dimension payloads to avoid polling the db
Delay the retention update message until we are done with the updates
Fix full resync command to handle sequence_id = 0 correctly
Disable functions not needed when the new cloud env functionality is not compiled

* Add chart_payload count and retry count
Output information or error message if we fail to queue chart/dimension PUSH commands
Only try to queue commands if we have chart_payload_count>0
Remove the event loop shutdown opcode handle

* Improve detection of shutdown (check netdata_exit)

* Adjusting info messages
2021-11-03 19:18:35 +02:00
Stelios Fragkakis
12f16063f5
Enable additional functionality for the new cloud architecture () 2021-10-06 20:55:31 +03:00
Timotej S
dad48421a6
Makes New Cloud architecture optional for ACLK-NG ()
ACLK-NG supports both new and old cloud protocol. Protobuf and C++ compiler are required only for new cloud protocol.
There is no reason to skip building whole ACLK-NG when protobuf is missing.
2021-09-29 17:53:53 +02:00
Emmanuel Vasilakis
4ae3199311
Add alert message support for ACLK new architecture ()
* add alert messages

* also clear date_cloud_ack

* move buffer_create

* remove include file

* use wc->node_id
2021-09-23 17:34:34 +03:00
Stelios Fragkakis
dbbb553459
Address coverity report issues CID_373247-373251 ()
* Fix memory leak CID_373251

* Check return value CID_373248

* Check return code CID_373249

* Check return code CID_373250

* Initialize cmd CID_373249
2021-09-22 12:57:59 +03:00
Stelios Fragkakis
2085a518c3
Add chart message support for ACLK new architecture () 2021-09-21 22:37:12 +03:00