0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-22 04:31:08 +00:00
Commit graph

737 commits

Author SHA1 Message Date
Stelios Fragkakis
85f359fc26
Handle ephemeral hosts ()
* Handle ephemeral hosts

* Node empheral removal timeout 86400 seconds (1 day)

* Move config from health to global section

* Set a node to queryable false when it is ephemeral and is removed

* Log queryable. Send queryable=0 only when forcing host deletion (the node is ephemeral)

* Switch to "is ephemeral node"
Document stream.conf

* Unregister node id
2023-11-23 23:56:34 +02:00
Costa Tsaousis
3e508c8f95
New logging layer ()
* cleanup of logging - wip

* first working iteration

* add errno annotator

* replace old logging functions with netdata_logger()

* cleanup

* update error_limit

* fix remanining error_limit references

* work on fatal()

* started working on structured logs

* full cleanup

* default logging to files; fix all plugins initialization

* fix formatting of numbers

* cleanup and reorg

* fix coverity issues

* cleanup obsolete code

* fix formatting of numbers

* fix log rotation

* fix for older systems

* add detection of systemd journal via stderr

* finished on access.log

* remove left-over transport

* do not add empty fields to the logs

* journal get compact uuids; X-Transaction-ID header is added in web responses

* allow compiling on systems without memfd sealing

* added libnetdata/uuid directory

* move datetime formatters to libnetdata

* add missing files

* link the makefiles in libnetdata

* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses

* added stream receiver, sender, proc plugin and pluginsd log stack

* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup

* add message ids to streaming inbound and outbound connections

* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled

* send SIGABRT to external plugins only if we are not shutting down

* fix cross cleanup in pluginsd parser

* fatal when there is a stack error in logs

* compile netdata with -fexceptions

* do not kill external plugins with SIGABRT

* metasync info logs to debug level

* added severity to logs

* added json output; added options per log output; added documentation; fixed issues mentioned

* allow memfd only on linux

* moved journal low level functions to journal.c/h

* move health logs to daemon.log with proper priorities

* fixed a couple of bugs; health log in journal

* updated docs

* systemd-cat-native command to push structured logs to journal from the command line

* fix makefiles

* restored NETDATA_LOG_SEVERITY_LEVEL

* fix makefiles

* systemd-cat-native can also work as the logger of Netdata scripts

* do not require a socket to systemd-journal to log-as-netdata

* alarm notify logs in native format

* properly compare log ids

* fatals log alerts; alarm-notify.sh working

* fix overflow warning

* alarm-notify.sh now logs the request (command line)

* anotate external plugins logs with the function cmd they run

* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash

* reformatted alarm-notify logs

* unify cgroup-network-helper.sh

* added quotes around params

* charts.d.plugin switched logging to journal native

* quotes for logfmt

* unify the status codes of streaming receivers and senders

* alarm-notify: dont log anything, if there is nothing to do

* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available

* migrate cgroup-name.sh to new logging

* systemd-cat-native now supports messages with newlines

* socket.c logs use priority

* cleanup log field types

* inherit the systemd set INVOCATION_ID if found

* allow systemd-cat-native to send messages to a systemd-journal-remote URL

* log2journal command that can convert structured logs to journal export format

* various fixes and documentation of log2journal

* updated log2journal docs

* updated log2journal docs

* updated documentation of fields

* allow compiling without libcurl

* do not use socket as format string

* added version information to newly added tools

* updated documentation and help messages

* fix the namespace socket path

* print errno with error

* do not timeout

* updated docs

* updated docs

* updated docs

* log2journal updated docs and params

* when talking to a remote journal, systemd-cat-native batches the messages

* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote

* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"

This reverts commit b079d53c11.

* note about uncompressed traffic

* log2journal: code reorg and cleanup to make modular

* finished rewriting log2journal

* more comments

* rewriting rules support

* increased limits

* updated docs

* updated docs

* fix old log call

* use journal only when stderr is connected to journal

* update netdata.spec for libcurl, libpcre2 and log2journal

* pcre2-devel

* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse

* log2journal only on systems pcre2 is available

* ignore log2journal in .gitignore

* avoid log2journal on centos 7, amazonlinux 2 and opensuse

* add pcre2-8 to static build

* undo last commit

* Bundle to static

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add build deps for deb packages

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add dependencies; build from source

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Test build for amazon linux and centos expect to fail for suse

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* fix minor oversight

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Reorg code

* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

---------

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-11-22 10:27:25 +02:00
vkalintiris
b515c74228
Add support for gorilla pages for tier 0. ()
---------

Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
2023-11-21 21:42:59 +02:00
Stelios Fragkakis
85d4369435
Remove queue limit from ACLK sync event loop ()
Code cleanup
2023-11-21 12:00:51 +02:00
Stelios Fragkakis
d41bf12a2b
Switch alarm_log to use the buffer json functions ()
* Switch alarm_log to use the buffer json functions

* Remove commented out code

* Fix finalize when an object is not explicitly closed

* Use buffer_json_member_add_boolean
2023-11-13 16:08:31 +02:00
Stelios Fragkakis
4c867cb3a6
Switch charts / chart to use buffer json functions ()
Switch charts / chart to use buffer json
2023-11-08 16:15:54 +02:00
Ilya Mashchenko
10238fc52a
add rrddim_get_last_stored_value to simplify function code in internal collectors () 2023-11-07 13:53:07 +02:00
Stelios Fragkakis
c78566b2d0
Improve agent to cloud status update process ()
* Send node update info only if the host has finished replication

* Log number of hosts replicating / pending to load context

* Remove prefix (thread name is enough)
2023-11-07 11:39:11 +02:00
Stelios Fragkakis
240cf98375
Better database corruption detention during runtime ()
Detect database corruption doing query executing and schedule recovery on next restart
2023-11-07 09:30:09 +02:00
Stelios Fragkakis
2a6d0a35f2
Improve unittests ()
* Update flags during label copy
Update unittests

* Return proper value

* Improve double check in unittests
2023-11-06 21:39:40 +02:00
Costa Tsaousis
c881ddf969
give the streaming function to nightly users () 2023-11-06 17:22:15 +02:00
Stelios Fragkakis
537bab0f18
Keep precompiled statements for alarm log queries to improve performance ()
* Keep precompiled statements for alarm log queries to improve performance

* Check bind result
2023-11-06 08:53:28 +02:00
Costa Tsaousis
236c04b925
Systemd units function ()
* split systemd-journal.c

* split fstat caching

* split systemd-journal further

* working systemd-units function

* do not enable systemd-units when libsystemd does not provide the interface

* move the header to the right place

* mixed parantheses

* update codacy exlcusions

* update codacy exlcusions

* update codacy exlcusions

* added option to show show expanded filters by default

* keep the original extension and decode descriptions too

* updated systemd-units function to handle all known unit states

* dont show the path by default

* final touches

* remove trailing spaces
2023-11-02 16:29:55 +02:00
Stelios Fragkakis
fc86034c86
Fix journal file index when collision is detected ()
Add index properly if collision is detected
2023-11-02 00:03:30 +02:00
Stelios Fragkakis
2a7c09257b
Optimize database before agent shutdown ()
Optimize database before shutdown
2023-11-01 21:25:54 +02:00
Stelios Fragkakis
ca592a9630
Improve shutdown when collectors are active ()
* Collectors should not be running at this point, but allow shutdown to continue after several retries (workaround)

* Proceed with shutdown after 10 attempts
2023-11-01 16:22:51 +02:00
Stelios Fragkakis
661f2eb6c5
Improve dimension ML model load ()
* Prepare metadata sync thread cleanup earlier in the shutdown process

* Set flag for the dimensions that need ML MODEL load instead of queueing a message in the event loop

* Process the dimension ML load during the normal dimension metadata save loop

* Use spinlock for cmd queue / dequeue instead of mutex
Cleanup queue structure

* Remove old ML model load code

* Rebase and cleanup
2023-10-31 09:57:03 +02:00
Stelios Fragkakis
d93101b972
Fix label copy ()
Fix label copy, more details in unittest
2023-10-29 22:46:02 +02:00
Costa Tsaousis
532e9b3d8d
fix missing labels from parents ()
* maintain in /tmp/stream-receiver-X.txt a copy of metadata received

* stream log metadata to /tmp/stream-sender-localhost.txt

* log the stream of all senders

* cleanup use of X_update_metadata() functions

* fix for last commit

* rrdlabel unmark/mark/delete unmarked restored
2023-10-29 01:32:47 +03:00
Costa Tsaousis
48af9dc2e0
do not propagate upstream internal label sources () 2023-10-28 19:40:47 +03:00
Costa Tsaousis
a84213ca31
fix various issues identified by coverity () 2023-10-28 18:20:47 +03:00
Costa Tsaousis
d41a4912d4
fix missing labels from parents ()
increate metadata version to resend chart upstream when labels and other metadata are updated
2023-10-28 17:13:30 +03:00
Costa Tsaousis
31e7d85547
fix retention loading () 2023-10-28 06:38:46 +03:00
Costa Tsaousis
2175104d41
Faster parents ()
* cache ctx in collection handle

* cache rd together with rda

* do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension()

* fix unit tests

* do the absolutely minimum while updating timestamps, ensure validity during reading them

* when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only.

* remove the spinlock from mrg

* remove the metric flags that are not used any more

* mrg writers can be different threads

* update first time when latest clean is also updated

* cleanup

* set hot page with a simple atomic operation

* sender sets chart slot for every chart

* work on senders without SLOT

* enable SLOT capability

* send slot at BEGIN when SLOT is enabled

* fix slot generation and parsing

* send slot while re-streaming

* use the sender capabilities, not the receiver

* cleanup

* add slots support to all chart and dimension related plugin commands

* fix condition

* fix calculation

* check sender capabilties

* assign slots in constructors

* we need the dimension slot at the DIMENSION keyword

* more debug info in case of dimension mismatch

* ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely

* fix renumbering on child restart

* reset rda caching when receiving a chart definition

* optimize pluginsd_end_v2()

* do not do zero sized allocations

* trust the chart slot id of the child

* cleanup charts on pluginsd thread exit

* better cleanup

* find the chart and put it in the slot, if it not already there

* move slots array to host

* initialize pluginsd slots properly

* add slots to replay begin; do not cleanup slots that dont belong to a chart

* cleanup on obsolete

* cleanup slots on obsoletions

* cleanup and renames about obsoletion

* rewrite obsolation service code to remove race conditions

* better service obsoletion log

* added debugging

* more debug

* exposed flag now compares versions

* removed debugging messages

* respolve conflicts

* fix replication check for unsent dimensions
2023-10-27 22:42:29 +03:00
Costa Tsaousis
cd584e0357
ZSTD and GZIP/DEFLATE streaming support ()
* move compression header to compression.h

* prototype with zstd compression

* updated capabilities

* no need for resetting compression

* left-over reset function

* use ZSTD_compressStream() instead of ZSTD_compressStream2() for backwards compatibility

* remove call to LZ4_decoderRingBufferSize()

* debug signature failures

* fix the buffers of lz4

* fix decoding of zstd

* detect compression based on initialization; prefer ZSTD over LZ4

* allow both lz4 and zstd

* initialize zstd streams

* define missing ZSTD_CLEVEL_DEFAULT

* log zero compressed size

* debug log

* flush compression buffer

* add sender compression statistics

* removed debugging messages

* do not fail if zstd is not available

* cleanup and buildinfo

* fix max message size, use zstd level 1, add compressio ratio reporting

* use compression level 1

* fix ratio title

* better compression error logs

* for backwards compatibility use buffers of COMPRESSION_MAX_CHUNK

* switch to default compression level

* additional streaming error conditions detection

* do not expose compression stats when compression is not enabled

* test for the right lz4 functions

* moved lz4 and zstd to their own files

* add gzip streaming compression

* gzip error handling

* added unittest for streaming compression

* eliminate a copy of the uncompressed data during zstd compression

* eliminate not needed zstd allocations

* cleanup

* decode gzip with Z_SYNC_FLUSH

* set the decoding gzip algorithm

* user configuration for compression levels and compression algorithms order

* fix exclusion of not preferred compressions

* remove now obsolete compression define, since gzip is always available

* rename compression algorithms order in stream.conf

* move common checks in compression.c

* cleanup

* backwards compatible error checking
2023-10-27 17:37:34 +03:00
Emmanuel Vasilakis
59a983abb4
Small optimization of alert queries () 2023-10-27 15:35:32 +03:00
Stelios Fragkakis
8f17bbc159
Fix label copy to correctly handle duplicate keys ()
Fix rrdlabel copy to correctly handle duplicate keys
Enhance the corresponding unit tests
2023-10-20 17:28:55 +03:00
Stelios Fragkakis
243c5cdfbc
Drop an unused index from aclk_alert table ()
* Drop unused aclk_alert index

* Log messages only when compiled with NETDATA_INTERNAL_CHECKS
2023-10-20 10:23:48 +03:00
Stelios Fragkakis
a27aed521f
Improve context load on startup ()
* Retrieve last connected timestamp from the database (host->last_connected)

* Improve context load performance
Check for agent shutdown while context load in progress
Log information about host load start and finish

* Remove check for slot as it will only reach this part when a slot is found
2023-10-18 17:21:18 +03:00
Stelios Fragkakis
9caea28bcd
Reuse ML load prepared statement ()
Reuse ML load prepared statement and release resources on each batch load
Fix parameter to ML model load to be in seconds not usec
2023-10-18 17:20:00 +03:00
Stelios Fragkakis
0542661fff
Improvements for labels handling ()
* Add additional checks
rrdlabels_find_label_with_key_unsafe finds a label with specified key (not only if it exists but with difefrent value)
Quick check if label already exists (avoids JudyLIns)

* Add migration unit test
Add additional unit test to verify that adding a label with the same key will replace its value
2023-10-18 15:45:34 +03:00
Stelios Fragkakis
46c7e6f8da
Fix dimension HETEROGENEOUS check () 2023-10-18 12:11:06 +03:00
Stelios Fragkakis
0801f5bdce
Fix statistics calculation in 32bit systems ()
Use uint64_t for calculations
2023-10-18 10:27:42 +03:00
Stelios Fragkakis
d9e8b31ac6
Fix meta unittest ()
Fix log message (queue has no limit now)
Fix unittest
2023-10-17 16:55:58 +03:00
Costa Tsaousis
063c4179b3
dynamic meta queue size ()
* dynamic meta queue size

* meta cleanup
2023-10-16 20:13:57 +03:00
Emmanuel Vasilakis
bdf83311c3
Add summary to /alerts () 2023-10-16 17:09:41 +03:00
Costa Tsaousis
7a3f22e24b
allow patterns in journal queries () 2023-10-16 00:24:34 +01:00
Costa Tsaousis
1b7c15ac09
journal timeout ()
* stop the query 250ms before the timeout, to allow sending back partial responses

* on timeout return partial responses

* give it 500ms

* give some additional timeout to plugins.d garbage collection

* define an extension to the timeout for all intermediate hops

* hunting for the crash...

* set value name and len to zero

* remove unneeded memset()
2023-10-14 18:03:02 +03:00
Stelios Fragkakis
77e5795e8d
Fix access of memory after free ()
* Proper init to avoid use after free

* CID 400083 Unchecked return value
2023-10-13 21:16:46 +03:00
Stelios Fragkakis
e8c770d1a8
Batch ML model load commands ()
Batch ML model load
2023-10-10 19:43:32 +03:00
Emmanuel Vasilakis
00115ced31
Don't queue removed when there is a newer alert ()
* dont queue removed when there is a newer

* proper concat

* add host_id
2023-10-10 17:39:26 +03:00
Stelios Fragkakis
7644910646
Fix compilation warnings ()
Drop warning when parent is not accepting job status updates
2023-10-10 16:21:39 +03:00
Stelios Fragkakis
18e729d670
Code improvements ()
* Remove unused functions

* No need for prepare statement because the function is not used frequently

* Remove db_meta check, already assumed valid

* Remove D_ACLK_SYNC and D_METADATALOG, fix log message

* Reuse prepared statements per run to avoid sql parsing all the time

* Keep rowid in charts and dimensions

* Host and chart labels keep rowids

* Don't store internal flags

* Remove commented out code

* Formatting

* Fix algorithm when updating dimension
2023-10-06 16:33:45 +03:00
Emmanuel Vasilakis
9493fa8682
Remove family from alerts ()
* remove loading and storing families from alert configs

* remove families from silencers

* remove from alarm log

* start remove from alarm-notify.sh.in

* fix test alarm

* rebase

* remove from api/v1/alarm_log

* remove from alert stream

* remove from config stream

* remove from more

* remove from swagger for health api

* revert md changes

* remove from health cmd api test
2023-10-06 00:57:53 +03:00
Costa Tsaousis
9fd9823e07
journal: fix the 1 second latency in play mode ()
provide a relative_to_absolute function that does not touch the current realtime time
2023-10-04 20:54:16 +03:00
Stelios Fragkakis
385d022035
Skip database migration steps in new installation ()
* For new installation skip database migration steps

* Simplify logging

* Count database tables to determine if database is empty

* Report extended error message
2023-10-03 15:11:44 +03:00
Stelios Fragkakis
d34dbf844f
Avoid duplicate keys in labels ()
* Avoid duplicate keys in labels

* Properly delete old label

* Simplify memory statistics logging
2023-10-02 18:07:38 +03:00
Emmanuel Vasilakis
8c9492a476
Send alerts summary field to cloud ()
* new aclk schema

* transmit summary to cloud and expose in v2/alerts

* missing assign
2023-10-02 09:56:01 +03:00
Timotej S
6dfc99a2e0
Dyncfg add streaming support ()
* dyncfg fncnames as constants

* add helper macros to know parser streaming/plugin

* plugins dictionary per RRDHOST

* api_request_v2_config add support for /host/

* streamify pluginsd_register_plugin

* streamify pluginsd_register_module

* streamify report_job_status

* streamify dyncfg get functions

* module_type2str

* add job type and flags

* add DYNCFG_REGISTER_JOB

* implement register job

* push all to parent at startup

* add helper function is_dyncfg_function

* forward virtual functions trough streaming

* separate job2json

* add api/v2/job_statuses

* do cleanup on streaming

* streamify set functions

* support FUNCTION_PAYLOAD trough streaming

* WIP tests

* dont attempt loading non-localhost configs

* move cfg persistence to proper place

* prevent race

* properly update job state at runtime

* cleanup 1

* job2json add missing reason

* add tests

* correct HTTP code

* add test

* streamify delete_job_cb

* add DELETE_JOB keyword

* job delete over streaming

* add tests for create and delete job over parent

* rrdpush common checks to macro

* add missing forwarders

* fix jobs according to test results

* more tests

* review comment 1

* codacy remove valid warning

* codacy ruby fixes

* fix wrong rc check

* minimal test plugin for child

* add test

* dict walk insted of master lock

* minor - english spelling fixes

* thiago comments 1

* minor - rename folder to dynconf

* enable only when built with -DNETDATA_TEST_DYNCFG

* minor - compiler warning

* create dir post daemonization

* stricter URL check
2023-09-29 17:13:42 +02:00
Stelios Fragkakis
f90c2a23e9
Convert the ML database ()
* Convert a db to WAL with auto vacuum

* Use single sqlite configuration function

* Remove UNUSED statements
2023-09-28 19:40:02 +03:00