netdata_netdata

19165 commits 41 branches 128 tags 309 MiB

Author SHA1 Message Date

Author	SHA1	Message	Date
Costa Tsaousis	5f72d4279b	Streaming improvements No 3 (#19168 ) * ML uses synchronous queries * do not call malloc_trim() to free memory, since to locks everything * Reschedule dimensions for training from worker threads. * when we collect or read from the database, it is SAMPLES. When we generate points for a chart is POINTS * keep the receiver send buffer 10x the default * support autoscaling stream circular buffers * nd_poll() prefers sending data vs receiving data - in an attempt to dequeue as soon as possible * fix last commit * allow removing receiver and senders inline, if the stream thread is not working on them * fix logs * Revert "nd_poll() prefers sending data vs receiving data - in an attempt to dequeue as soon as possible" This reverts commit `51539a97da`. * do not access receiver or sender after it has been removed * open cache hot2clean * open cache hot2clean does not need flushing * use aral for extent pages up to 65k * track aral malloc and mmap allocations separately; add 8192 as a possible value to PGD * do not evict too frequently if not needed * fix aral metrics * fix aral metrics again * accurate accounting of memory for dictionaries, strings, labels and MRG * log during shutdown the progress of dbengine flushing * move metasync shutfown after dbengine * max iterations per I/O events * max iterations per I/O events - break the loop * max iterations per I/O events - break the loop - again * disable inline evictions for all caches * when writing to sockets, send everything that can be sent * cleanup code to trigger evictions * fix calculation of eviction size * fix calculation of eviction size once more * fix calculation of eviction size once more - again * ml and replication stop while backfilling is running * process opcodes while draining the sockets; log with limit when asking to disconnect a node * fix log * ml stops when replication queries are running * report pgd_padding to pulse * aral precise memory accounting * removed all alignas() and fix the 2 issues that resulted in unaligned memory accesses (one in mqtt and another in streaming) * remove the bigger sizes from PGD, but keep multiples of gorilla buffers * exclude judy from sanitizers * use 16 bytes alignment on 32 bit machines * internal check about memory alignment * experiment: do not allow more children to connect while there is backfilling or replication queries running * when the node is initializing, retry in 30 seconds * connector cleanup and isolation of control logic about enabling/disabling various parts * stop also health queries while backfilling is running * tuning * drain the input * improve interactivity when suspending * more interactive stream_control * debug logs to find the connection issue * abstracted everything about stream control * Add ml_host_{start,stop} again. * Do not create/update anomaly-detection charts when ML is not running for a host. * rrdhost flag RECEIVER_DISCONNECTED has been reversed to COLLECTOR_ONLINE and has been used for localhost and virtual hosts too, to have a single point of truth about the availability of collected data or not * ml_host_start() and ml_host_stop() are used by streaming receivers; ml_host_start() is used for localhost and virtual hosts * fixed typo * allow up to 3 backfills at a time * add throttling based on user queries * restore cache line paddings * unify streaming logs to make it easier to grep logs * tuning of stream_control * more logs unification * use mallocz_release_as_much_memory_to_the_system() under extreme conditions * do not rely on the response code of evict_pages() * log the gap of the database every time a node is connected * updated ram requirements --------- Co-authored-by: vkalintiris <vasilis@netdata.cloud>	2024-12-11 18:02:17 +02:00
Costa Tsaousis	9ecf021ec2	Streaming improvements #1 (#19137 ) * prefer tinysleep over yielding the processor * split spinlocks to separate files * rename spinlock initializers * Optimize ML queuing operations. - Allocate 25% of cores for ML. - Split queues by request type. - Accurate stats for queue operations by type. * abstracted circular buffer into a new private structure to enable using it in receiver sending side - no features added yet, only abstracted the existing functionality - not tested yet * completed the abstraction of stream circular buffer * unified list of receivers and senders; opcodes now support both receivers and senders * use strings in pluginsd * stream receivers send data back to the child using the event loop * do not share pgc aral between caches * pgc uses 4 to 256 partitions, by default equal to the number of CPU cores * add forgotten worker job * workers now monitor spinlock contention * stream sender tries to lock the sender, but does not wait for it - it will be handled later * increase the number of web server threads to the number of cpu cores, with a minimum of 6 * use the nowait versions of nd_sock functions * handle EAGAIN properly * add spinlock contention tracing for rw_spinlock * aral lock/unlock contention tracing * allocate the compressed buffer * use 128KiB for aral default page size; limit memory protection to 5GiB * aral uses mmap() for big pages * enrich log messages * renamed telemetry to pulse * unified sender and receiver socket event loops * logging improvements * NETDATA_LOG_STREAM_SENDER logs inbound and outbound traffic * 16k receiver buffer size to improve interactivity * fix NETDATA_LOG_STREAM_SENDER in sender_execute * do not stream ML models for charts and dimensions that have not been exposed * add support for sending QUIT to plugins and waiting for some time for them to quit gracefully * global spinlock contention per function * use an aral per pgc partition; use 8 partitions for PGD * rrdcalc: do not change the frequency of alerts - it uses arbitrary values used during replication, changing permanently the frequency of alerts replication: use 1/3 of the cores or 1 core every 10 nodes (min of the two) pgd: use as many aral partitions as the CPU cores, up to 256 * aral does 1 allocation per page (the structure and the elements together), instead of two * use the evitor thread only when we run out of memory; restore the optimization about prepending or appending clean pages based on their accesses; use the main cache free memory for the other caches, reducing I/O when the main cache has enough room * reduce the number of events per poll() to 10 * aral allocates pages of up to 1MiB; restore processing 100 events per nd_poll() call * drain the sockets while reading * receiver sockets should be non-blocking * add stability detector to aral * increase the receivers send buffer * do not remove the sender or the receiver while we drain the input sockets --------- Co-authored-by: vkalintiris <vasilis@netdata.cloud>	2024-12-09 02:37:44 +02:00

Costa Tsaousis

5f72d4279b

Streaming improvements No 3 (#19168 )

* ML uses synchronous queries

* do not call malloc_trim() to free memory, since to locks everything

* Reschedule dimensions for training from worker threads.

* when we collect or read from the database, it is SAMPLES. When we generate points for a chart is POINTS

* keep the receiver send buffer 10x the default

* support autoscaling stream circular buffers

* nd_poll() prefers sending data vs receiving data - in an attempt to dequeue as soon as possible

* fix last commit

* allow removing receiver and senders inline, if the stream thread is not working on them

* fix logs

* Revert "nd_poll() prefers sending data vs receiving data - in an attempt to dequeue as soon as possible"

This reverts commit 51539a97da.

* do not access receiver or sender after it has been removed

* open cache hot2clean

* open cache hot2clean does not need flushing

* use aral for extent pages up to 65k

* track aral malloc and mmap allocations separately; add 8192 as a possible value to PGD

* do not evict too frequently if not needed

* fix aral metrics

* fix aral metrics again

* accurate accounting of memory for dictionaries, strings, labels and MRG

* log during shutdown the progress of dbengine flushing

* move metasync shutfown after dbengine

* max iterations per I/O events

* max iterations per I/O events - break the loop

* max iterations per I/O events - break the loop - again

* disable inline evictions for all caches

* when writing to sockets, send everything that can be sent

* cleanup code to trigger evictions

* fix calculation of eviction size

* fix calculation of eviction size once more

* fix calculation of eviction size once more - again

* ml and replication stop while backfilling is running

* process opcodes while draining the sockets; log with limit when asking to disconnect a node

* fix log

* ml stops when replication queries are running

* report pgd_padding to pulse

* aral precise memory accounting

* removed all alignas() and fix the 2 issues that resulted in unaligned memory accesses (one in mqtt and another in streaming)

* remove the bigger sizes from PGD, but keep multiples of gorilla buffers

* exclude judy from sanitizers

* use 16 bytes alignment on 32 bit machines

* internal check about memory alignment

* experiment: do not allow more children to connect while there is backfilling or replication queries running

* when the node is initializing, retry in 30 seconds

* connector cleanup and isolation of control logic about enabling/disabling various parts

* stop also health queries while backfilling is running

* tuning

* drain the input

* improve interactivity when suspending

* more interactive stream_control

* debug logs to find the connection issue

* abstracted everything about stream control

* Add ml_host_{start,stop} again.

* Do not create/update anomaly-detection charts when ML is not running for a host.

* rrdhost flag RECEIVER_DISCONNECTED has been reversed to COLLECTOR_ONLINE and has been used for localhost and virtual hosts too, to have a single point of truth about the availability of collected data or not

* ml_host_start() and ml_host_stop() are used by streaming receivers; ml_host_start() is used for localhost and virtual hosts

* fixed typo

* allow up to 3 backfills at a time

* add throttling based on user queries

* restore cache line paddings

* unify streaming logs to make it easier to grep logs

* tuning of stream_control

* more logs unification

* use mallocz_release_as_much_memory_to_the_system() under extreme conditions

* do not rely on the response code of evict_pages()

* log the gap of the database every time a node is connected

* updated ram requirements

---------

Co-authored-by: vkalintiris <vasilis@netdata.cloud>

2024-12-11 18:02:17 +02:00

Costa Tsaousis

9ecf021ec2

Streaming improvements #1 (#19137 )

* prefer tinysleep over yielding the processor

* split spinlocks to separate files

* rename spinlock initializers

* Optimize ML queuing operations.

- Allocate 25% of cores for ML.
- Split queues by request type.
- Accurate stats for queue operations by type.

* abstracted circular buffer into a new private structure to enable using it in receiver sending side - no features added yet, only abstracted the existing functionality - not tested yet

* completed the abstraction of stream circular buffer

* unified list of receivers and senders; opcodes now support both receivers and senders

* use strings in pluginsd

* stream receivers send data back to the child using the event loop

* do not share pgc aral between caches

* pgc uses 4 to 256 partitions, by default equal to the number of CPU cores

* add forgotten worker job

* workers now monitor spinlock contention

* stream sender tries to lock the sender, but does not wait for it - it will be handled later

* increase the number of web server threads to the number of cpu cores, with a minimum of 6

* use the nowait versions of nd_sock functions

* handle EAGAIN properly

* add spinlock contention tracing for rw_spinlock

* aral lock/unlock contention tracing

* allocate the compressed buffer

* use 128KiB for aral default page size; limit memory protection to 5GiB

* aral uses mmap() for big pages

* enrich log messages

* renamed telemetry to pulse

* unified sender and receiver socket event loops

* logging improvements

* NETDATA_LOG_STREAM_SENDER logs inbound and outbound traffic

* 16k receiver buffer size to improve interactivity

* fix NETDATA_LOG_STREAM_SENDER in sender_execute

* do not stream ML models for charts and dimensions that have not been exposed

* add support for sending QUIT to plugins and waiting for some time for them to quit gracefully

* global spinlock contention per function

* use an aral per pgc partition; use 8 partitions for PGD

* rrdcalc: do not change the frequency of alerts - it uses arbitrary values used during replication, changing permanently the frequency of alerts
replication: use 1/3 of the cores or 1 core every 10 nodes (min of the two)
pgd: use as many aral partitions as the CPU cores, up to 256

* aral does 1 allocation per page (the structure and the elements together), instead of two

* use the evitor thread only when we run out of memory; restore the optimization about prepending or appending clean pages based on their accesses; use the main cache free memory for the other caches, reducing I/O when the main cache has enough room

* reduce the number of events per poll() to 10

* aral allocates pages of up to 1MiB; restore processing 100 events per nd_poll() call

* drain the sockets while reading

* receiver sockets should be non-blocking

* add stability detector to aral

* increase the receivers send buffer

* do not remove the sender or the receiver while we drain the input sockets

---------

Co-authored-by: vkalintiris <vasilis@netdata.cloud>

2024-12-09 02:37:44 +02:00

Renamed from src/daemon/telemetry/telemetry-aral.c (Browse further)

2 commits