netdata_netdata

mirror of https://github.com/netdata/netdata.git synced 2025-03-31 11:45:06 +00:00

History

Costa Tsaousis bc840a7994 DBENGINE: pgc tuning, replication tuning (#19237 ) * evict, a page at a time * 4 replication ahead requests per replication thread * added per job average timings for workers and dbengine query router * debug statement to find what is slow * yield the processor to avoid monopolizing the cache * test more page sizes in aral * more polite journal v2 indexing * pulse macros for atomics * added profile so that users can control the defaults of the agent * fix windows warnings; journal v2 generation yields the processor for every page * replication threads are 1/3 of the cores and they are synchronous * removed the default from the list of profiles * turn pgc locks into macros to have tracing on the functions that called them * log the size of madvise() when failing * more work on profiles * restore batch cache evictions, but lower the batch size significantly * do not spin while searching for pages in the cache - handle currently being deleted pages within the search logic itself * remove bottleneck in epdl processing while merging extents * allocate outside the lock * rw spinlock implemented without spinlocks; both spinlocks and r/w spinlocks now support exponential backoff while waiting * apply max sleep to spinlocks * tune replication * r/w spinlock prefers writers again, but this time recursive readers are bypassing the writer wait * tuning of rw spinlock * more tuning of the rw spinlock * configure glibc arenas based on profile * moving global variables into nd_profile * do not accept sockets that have not received any data; once sockets with data have been accepted, check they are not closed already before processing them * poll_events is now using nd_poll(), resulting in vast simplification of the code; static web files are now served inline resulting in another simplification of the web server logic (was required because epoll does not support normal files) * startup fixes * added debug info to poll_fd_close() * closed sockets are automatically removed from epoll(), by the kernel * fix for mrg acquired and referenced going negative * fixed bug in mrg cleanup, not deleting metrics that do not have retention * monitor strings index size * strings memory chart is now stacked * replication: do not lock data collection when running in batches * explicitly set socket flags for sender and receiver * normalize the event loop for sending data (receiver and sender) * normalize the event loop for receiving data (receiver and sender) * check all sender nodes every half a second * fix bug on sender, not enabling sending * first cleanup then destroy * normalize nd_poll() to handle all possible events * cleanup * normalize socket helper functions * fixed warnings on alpine * fix for POLLRDHUP missing * fix cleanup on shutdown * added detailed replication summary * moved logs to INFO * prevent crash when sender is not there * madvise _dontfork() should not be used with aral, madvise_dontdump() is only used for file backed maps * fix wording * fix log wording * split replication receiver and sender; add logs to find missing replication requests * fix compilation * fixed bug in backfilling, having garbage for counters - malloc instead of calloc * backfilling logs if it misses callbacks * log replication rcv and replication snd in node info * remove contention from aral_page_free_lock(), but having 2 free lists per page, one for incoming and another for available items and moving incoming to available when the available is empty - this allows aral_mallocz() and aral_freez() to operate concurrently on the same page * fix internal checks * log errors for all replication receiver exceptions * removed wrong error log * prevent health crashing * cleanup logs that are irrelevant with the missing replication events * replication tracking: added replication tracking to figure out how replication missed requests * fix compilation and fix bug on spawn server cleanup calling uv_shutdown at exit * merged receiver initialization * prevent compilation warnings * fix race condition in nd_poll() returning events for deleted fds * for user queries, prepare as many queries as half the processors * fix log * add option dont_dump to netdata_mmap and aral_create * add logging missing receiver and sender charts * reviewed judy memory accounting; adbstracted flags handling to ensure they all work the same way; introduced atomic_flags_set_and_clear() to set and clear atomic flags with a single atomic operation * improvement(go.d/nats): add server_id label (#19280) * Regenerate integrations docs (#19281) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> * [ci skip] Update changelog and version for nightly build: v2.1.0-30-nightly. * docs: improve on-prem troubleshooting readability (#19279) * docs: improve on-prem troubleshooting readability * Apply suggestions from code review --------- Co-authored-by: Fotis Voutsas <fotis@netdata.cloud> * improvement(go.d/nats): add leafz metrics (#19282) * Regenerate integrations docs (#19283) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> * [ci skip] Update changelog and version for nightly build: v2.1.0-34-nightly. * fix go.d/nats tests (#19284) * improvement(go.d/nats): add basic jetstream metrics (#19285) * Regenerate integrations docs (#19286) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> * [ci skip] Update changelog and version for nightly build: v2.1.0-38-nightly. * bump dag req jinja version (#19287) * more strict control on replication counters * do not flush the log files - to cope with the rate * [ci skip] Update changelog and version for nightly build: v2.1.0-40-nightly. * fix aral on windows * add waiting queue to sender commit, to allow the streaming thread go fast and put replication threads in order * use the receiver tid * fix(netdata-updater.sh): remove commit_check_file directory (#19288) * receiver now has periodic checks too (like the senders have) * fixed logs * replication periodic checks: resending of chart definitions * strict checking on rrdhost state id * replication periodic checks: added for receivers * shorter replication status messages * do not log about ieee754 * receiver logs replication traffic without RSET * object state: rrdhost_state_id has become object_state in libnetdata so that it can be reused * fixed metadata; added journal message id for netdata fatal messages * replication: undo bypassing the pipeline * receiver cleanup: free all structures at the end, to ensure there are not crashes while cleaning up * replication periodic checks: do not run it on receivers, when there is replication in progress * nd_log: prevent fatal statements from recursing * replication tracking: disabled (compile time) * fix priority and log * disconnect on stale replication - detected on both sender and receiver * update our tagline * when sending data from within opcode handling do not remove the receiver/sender * improve interactivity of streaming sockets * log the replication cmd counters on disconnect and reset them on reconnect * rrdhost object state activate/deactivate should happen in set/clear receiver * remove writer preference from rw spinlocks * show the value in health logs * move counter to the right place to avoid double counting replication commands * do not run opcodes when running inline * fix replication log messages * make IoT harmless for the moment --------- Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> Co-authored-by: Netdata bot <43409846+netdatabot@users.noreply.github.com> Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> Co-authored-by: netdatabot <bot@netdata.cloud> Co-authored-by: Fotis Voutsas <fotis@netdata.cloud>		2024-12-29 20:22:24 +02:00
..
data_structures	DBENGINE: pgc tuning, replication tuning (#19237 )	2024-12-29 20:22:24 +02:00
docs	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
build.sh	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
config.puml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
ephemeral-nodes-two-parents.xml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
netdata-for-ephemeral-nodes.xml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
netdata-overview.xml	Updated copyright notices (#19256 )	2024-12-20 15:25:45 +02:00
netdata-proxies-example.xml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
registry.puml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
simple-parent-child-no-cloud.xml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
simple-parent-child.xml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00
windows.xml	Move diagrams/ under docs/ (#16998 )	2024-02-12 16:58:26 +02:00