mirror of
https://github.com/netdata/netdata.git
synced 2025-04-15 10:04:15 +00:00
9 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
5f72d4279b
|
Streaming improvements No 3 (#19168)
* ML uses synchronous queries
* do not call malloc_trim() to free memory, since to locks everything
* Reschedule dimensions for training from worker threads.
* when we collect or read from the database, it is SAMPLES. When we generate points for a chart is POINTS
* keep the receiver send buffer 10x the default
* support autoscaling stream circular buffers
* nd_poll() prefers sending data vs receiving data - in an attempt to dequeue as soon as possible
* fix last commit
* allow removing receiver and senders inline, if the stream thread is not working on them
* fix logs
* Revert "nd_poll() prefers sending data vs receiving data - in an attempt to dequeue as soon as possible"
This reverts commit
|
||
![]() |
9ecf021ec2
|
Streaming improvements #1 (#19137)
* prefer tinysleep over yielding the processor * split spinlocks to separate files * rename spinlock initializers * Optimize ML queuing operations. - Allocate 25% of cores for ML. - Split queues by request type. - Accurate stats for queue operations by type. * abstracted circular buffer into a new private structure to enable using it in receiver sending side - no features added yet, only abstracted the existing functionality - not tested yet * completed the abstraction of stream circular buffer * unified list of receivers and senders; opcodes now support both receivers and senders * use strings in pluginsd * stream receivers send data back to the child using the event loop * do not share pgc aral between caches * pgc uses 4 to 256 partitions, by default equal to the number of CPU cores * add forgotten worker job * workers now monitor spinlock contention * stream sender tries to lock the sender, but does not wait for it - it will be handled later * increase the number of web server threads to the number of cpu cores, with a minimum of 6 * use the nowait versions of nd_sock functions * handle EAGAIN properly * add spinlock contention tracing for rw_spinlock * aral lock/unlock contention tracing * allocate the compressed buffer * use 128KiB for aral default page size; limit memory protection to 5GiB * aral uses mmap() for big pages * enrich log messages * renamed telemetry to pulse * unified sender and receiver socket event loops * logging improvements * NETDATA_LOG_STREAM_SENDER logs inbound and outbound traffic * 16k receiver buffer size to improve interactivity * fix NETDATA_LOG_STREAM_SENDER in sender_execute * do not stream ML models for charts and dimensions that have not been exposed * add support for sending QUIT to plugins and waiting for some time for them to quit gracefully * global spinlock contention per function * use an aral per pgc partition; use 8 partitions for PGD * rrdcalc: do not change the frequency of alerts - it uses arbitrary values used during replication, changing permanently the frequency of alerts replication: use 1/3 of the cores or 1 core every 10 nodes (min of the two) pgd: use as many aral partitions as the CPU cores, up to 256 * aral does 1 allocation per page (the structure and the elements together), instead of two * use the evitor thread only when we run out of memory; restore the optimization about prepending or appending clean pages based on their accesses; use the main cache free memory for the other caches, reducing I/O when the main cache has enough room * reduce the number of events per poll() to 10 * aral allocates pages of up to 1MiB; restore processing 100 events per nd_poll() call * drain the sockets while reading * receiver sockets should be non-blocking * add stability detector to aral * increase the receivers send buffer * do not remove the sender or the receiver while we drain the input sockets --------- Co-authored-by: vkalintiris <vasilis@netdata.cloud> |
||
![]() |
6b8c6baac2
|
Balance streaming parents (#18945)
* recreate the circular buffer from time to time * do not update cloud url if the node id is not updated * remove deadlock and optimize pipe size * removed const * finer control on randomized delays * restore children re-connecting to parents * handle partial pipe reads; sender_commit() now checks if the sender is still connected to avoid bombarding it with data that cannot be sent * added commented code about optimizing the array of pollfds * improve interactivity of sender; code cleanup * do not use the pipe for sending messages, instead use a queue in memory (that can never be full) * fix dictionaries families * do not destroy aral on replication exit - it crashes the senders * support multiple dispatchers and connectors; code cleanup * more cleanup * Add serde support for KMeans models. - Serialization/Deserialization support of KMeans models. - Send/receive ML models between a child/parent. - Fix some rare and old crash reports. - Reduce allocations by a couple thousand per second when training. - Enable ML statistics temporarily which might increase CPU consumption. * fix ml models streaming * up to 10 dispatchers and 2 connectors * experiment: limit the number of receivers to the number of cores - 2 * reworked compression at the receiver to minimize read operations * multi-core receivers * use slot 0 on receivers * use slot 0 on receivers * use half the cores for receivers with a minimum of 4 * cancel receiver threads * use offsets instead of pointers in the compressed buffer; track last reads * fix crash on using freed decompressor; core re-org * fix incorrect job registration * fix send_to_plugin() for SSL * add reason to disconnect message * fix signaling receivers to stop * added --dev option to netdata-installer.sh to prevent it from removing the build directory * Fix serde of double values. NaNs and +/- infinities are encoded as strings. * unused param * reset max cbuffer size when it is recreated * struct receiver_state is now private * 1 dispatcher, 1 connector, 2/3 cores for receivers * all replication requests are served by replication threads - never the dispatcher threads * optimize partitions and cache lines for dbengine cache * fix crash on receiver shutdown * rw spinlock now prioritizes writers * backfill all higher tiers * extent cache to 10% * automatic sizing of replication threads * add more replication threads * configure cache eviction parameters to avoid running in aggressive mode all the time * run evictions and flushes every 100ms * add missing initialization * add missing initialization - again * add evictors for all caches * add dedicated evict thread per cache * destroy the completion * avoid sending too many signals to eviction threads * alternative way to make sure there are data to evict * measure inline cache events * disable inline evictions and flushing for open and extent cache * use a spinlock to avoid sending too many signals * batch evictions are not in steps of pages * fix wanted cache size when there are no clean entries in it * fix wanted cache size when there are no clean entries in it * fix wanted cache size again * adaptive batch evictions; batch evictions first try all partitions * move waste events to waste chart * added evict_traversed * evict is smaller steps * removed obsolete code * disabled inlining of evictions and flushing; added timings for evictions * more detailed timings for evictions * use inline evictors * use aral for gorilla pages of 512 bytes, when they are loaded from disk * use aral for all gorilla page sizes loaded from disk * disable inlining again to test it after the memory optimization * timings for dbengine evictions * added timing names * detailed timings * detailed timings - again * removed timings and restored inline evictions * eviction on release only under critical pressure * cleanup and replication tuning * tune cache size calculation * tune replication threads calculation * make streaming receiver exit * Do not allocate/copy extent data twice. * Build/link mimalloc Just for testing, it will be reverted. * lower memory requirements * Link mimalloc statically * run replication with synchronous queries * added missing worker jobs in sender dispatcher * enable batch evictions in pgc * fix sender-dispatcher workers * set max dispatchers to 2 * increase the default replication threads * log stream_info errors * increase replication threads * log the json text when we fail to parse json response of stream_info * stream info response may come back in multiple steps * print the socket error of stream info * added debug to stream info socket error * loop while content-length is smaller than the payload received * Revert "Link mimalloc statically" This reverts commit |
||
![]() |
fc5605da7f
|
Adjust max possible extent size (#18960)
* Adjust max possible extent size * Simplify calculation Fix also the have_read_error |
||
![]() |
6bcfc4972b
|
Fix warnings (#17940) | ||
![]() |
fe06e8495f
|
Windows Support Phase 1 (#17497)
* abstraction layer for O/S * updates * updates * updates * temp fix for protobuf * emulated waitid() * fix * fix * compatibility layer * fix for idtype * fix for missing includes * fix for missing includes * added missing includes * added missing includes * added missing includes * added missing includes * added missing includes * added missing includes * UUID renamed to ND_UUID to avoid conflict with windows.h * include libnetdata.h always - no conflicts * simplify abstraction headers * fix missing functions * fix missing functions * fix missing functions * fix missing functions * rename MSYS to WINDOWS * moved byteorder.h * structure for an internal windows plugin * 1st windows plugin * working plugin * fix printf * Special case windows for protobuf * remove cygwin, compile both as windows * log windows libraries used * fix cmake * fix protobuf * compilation * updated compilation script * added system.ram * windows uptime * perflib * working perflibdump * minify dump * updates to windows plugins, enable ML * minor compatibility fixes for cygwin and msys * perflib-dump to its own file * perflib now indexes names * improvements to the library; disks module WIP * API for selectively traversing the metrics * first working perflib chart: disk.space * working chart on logical and physical disks * added windows protocols * fix datatypes for loops * tinysleep for native smallest sleep support * remove libuuid dependency on windows * fix uuid functions for macos compilation * fix uuid comparison function * do not overwrite uuid library functions, define them as aliases to our own * fixed uuid_unparse functions * fixed typo * added perflib processor charts * updates for compiling without posix emulation * gather common contexts together * fix includes on linux * perflib-memory * windows mem.available * Update variable names for protobuf * network traffic * add network adapters that have traffic as virtual interfaces * add -pipe to windows compilation * reset or overflow flag is now per dimension * dpc is now counted separately * verified all perflib fields are processed and no text fields are present in the data * more common contexts * fix crash * do not add system.net multiple times * install deps update and shortcut * all threads are now joinable behind the scenes * fix threads cleanup * prepare for abstracting threads API * netdata threads full abstraction from pthreads * more threads abstraction and cleanup * more compatibility changes * fix compiler warnings * add base-devel to packages * removed duplicate base-devel * check for strndup * check headers in quotes * fix linux compilation * fix attribute gnu_printf on macos * fix for threads on macos * mingw64 compatibility * enable compilation on windows clion * added instructions * enable cloud * compatibility fixes * compatibility fixes * compatibility fixes * clion works on windows * support both MSYSTEM=MSYS and MSYSTEM=MINGW64 for configure * cleanup and docs * rename uuid_t to nd_uuid_t to avoid conflict with windows uuid_t * leftovers uuid_t * do not include uuid.h on macos * threads signaled cancellations * do not install v0 dashboard on windows * script to install openssh server on windows * update openssh installation script * update openssh installation script * update openssh installation script * update openssh installation script * update openssh installation script * update openssh installation script * update openssh installation script * update openssh installation script * update openssh installation script * use cleanup variable instead of pthreads push and pop * replace all calls to netdata_thread_cleanup_push() and netdata_thread_cleanup_pop() with __attribute__((cleanup(...))) * remove left-over freez * make sure there are no locks acquired at thread exit * add missing parameter * stream receivers and senders are now voluntarily cancelled * plugins.d now voluntarily exits its threads * uuid_t may not be aligned to word boundaries - fix the uuid_t functions to work on unaligned objects too. * collectors evloop is now using the new threading cancellation; ml is now not using pthread_cancel; more fixes * eliminate threads cancellability from the code base * fix exit timings and logs; fix uv_threads tags * use SSL_has_pending() only when it is available * do not use SSL_has_pending() * dyncfg files on windows escape collon and pipe characters * fix compilation on older systems * fix compilation on older systems * Create windows installer. The installer will install everything under C:\netdata by default. It will: - Install msys2 at C:\netdata - Install netdata dependencies with pacman - Install the agent itself under C:\netdata\opt You can start the agent by running an MSYS shell with C:\netdata\msys2_shell.cmd and then start the agent normally with: /opt/netdata/usr/sbin/netdata -D There are a more couple things to work on: - Verify publisher. - Install all deps not just libuv & protobuf. - Figure out how we want to auto-start the agent as a service. - Check how to uninstall things. * fixed typo * code cleanup * Create uninstaller --------- Co-authored-by: vkalintiris <vasilis@netdata.cloud> |
||
![]() |
f1c26d0e2b
|
DBENGINE: support ZSTD compression (#17244)
* extract dbengine compression to separate files * added ZSTD support in dbengine * automatically select best compression * handle decompression errors * eliminate fatals from compression algorithms; fallback to uncompressed pages if compression fails or generates bigger data * have the unit test generate many data files |
||
![]() |
00f897a883
|
Code cleanup (#17237)
* renames in dbengine * remove leftovers from memory mode save and map * fix docs about 3 tiers by default * split linked-lists, bitmaps and storage-points from libnetdata.h |
||
![]() |
115d074a6c
|
Create a top-level directory to contain source code. (#16896)
* Move ML under src * Move spwan under src * Move cli/ under src/ * move registry/ under src/ * move streaming/ under src/ * Move claim under src. Update docs * Move database/ under src/ * Move libnetdata/ under src/ * Update references to libnetdata * Fix logsmanagement includes * Update generated script path. |
Renamed from database/engine/pdc.c (Browse further)