0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-17 03:02:41 +00:00
Commit graph

17 commits

Author SHA1 Message Date
thiagoftsm
3e3ff4bee8
Add Collector log () 2023-01-25 19:04:07 +00:00
Costa Tsaousis
368a26cfee
DBENGINE v2 ()
* count open cache pages refering to datafile

* eliminate waste flush attempts

* remove eliminated variable

* journal v2 scanning split functions

* avoid locking open cache for a long time while migrating to journal v2

* dont acquire datafile for the loop; disable thread cancelability while a query is running

* work on datafile acquiring

* work on datafile deletion

* work on datafile deletion again

* logs of dbengine should start with DBENGINE

* thread specific key for queries to check if a query finishes without a finalize

* page_uuid is not used anymore

* Cleanup judy traversal when building new v2
Remove not needed calls to metric registry

* metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent

* disable checks for invalid time-ranges

* Remove type from page details

* report scanning time

* remove infinite loop from datafile acquire for deletion

* remove infinite loop from datafile acquire for deletion again

* trace query handles

* properly allocate array of dimensions in replication

* metrics cleanup

* metrics registry uses arrayalloc

* arrayalloc free should be protected by lock

* use array alloc in page cache

* journal v2 scanning fix

* datafile reference leaking hunding

* do not load metrics of future timestamps

* initialize reasons

* fix datafile reference leak

* do not load pages that are entirely overlapped by others

* expand metric retention atomically

* split replication logic in initialization and execution

* replication prepare ahead queries

* replication prepare ahead queries fixed

* fix replication workers accounting

* add router active queries chart

* restore accounting of pages metadata sources; cleanup replication

* dont count skipped pages as unroutable

* notes on services shutdown

* do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file

* do not add pages we dont need to pdc

* time in range re-work to provide info about past and future matches

* finner control on the pages selected for processing; accounting of page related issues

* fix invalid reference to handle->page

* eliminate data collection handle of pg_lookup_next

* accounting for queries with gaps

* query preprocessing the same way the processing is done; cache now supports all operations on Judy

* dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3)

* get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query

* finner gaps calculation; accounting of overlapping pages in queries

* fix gaps accounting

* move datafile deletion to worker thread

* tune libuv workers and thread stack size

* stop netdata threads gradually

* run indexing together with cache flush/evict

* more work on clean shutdown

* limit the number of pages to evict per run

* do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction

* economies on flags for smaller page footprint; cleanup and renames

* eviction moves referenced pages to the end of the queue

* use murmur hash for indexing partition

* murmur should be static

* use more indexing partitions

* revert number of partitions to number of cpus

* cancel threads first, then stop services

* revert default thread stack size

* dont execute replication requests of disconnected senders

* wait more time for services that are exiting gradually

* fixed last commit

* finer control on page selection algorithm

* default stacksize of 1MB

* fix formatting

* fix worker utilization going crazy when the number is rotating

* avoid buffer full due to replication preprocessing of requests

* support query priorities

* add count of spins in spinlock when compiled with netdata internal checks

* remove prioritization from dbengine queries; cache now uses mutexes for the queues

* hot pages are now in sections judy arrays, like dirty

* align replication queries to optimal page size

* during flushing add to clean and evict in batches

* Revert "during flushing add to clean and evict in batches"

This reverts commit 8fb2b69d06.

* dont lock clean while evicting pages during flushing

* Revert "dont lock clean while evicting pages during flushing"

This reverts commit d6c82b5f40.

* Revert "Revert "during flushing add to clean and evict in batches""

This reverts commit ca7a187537.

* dont cross locks during flushing, for the fastest flushes possible

* low-priority queries load pages synchronously

* Revert "low-priority queries load pages synchronously"

This reverts commit 1ef2662ddc.

* cache uses spinlock again

* during flushing, dont lock the clean queue at all; each item is added atomically

* do smaller eviction runs

* evict one page at a time to minimize lock contention on the clean queue

* fix eviction statistics

* fix last commit

* plain should be main cache

* event loop cleanup; evictions and flushes can now happen concurrently

* run flush and evictions from tier0 only

* remove not needed variables

* flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time

* added worker jobs in timer to find the slow part of it

* support fast eviction of pages when all_of_them is set

* revert default thread stack size

* bypass event loop for dispatching read extent commands to workers - send them directly

* Revert "bypass event loop for dispatching read extent commands to workers - send them directly"

This reverts commit 2c08bc5bab.

* cache work requests

* minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors

* publish flushed pages to open cache in the thread pool

* prevent eventloop requests from getting stacked in the event loop

* single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c

* more rrdengine.c cleanup

* enable db rotation

* do not log when there is a filter

* do not run multiple migration to journal v2

* load all extents async

* fix wrong paste

* report opcodes waiting, works dispatched, works executing

* cleanup event loop memory every 10 minutes

* dont dispatch more work requests than the number of threads available

* use the dispatched counter instead of the executing counter to check if the worker thread pool is full

* remove UV_RUN_NOWAIT

* replication to fill the queues

* caching of extent buffers; code cleanup

* caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation

* single transaction wal

* synchronous flushing

* first cancel the threads, then signal them to exit

* caching of rrdeng query handles; added priority to query target; health is now low prio

* add priority to the missing points; do not allow critical priority in queries

* offload query preparation and routing to libuv thread pool

* updated timing charts for the offloaded query preparation

* caching of WALs

* accounting for struct caches (buffers); do not load extents with invalid sizes

* protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset

* also check if the expanded before is not the chart later updated time

* also check if the expanded before is not after the wall clock time of when the query started

* Remove unused variable

* replication to queue less queries; cleanup of internal fatals

* Mark dimension to be updated async

* caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol)

* disable pgc stress test, under an ifdef

* disable mrg stress test under an ifdef

* Mark chart and host labels, host info for async check and store in the database

* dictionary items use arrayalloc

* cache section pages structure is allocated with arrayalloc

* Add function to wakeup the aclk query threads and check for exit
Register function to be called during shutdown after signaling the service to exit

* parallel preparation of all dimensions of queries

* be more sensitive to enable streaming after replication

* atomically finish chart replication

* fix last commit

* fix last commit again

* fix last commit again again

* fix last commit again again again

* unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication

* do not cancel start streaming; use high priority queries when we have locked chart data collection

* prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered

* opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue

* Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore

* Fix bad memory allocation / cleanup

* Cleanup ACLK sync initialization (part 1)

* Don't update metric registry during shutdown (part 1)

* Prevent crash when dashboard is refreshed and host goes away

* Mark ctx that is shutting down.
Test not adding flushed pages to open cache as hot if we are shutting down

* make ML work

* Fix compile without NETDATA_INTERNAL_CHECKS

* shutdown each ctx independently

* fix completion of quiesce

* do not update shared ML charts

* Create ML charts on child hosts.

When a parent runs a ML for a child, the relevant-ML charts
should be created on the child host. These charts should use
the parent's hostname to differentiate multiple parents that might
run ML for a child.

The only exception to this rule is the training/prediction resource
usage charts. These are created on the localhost of the parent host,
because they provide information specific to said host.

* check new ml code

* first save the database, then free all memory

* dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db

* Cleanup metadata thread (part 2)

* increase refcount before dispatching prep command

* Do not try to stop anomaly detection threads twice.

A separate function call has been added to stop anomaly detection threads.
This commit removes the left over function calls that were made
internally when a host was being created/destroyed.

* Remove allocations when smoothing samples buffer

The number of dims per sample is always 1, ie. we are training and
predicting only individual dimensions.

* set the orphan flag when loading archived hosts

* track worker dispatch callbacks and threadpool worker init

* make ML threads joinable; mark ctx having flushing in progress as early as possible

* fix allocation counter

* Cleanup metadata thread (part 3)

* Cleanup metadata thread (part 4)

* Skip metadata host scan when running unittest

* unittest support during init

* dont use all the libuv threads for queries

* break an infinite loop when sleep_usec() is interrupted

* ml prediction is a collector for several charts

* sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit

* worker_unregister() in netdata threads cleanup

* moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h

* fixed ML issues

* removed engine2 directory

* added dbengine2 files in CMakeLists.txt

* move query plan data to query target, so that they can be exposed by in jsonwrap

* uniform definition of query plan according to the other query target members

* event_loop should be in daemon, not libnetdata

* metric_retention_by_uuid() is now part of the storage engine abstraction

* unify time_t variables to have the suffix _s (meaning: seconds)

* old dbengine statistics become "dbengine io"

* do not enable ML resource usage charts by default

* unify ml chart families, plugins and modules

* cleanup query plans from query target

* cleanup all extent buffers

* added debug info for rrddim slot to time

* rrddim now does proper gap management

* full rewrite of the mem modes

* use library functions for madvise

* use CHECKSUM_SZ for the checksum size

* fix coverity warning about the impossible case of returning a page that is entirely in the past of the query

* fix dbengine shutdown

* keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently

* fine tune cache evictions

* dont initialize health if the health service is not running - prevent crash on shutdown while children get connected

* rename AS threads to ACLK[hostname]

* prevent re-use of uninitialized memory in queries

* use JulyL instead of JudyL for PDC operations - to test it first

* add also JulyL files

* fix July memory accounting

* disable July for PDC (use Judy)

* use the function to remove datafiles from linked list

* fix july and event_loop

* add july to libnetdata subdirs

* rename time_t variables that end in _t to end in _s

* replicate when there is a gap at the beginning of the replication period

* reset postponing of sender connections when a receiver is connected

* Adjust update every properly

* fix replication infinite loop due to last change

* packed enums in rrd.h and cleanup of obsolete rrd structure members

* prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests()

* void unused variable

* void unused variables

* fix indentation

* entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps

* macros to caclulate page entries by time and size

* prevent statsd cleanup crash on exit

* cleanup health thread related variables

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: vkalintiris <vasilis@netdata.cloud>
2023-01-10 19:59:21 +02:00
Costa Tsaousis
cb7af25c09
RRD structures managed by dictionaries ()
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
Costa Tsaousis
5e1b95cf92
Deduplicate all netdata strings ()
* rrdfamily

* rrddim

* rrdset plugin and module names

* rrdset units

* rrdset type

* rrdset family

* rrdset title

* rrdset title more

* rrdset context

* rrdcalctemplate context and removal of context hash from rrdset

* strings statistics

* rrdset name

* rearranged members of rrdset

* eliminate rrdset name hash; rrdcalc chart converted to STRING

* rrdset id, eliminated rrdset hash

* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate

* rrdcalctemplate

* rrdvar

* eval_variable

* rrddimvar and rrdsetvar

* rrdhost hostname, os and tags

* fix master commits

* added thread cache; implemented string_dup without locks

* faster thread cache

* rrdset and rrddim now use dictionaries for indexing

* rrdhost now uses dictionary

* rrdfamily now uses DICTIONARY

* rrdvar using dictionary instead of AVL

* allocate the right size to rrdvar flag members

* rrdhost remaining char * members to STRING *

* better error handling on indexing

* strings now use a read/write lock to allow parallel searches to the index

* removed AVL support from dictionaries; implemented STRING with native Judy calls

* string releases should be negative

* only 31 bits are allowed for enum flags

* proper locking on strings

* string threading unittest and fixes

* fix lgtm finding

* fixed naming

* stream chart/dimension definitions at the beginning of a streaming session

* thread stack variable is undefined on thread cancel

* rrdcontext garbage collect per host on startup

* worker control in garbage collection

* relaxed deletion of rrdmetrics

* type checking on dictfe

* netdata chart to monitor rrdcontext triggers

* Group chart label updates

* rrdcontext better handling of collected rrdsets

* rrdpush incremental transmition of definitions should use as much buffer as possible

* require 1MB per chart

* empty the sender buffer before enabling metrics streaming

* fill up to 50% of buffer

* reset signaling metrics sending

* use the shared variable for status

* use separate host flag for enabling streaming of metrics

* make sure the flag is clear

* add logging for streaming

* add logging for streaming on buffer overflow

* circular_buffer proper sizing

* removed obsolete logs

* do not execute worker jobs if not necessary

* better messages about compression disabling

* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped

* monitor stream sender used buffer ratio

* Update exporting unit tests

* no need to compare label value with strcmp

* streaming send workers now monitor bandwidth

* workers now use strings

* streaming receiver monitors incoming bandwidth

* parser shift of worker ids

* minor fixes

* Group chart label updates

* Populate context with dimensions that have data

* Fix chart id

* better shift of parser worker ids

* fix for streaming compression

* properly count received bytes

* ensure LZ4 compression ring buffer does not wrap prematurely

* do not stream empty charts; do not process empty instances in rrdcontext

* need_to_send_chart_definition() does not need an rrdset lock any more

* rrdcontext objects are collected, after data have been written to the db

* better logging of RRDCONTEXT transitions

* always set all variables needed by the worker utilization charts

* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes

* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock

* STRING code re-organization for clarity

* thread_cache improvements; double numbers precision on worker threads

* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS

* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts

* Add index to speed up database context searches

* Removed last_updated optimization (was also buggy after latest merge with master)

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05 19:31:06 +03:00
Costa Tsaousis
7784a16cc7
Dictionary with JudyHS and double linked list ()
* dictionary internals isolation

* more dictionary cleanups

* added unit test

* we should use DICT internally

* disable cups in cmake

* implement DICTIONARY with Judy arrays

* operational JUDY implementation

* JUDY cleanup

* JUDY summary added

* JudyHS implementation with double linked list

* test negative searches too

* optimize destruction

* optimize set to insert first without lookup

* updated stats

* code cleanup; better organization; updated info

* more code cleanup and commenting

* more cleanup, renames and comments

* fix rename

* more cleanups

* use Judy.h from system paths

* added foreach traversal; added flag to add item in front; isolated locks to their own functions; destruction returns the number of bytes freed

* more comments; flags are now 16-bit

* completed unittesting

* addressed comments and added reference counters maintainance

* added unittest in main; tested removal of items in front, back and middle

* added read/write walkthrough and foreach; allowed walkthrough and foreach in write mode to delete the current element (used by cups.plugin); referenced counters removed from the API

* DICTFE.name should be const too

* added API calls for exposing all statistics

* dictionary flags as enum and reference counters as atomic operations

* more comments; improved error handling at unit tests

* added functions to allow unsafe access while traversing the dictionary with locks in place

* check for libcups in cmake

* added delete callback; implemented statsd with this dictionary

* added missing dfe_done()

* added alternative implementation with AVL

* added documentation

* added comments and warning about AVL

* dictionary walktrhough on new code

* simplified foreach; updated docs

* updated docs

* AVL is much faster without hashes

* AVL should follow DBENGINE
2022-06-01 20:01:52 +03:00
Stelios Fragkakis
92d48b1778
Return stable or nightly based on version if the file check fails () 2022-05-13 12:48:53 +03:00
Stelios Fragkakis
e9d59e37d9
Migrate metadata log to SQLite () 2020-11-24 20:00:02 +02:00
Stelios Fragkakis
eda12f579f
Implemented multihost database ()
* Hard code a node for non-legacy multidb test
Skip dbengine initialization for new incoming children
Add code to switch to multidb ctx when accessing the dbengine

* When a non-legacy streaming connection is detected, use the multidb metadata log context

* Clear the superblock memory to avoid random data written in the metadata log

* Activate the host detection during compaction
Activate the host detection during metadata log chart updates
Keep the host in the user object during replay of the HOST command

* Add defaults for health / rrdpush on HOST metadata replay
Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child()

Use defaults from the stream.conf

* Count hosts only if not archived
When host switches from archived to active update rrd_hosts_available
Remove archived hosts from charts and info

* Change parameter from "multidb disk space" to "dbengine multihost disk space"
Remove unused variables
Fix compilation error when dbengine is disabled
Fix condition for machine_guid directory creation under cache_dir

* Enable multidb disk space file creation.

* Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine.

* Fix old bug in the code that confused obsolete hosts with orphan hosts.

* Do not delete multi-host DB host files.

* Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies.

* Identify metadata that collide with non-dbengine memory mode hosts and ignore them.

* Handle non-dbengine localhost with dbengine archived charts in localhost and streaming.

* Ignore archived hosts in streaming.

* Add documentation before merging to master.

Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-07-28 15:04:39 +03:00
Stelios Fragkakis
1bd8a25544
Add support for persistent metadata ()
* Implemented collector metadata logging 
* Added persistent GUIDs for charts and dimensions
* Added metadata log replay and automatic compaction
* Added detection of charts with no active collector (archived)
* Added new endpoint to report archived charts via `/api/v1/archivedcharts`
* Added support for collector metadata update

Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-06-12 10:35:17 +03:00
Andrew Moss
c6d945200f
Merging the feature branch for the ACLK in the previous sprint. ()
* ACLK connection and protocol improvements ()
* Adding ACLK retry on connection failure ()
* Fixed reconnect issues on the ACLK. ()
* Cleaning up ACLK - part 1 ()

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2020-02-24 12:10:10 +01:00
Vladimir Kobal
8cf5889194
Clean up host labels in API responses ()
* Remove host labels from the Swagger specification

* Remove host labels from the api responses
2020-01-06 17:34:49 +02:00
Andrew Moss
c8c72f18a6
Labels issues ()
Initial work on host labels from the dedicated branch. Includes work for issues , , , , , , ,  and  by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
2019-12-16 15:12:00 +01:00
Jacek Kolasa
788fbb219a
sidebar-info update - DB engine ()
* remove "Netdata is using # MB of memory on HOSTNAME for # hour, # minutes, and # seconds of real-time history."

* Added "memory_mode" key to the get "charts" API call

* don't show db engine tip when user has it already installed

* add back hostname information

* add oxford comma (only for db-engine users)

* update main.js hash

* <b> --> <strong> (but only in sidebar info, main.js)
2019-09-12 15:35:44 +02:00
Chris Akritidis
8f36f5bcee
info API minor enhancements
Return 503 instead of 400 when netdata hasnt started yet, move struct definitions in .c, swagger update ()
2019-05-02 13:04:15 +03:00
Chris Akritidis
ca95332d55
Extend netdata info API call ()
* Add array of collector plugins-modules to api/v1/info

* Add system info to api/v1/info, collect data from separate script, use environment vars in anonymous statistics script
2019-04-18 18:17:03 +03:00
Chris Akritidis
88c6daad79
Correct version check in UI ()
* Correct version check in UI. Support stable and nightly release channel.
* Use github releases instead of latest versions, get nightlies from GCS
* Prevent cross-origin errors by using the google API
2019-02-20 19:56:44 +01:00
Costa Tsaousis
798c141c49
Split the API formatters in modules ()
* split all API formatters in modules

* added markdown formatting

* updated csv readme

* updated csv readme

* more documentation

* added more documentation

* updated documentation

* fixed typo

* fixed typo
2018-10-27 19:44:27 +03:00