0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-06 22:38:55 +00:00

New logging layer ()

* cleanup of logging - wip

* first working iteration

* add errno annotator

* replace old logging functions with netdata_logger()

* cleanup

* update error_limit

* fix remanining error_limit references

* work on fatal()

* started working on structured logs

* full cleanup

* default logging to files; fix all plugins initialization

* fix formatting of numbers

* cleanup and reorg

* fix coverity issues

* cleanup obsolete code

* fix formatting of numbers

* fix log rotation

* fix for older systems

* add detection of systemd journal via stderr

* finished on access.log

* remove left-over transport

* do not add empty fields to the logs

* journal get compact uuids; X-Transaction-ID header is added in web responses

* allow compiling on systems without memfd sealing

* added libnetdata/uuid directory

* move datetime formatters to libnetdata

* add missing files

* link the makefiles in libnetdata

* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses

* added stream receiver, sender, proc plugin and pluginsd log stack

* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup

* add message ids to streaming inbound and outbound connections

* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled

* send SIGABRT to external plugins only if we are not shutting down

* fix cross cleanup in pluginsd parser

* fatal when there is a stack error in logs

* compile netdata with -fexceptions

* do not kill external plugins with SIGABRT

* metasync info logs to debug level

* added severity to logs

* added json output; added options per log output; added documentation; fixed issues mentioned

* allow memfd only on linux

* moved journal low level functions to journal.c/h

* move health logs to daemon.log with proper priorities

* fixed a couple of bugs; health log in journal

* updated docs

* systemd-cat-native command to push structured logs to journal from the command line

* fix makefiles

* restored NETDATA_LOG_SEVERITY_LEVEL

* fix makefiles

* systemd-cat-native can also work as the logger of Netdata scripts

* do not require a socket to systemd-journal to log-as-netdata

* alarm notify logs in native format

* properly compare log ids

* fatals log alerts; alarm-notify.sh working

* fix overflow warning

* alarm-notify.sh now logs the request (command line)

* anotate external plugins logs with the function cmd they run

* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash

* reformatted alarm-notify logs

* unify cgroup-network-helper.sh

* added quotes around params

* charts.d.plugin switched logging to journal native

* quotes for logfmt

* unify the status codes of streaming receivers and senders

* alarm-notify: dont log anything, if there is nothing to do

* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available

* migrate cgroup-name.sh to new logging

* systemd-cat-native now supports messages with newlines

* socket.c logs use priority

* cleanup log field types

* inherit the systemd set INVOCATION_ID if found

* allow systemd-cat-native to send messages to a systemd-journal-remote URL

* log2journal command that can convert structured logs to journal export format

* various fixes and documentation of log2journal

* updated log2journal docs

* updated log2journal docs

* updated documentation of fields

* allow compiling without libcurl

* do not use socket as format string

* added version information to newly added tools

* updated documentation and help messages

* fix the namespace socket path

* print errno with error

* do not timeout

* updated docs

* updated docs

* updated docs

* log2journal updated docs and params

* when talking to a remote journal, systemd-cat-native batches the messages

* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote

* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"

This reverts commit b079d53c11.

* note about uncompressed traffic

* log2journal: code reorg and cleanup to make modular

* finished rewriting log2journal

* more comments

* rewriting rules support

* increased limits

* updated docs

* updated docs

* fix old log call

* use journal only when stderr is connected to journal

* update netdata.spec for libcurl, libpcre2 and log2journal

* pcre2-devel

* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse

* log2journal only on systems pcre2 is available

* ignore log2journal in .gitignore

* avoid log2journal on centos 7, amazonlinux 2 and opensuse

* add pcre2-8 to static build

* undo last commit

* Bundle to static

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add build deps for deb packages

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add dependencies; build from source

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Test build for amazon linux and centos expect to fail for suse

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* fix minor oversight

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Reorg code

* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

---------

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
This commit is contained in:
Costa Tsaousis 2023-11-22 08:27:25 +00:00 committed by GitHub
parent 8f31356a0c
commit 3e508c8f95
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
120 changed files with 8651 additions and 3037 deletions
.gitignoreMakefile.am
aclk
claim
cli
collectors
configure.ac
contrib/debian
daemon
database
exporting/aws_kinesis
health
libnetdata

2
.gitignore vendored
View file

@ -41,6 +41,8 @@ sha256sums.txt
# netdata binaries
netdata
netdatacli
systemd-cat-native
log2journal
!netdata/
upload/
artifacts/

View file

@ -128,6 +128,7 @@ AM_CFLAGS = \
$(OPTIONAL_CUPS_CFLAGS) \
$(OPTIONAL_XENSTAT_CFLAGS) \
$(OPTIONAL_BPF_CFLAGS) \
$(OPTIONAL_SYSTEMD_CFLAGS) \
$(OPTIONAL_GTEST_CFLAGS) \
$(NULL)
@ -145,12 +146,18 @@ LIBNETDATA_FILES = \
libnetdata/avl/avl.h \
libnetdata/buffer/buffer.c \
libnetdata/buffer/buffer.h \
libnetdata/buffered_reader/buffered_reader.c \
libnetdata/buffered_reader/buffered_reader.h \
libnetdata/circular_buffer/circular_buffer.c \
libnetdata/circular_buffer/circular_buffer.h \
libnetdata/clocks/clocks.c \
libnetdata/clocks/clocks.h \
libnetdata/completion/completion.c \
libnetdata/completion/completion.h \
libnetdata/datetime/iso8601.c \
libnetdata/datetime/iso8601.h \
libnetdata/datetime/rfc7231.c \
libnetdata/datetime/rfc7231.h \
libnetdata/dictionary/dictionary.c \
libnetdata/dictionary/dictionary.h \
libnetdata/eval/eval.c \
@ -167,8 +174,12 @@ LIBNETDATA_FILES = \
libnetdata/libnetdata.c \
libnetdata/libnetdata.h \
libnetdata/required_dummies.h \
libnetdata/line_splitter/line_splitter.c \
libnetdata/line_splitter/line_splitter.h \
libnetdata/locks/locks.c \
libnetdata/locks/locks.h \
libnetdata/log/journal.c \
libnetdata/log/journal.h \
libnetdata/log/log.c \
libnetdata/log/log.h \
libnetdata/onewayalloc/onewayalloc.c \
@ -195,6 +206,8 @@ LIBNETDATA_FILES = \
libnetdata/threads/threads.h \
libnetdata/url/url.c \
libnetdata/url/url.h \
libnetdata/uuid/uuid.c \
libnetdata/uuid/uuid.h \
libnetdata/json/json.c \
libnetdata/json/json.h \
libnetdata/json/jsmn.c \
@ -323,6 +336,16 @@ SYSTEMD_JOURNAL_PLUGIN_FILES = \
$(LIBNETDATA_FILES) \
$(NULL)
SYSTEMD_CAT_NATIVE_FILES = \
libnetdata/log/systemd-cat-native.c \
libnetdata/log/systemd-cat-native.h \
$(LIBNETDATA_FILES) \
$(NULL)
LOG2JOURNAL_FILES = \
libnetdata/log/log2journal.c \
$(NULL)
CUPS_PLUGIN_FILES = \
collectors/cups.plugin/cups_plugin.c \
$(LIBNETDATA_FILES) \
@ -1179,6 +1202,7 @@ NETDATA_COMMON_LIBS = \
$(OPTIONAL_MQTT_LIBS) \
$(OPTIONAL_UV_LIBS) \
$(OPTIONAL_LZ4_LIBS) \
$(OPTIONAL_CURL_LIBS) \
$(OPTIONAL_ZSTD_LIBS) \
$(OPTIONAL_BROTLIENC_LIBS) \
$(OPTIONAL_BROTLIDEC_LIBS) \
@ -1190,6 +1214,7 @@ NETDATA_COMMON_LIBS = \
$(OPTIONAL_YAML_LIBS) \
$(OPTIONAL_ATOMIC_LIBS) \
$(OPTIONAL_DL_LIBS) \
$(OPTIONAL_SYSTEMD_LIBS) \
$(OPTIONAL_GTEST_LIBS) \
$(NULL)
@ -1290,6 +1315,14 @@ if ENABLE_PLUGIN_FREEIPMI
$(NULL)
endif
if ENABLE_LOG2JOURNAL
sbin_PROGRAMS += log2journal
log2journal_SOURCES = $(LOG2JOURNAL_FILES)
log2journal_LDADD = \
$(OPTIONAL_PCRE2_LIBS) \
$(NULL)
endif
if ENABLE_PLUGIN_SYSTEMD_JOURNAL
plugins_PROGRAMS += systemd-journal.plugin
systemd_journal_plugin_SOURCES = $(SYSTEMD_JOURNAL_PLUGIN_FILES)
@ -1299,6 +1332,12 @@ if ENABLE_PLUGIN_SYSTEMD_JOURNAL
$(NULL)
endif
sbin_PROGRAMS += systemd-cat-native
systemd_cat_native_SOURCES = $(SYSTEMD_CAT_NATIVE_FILES)
systemd_cat_native_LDADD = \
$(NETDATA_COMMON_LIBS) \
$(NULL)
if ENABLE_PLUGIN_EBPF
plugins_PROGRAMS += ebpf.plugin
ebpf_plugin_SOURCES = $(EBPF_PLUGIN_FILES)

View file

@ -154,7 +154,9 @@ biofailed:
static int wait_till_cloud_enabled()
{
netdata_log_info("Waiting for Cloud to be enabled");
nd_log(NDLS_DAEMON, NDLP_INFO,
"Waiting for Cloud to be enabled");
while (!netdata_cloud_enabled) {
sleep_usec(USEC_PER_SEC * 1);
if (!service_running(SERVICE_ACLK))
@ -236,14 +238,19 @@ void aclk_mqtt_wss_log_cb(mqtt_wss_log_type_t log_type, const char* str)
case MQTT_WSS_LOG_WARN:
error_report("%s", str);
return;
case MQTT_WSS_LOG_INFO:
netdata_log_info("%s", str);
nd_log(NDLS_DAEMON, NDLP_INFO,
"%s",
str);
return;
case MQTT_WSS_LOG_DEBUG:
netdata_log_debug(D_ACLK, "%s", str);
return;
default:
netdata_log_error("Unknown log type from mqtt_wss");
nd_log(NDLS_DAEMON, NDLP_ERR,
"Unknown log type from mqtt_wss");
}
}
@ -297,7 +304,9 @@ static void puback_callback(uint16_t packet_id)
#endif
if (aclk_shared_state.mqtt_shutdown_msg_id == (int)packet_id) {
netdata_log_info("Shutdown message has been acknowledged by the cloud. Exiting gracefully");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Shutdown message has been acknowledged by the cloud. Exiting gracefully");
aclk_shared_state.mqtt_shutdown_msg_rcvd = 1;
}
}
@ -335,9 +344,11 @@ static int handle_connection(mqtt_wss_client client)
}
if (disconnect_req || aclk_kill_link) {
netdata_log_info("Going to restart connection due to disconnect_req=%s (cloud req), aclk_kill_link=%s (reclaim)",
disconnect_req ? "true" : "false",
aclk_kill_link ? "true" : "false");
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"Going to restart connection due to disconnect_req=%s (cloud req), aclk_kill_link=%s (reclaim)",
disconnect_req ? "true" : "false",
aclk_kill_link ? "true" : "false");
disconnect_req = 0;
aclk_kill_link = 0;
aclk_graceful_disconnect(client);
@ -390,7 +401,9 @@ static inline void mqtt_connected_actions(mqtt_wss_client client)
void aclk_graceful_disconnect(mqtt_wss_client client)
{
netdata_log_info("Preparing to gracefully shutdown ACLK connection");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Preparing to gracefully shutdown ACLK connection");
aclk_queue_lock();
aclk_queue_flush();
@ -403,17 +416,22 @@ void aclk_graceful_disconnect(mqtt_wss_client client)
break;
}
if (aclk_shared_state.mqtt_shutdown_msg_rcvd) {
netdata_log_info("MQTT App Layer `disconnect` message sent successfully");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"MQTT App Layer `disconnect` message sent successfully");
break;
}
}
netdata_log_info("ACLK link is down");
netdata_log_access("ACLK DISCONNECTED");
nd_log(NDLS_DAEMON, NDLP_WARNING, "ACLK link is down");
nd_log(NDLS_ACCESS, NDLP_WARNING, "ACLK DISCONNECTED");
aclk_stats_upd_online(0);
last_disconnect_time = now_realtime_sec();
aclk_connected = 0;
netdata_log_info("Attempting to gracefully shutdown the MQTT/WSS connection");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Attempting to gracefully shutdown the MQTT/WSS connection");
mqtt_wss_disconnect(client, 1000);
}
@ -455,7 +473,9 @@ static int aclk_block_till_recon_allowed() {
next_connection_attempt = now_realtime_sec() + (recon_delay / MSEC_PER_SEC);
last_backoff_value = (float)recon_delay / MSEC_PER_SEC;
netdata_log_info("Wait before attempting to reconnect in %.3f seconds", recon_delay / (float)MSEC_PER_SEC);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Wait before attempting to reconnect in %.3f seconds", recon_delay / (float)MSEC_PER_SEC);
// we want to wake up from time to time to check netdata_exit
while (recon_delay)
{
@ -593,7 +613,9 @@ static int aclk_attempt_to_connect(mqtt_wss_client client)
return 1;
}
netdata_log_info("Attempting connection now");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Attempting connection now");
memset(&base_url, 0, sizeof(url_t));
if (url_parse(aclk_cloud_base_url, &base_url)) {
aclk_status = ACLK_STATUS_INVALID_CLOUD_URL;
@ -680,7 +702,9 @@ static int aclk_attempt_to_connect(mqtt_wss_client client)
error_report("Can't use encoding=proto without at least \"proto\" capability.");
continue;
}
netdata_log_info("New ACLK protobuf protocol negotiated successfully (/env response).");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"New ACLK protobuf protocol negotiated successfully (/env response).");
memset(&auth_url, 0, sizeof(url_t));
if (url_parse(aclk_env->auth_endpoint, &auth_url)) {
@ -750,9 +774,9 @@ static int aclk_attempt_to_connect(mqtt_wss_client client)
if (!ret) {
last_conn_time_mqtt = now_realtime_sec();
netdata_log_info("ACLK connection successfully established");
nd_log(NDLS_DAEMON, NDLP_INFO, "ACLK connection successfully established");
aclk_status = ACLK_STATUS_CONNECTED;
netdata_log_access("ACLK CONNECTED");
nd_log(NDLS_ACCESS, NDLP_INFO, "ACLK CONNECTED");
mqtt_connected_actions(client);
return 0;
}
@ -798,7 +822,9 @@ void *aclk_main(void *ptr)
netdata_thread_disable_cancelability();
#if defined( DISABLE_CLOUD ) || !defined( ENABLE_ACLK )
netdata_log_info("Killing ACLK thread -> cloud functionality has been disabled");
nd_log(NDLS_DAEMON, NDLP_INFO,
"Killing ACLK thread -> cloud functionality has been disabled");
static_thread->enabled = NETDATA_MAIN_THREAD_EXITED;
return NULL;
#endif
@ -857,7 +883,7 @@ void *aclk_main(void *ptr)
aclk_stats_upd_online(0);
last_disconnect_time = now_realtime_sec();
aclk_connected = 0;
netdata_log_access("ACLK DISCONNECTED");
nd_log(NDLS_ACCESS, NDLP_WARNING, "ACLK DISCONNECTED");
}
} while (service_running(SERVICE_ACLK));
@ -924,7 +950,9 @@ void aclk_host_state_update(RRDHOST *host, int cmd)
rrdhost_aclk_state_unlock(localhost);
create_query->data.bin_payload.topic = ACLK_TOPICID_CREATE_NODE;
create_query->data.bin_payload.msg_name = "CreateNodeInstance";
netdata_log_info("Registering host=%s, hops=%u", host->machine_guid, host->system_info->hops);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Registering host=%s, hops=%u", host->machine_guid, host->system_info->hops);
aclk_queue_query(create_query);
return;
}
@ -947,8 +975,10 @@ void aclk_host_state_update(RRDHOST *host, int cmd)
query->data.bin_payload.payload = generate_node_instance_connection(&query->data.bin_payload.size, &node_state_update);
rrdhost_aclk_state_unlock(localhost);
netdata_log_info("Queuing status update for node=%s, live=%d, hops=%u",(char*)node_state_update.node_id, cmd,
host->system_info->hops);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Queuing status update for node=%s, live=%d, hops=%u",
(char*)node_state_update.node_id, cmd, host->system_info->hops);
freez((void*)node_state_update.node_id);
query->data.bin_payload.msg_name = "UpdateNodeInstanceConnection";
query->data.bin_payload.topic = ACLK_TOPICID_NODE_CONN;
@ -990,9 +1020,10 @@ void aclk_send_node_instances()
node_state_update.claim_id = localhost->aclk_state.claimed_id;
query->data.bin_payload.payload = generate_node_instance_connection(&query->data.bin_payload.size, &node_state_update);
rrdhost_aclk_state_unlock(localhost);
netdata_log_info("Queuing status update for node=%s, live=%d, hops=%d",(char*)node_state_update.node_id,
list->live,
list->hops);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Queuing status update for node=%s, live=%d, hops=%d",
(char*)node_state_update.node_id, list->live, list->hops);
freez((void*)node_state_update.capabilities);
freez((void*)node_state_update.node_id);
@ -1014,8 +1045,11 @@ void aclk_send_node_instances()
node_instance_creation.claim_id = localhost->aclk_state.claimed_id,
create_query->data.bin_payload.payload = generate_node_instance_creation(&create_query->data.bin_payload.size, &node_instance_creation);
rrdhost_aclk_state_unlock(localhost);
netdata_log_info("Queuing registration for host=%s, hops=%d",(char*)node_instance_creation.machine_guid,
list->hops);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Queuing registration for host=%s, hops=%d",
(char*)node_instance_creation.machine_guid, list->hops);
freez((void *)node_instance_creation.machine_guid);
aclk_queue_query(create_query);
}

View file

@ -90,6 +90,12 @@ static bool aclk_web_client_interrupt_cb(struct web_client *w __maybe_unused, vo
}
static int http_api_v2(struct aclk_query_thread *query_thr, aclk_query_t query) {
ND_LOG_STACK lgs[] = {
ND_LOG_FIELD_TXT(NDF_SRC_TRANSPORT, "aclk"),
ND_LOG_FIELD_END(),
};
ND_LOG_STACK_PUSH(lgs);
int retval = 0;
BUFFER *local_buffer = NULL;
size_t size = 0;
@ -110,7 +116,7 @@ static int http_api_v2(struct aclk_query_thread *query_thr, aclk_query_t query)
usec_t t;
web_client_timeout_checkpoint_set(w, query->timeout);
if(web_client_timeout_checkpoint_and_check(w, &t)) {
netdata_log_access("QUERY CANCELED: QUEUE TIME EXCEEDED %llu ms (LIMIT %d ms)", t / USEC_PER_MS, query->timeout);
nd_log(NDLS_ACCESS, NDLP_ERR, "QUERY CANCELED: QUEUE TIME EXCEEDED %llu ms (LIMIT %d ms)", t / USEC_PER_MS, query->timeout);
retval = 1;
w->response.code = HTTP_RESP_SERVICE_UNAVAILABLE;
aclk_http_msg_v2_err(query_thr->client, query->callback_topic, query->msg_id, w->response.code, CLOUD_EC_SND_TIMEOUT, CLOUD_EMSG_SND_TIMEOUT, NULL, 0);
@ -217,25 +223,8 @@ static int http_api_v2(struct aclk_query_thread *query_thr, aclk_query_t query)
// send msg.
w->response.code = aclk_http_msg_v2(query_thr->client, query->callback_topic, query->msg_id, t, query->created, w->response.code, local_buffer->buffer, local_buffer->len);
struct timeval tv;
cleanup:
now_monotonic_high_precision_timeval(&tv);
netdata_log_access("%llu: %d '[ACLK]:%d' '%s' (sent/all = %zu/%zu bytes %0.0f%%, prep/sent/total = %0.2f/%0.2f/%0.2f ms) %d '%s'",
w->id
, gettid()
, query_thr->idx
, "DATA"
, sent
, size
, size > sent ? -(((size - sent) / (double)size) * 100.0) : ((size > 0) ? (((sent - size ) / (double)size) * 100.0) : 0.0)
, dt_usec(&w->timings.tv_ready, &w->timings.tv_in) / 1000.0
, dt_usec(&tv, &w->timings.tv_ready) / 1000.0
, dt_usec(&tv, &w->timings.tv_in) / 1000.0
, w->response.code
, strip_control_characters((char *)buffer_tostring(w->url_as_received))
);
web_client_log_completed_request(w, false);
web_client_release_to_cache(w);
pending_req_list_rm(query->msg_id);

View file

@ -455,7 +455,7 @@ int cancel_pending_req(const char *msg, size_t msg_len)
return 1;
}
netdata_log_access("ACLK CancelPendingRequest REQ: %s, cloud trace-id: %s", cmd.request_id, cmd.trace_id);
nd_log(NDLS_ACCESS, NDLP_NOTICE, "ACLK CancelPendingRequest REQ: %s, cloud trace-id: %s", cmd.request_id, cmd.trace_id);
if (mark_pending_req_cancelled(cmd.request_id))
error_report("CancelPending Request for %s failed. No such pending request.", cmd.request_id);

View file

@ -323,11 +323,11 @@ static bool check_claim_param(const char *s) {
}
void claim_reload_all(void) {
error_log_limit_unlimited();
nd_log_limits_unlimited();
load_claiming_state();
registry_update_cloud_base_url();
rrdpush_send_claimed_id(localhost);
error_log_limit_reset();
nd_log_limits_reset();
}
int api_v2_claim(struct web_client *w, char *url) {

View file

@ -3,25 +3,18 @@
#include "cli.h"
#include "daemon/pipename.h"
void error_int(int is_collector __maybe_unused, const char *prefix __maybe_unused, const char *file __maybe_unused, const char *function __maybe_unused, const unsigned long line __maybe_unused, const char *fmt, ... ) {
FILE *fp = stderr;
void netdata_logger(ND_LOG_SOURCES source, ND_LOG_FIELD_PRIORITY priority, const char *file, const char *function, unsigned long line, const char *fmt, ... ) {
va_list args;
va_start( args, fmt );
vfprintf(fp, fmt, args );
va_end( args );
va_start(args, fmt);
vfprintf(stderr, fmt, args );
va_end(args);
}
#ifdef NETDATA_INTERNAL_CHECKS
uint64_t debug_flags;
void debug_int( const char *file __maybe_unused , const char *function __maybe_unused , const unsigned long line __maybe_unused, const char *fmt __maybe_unused, ... )
{
}
void fatal_int( const char *file __maybe_unused, const char *function __maybe_unused, const unsigned long line __maybe_unused, const char *fmt __maybe_unused, ... )
void netdata_logger_fatal( const char *file __maybe_unused, const char *function __maybe_unused, const unsigned long line __maybe_unused, const char *fmt __maybe_unused, ... )
{
abort();
};

View file

@ -5234,25 +5234,11 @@ static void function_processes(const char *transaction, char *function __maybe_u
static bool apps_plugin_exit = false;
int main(int argc, char **argv) {
// debug_flags = D_PROCFILE;
stderror = stderr;
clocks_init();
nd_log_initialize_for_external_plugins("apps.plugin");
pagesize = (size_t)sysconf(_SC_PAGESIZE);
// set the name for logging
program_name = "apps.plugin";
// disable syslog for apps.plugin
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
bool send_resource_usage = true;
{
const char *s = getenv("NETDATA_INTERNALS_MONITORING");

View file

@ -12,57 +12,106 @@
export PATH="${PATH}:/sbin:/usr/sbin:/usr/local/sbin"
export LC_ALL=C
cmd_line="'${0}' $(printf "'%s' " "${@}")"
# -----------------------------------------------------------------------------
# logging
PROGRAM_NAME="$(basename "${0}")"
LOG_LEVEL_ERR=1
LOG_LEVEL_WARN=2
LOG_LEVEL_INFO=3
LOG_LEVEL="$LOG_LEVEL_INFO"
# these should be the same with syslog() priorities
NDLP_EMERG=0 # system is unusable
NDLP_ALERT=1 # action must be taken immediately
NDLP_CRIT=2 # critical conditions
NDLP_ERR=3 # error conditions
NDLP_WARN=4 # warning conditions
NDLP_NOTICE=5 # normal but significant condition
NDLP_INFO=6 # informational
NDLP_DEBUG=7 # debug-level messages
set_log_severity_level() {
case ${NETDATA_LOG_SEVERITY_LEVEL,,} in
"info") LOG_LEVEL="$LOG_LEVEL_INFO";;
"warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";;
"err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";;
# the max (numerically) log level we will log
LOG_LEVEL=$NDLP_INFO
set_log_min_priority() {
case "${NETDATA_LOG_PRIORITY_LEVEL,,}" in
"emerg" | "emergency")
LOG_LEVEL=$NDLP_EMERG
;;
"alert")
LOG_LEVEL=$NDLP_ALERT
;;
"crit" | "critical")
LOG_LEVEL=$NDLP_CRIT
;;
"err" | "error")
LOG_LEVEL=$NDLP_ERR
;;
"warn" | "warning")
LOG_LEVEL=$NDLP_WARN
;;
"notice")
LOG_LEVEL=$NDLP_NOTICE
;;
"info")
LOG_LEVEL=$NDLP_INFO
;;
"debug")
LOG_LEVEL=$NDLP_DEBUG
;;
esac
}
set_log_severity_level
logdate() {
date "+%Y-%m-%d %H:%M:%S"
}
set_log_min_priority
log() {
local status="${1}"
shift
local level="${1}"
shift 1
echo >&2 "$(logdate): ${PROGRAM_NAME}: ${status}: ${*}"
[[ -n "$level" && -n "$LOG_LEVEL" && "$level" -gt "$LOG_LEVEL" ]] && return
systemd-cat-native --log-as-netdata --newline="{NEWLINE}" <<EOFLOG
INVOCATION_ID=${NETDATA_INVOCATION_ID}
SYSLOG_IDENTIFIER=${PROGRAM_NAME}
PRIORITY=${level}
THREAD_TAG="cgroup-name"
ND_LOG_SOURCE=collector
ND_REQUEST=${cmd_line}
MESSAGE=${*//[$'\r\n']/{NEWLINE}}
EOFLOG
# AN EMPTY LINE IS NEEDED ABOVE
}
info() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return
log INFO "${@}"
log "$NDLP_INFO" "${@}"
}
warning() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return
log WARNING "${@}"
log "$NDLP_WARN" "${@}"
}
error() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return
log ERROR "${@}"
log "$NDLP_ERR" "${@}"
}
fatal() {
log FATAL "${@}"
log "$NDLP_ALERT" "${@}"
exit 1
}
debug() {
log "$NDLP_DEBUG" "${@}"
}
# -----------------------------------------------------------------------------
function parse_docker_like_inspect_output() {
local output="${1}"
eval "$(grep -E "^(NOMAD_NAMESPACE|NOMAD_JOB_NAME|NOMAD_TASK_NAME|NOMAD_SHORT_ALLOC_ID|CONT_NAME|IMAGE_NAME)=" <<<"$output")"

View file

@ -29,65 +29,117 @@
export LC_ALL=C
cmd_line="'${0}' $(printf "'%s' " "${@}")"
# -----------------------------------------------------------------------------
# logging
PROGRAM_NAME="$(basename "${0}")"
LOG_LEVEL_ERR=1
LOG_LEVEL_WARN=2
LOG_LEVEL_INFO=3
LOG_LEVEL="$LOG_LEVEL_INFO"
# these should be the same with syslog() priorities
NDLP_EMERG=0 # system is unusable
NDLP_ALERT=1 # action must be taken immediately
NDLP_CRIT=2 # critical conditions
NDLP_ERR=3 # error conditions
NDLP_WARN=4 # warning conditions
NDLP_NOTICE=5 # normal but significant condition
NDLP_INFO=6 # informational
NDLP_DEBUG=7 # debug-level messages
set_log_severity_level() {
case ${NETDATA_LOG_SEVERITY_LEVEL,,} in
"info") LOG_LEVEL="$LOG_LEVEL_INFO";;
"warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";;
"err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";;
# the max (numerically) log level we will log
LOG_LEVEL=$NDLP_INFO
set_log_min_priority() {
case "${NETDATA_LOG_PRIORITY_LEVEL,,}" in
"emerg" | "emergency")
LOG_LEVEL=$NDLP_EMERG
;;
"alert")
LOG_LEVEL=$NDLP_ALERT
;;
"crit" | "critical")
LOG_LEVEL=$NDLP_CRIT
;;
"err" | "error")
LOG_LEVEL=$NDLP_ERR
;;
"warn" | "warning")
LOG_LEVEL=$NDLP_WARN
;;
"notice")
LOG_LEVEL=$NDLP_NOTICE
;;
"info")
LOG_LEVEL=$NDLP_INFO
;;
"debug")
LOG_LEVEL=$NDLP_DEBUG
;;
esac
}
set_log_severity_level
logdate() {
date "+%Y-%m-%d %H:%M:%S"
}
set_log_min_priority
log() {
local status="${1}"
shift
local level="${1}"
shift 1
echo >&2 "$(logdate): ${PROGRAM_NAME}: ${status}: ${*}"
[[ -n "$level" && -n "$LOG_LEVEL" && "$level" -gt "$LOG_LEVEL" ]] && return
systemd-cat-native --log-as-netdata --newline="{NEWLINE}" <<EOFLOG
INVOCATION_ID=${NETDATA_INVOCATION_ID}
SYSLOG_IDENTIFIER=${PROGRAM_NAME}
PRIORITY=${level}
THREAD_TAG="cgroup-network-helper.sh"
ND_LOG_SOURCE=collector
ND_REQUEST=${cmd_line}
MESSAGE=${*//[$'\r\n']/{NEWLINE}}
EOFLOG
# AN EMPTY LINE IS NEEDED ABOVE
}
info() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return
log INFO "${@}"
log "$NDLP_INFO" "${@}"
}
warning() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return
log WARNING "${@}"
log "$NDLP_WARN" "${@}"
}
error() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return
log ERROR "${@}"
log "$NDLP_ERR" "${@}"
}
fatal() {
log FATAL "${@}"
exit 1
log "$NDLP_ALERT" "${@}"
exit 1
}
debug=${NETDATA_CGROUP_NETWORK_HELPER_DEBUG=0}
debug() {
[ "${debug}" = "1" ] && log DEBUG "${@}"
log "$NDLP_DEBUG" "${@}"
}
debug=0
if [ "${NETDATA_CGROUP_NETWORK_HELPER_DEBUG-0}" = "1" ]; then
debug=1
LOG_LEVEL=$NDLP_DEBUG
fi
# -----------------------------------------------------------------------------
# check for BASH v4+ (required for associative arrays)
[ $(( BASH_VERSINFO[0] )) -lt 4 ] && \
fatal "BASH version 4 or later is required (this is ${BASH_VERSION})."
if [ ${BASH_VERSINFO[0]} -lt 4 ]; then
echo >&2 "BASH version 4 or later is required (this is ${BASH_VERSION})."
exit 1
fi
# -----------------------------------------------------------------------------
# parse the arguments
@ -99,7 +151,10 @@ do
case "${1}" in
--cgroup) cgroup="${2}"; shift 1;;
--pid|-p) pid="${2}"; shift 1;;
--debug|debug) debug=1;;
--debug|debug)
debug=1
LOG_LEVEL=$NDLP_DEBUG
;;
*) fatal "Cannot understand argument '${1}'";;
esac

View file

@ -649,12 +649,11 @@ void usage(void) {
}
int main(int argc, char **argv) {
stderror = stderr;
pid_t pid = 0;
program_name = argv[0];
program_version = VERSION;
error_log_syslog = 0;
clocks_init();
nd_log_initialize_for_external_plugins("cgroup-network");
// since cgroup-network runs as root, prevent it from opening symbolic links
procfile_open_flags = O_RDONLY|O_NOFOLLOW;
@ -687,8 +686,6 @@ int main(int argc, char **argv) {
if(argc != 3)
usage();
log_set_global_severity_for_external_plugins();
int arg = 1;
int helper = 1;

View file

@ -16,24 +16,111 @@
export PATH="${PATH}:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin"
PROGRAM_FILE="$0"
PROGRAM_NAME="$(basename $0)"
PROGRAM_NAME="${PROGRAM_NAME/.plugin/}"
MODULE_NAME="main"
LOG_LEVEL_ERR=1
LOG_LEVEL_WARN=2
LOG_LEVEL_INFO=3
LOG_LEVEL="$LOG_LEVEL_INFO"
# -----------------------------------------------------------------------------
# logging
set_log_severity_level() {
case ${NETDATA_LOG_SEVERITY_LEVEL,,} in
"info") LOG_LEVEL="$LOG_LEVEL_INFO";;
"warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";;
"err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";;
PROGRAM_NAME="$(basename "${0}")"
# these should be the same with syslog() priorities
NDLP_EMERG=0 # system is unusable
NDLP_ALERT=1 # action must be taken immediately
NDLP_CRIT=2 # critical conditions
NDLP_ERR=3 # error conditions
NDLP_WARN=4 # warning conditions
NDLP_NOTICE=5 # normal but significant condition
NDLP_INFO=6 # informational
NDLP_DEBUG=7 # debug-level messages
# the max (numerically) log level we will log
LOG_LEVEL=$NDLP_INFO
set_log_min_priority() {
case "${NETDATA_LOG_PRIORITY_LEVEL,,}" in
"emerg" | "emergency")
LOG_LEVEL=$NDLP_EMERG
;;
"alert")
LOG_LEVEL=$NDLP_ALERT
;;
"crit" | "critical")
LOG_LEVEL=$NDLP_CRIT
;;
"err" | "error")
LOG_LEVEL=$NDLP_ERR
;;
"warn" | "warning")
LOG_LEVEL=$NDLP_WARN
;;
"notice")
LOG_LEVEL=$NDLP_NOTICE
;;
"info")
LOG_LEVEL=$NDLP_INFO
;;
"debug")
LOG_LEVEL=$NDLP_DEBUG
;;
esac
}
set_log_severity_level
set_log_min_priority
log() {
local level="${1}"
shift 1
[[ -n "$level" && -n "$LOG_LEVEL" && "$level" -gt "$LOG_LEVEL" ]] && return
systemd-cat-native --log-as-netdata <<EOFLOG
INVOCATION_ID=${NETDATA_INVOCATION_ID}
SYSLOG_IDENTIFIER=${PROGRAM_NAME}
PRIORITY=${level}
THREAD_TAG="charts.d.plugin"
ND_LOG_SOURCE=collector
MESSAGE=${MODULE_NAME}: ${*//[$'\r\n']}
EOFLOG
# AN EMPTY LINE IS NEEDED ABOVE
}
info() {
log "$NDLP_INFO" "${@}"
}
warning() {
log "$NDLP_WARN" "${@}"
}
error() {
log "$NDLP_ERR" "${@}"
}
fatal() {
log "$NDLP_ALERT" "${@}"
echo "DISABLE"
exit 1
}
debug() {
[ "$debug" = "1" ] && log "$NDLP_DEBUG" "${@}"
}
# -----------------------------------------------------------------------------
# check for BASH v4+ (required for associative arrays)
if [ ${BASH_VERSINFO[0]} -lt 4 ]; then
echo >&2 "BASH version 4 or later is required (this is ${BASH_VERSION})."
exit 1
fi
# -----------------------------------------------------------------------------
# create temp dir
@ -62,39 +149,6 @@ logdate() {
date "+%Y-%m-%d %H:%M:%S"
}
log() {
local status="${1}"
shift
echo >&2 "$(logdate): ${PROGRAM_NAME}: ${status}: ${MODULE_NAME}: ${*}"
}
info() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return
log INFO "${@}"
}
warning() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return
log WARNING "${@}"
}
error() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return
log ERROR "${@}"
}
fatal() {
log FATAL "${@}"
echo "DISABLE"
exit 1
}
debug() {
[ $debug -eq 1 ] && log DEBUG "${@}"
}
# -----------------------------------------------------------------------------
# check a few commands
@ -194,12 +248,14 @@ while [ ! -z "$1" ]; do
if [ "$1" = "debug" -o "$1" = "all" ]; then
debug=1
LOG_LEVEL=$NDLP_DEBUG
shift
continue
fi
if [ -f "$chartsd/$1.chart.sh" ]; then
debug=1
LOG_LEVEL=$NDLP_DEBUG
chart_only="$(echo $1.chart.sh | sed "s/\.chart\.sh$//g")"
shift
continue
@ -207,6 +263,7 @@ while [ ! -z "$1" ]; do
if [ -f "$chartsd/$1" ]; then
debug=1
LOG_LEVEL=$NDLP_DEBUG
chart_only="$(echo $1 | sed "s/\.chart\.sh$//g")"
shift
continue

View file

@ -226,22 +226,8 @@ void reset_metrics() {
}
int main(int argc, char **argv) {
stderror = stderr;
clocks_init();
// ------------------------------------------------------------------------
// initialization of netdata plugin
program_name = "cups.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
nd_log_initialize_for_external_plugins("cups.plugin");
parse_command_line(argc, argv);

View file

@ -159,16 +159,8 @@ static void debugfs_parse_args(int argc, char **argv)
int main(int argc, char **argv)
{
// debug_flags = D_PROCFILE;
stderror = stderr;
// set the name for logging
program_name = "debugfs.plugin";
// disable syslog for debugfs.plugin
error_log_syslog = 0;
log_set_global_severity_for_external_plugins();
clocks_init();
nd_log_initialize_for_external_plugins("debugfs.plugin");
netdata_configured_host_prefix = getenv("NETDATA_HOST_PREFIX");
if (verify_netdata_host_prefix() == -1)

View file

@ -4024,11 +4024,9 @@ static void ebpf_manage_pid(pid_t pid)
*/
int main(int argc, char **argv)
{
stderror = stderr;
log_set_global_severity_for_external_plugins();
clocks_init();
nd_log_initialize_for_external_plugins("ebpf.plugin");
main_thread_id = gettid();
set_global_variables();
@ -4038,16 +4036,6 @@ int main(int argc, char **argv)
if (ebpf_check_conditions())
return 2;
// set name
program_name = "ebpf.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
if (ebpf_adjust_memory_limit())
return 3;

View file

@ -1622,30 +1622,14 @@ static void plugin_exit(int code) {
}
int main (int argc, char **argv) {
clocks_init();
nd_log_initialize_for_external_plugins("freeipmi.plugin");
netdata_threads_init_for_external_plugins(0); // set the default threads stack size here
bool netdata_do_sel = IPMI_ENABLE_SEL_BY_DEFAULT;
stderror = stderr;
clocks_init();
bool debug = false;
// ------------------------------------------------------------------------
// initialization of netdata plugin
program_name = "freeipmi.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
// initialize the threads
netdata_threads_init_for_external_plugins(0); // set the default threads stack size here
// ------------------------------------------------------------------------
// parse command line parameters

View file

@ -747,22 +747,8 @@ void nfacct_signals()
}
int main(int argc, char **argv) {
stderror = stderr;
clocks_init();
// ------------------------------------------------------------------------
// initialization of netdata plugin
program_name = "nfacct.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
nd_log_initialize_for_external_plugins("nfacct.plugin");
// ------------------------------------------------------------------------
// parse command line parameters

View file

@ -1283,22 +1283,8 @@ void parse_command_line(int argc, char **argv) {
}
int main(int argc, char **argv) {
stderror = stderr;
clocks_init();
// ------------------------------------------------------------------------
// initialization of netdata plugin
program_name = "perf.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
nd_log_initialize_for_external_plugins("perf.plugin");
parse_command_line(argc, argv);

View file

@ -47,8 +47,7 @@ static inline bool plugin_is_running(struct plugind *cd) {
return ret;
}
static void pluginsd_worker_thread_cleanup(void *arg)
{
static void pluginsd_worker_thread_cleanup(void *arg) {
struct plugind *cd = (struct plugind *)arg;
worker_unregister();
@ -143,41 +142,64 @@ static void *pluginsd_worker_thread(void *arg) {
netdata_thread_cleanup_push(pluginsd_worker_thread_cleanup, arg);
struct plugind *cd = (struct plugind *)arg;
plugin_set_running(cd);
{
struct plugind *cd = (struct plugind *) arg;
plugin_set_running(cd);
size_t count = 0;
size_t count = 0;
while (service_running(SERVICE_COLLECTORS)) {
FILE *fp_child_input = NULL;
FILE *fp_child_output = netdata_popen(cd->cmd, &cd->unsafe.pid, &fp_child_input);
while(service_running(SERVICE_COLLECTORS)) {
FILE *fp_child_input = NULL;
FILE *fp_child_output = netdata_popen(cd->cmd, &cd->unsafe.pid, &fp_child_input);
if (unlikely(!fp_child_input || !fp_child_output)) {
netdata_log_error("PLUGINSD: 'host:%s', cannot popen(\"%s\", \"r\").", rrdhost_hostname(cd->host), cd->cmd);
break;
}
if(unlikely(!fp_child_input || !fp_child_output)) {
netdata_log_error("PLUGINSD: 'host:%s', cannot popen(\"%s\", \"r\").",
rrdhost_hostname(cd->host), cd->cmd);
break;
}
netdata_log_info("PLUGINSD: 'host:%s' connected to '%s' running on pid %d",
rrdhost_hostname(cd->host), cd->fullfilename, cd->unsafe.pid);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"PLUGINSD: 'host:%s' connected to '%s' running on pid %d",
rrdhost_hostname(cd->host),
cd->fullfilename, cd->unsafe.pid);
count = pluginsd_process(cd->host, cd, fp_child_input, fp_child_output, 0);
const char *plugin = strrchr(cd->fullfilename, '/');
if(plugin)
plugin++;
else
plugin = cd->fullfilename;
netdata_log_info("PLUGINSD: 'host:%s', '%s' (pid %d) disconnected after %zu successful data collections (ENDs).",
rrdhost_hostname(cd->host), cd->fullfilename, cd->unsafe.pid, count);
char module[100];
snprintfz(module, sizeof(module), "plugins.d[%s]", plugin);
ND_LOG_STACK lgs[] = {
ND_LOG_FIELD_TXT(NDF_MODULE, module),
ND_LOG_FIELD_TXT(NDF_NIDL_NODE, rrdhost_hostname(cd->host)),
ND_LOG_FIELD_TXT(NDF_SRC_TRANSPORT, "pluginsd"),
ND_LOG_FIELD_END(),
};
ND_LOG_STACK_PUSH(lgs);
killpid(cd->unsafe.pid);
count = pluginsd_process(cd->host, cd, fp_child_input, fp_child_output, 0);
int worker_ret_code = netdata_pclose(fp_child_input, fp_child_output, cd->unsafe.pid);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"PLUGINSD: 'host:%s', '%s' (pid %d) disconnected after %zu successful data collections (ENDs).",
rrdhost_hostname(cd->host), cd->fullfilename, cd->unsafe.pid, count);
if (likely(worker_ret_code == 0))
pluginsd_worker_thread_handle_success(cd);
else
pluginsd_worker_thread_handle_error(cd, worker_ret_code);
killpid(cd->unsafe.pid);
cd->unsafe.pid = 0;
if (unlikely(!plugin_is_enabled(cd)))
break;
}
int worker_ret_code = netdata_pclose(fp_child_input, fp_child_output, cd->unsafe.pid);
if(likely(worker_ret_code == 0))
pluginsd_worker_thread_handle_success(cd);
else
pluginsd_worker_thread_handle_error(cd, worker_ret_code);
cd->unsafe.pid = 0;
if(unlikely(!plugin_is_enabled(cd)))
break;
}
}
netdata_thread_cleanup_pop(1);
return NULL;

View file

@ -21,8 +21,6 @@
#define PLUGINSD_KEYWORD_REPORT_JOB_STATUS "REPORT_JOB_STATUS"
#define PLUGINSD_KEYWORD_DELETE_JOB "DELETE_JOB"
#define PLUGINSD_MAX_WORDS 30
#define PLUGINSD_MAX_DIRECTORIES 20
extern char *plugin_directories[PLUGINSD_MAX_DIRECTORIES];

View file

@ -153,11 +153,12 @@ static inline bool pluginsd_set_scope_chart(PARSER *parser, RRDSET *st, const ch
if(unlikely(old_collector_tid)) {
if(old_collector_tid != my_collector_tid) {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "PLUGINSD: keyword %s: 'host:%s/chart:%s' is collected twice (my tid %d, other collector tid %d)",
keyword ? keyword : "UNKNOWN",
rrdhost_hostname(st->rrdhost), rrdset_id(st),
my_collector_tid, old_collector_tid);
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_COLLECTORS, NDLP_WARNING,
"PLUGINSD: keyword %s: 'host:%s/chart:%s' is collected twice (my tid %d, other collector tid %d)",
keyword ? keyword : "UNKNOWN",
rrdhost_hostname(st->rrdhost), rrdset_id(st),
my_collector_tid, old_collector_tid);
return false;
}
@ -389,8 +390,9 @@ static inline PARSER_RC PLUGINSD_DISABLE_PLUGIN(PARSER *parser, const char *keyw
parser->user.enabled = 0;
if(keyword && msg) {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "PLUGINSD: keyword %s: %s", keyword, msg);
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_COLLECTORS, NDLP_INFO,
"PLUGINSD: keyword %s: %s", keyword, msg);
}
return PARSER_RC_ERROR;
@ -1109,7 +1111,8 @@ void pluginsd_function_cancel(void *data) {
dfe_done(t);
if(sent <= 0)
netdata_log_error("PLUGINSD: FUNCTION_CANCEL request didn't match any pending function requests in pluginsd.d.");
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"PLUGINSD: FUNCTION_CANCEL request didn't match any pending function requests in pluginsd.d.");
}
// this is the function that is called from
@ -1626,9 +1629,10 @@ static inline PARSER_RC pluginsd_replay_set(char **words, size_t num_words, PARS
if(!st) return PLUGINSD_DISABLE_PLUGIN(parser, NULL, NULL);
if(!parser->user.replay.rset_enabled) {
error_limit_static_thread_var(erl, 1, 0);
error_limit(&erl, "PLUGINSD: 'host:%s/chart:%s' got a %s but it is disabled by %s errors",
rrdhost_hostname(host), rrdset_id(st), PLUGINSD_KEYWORD_REPLAY_SET, PLUGINSD_KEYWORD_REPLAY_BEGIN);
nd_log_limit_static_thread_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_COLLECTORS, NDLP_ERR,
"PLUGINSD: 'host:%s/chart:%s' got a %s but it is disabled by %s errors",
rrdhost_hostname(host), rrdset_id(st), PLUGINSD_KEYWORD_REPLAY_SET, PLUGINSD_KEYWORD_REPLAY_BEGIN);
// we have to return OK here
return PARSER_RC_OK;
@ -1675,8 +1679,10 @@ static inline PARSER_RC pluginsd_replay_set(char **words, size_t num_words, PARS
rd->collector.counter++;
}
else {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "PLUGINSD: 'host:%s/chart:%s/dim:%s' has the ARCHIVED flag set, but it is replicated. Ignoring data.",
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_COLLECTORS, NDLP_WARNING,
"PLUGINSD: 'host:%s/chart:%s/dim:%s' has the ARCHIVED flag set, but it is replicated. "
"Ignoring data.",
rrdhost_hostname(st->rrdhost), rrdset_id(st), rrddim_name(rd));
}
}
@ -2832,61 +2838,6 @@ static inline PARSER_RC streaming_claimed_id(char **words, size_t num_words, PAR
// ----------------------------------------------------------------------------
static inline bool buffered_reader_read(struct buffered_reader *reader, int fd) {
#ifdef NETDATA_INTERNAL_CHECKS
if(reader->read_buffer[reader->read_len] != '\0')
fatal("%s(): read_buffer does not start with zero", __FUNCTION__ );
#endif
ssize_t bytes_read = read(fd, reader->read_buffer + reader->read_len, sizeof(reader->read_buffer) - reader->read_len - 1);
if(unlikely(bytes_read <= 0))
return false;
reader->read_len += bytes_read;
reader->read_buffer[reader->read_len] = '\0';
return true;
}
static inline bool buffered_reader_read_timeout(struct buffered_reader *reader, int fd, int timeout_ms) {
errno = 0;
struct pollfd fds[1];
fds[0].fd = fd;
fds[0].events = POLLIN;
int ret = poll(fds, 1, timeout_ms);
if (ret > 0) {
/* There is data to read */
if (fds[0].revents & POLLIN)
return buffered_reader_read(reader, fd);
else if(fds[0].revents & POLLERR) {
netdata_log_error("PARSER: read failed: POLLERR.");
return false;
}
else if(fds[0].revents & POLLHUP) {
netdata_log_error("PARSER: read failed: POLLHUP.");
return false;
}
else if(fds[0].revents & POLLNVAL) {
netdata_log_error("PARSER: read failed: POLLNVAL.");
return false;
}
netdata_log_error("PARSER: poll() returned positive number, but POLLIN|POLLERR|POLLHUP|POLLNVAL are not set.");
return false;
}
else if (ret == 0) {
netdata_log_error("PARSER: timeout while waiting for data.");
return false;
}
netdata_log_error("PARSER: poll() failed with code %d.", ret);
return false;
}
void pluginsd_process_thread_cleanup(void *ptr) {
PARSER *parser = (PARSER *)ptr;
@ -2905,6 +2856,33 @@ void pluginsd_process_thread_cleanup(void *ptr) {
parser_destroy(parser);
}
bool parser_reconstruct_node(BUFFER *wb, void *ptr) {
PARSER *parser = ptr;
if(!parser || !parser->user.host)
return false;
buffer_strcat(wb, rrdhost_hostname(parser->user.host));
return true;
}
bool parser_reconstruct_instance(BUFFER *wb, void *ptr) {
PARSER *parser = ptr;
if(!parser || !parser->user.st)
return false;
buffer_strcat(wb, rrdset_name(parser->user.st));
return true;
}
bool parser_reconstruct_context(BUFFER *wb, void *ptr) {
PARSER *parser = ptr;
if(!parser || !parser->user.st)
return false;
buffer_strcat(wb, string2str(parser->user.st->context));
return true;
}
inline size_t pluginsd_process(RRDHOST *host, struct plugind *cd, FILE *fp_plugin_input, FILE *fp_plugin_output, int trust_durations)
{
int enabled = cd->unsafe.enabled;
@ -2952,33 +2930,51 @@ inline size_t pluginsd_process(RRDHOST *host, struct plugind *cd, FILE *fp_plugi
// so, parser needs to be allocated before pushing it
netdata_thread_cleanup_push(pluginsd_process_thread_cleanup, parser);
buffered_reader_init(&parser->reader);
BUFFER *buffer = buffer_create(sizeof(parser->reader.read_buffer) + 2, NULL);
while(likely(service_running(SERVICE_COLLECTORS))) {
if (unlikely(!buffered_reader_next_line(&parser->reader, buffer))) {
if(unlikely(!buffered_reader_read_timeout(&parser->reader, fileno((FILE *)parser->fp_input), 2 * 60 * MSEC_PER_SEC)))
break;
{
ND_LOG_STACK lgs[] = {
ND_LOG_FIELD_CB(NDF_REQUEST, line_splitter_reconstruct_line, &parser->line),
ND_LOG_FIELD_CB(NDF_NIDL_NODE, parser_reconstruct_node, parser),
ND_LOG_FIELD_CB(NDF_NIDL_INSTANCE, parser_reconstruct_instance, parser),
ND_LOG_FIELD_CB(NDF_NIDL_CONTEXT, parser_reconstruct_context, parser),
ND_LOG_FIELD_END(),
};
ND_LOG_STACK_PUSH(lgs);
continue;
}
buffered_reader_init(&parser->reader);
BUFFER *buffer = buffer_create(sizeof(parser->reader.read_buffer) + 2, NULL);
while(likely(service_running(SERVICE_COLLECTORS))) {
if(unlikely(parser_action(parser, buffer->buffer)))
break;
if(unlikely(!buffered_reader_next_line(&parser->reader, buffer))) {
buffered_reader_ret_t ret = buffered_reader_read_timeout(
&parser->reader,
fileno((FILE *) parser->fp_input),
2 * 60 * MSEC_PER_SEC, true
);
buffer->len = 0;
buffer->buffer[0] = '\0';
}
buffer_free(buffer);
if(unlikely(ret != BUFFERED_READER_READ_OK))
break;
cd->unsafe.enabled = parser->user.enabled;
count = parser->user.data_collections_count;
continue;
}
if (likely(count)) {
cd->successful_collections += count;
cd->serial_failures = 0;
}
else
cd->serial_failures++;
if(unlikely(parser_action(parser, buffer->buffer)))
break;
buffer->len = 0;
buffer->buffer[0] = '\0';
}
buffer_free(buffer);
cd->unsafe.enabled = parser->user.enabled;
count = parser->user.data_collections_count;
if(likely(count)) {
cd->successful_collections += count;
cd->serial_failures = 0;
}
else
cd->serial_failures++;
}
// free parser with the pop function
netdata_thread_cleanup_pop(1);

View file

@ -97,7 +97,6 @@ typedef struct parser {
PARSER_REPERTOIRE repertoire;
uint32_t flags;
int fd; // Socket
size_t line;
FILE *fp_input; // Input source e.g. stream
FILE *fp_output; // Stream to send commands to plugin
@ -111,6 +110,8 @@ typedef struct parser {
PARSER_USER_OBJECT user; // User defined structure to hold extra state between calls
struct buffered_reader reader;
struct line_splitter line;
PARSER_KEYWORD *keyword;
struct {
const char *end_keyword;
@ -162,13 +163,17 @@ static inline PARSER_KEYWORD *parser_find_keyword(PARSER *parser, const char *co
return NULL;
}
bool parser_reconstruct_node(BUFFER *wb, void *ptr);
bool parser_reconstruct_instance(BUFFER *wb, void *ptr);
bool parser_reconstruct_context(BUFFER *wb, void *ptr);
static inline int parser_action(PARSER *parser, char *input) {
#ifdef NETDATA_LOG_STREAM_RECEIVE
static __thread char line[PLUGINSD_LINE_MAX + 1];
strncpyz(line, input, sizeof(line) - 1);
#endif
parser->line++;
parser->line.count++;
if(unlikely(parser->flags & PARSER_DEFER_UNTIL_KEYWORD)) {
char command[100 + 1];
@ -200,24 +205,25 @@ static inline int parser_action(PARSER *parser, char *input) {
return 0;
}
static __thread char *words[PLUGINSD_MAX_WORDS];
size_t num_words = quoted_strings_splitter_pluginsd(input, words, PLUGINSD_MAX_WORDS);
const char *command = get_word(words, num_words, 0);
parser->line.num_words = quoted_strings_splitter_pluginsd(input, parser->line.words, PLUGINSD_MAX_WORDS);
const char *command = get_word(parser->line.words, parser->line.num_words, 0);
if(unlikely(!command))
if(unlikely(!command)) {
line_splitter_reset(&parser->line);
return 0;
}
PARSER_RC rc;
PARSER_KEYWORD *t = parser_find_keyword(parser, command);
if(likely(t)) {
worker_is_busy(t->worker_job_id);
parser->keyword = parser_find_keyword(parser, command);
if(likely(parser->keyword)) {
worker_is_busy(parser->keyword->worker_job_id);
#ifdef NETDATA_LOG_STREAM_RECEIVE
if(parser->user.stream_log_fp && t->repertoire & parser->user.stream_log_repertoire)
if(parser->user.stream_log_fp && parser->keyword->repertoire & parser->user.stream_log_repertoire)
fprintf(parser->user.stream_log_fp, "%s", line);
#endif
rc = parser_execute(parser, t, words, num_words);
rc = parser_execute(parser, parser->keyword, parser->line.words, parser->line.num_words);
// rc = (*t->func)(words, num_words, parser);
worker_is_idle();
}
@ -225,22 +231,13 @@ static inline int parser_action(PARSER *parser, char *input) {
rc = PARSER_RC_ERROR;
if(rc == PARSER_RC_ERROR) {
BUFFER *wb = buffer_create(PLUGINSD_LINE_MAX, NULL);
for(size_t i = 0; i < num_words ;i++) {
if(i) buffer_fast_strcat(wb, " ", 1);
buffer_fast_strcat(wb, "\"", 1);
const char *s = get_word(words, num_words, i);
buffer_strcat(wb, s?s:"");
buffer_fast_strcat(wb, "\"", 1);
}
CLEAN_BUFFER *wb = buffer_create(PLUGINSD_LINE_MAX, NULL);
line_splitter_reconstruct_line(wb, &parser->line);
netdata_log_error("PLUGINSD: parser_action('%s') failed on line %zu: { %s } (quotes added to show parsing)",
command, parser->line, buffer_tostring(wb));
buffer_free(wb);
command, parser->line.count, buffer_tostring(wb));
}
line_splitter_reset(&parser->line);
return (rc == PARSER_RC_ERROR || rc == PARSER_RC_STOP);
}

View file

@ -138,6 +138,12 @@ static bool is_lxcfs_proc_mounted() {
return false;
}
static bool log_proc_module(BUFFER *wb, void *data) {
struct proc_module *pm = data;
buffer_sprintf(wb, "proc.plugin[%s]", pm->name);
return true;
}
void *proc_main(void *ptr)
{
worker_register("PROC");
@ -153,46 +159,56 @@ void *proc_main(void *ptr)
netdata_thread_cleanup_push(proc_main_cleanup, ptr);
config_get_boolean("plugin:proc", "/proc/pagetypeinfo", CONFIG_BOOLEAN_NO);
{
config_get_boolean("plugin:proc", "/proc/pagetypeinfo", CONFIG_BOOLEAN_NO);
// check the enabled status for each module
int i;
for (i = 0; proc_modules[i].name; i++) {
struct proc_module *pm = &proc_modules[i];
// check the enabled status for each module
int i;
for(i = 0; proc_modules[i].name; i++) {
struct proc_module *pm = &proc_modules[i];
pm->enabled = config_get_boolean("plugin:proc", pm->name, CONFIG_BOOLEAN_YES);
pm->rd = NULL;
pm->enabled = config_get_boolean("plugin:proc", pm->name, CONFIG_BOOLEAN_YES);
pm->rd = NULL;
worker_register_job_name(i, proc_modules[i].dim);
}
worker_register_job_name(i, proc_modules[i].dim);
}
usec_t step = localhost->rrd_update_every * USEC_PER_SEC;
heartbeat_t hb;
heartbeat_init(&hb);
usec_t step = localhost->rrd_update_every * USEC_PER_SEC;
heartbeat_t hb;
heartbeat_init(&hb);
inside_lxc_container = is_lxcfs_proc_mounted();
inside_lxc_container = is_lxcfs_proc_mounted();
while (service_running(SERVICE_COLLECTORS)) {
worker_is_idle();
usec_t hb_dt = heartbeat_next(&hb, step);
#define LGS_MODULE_ID 0
if (unlikely(!service_running(SERVICE_COLLECTORS)))
break;
ND_LOG_STACK lgs[] = {
[LGS_MODULE_ID] = ND_LOG_FIELD_TXT(NDF_MODULE, "proc.plugin"),
ND_LOG_FIELD_END(),
};
ND_LOG_STACK_PUSH(lgs);
for (i = 0; proc_modules[i].name; i++) {
if (unlikely(!service_running(SERVICE_COLLECTORS)))
break;
while(service_running(SERVICE_COLLECTORS)) {
worker_is_idle();
usec_t hb_dt = heartbeat_next(&hb, step);
struct proc_module *pm = &proc_modules[i];
if (unlikely(!pm->enabled))
continue;
if(unlikely(!service_running(SERVICE_COLLECTORS)))
break;
netdata_log_debug(D_PROCNETDEV_LOOP, "PROC calling %s.", pm->name);
for(i = 0; proc_modules[i].name; i++) {
if(unlikely(!service_running(SERVICE_COLLECTORS)))
break;
worker_is_busy(i);
pm->enabled = !pm->func(localhost->rrd_update_every, hb_dt);
}
}
struct proc_module *pm = &proc_modules[i];
if(unlikely(!pm->enabled))
continue;
worker_is_busy(i);
lgs[LGS_MODULE_ID] = ND_LOG_FIELD_CB(NDF_MODULE, log_proc_module, pm);
pm->enabled = !pm->func(localhost->rrd_update_every, hb_dt);
lgs[LGS_MODULE_ID] = ND_LOG_FIELD_TXT(NDF_MODULE, "proc.plugin");
}
}
}
netdata_thread_cleanup_pop(1);
return NULL;

View file

@ -336,14 +336,11 @@ void usage(void) {
}
int main(int argc, char **argv) {
stderror = stderr;
clocks_init();
nd_log_initialize_for_external_plugins("slabinfo.plugin");
program_name = argv[0];
program_version = "0.1";
error_log_syslog = 0;
log_set_global_severity_for_external_plugins();
int update_every = 1, i, n, freq = 0;

View file

@ -2326,7 +2326,7 @@ static inline void statsd_flush_index_metrics(STATSD_INDEX *index, void (*flush_
if(unlikely(is_metric_checked(m))) break;
if(unlikely(!(m->options & STATSD_METRIC_OPTION_CHECKED_IN_APPS))) {
netdata_log_access("NEW STATSD METRIC '%s': '%s'", statsd_metric_type_string(m->type), m->name);
nd_log(NDLS_ACCESS, NDLP_DEBUG, "NEW STATSD METRIC '%s': '%s'", statsd_metric_type_string(m->type), m->name);
check_if_metric_is_for_app(index, m);
m->options |= STATSD_METRIC_OPTION_CHECKED_IN_APPS;
}

View file

@ -408,35 +408,50 @@ void netdata_systemd_journal_transform_boot_id(FACETS *facets __maybe_unused, BU
};
sd_journal *j = NULL;
if(sd_journal_open_files(&j, files, ND_SD_JOURNAL_OPEN_FLAGS) < 0 || !j) {
internal_error(true, "JOURNAL: cannot open file '%s' to get boot_id", jf_dfe.name);
int r = sd_journal_open_files(&j, files, ND_SD_JOURNAL_OPEN_FLAGS);
if(r < 0 || !j) {
internal_error(true, "JOURNAL: while looking for the first timestamp of boot_id '%s', "
"sd_journal_open_files('%s') returned %d",
boot_id, jf_dfe.name, r);
continue;
}
char m[100];
size_t len = snprintfz(m, sizeof(m), "_BOOT_ID=%s", boot_id);
if(sd_journal_add_match(j, m, len) < 0) {
internal_error(true, "JOURNAL: cannot add match '%s' to file '%s'", m, jf_dfe.name);
r = sd_journal_add_match(j, m, len);
if(r < 0) {
internal_error(true, "JOURNAL: while looking for the first timestamp of boot_id '%s', "
"sd_journal_add_match('%s') on file '%s' returned %d",
boot_id, m, jf_dfe.name, r);
sd_journal_close(j);
continue;
}
if(sd_journal_seek_head(j) < 0) {
internal_error(true, "JOURNAL: cannot seek head to file '%s'", jf_dfe.name);
r = sd_journal_seek_head(j);
if(r < 0) {
internal_error(true, "JOURNAL: while looking for the first timestamp of boot_id '%s', "
"sd_journal_seek_head() on file '%s' returned %d",
boot_id, jf_dfe.name, r);
sd_journal_close(j);
continue;
}
if(sd_journal_next(j) < 0) {
internal_error(true, "JOURNAL: cannot get next of file '%s'", jf_dfe.name);
r = sd_journal_next(j);
if(r < 0) {
internal_error(true, "JOURNAL: while looking for the first timestamp of boot_id '%s', "
"sd_journal_next() on file '%s' returned %d",
boot_id, jf_dfe.name, r);
sd_journal_close(j);
continue;
}
usec_t t_ut = 0;
if(sd_journal_get_realtime_usec(j, &t_ut) < 0 || !t_ut) {
internal_error(true, "JOURNAL: cannot get realtime_usec of file '%s'", jf_dfe.name);
r = sd_journal_get_realtime_usec(j, &t_ut);
if(r < 0 || !t_ut) {
internal_error(r != -EADDRNOTAVAIL, "JOURNAL: while looking for the first timestamp of boot_id '%s', "
"sd_journal_get_realtime_usec() on file '%s' returned %d",
boot_id, jf_dfe.name, r);
sd_journal_close(j);
continue;
}
@ -454,25 +469,21 @@ void netdata_systemd_journal_transform_boot_id(FACETS *facets __maybe_unused, BU
ut = *p_ut;
if(ut != UINT64_MAX) {
time_t timestamp_sec = (time_t)(ut / USEC_PER_SEC);
struct tm tm;
char buffer[30];
gmtime_r(&timestamp_sec, &tm);
strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", &tm);
char buffer[ISO8601_MAX_LENGTH];
iso8601_datetime_ut(buffer, sizeof(buffer), ut, ISO8601_UTC);
switch(scope) {
default:
case FACETS_TRANSFORM_DATA:
case FACETS_TRANSFORM_VALUE:
buffer_sprintf(wb, " (%s UTC) ", buffer);
buffer_sprintf(wb, " (%s) ", buffer);
break;
case FACETS_TRANSFORM_FACET:
case FACETS_TRANSFORM_FACET_SORT:
case FACETS_TRANSFORM_HISTOGRAM:
buffer_flush(wb);
buffer_sprintf(wb, "%s UTC", buffer);
buffer_sprintf(wb, "%s", buffer);
break;
}
}
@ -537,13 +548,9 @@ void netdata_systemd_journal_transform_timestamp_usec(FACETS *facets __maybe_unu
if(*v && isdigit(*v)) {
uint64_t ut = str2ull(buffer_tostring(wb), NULL);
if(ut) {
time_t timestamp_sec = (time_t)(ut / USEC_PER_SEC);
struct tm tm;
char buffer[30];
gmtime_r(&timestamp_sec, &tm);
strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", &tm);
buffer_sprintf(wb, " (%s.%06llu UTC)", buffer, ut % USEC_PER_SEC);
char buffer[ISO8601_MAX_LENGTH];
iso8601_datetime_ut(buffer, sizeof(buffer), ut, ISO8601_UTC | ISO8601_MICROSECONDS);
buffer_sprintf(wb, " (%s)", buffer);
}
}
}
@ -703,6 +710,23 @@ void netdata_systemd_journal_message_ids_init(void) {
// gnome-shell
// https://gitlab.gnome.org/GNOME/gnome-shell/-/blob/main/js/ui/main.js#L56
i.msg = "Gnome shell started";dictionary_set(known_journal_messages_ids, "f3ea493c22934e26811cd62abe8e203a", &i, sizeof(i));
// flathub
// https://docs.flatpak.org/de/latest/flatpak-command-reference.html
i.msg = "Flatpak cache"; dictionary_set(known_journal_messages_ids, "c7b39b1e006b464599465e105b361485", &i, sizeof(i));
// ???
i.msg = "Flathub pulls"; dictionary_set(known_journal_messages_ids, "75ba3deb0af041a9a46272ff85d9e73e", &i, sizeof(i));
i.msg = "Flathub pull errors"; dictionary_set(known_journal_messages_ids, "f02bce89a54e4efab3a94a797d26204a", &i, sizeof(i));
// ??
i.msg = "Boltd starting"; dictionary_set(known_journal_messages_ids, "dd11929c788e48bdbb6276fb5f26b08a", &i, sizeof(i));
// Netdata
i.msg = "Netdata connection from child"; dictionary_set(known_journal_messages_ids, "ed4cdb8f1beb4ad3b57cb3cae2d162fa", &i, sizeof(i));
i.msg = "Netdata connection to parent"; dictionary_set(known_journal_messages_ids, "6e2e3839067648968b646045dbf28d66", &i, sizeof(i));
i.msg = "Netdata alert transition"; dictionary_set(known_journal_messages_ids, "9ce0cb58ab8b44df82c4bf1ad9ee22de", &i, sizeof(i));
i.msg = "Netdata alert notification"; dictionary_set(known_journal_messages_ids, "6db0018e83e34320ae2a659d78019fb7", &i, sizeof(i));
}
void netdata_systemd_journal_transform_message_id(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) {

View file

@ -333,8 +333,8 @@ static void files_registry_delete_cb(const DICTIONARY_ITEM *item, void *value, v
struct journal_file *jf = value; (void)jf;
const char *filename = dictionary_acquired_item_name(item); (void)filename;
string_freez(jf->source);
internal_error(true, "removed journal file '%s'", filename);
string_freez(jf->source);
}
void journal_directory_scan(const char *dirname, int depth, usec_t last_scan_ut) {

View file

@ -165,6 +165,18 @@
"|IMAGE_NAME" /* undocumented */ \
/* "|CONTAINER_PARTIAL_MESSAGE" */ \
\
\
/* --- NETDATA --- */ \
\
"|ND_NIDL_NODE" \
"|ND_NIDL_CONTEXT" \
"|ND_LOG_SOURCE" \
/*"|ND_MODULE" */ \
"|ND_ALERT_NAME" \
"|ND_ALERT_CLASS" \
"|ND_ALERT_COMPONENT" \
"|ND_ALERT_TYPE" \
\
""
// ----------------------------------------------------------------------------

View file

@ -9,19 +9,8 @@ netdata_mutex_t stdout_mutex = NETDATA_MUTEX_INITIALIZER;
static bool plugin_should_exit = false;
int main(int argc __maybe_unused, char **argv __maybe_unused) {
stderror = stderr;
clocks_init();
program_name = "systemd-journal.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
nd_log_initialize_for_external_plugins("systemd-journal.plugin");
netdata_configured_host_prefix = getenv("NETDATA_HOST_PREFIX");
if(verify_netdata_host_prefix() == -1) exit(1);

View file

@ -676,8 +676,6 @@ static void update_freezer_state(UnitInfo *u, UnitAttribute *ua) {
// ----------------------------------------------------------------------------
// common helpers
#define _cleanup_(x) __attribute__((__cleanup__(x)))
static void log_dbus_error(int r, const char *msg) {
netdata_log_error("SYSTEMD_UNITS: %s failed with error %d (%s)", msg, r, strerror(-r));
}

View file

@ -920,7 +920,6 @@ static void xenstat_send_domain_metrics() {
}
int main(int argc, char **argv) {
stderror = stderr;
clocks_init();
// ------------------------------------------------------------------------
@ -928,14 +927,7 @@ int main(int argc, char **argv) {
program_name = "xenstat.plugin";
// disable syslog
error_log_syslog = 0;
// set errors flood protection to 100 logs per hour
error_log_errors_per_period = 100;
error_log_throttle_period = 3600;
log_set_global_severity_for_external_plugins();
nd_log_initialize_for_external_plugins();
// ------------------------------------------------------------------------
// parse command line parameters

View file

@ -54,6 +54,8 @@ else
AC_CHECK_TOOL([AR], [ar])
fi
CFLAGS="$CFLAGS -fexceptions"
# -----------------------------------------------------------------------------
# configurable options
@ -571,6 +573,48 @@ AC_CHECK_LIB(
[LZ4_LIBS="-llz4"]
)
# -----------------------------------------------------------------------------
# libcurl
PKG_CHECK_MODULES(
[LIBCURL],
[libcurl],
[AC_CHECK_LIB(
[curl],
[curl_easy_init],
[have_libcurl=yes],
[have_libcurl=no]
)],
[have_libcurl=no]
)
if test "x$have_libcurl" = "xyes"; then
AC_DEFINE([HAVE_CURL], [1], [libcurl usability])
OPTIONAL_CURL_LIBS="-lcurl"
fi
# -----------------------------------------------------------------------------
# PCRE2
PKG_CHECK_MODULES(
[LIBPCRE2],
[libpcre2-8],
[AC_CHECK_LIB(
[pcre2-8],
[pcre2_compile_8],
[have_libpcre2=yes],
[have_libpcre2=no]
)],
[have_libpcre2=no]
)
if test "x$have_libpcre2" = "xyes"; then
AC_DEFINE([HAVE_PCRE2], [1], [PCRE2 usability])
OPTIONAL_PCRE2_LIBS="-lpcre2-8"
fi
AM_CONDITIONAL([ENABLE_LOG2JOURNAL], [test "${have_libpcre2}" = "yes"])
# -----------------------------------------------------------------------------
# zstd
@ -1590,18 +1634,6 @@ PKG_CHECK_MODULES(
[have_libssl=no]
)
PKG_CHECK_MODULES(
[LIBCURL],
[libcurl],
[AC_CHECK_LIB(
[curl],
[curl_easy_init],
[have_libcurl=yes],
[have_libcurl=no]
)],
[have_libcurl=no]
)
PKG_CHECK_MODULES(
[AWS_CPP_SDK_CORE],
[aws-cpp-sdk-core],
@ -1946,6 +1978,8 @@ AC_SUBST([OPTIONAL_UV_LIBS])
AC_SUBST([OPTIONAL_LZ4_LIBS])
AC_SUBST([OPTIONAL_BROTLIENC_LIBS])
AC_SUBST([OPTIONAL_BROTLIDEC_LIBS])
AC_SUBST([OPTIONAL_CURL_LIBS])
AC_SUBST([OPTIONAL_PCRE2_LIBS])
AC_SUBST([OPTIONAL_ZSTD_LIBS])
AC_SUBST([OPTIONAL_SSL_LIBS])
AC_SUBST([OPTIONAL_JSONC_LIBS])
@ -2073,15 +2107,18 @@ AC_CONFIG_FILES([
libnetdata/aral/Makefile
libnetdata/avl/Makefile
libnetdata/buffer/Makefile
libnetdata/buffered_reader/Makefile
libnetdata/clocks/Makefile
libnetdata/completion/Makefile
libnetdata/config/Makefile
libnetdata/datetime/Makefile
libnetdata/dictionary/Makefile
libnetdata/ebpf/Makefile
libnetdata/eval/Makefile
libnetdata/facets/Makefile
libnetdata/functions_evloop/Makefile
libnetdata/july/Makefile
libnetdata/line_splitter/Makefile
libnetdata/locks/Makefile
libnetdata/log/Makefile
libnetdata/onewayalloc/Makefile
@ -2095,6 +2132,7 @@ AC_CONFIG_FILES([
libnetdata/storage_number/tests/Makefile
libnetdata/threads/Makefile
libnetdata/url/Makefile
libnetdata/uuid/Makefile
libnetdata/json/Makefile
libnetdata/health/Makefile
libnetdata/worker_utilization/Makefile

View file

@ -4,6 +4,7 @@ Build-Depends: debhelper (>= 9.20160709),
dpkg-dev (>= 1.13.19),
zlib1g-dev,
uuid-dev,
libcurl4-openssl-dev,
libelf-dev,
libuv1-dev,
liblz4-dev,
@ -15,6 +16,7 @@ Build-Depends: debhelper (>= 9.20160709),
libipmimonitoring-dev,
libnetfilter-acct-dev,
libsnappy-dev,
libpcre2-8-0,
libprotobuf-dev,
libprotoc-dev,
libsystemd-dev,

View file

@ -142,10 +142,10 @@ static cmd_status_t cmd_reload_health_execute(char *args, char **message)
(void)args;
(void)message;
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("COMMAND: Reloading HEALTH configuration.");
health_reload();
error_log_limit_reset();
nd_log_limits_reset();
return CMD_STATUS_SUCCESS;
}
@ -155,11 +155,11 @@ static cmd_status_t cmd_save_database_execute(char *args, char **message)
(void)args;
(void)message;
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("COMMAND: Saving databases.");
rrdhost_save_all();
netdata_log_info("COMMAND: Databases saved.");
error_log_limit_reset();
nd_log_limits_reset();
return CMD_STATUS_SUCCESS;
}
@ -169,10 +169,9 @@ static cmd_status_t cmd_reopen_logs_execute(char *args, char **message)
(void)args;
(void)message;
error_log_limit_unlimited();
netdata_log_info("COMMAND: Reopening all log files.");
reopen_all_log_files();
error_log_limit_reset();
nd_log_limits_unlimited();
nd_log_reopen_log_files();
nd_log_limits_reset();
return CMD_STATUS_SUCCESS;
}
@ -182,7 +181,7 @@ static cmd_status_t cmd_exit_execute(char *args, char **message)
(void)args;
(void)message;
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("COMMAND: Cleaning up to exit.");
netdata_cleanup_and_exit(0);
exit(0);

View file

@ -31,22 +31,6 @@ void get_netdata_execution_path(void) {
dirname(netdata_exe_path);
}
static void chown_open_file(int fd, uid_t uid, gid_t gid) {
if(fd == -1) return;
struct stat buf;
if(fstat(fd, &buf) == -1) {
netdata_log_error("Cannot fstat() fd %d", fd);
return;
}
if((buf.st_uid != uid || buf.st_gid != gid) && S_ISREG(buf.st_mode)) {
if(fchown(fd, uid, gid) == -1)
netdata_log_error("Cannot fchown() fd %d.", fd);
}
}
static void fix_directory_file_permissions(const char *dirname, uid_t uid, gid_t gid, bool recursive)
{
char filename[FILENAME_MAX + 1];
@ -150,9 +134,9 @@ int become_user(const char *username, int pid_fd) {
}
}
nd_log_chown_log_files(uid, gid);
chown_open_file(STDOUT_FILENO, uid, gid);
chown_open_file(STDERR_FILENO, uid, gid);
chown_open_file(stdaccess_fd, uid, gid);
chown_open_file(pid_fd, uid, gid);
if(supplementary_groups && ngroups > 0) {

View file

@ -315,7 +315,7 @@ void netdata_cleanup_and_exit(int ret) {
const char *prev_msg = NULL;
bool timeout = false;
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("NETDATA SHUTDOWN: initializing shutdown with code %d...", ret);
send_statistics("EXIT", ret?"ERROR":"OK","-");
@ -449,8 +449,9 @@ void netdata_cleanup_and_exit(int ret) {
running += rrdeng_collectors_running(multidb_ctx[tier]);
if(running) {
error_limit_static_thread_var(erl, 1, 100 * USEC_PER_MS);
error_limit(&erl, "waiting for %zu collectors to finish", running);
nd_log_limit_static_thread_var(erl, 1, 100 * USEC_PER_MS);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_NOTICE,
"waiting for %zu collectors to finish", running);
// sleep_usec(100 * USEC_PER_MS);
cleanup_destroyed_dictionaries();
}
@ -618,8 +619,14 @@ int killpid(pid_t pid) {
int ret;
netdata_log_debug(D_EXIT, "Request to kill pid %d", pid);
int signal = SIGTERM;
//#ifdef NETDATA_INTERNAL_CHECKS
// if(service_running(SERVICE_COLLECTORS))
// signal = SIGABRT;
//#endif
errno = 0;
ret = kill(pid, SIGTERM);
ret = kill(pid, signal);
if (ret == -1) {
switch(errno) {
case ESRCH:
@ -666,7 +673,7 @@ static void set_nofile_limit(struct rlimit *rl) {
}
void cancel_main_threads() {
error_log_limit_unlimited();
nd_log_limits_unlimited();
int i, found = 0;
usec_t max = 5 * USEC_PER_SEC, step = 100000;
@ -756,7 +763,7 @@ int help(int exitcode) {
" | '-' '-' '-' '-' real-time performance monitoring, done right! \n"
" +----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--->\n"
"\n"
" Copyright (C) 2016-2022, Netdata, Inc. <info@netdata.cloud>\n"
" Copyright (C) 2016-2023, Netdata, Inc. <info@netdata.cloud>\n"
" Released under GNU General Public License v3 or later.\n"
" All rights reserved.\n"
"\n"
@ -845,44 +852,49 @@ static void security_init(){
#endif
static void log_init(void) {
nd_log_set_facility(config_get(CONFIG_SECTION_LOGS, "facility", "daemon"));
time_t period = ND_LOG_DEFAULT_THROTTLE_PERIOD;
size_t logs = ND_LOG_DEFAULT_THROTTLE_LOGS;
period = config_get_number(CONFIG_SECTION_LOGS, "logs flood protection period", period);
logs = (unsigned long)config_get_number(CONFIG_SECTION_LOGS, "logs to trigger flood protection", (long long int)logs);
nd_log_set_flood_protection(logs, period);
nd_log_set_priority_level(config_get(CONFIG_SECTION_LOGS, "level", NDLP_INFO_STR));
char filename[FILENAME_MAX + 1];
snprintfz(filename, FILENAME_MAX, "%s/debug.log", netdata_configured_log_dir);
stdout_filename = config_get(CONFIG_SECTION_LOGS, "debug", filename);
nd_log_set_user_settings(NDLS_DEBUG, config_get(CONFIG_SECTION_LOGS, "debug", filename));
snprintfz(filename, FILENAME_MAX, "%s/error.log", netdata_configured_log_dir);
stderr_filename = config_get(CONFIG_SECTION_LOGS, "error", filename);
bool with_journal = is_stderr_connected_to_journal() /* || nd_log_journal_socket_available() */;
if(with_journal)
snprintfz(filename, FILENAME_MAX, "journal");
else
snprintfz(filename, FILENAME_MAX, "%s/daemon.log", netdata_configured_log_dir);
nd_log_set_user_settings(NDLS_DAEMON, config_get(CONFIG_SECTION_LOGS, "daemon", filename));
snprintfz(filename, FILENAME_MAX, "%s/collector.log", netdata_configured_log_dir);
stdcollector_filename = config_get(CONFIG_SECTION_LOGS, "collector", filename);
if(with_journal)
snprintfz(filename, FILENAME_MAX, "journal");
else
snprintfz(filename, FILENAME_MAX, "%s/collector.log", netdata_configured_log_dir);
nd_log_set_user_settings(NDLS_COLLECTORS, config_get(CONFIG_SECTION_LOGS, "collector", filename));
snprintfz(filename, FILENAME_MAX, "%s/access.log", netdata_configured_log_dir);
stdaccess_filename = config_get(CONFIG_SECTION_LOGS, "access", filename);
nd_log_set_user_settings(NDLS_ACCESS, config_get(CONFIG_SECTION_LOGS, "access", filename));
snprintfz(filename, FILENAME_MAX, "%s/health.log", netdata_configured_log_dir);
stdhealth_filename = config_get(CONFIG_SECTION_LOGS, "health", filename);
if(with_journal)
snprintfz(filename, FILENAME_MAX, "journal");
else
snprintfz(filename, FILENAME_MAX, "%s/health.log", netdata_configured_log_dir);
nd_log_set_user_settings(NDLS_HEALTH, config_get(CONFIG_SECTION_LOGS, "health", filename));
#ifdef ENABLE_ACLK
aclklog_enabled = config_get_boolean(CONFIG_SECTION_CLOUD, "conversation log", CONFIG_BOOLEAN_NO);
if (aclklog_enabled) {
snprintfz(filename, FILENAME_MAX, "%s/aclk.log", netdata_configured_log_dir);
aclklog_filename = config_get(CONFIG_SECTION_CLOUD, "conversation log file", filename);
nd_log_set_user_settings(NDLS_ACLK, config_get(CONFIG_SECTION_CLOUD, "conversation log file", filename));
}
#endif
char deffacility[8];
snprintfz(deffacility,7,"%s","daemon");
facility_log = config_get(CONFIG_SECTION_LOGS, "facility", deffacility);
error_log_throttle_period = config_get_number(CONFIG_SECTION_LOGS, "errors flood protection period", error_log_throttle_period);
error_log_errors_per_period = (unsigned long)config_get_number(CONFIG_SECTION_LOGS, "errors to trigger flood protection", (long long int)error_log_errors_per_period);
error_log_errors_per_period_backup = error_log_errors_per_period;
setenv("NETDATA_ERRORS_THROTTLE_PERIOD", config_get(CONFIG_SECTION_LOGS, "errors flood protection period" , ""), 1);
setenv("NETDATA_ERRORS_PER_PERIOD", config_get(CONFIG_SECTION_LOGS, "errors to trigger flood protection", ""), 1);
char *selected_level = config_get(CONFIG_SECTION_LOGS, "severity level", NETDATA_LOG_LEVEL_INFO_STR);
global_log_severity_level = log_severity_string_to_severity_level(selected_level);
setenv("NETDATA_LOG_SEVERITY_LEVEL", selected_level , 1);
}
char *initialize_lock_directory_path(char *prefix)
@ -1054,6 +1066,17 @@ static void backwards_compatible_config() {
config_move(CONFIG_SECTION_GLOBAL, "enable zero metrics",
CONFIG_SECTION_DB, "enable zero metrics");
config_move(CONFIG_SECTION_LOGS, "error",
CONFIG_SECTION_LOGS, "daemon");
config_move(CONFIG_SECTION_LOGS, "severity level",
CONFIG_SECTION_LOGS, "level");
config_move(CONFIG_SECTION_LOGS, "errors to trigger flood protection",
CONFIG_SECTION_LOGS, "logs to trigger flood protection");
config_move(CONFIG_SECTION_LOGS, "errors flood protection period",
CONFIG_SECTION_LOGS, "logs flood protection period");
}
static int get_hostname(char *buf, size_t buf_size) {
@ -1354,6 +1377,7 @@ int pluginsd_parser_unittest(void);
void replication_initialize(void);
void bearer_tokens_init(void);
int unittest_rrdpush_compressions(void);
int uuid_unittest(void);
int main(int argc, char **argv) {
// initialize the system clocks
@ -1363,8 +1387,6 @@ int main(int argc, char **argv) {
usec_t started_ut = now_monotonic_usec();
usec_t last_ut = started_ut;
const char *prev_msg = NULL;
// Initialize stderror avoiding coredump when netdata_log_info() or netdata_log_error() is called
stderror = stderr;
int i;
int config_loaded = 0;
@ -1516,6 +1538,8 @@ int main(int argc, char **argv) {
return 1;
if (ctx_unittest())
return 1;
if (uuid_unittest())
return 1;
fprintf(stderr, "\n\nALL TESTS PASSED\n\n");
return 0;
}
@ -1542,6 +1566,10 @@ int main(int argc, char **argv) {
unittest_running = true;
return buffer_unittest();
}
else if(strcmp(optarg, "uuidtest") == 0) {
unittest_running = true;
return uuid_unittest();
}
#ifdef ENABLE_DBENGINE
else if(strcmp(optarg, "mctest") == 0) {
unittest_running = true;
@ -1919,10 +1947,10 @@ int main(int argc, char **argv) {
// get log filenames and settings
log_init();
error_log_limit_unlimited();
nd_log_limits_unlimited();
// initialize the log files
open_all_log_files();
nd_log_initialize();
netdata_log_info("Netdata agent version \""VERSION"\" is starting");
ieee754_doubles = is_system_ieee754_double();
@ -2103,7 +2131,7 @@ int main(int argc, char **argv) {
// ------------------------------------------------------------------------
// enable log flood protection
error_log_limit_reset();
nd_log_limits_reset();
// Load host labels
delta_startup_time("collect host labels");

View file

@ -203,28 +203,28 @@ void signals_handle(void) {
switch (signals_waiting[i].action) {
case NETDATA_SIGNAL_RELOAD_HEALTH:
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("SIGNAL: Received %s. Reloading HEALTH configuration...", name);
error_log_limit_reset();
nd_log_limits_reset();
execute_command(CMD_RELOAD_HEALTH, NULL, NULL);
break;
case NETDATA_SIGNAL_SAVE_DATABASE:
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("SIGNAL: Received %s. Saving databases...", name);
error_log_limit_reset();
nd_log_limits_reset();
execute_command(CMD_SAVE_DATABASE, NULL, NULL);
break;
case NETDATA_SIGNAL_REOPEN_LOGS:
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("SIGNAL: Received %s. Reopening all log files...", name);
error_log_limit_reset();
nd_log_limits_reset();
execute_command(CMD_REOPEN_LOGS, NULL, NULL);
break;
case NETDATA_SIGNAL_EXIT_CLEANLY:
error_log_limit_unlimited();
nd_log_limits_unlimited();
netdata_log_info("SIGNAL: Received %s. Cleaning up to exit...", name);
commands_exit();
netdata_cleanup_and_exit(0);

View file

@ -2118,7 +2118,7 @@ int test_dbengine(void)
RRDDIM *rd[CHARTS][DIMS];
time_t time_start[REGIONS], time_end[REGIONS];
error_log_limit_unlimited();
nd_log_limits_unlimited();
fprintf(stderr, "\nRunning DB-engine test\n");
default_rrd_memory_mode = RRD_MEMORY_MODE_DBENGINE;
@ -2347,7 +2347,7 @@ void generate_dbengine_dataset(unsigned history_seconds)
(1024 * 1024);
default_rrdeng_disk_quota_mb -= default_rrdeng_disk_quota_mb * EXPECTED_COMPRESSION_RATIO / 100;
error_log_limit_unlimited();
nd_log_limits_unlimited();
fprintf(stderr, "Initializing localhost with hostname 'dbengine-dataset'");
host = dbengine_rrdhost_find_or_create("dbengine-dataset");
@ -2522,7 +2522,7 @@ void dbengine_stress_test(unsigned TEST_DURATION_SEC, unsigned DSET_CHARTS, unsi
unsigned i, j;
time_t time_start, test_duration;
error_log_limit_unlimited();
nd_log_limits_unlimited();
if (!TEST_DURATION_SEC)
TEST_DURATION_SEC = 10;

View file

@ -224,26 +224,31 @@ void rrdcontext_hub_checkpoint_command(void *ptr) {
struct ctxs_checkpoint *cmd = ptr;
if(!rrdhost_check_our_claim_id(cmd->claim_id)) {
netdata_log_error("RRDCONTEXT: received checkpoint command for claim_id '%s', node id '%s', but this is not our claim id. Ours '%s', received '%s'. Ignoring command.",
cmd->claim_id, cmd->node_id,
localhost->aclk_state.claimed_id?localhost->aclk_state.claimed_id:"NOT SET",
cmd->claim_id);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"RRDCONTEXT: received checkpoint command for claim_id '%s', node id '%s', "
"but this is not our claim id. Ours '%s', received '%s'. Ignoring command.",
cmd->claim_id, cmd->node_id,
localhost->aclk_state.claimed_id?localhost->aclk_state.claimed_id:"NOT SET",
cmd->claim_id);
return;
}
RRDHOST *host = rrdhost_find_by_node_id(cmd->node_id);
if(!host) {
netdata_log_error("RRDCONTEXT: received checkpoint command for claim id '%s', node id '%s', but there is no node with such node id here. Ignoring command.",
cmd->claim_id,
cmd->node_id);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"RRDCONTEXT: received checkpoint command for claim id '%s', node id '%s', "
"but there is no node with such node id here. Ignoring command.",
cmd->claim_id, cmd->node_id);
return;
}
if(rrdhost_flag_check(host, RRDHOST_FLAG_ACLK_STREAM_CONTEXTS)) {
netdata_log_info("RRDCONTEXT: received checkpoint command for claim id '%s', node id '%s', while node '%s' has an active context streaming.",
cmd->claim_id, cmd->node_id, rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"RRDCONTEXT: received checkpoint command for claim id '%s', node id '%s', "
"while node '%s' has an active context streaming.",
cmd->claim_id, cmd->node_id, rrdhost_hostname(host));
// disable it temporarily, so that our worker will not attempt to send messages in parallel
rrdhost_flag_clear(host, RRDHOST_FLAG_ACLK_STREAM_CONTEXTS);
@ -252,8 +257,10 @@ void rrdcontext_hub_checkpoint_command(void *ptr) {
uint64_t our_version_hash = rrdcontext_version_hash(host);
if(cmd->version_hash != our_version_hash) {
netdata_log_error("RRDCONTEXT: received version hash %"PRIu64" for host '%s', does not match our version hash %"PRIu64". Sending snapshot of all contexts.",
cmd->version_hash, rrdhost_hostname(host), our_version_hash);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"RRDCONTEXT: received version hash %"PRIu64" for host '%s', does not match our version hash %"PRIu64". "
"Sending snapshot of all contexts.",
cmd->version_hash, rrdhost_hostname(host), our_version_hash);
#ifdef ENABLE_ACLK
// prepare the snapshot
@ -275,41 +282,55 @@ void rrdcontext_hub_checkpoint_command(void *ptr) {
#endif
}
internal_error(true, "RRDCONTEXT: host '%s' enabling streaming of contexts", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRDCONTEXT: host '%s' enabling streaming of contexts",
rrdhost_hostname(host));
rrdhost_flag_set(host, RRDHOST_FLAG_ACLK_STREAM_CONTEXTS);
char node_str[UUID_STR_LEN];
uuid_unparse_lower(*host->node_id, node_str);
netdata_log_access("ACLK REQ [%s (%s)]: STREAM CONTEXTS ENABLED", node_str, rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_DEBUG,
"ACLK REQ [%s (%s)]: STREAM CONTEXTS ENABLED",
node_str, rrdhost_hostname(host));
}
void rrdcontext_hub_stop_streaming_command(void *ptr) {
struct stop_streaming_ctxs *cmd = ptr;
if(!rrdhost_check_our_claim_id(cmd->claim_id)) {
netdata_log_error("RRDCONTEXT: received stop streaming command for claim_id '%s', node id '%s', but this is not our claim id. Ours '%s', received '%s'. Ignoring command.",
cmd->claim_id, cmd->node_id,
localhost->aclk_state.claimed_id?localhost->aclk_state.claimed_id:"NOT SET",
cmd->claim_id);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"RRDCONTEXT: received stop streaming command for claim_id '%s', node id '%s', "
"but this is not our claim id. Ours '%s', received '%s'. Ignoring command.",
cmd->claim_id, cmd->node_id,
localhost->aclk_state.claimed_id?localhost->aclk_state.claimed_id:"NOT SET",
cmd->claim_id);
return;
}
RRDHOST *host = rrdhost_find_by_node_id(cmd->node_id);
if(!host) {
netdata_log_error("RRDCONTEXT: received stop streaming command for claim id '%s', node id '%s', but there is no node with such node id here. Ignoring command.",
cmd->claim_id, cmd->node_id);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"RRDCONTEXT: received stop streaming command for claim id '%s', node id '%s', "
"but there is no node with such node id here. Ignoring command.",
cmd->claim_id, cmd->node_id);
return;
}
if(!rrdhost_flag_check(host, RRDHOST_FLAG_ACLK_STREAM_CONTEXTS)) {
netdata_log_error("RRDCONTEXT: received stop streaming command for claim id '%s', node id '%s', but node '%s' does not have active context streaming. Ignoring command.",
cmd->claim_id, cmd->node_id, rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"RRDCONTEXT: received stop streaming command for claim id '%s', node id '%s', "
"but node '%s' does not have active context streaming. Ignoring command.",
cmd->claim_id, cmd->node_id, rrdhost_hostname(host));
return;
}
internal_error(true, "RRDCONTEXT: host '%s' disabling streaming of contexts", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRDCONTEXT: host '%s' disabling streaming of contexts",
rrdhost_hostname(host));
rrdhost_flag_clear(host, RRDHOST_FLAG_ACLK_STREAM_CONTEXTS);
}

View file

@ -1171,9 +1171,10 @@ static bool evict_pages_with_filter(PGC *cache, size_t max_skip, size_t max_evic
if(all_of_them && !filter) {
pgc_ll_lock(cache, &cache->clean);
if(cache->clean.stats->entries) {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "DBENGINE CACHE: cannot free all clean pages, %zu are still in the clean queue",
cache->clean.stats->entries);
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_NOTICE,
"DBENGINE CACHE: cannot free all clean pages, %zu are still in the clean queue",
cache->clean.stats->entries);
}
pgc_ll_unlock(cache, &cache->clean);
}

View file

@ -479,14 +479,18 @@ int create_new_datafile_pair(struct rrdengine_instance *ctx, bool having_lock)
int ret;
char path[RRDENG_PATH_MAX];
netdata_log_info("DBENGINE: creating new data and journal files in path %s", ctx->config.dbfiles_path);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"DBENGINE: creating new data and journal files in path %s",
ctx->config.dbfiles_path);
datafile = datafile_alloc_and_init(ctx, 1, fileno);
ret = create_data_file(datafile);
if(ret)
goto error_after_datafile;
generate_datafilepath(datafile, path, sizeof(path));
netdata_log_info("DBENGINE: created data file \"%s\".", path);
nd_log(NDLS_DAEMON, NDLP_INFO,
"DBENGINE: created data file \"%s\".", path);
journalfile = journalfile_alloc_and_init(datafile);
ret = journalfile_create(journalfile, datafile);
@ -494,7 +498,8 @@ int create_new_datafile_pair(struct rrdengine_instance *ctx, bool having_lock)
goto error_after_journalfile;
journalfile_v1_generate_path(datafile, path, sizeof(path));
netdata_log_info("DBENGINE: created journal file \"%s\".", path);
nd_log(NDLS_DAEMON, NDLP_INFO,
"DBENGINE: created journal file \"%s\".", path);
ctx_current_disk_space_increase(ctx, datafile->pos + journalfile->unsafe.pos);
datafile_list_insert(ctx, datafile, having_lock);

View file

@ -592,27 +592,30 @@ inline void mrg_update_metric_retention_and_granularity_by_uuid(
time_t update_every_s, time_t now_s)
{
if(unlikely(last_time_s > now_s)) {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "DBENGINE JV2: wrong last time on-disk (%ld - %ld, now %ld), "
"fixing last time to now",
first_time_s, last_time_s, now_s);
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"DBENGINE JV2: wrong last time on-disk (%ld - %ld, now %ld), "
"fixing last time to now",
first_time_s, last_time_s, now_s);
last_time_s = now_s;
}
if (unlikely(first_time_s > last_time_s)) {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "DBENGINE JV2: wrong first time on-disk (%ld - %ld, now %ld), "
"fixing first time to last time",
first_time_s, last_time_s, now_s);
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"DBENGINE JV2: wrong first time on-disk (%ld - %ld, now %ld), "
"fixing first time to last time",
first_time_s, last_time_s, now_s);
first_time_s = last_time_s;
}
if (unlikely(first_time_s == 0 || last_time_s == 0)) {
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl, "DBENGINE JV2: zero on-disk timestamps (%ld - %ld, now %ld), "
"using them as-is",
first_time_s, last_time_s, now_s);
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"DBENGINE JV2: zero on-disk timestamps (%ld - %ld, now %ld), "
"using them as-is",
first_time_s, last_time_s, now_s);
}
bool added = false;

View file

@ -772,7 +772,7 @@ VALIDATED_PAGE_DESCRIPTOR validate_page(
if(unlikely(!vd.is_valid || updated)) {
#ifndef NETDATA_INTERNAL_CHECKS
error_limit_static_global_var(erl, 1, 0);
nd_log_limit_static_global_var(erl, 1, 0);
#endif
char uuid_str[UUID_STR_LEN + 1];
uuid_unparse(*uuid, uuid_str);
@ -788,7 +788,7 @@ VALIDATED_PAGE_DESCRIPTOR validate_page(
#ifdef NETDATA_INTERNAL_CHECKS
internal_error(true,
#else
error_limit(&erl,
nd_log_limit(&erl, NDLS_DAEMON, NDLP_ERR,
#endif
"DBENGINE: metric '%s' %s invalid page of type %u "
"from %ld to %ld (now %ld), update every %ld, page length %zu, entries %zu (flags: %s)",
@ -808,7 +808,7 @@ VALIDATED_PAGE_DESCRIPTOR validate_page(
#ifdef NETDATA_INTERNAL_CHECKS
internal_error(true,
#else
error_limit(&erl,
nd_log_limit(&erl, NDLS_DAEMON, NDLP_ERR,
#endif
"DBENGINE: metric '%s' %s page of type %u "
"from %ld to %ld (now %ld), update every %ld, page length %zu, entries %zu (flags: %s), "
@ -915,8 +915,8 @@ static void epdl_extent_loading_error_log(struct rrdengine_instance *ctx, EPDL *
if(end_time_s)
log_date(end_time_str, LOG_DATE_LENGTH, end_time_s);
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl,
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_ERR,
"DBENGINE: error while reading extent from datafile %u of tier %d, at offset %" PRIu64 " (%u bytes) "
"%s from %ld (%s) to %ld (%s) %s%s: "
"%s",

View file

@ -1478,12 +1478,19 @@ static void *journal_v2_indexing_tp_worker(struct rrdengine_instance *ctx __mayb
spinlock_unlock(&datafile->writers.spinlock);
if(!available) {
netdata_log_info("DBENGINE: journal file %u needs to be indexed, but it has writers working on it - skipping it for now", datafile->fileno);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"DBENGINE: journal file %u needs to be indexed, but it has writers working on it - "
"skipping it for now",
datafile->fileno);
datafile = datafile->next;
continue;
}
netdata_log_info("DBENGINE: journal file %u is ready to be indexed", datafile->fileno);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"DBENGINE: journal file %u is ready to be indexed",
datafile->fileno);
pgc_open_cache_to_journal_v2(open_cache, (Word_t) ctx, (int) datafile->fileno, ctx->config.page_type,
journalfile_migrate_to_v2_callback, (void *) datafile->journalfile);
@ -1496,7 +1503,10 @@ static void *journal_v2_indexing_tp_worker(struct rrdengine_instance *ctx __mayb
}
errno = 0;
internal_error(count, "DBENGINE: journal indexing done; %u files processed", count);
if(count)
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"DBENGINE: journal indexing done; %u files processed",
count);
worker_is_idle();

View file

@ -361,12 +361,12 @@ static void rrdeng_store_metric_create_new_page(struct rrdeng_collect_handle *ha
#ifdef NETDATA_INTERNAL_CHECKS
internal_error(true,
#else
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl,
#endif
"DBENGINE: metric '%s' new page from %ld to %ld, update every %ld, has a conflict in main cache "
"with existing %s%s page from %ld to %ld, update every %ld - "
"is it collected more than once?",
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
#endif
"DBENGINE: metric '%s' new page from %ld to %ld, update every %ld, has a conflict in main cache "
"with existing %s%s page from %ld to %ld, update every %ld - "
"is it collected more than once?",
uuid,
page_entry.start_time_s, page_entry.end_time_s, (time_t)page_entry.update_every_s,
pgc_is_page_hot(pgc_page) ? "hot" : "not-hot",
@ -521,8 +521,8 @@ static void store_metric_next_error_log(struct rrdeng_collect_handle *handle __m
collect_page_flags_to_buffer(wb, handle->page_flags);
}
error_limit_static_global_var(erl, 1, 0);
error_limit(&erl,
nd_log_limit_static_global_var(erl, 1, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_NOTICE,
"DBENGINE: metric '%s' collected point at %ld, %s last collection at %ld, "
"update every %ld, %s page from %ld to %ld, position %u (of %u), flags: %s",
uuid,
@ -535,7 +535,7 @@ static void store_metric_next_error_log(struct rrdeng_collect_handle *handle __m
(time_t)(handle->page_end_time_ut / USEC_PER_SEC),
handle->page_position, handle->page_entries_max,
wb ? buffer_tostring(wb) : ""
);
);
buffer_free(wb);
#else

View file

@ -1023,7 +1023,7 @@ typedef enum __attribute__ ((__packed__)) rrdhost_flags {
#ifdef NETDATA_INTERNAL_CHECKS
#define rrdset_debug(st, fmt, args...) do { if(unlikely(debug_flags & D_RRD_STATS && rrdset_flag_check(st, RRDSET_FLAG_DEBUG))) \
debug_int(__FILE__, __FUNCTION__, __LINE__, "%s: " fmt, rrdset_name(st), ##args); } while(0)
netdata_logger(NDLS_DEBUG, NDLP_DEBUG, __FILE__, __FUNCTION__, __LINE__, "%s: " fmt, rrdset_name(st), ##args); } while(0)
#else
#define rrdset_debug(st, fmt, args...) debug_dummy()
#endif

View file

@ -809,10 +809,10 @@ void rrdcalc_delete_alerts_not_matching_host_labels_from_this_host(RRDHOST *host
continue;
if(!rrdlabels_match_simple_pattern_parsed(host->rrdlabels, rc->host_labels_pattern, '=', NULL)) {
netdata_log_health("Health configuration for alarm '%s' cannot be applied, because the host %s does not have the label(s) '%s'",
rrdcalc_name(rc),
rrdhost_hostname(host),
rrdcalc_host_labels(rc));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Health configuration for alarm '%s' cannot be applied, "
"because the host %s does not have the label(s) '%s'",
rrdcalc_name(rc), rrdhost_hostname(host), rrdcalc_host_labels(rc));
rrdcalc_unlink_and_delete(host, rc, false);
}

View file

@ -1007,11 +1007,11 @@ int rrd_function_run(RRDHOST *host, BUFFER *result_wb, int timeout, const char *
// the function can only be executed in async mode
// put the function into the inflight requests
char uuid_str[UUID_STR_LEN];
char uuid_str[UUID_COMPACT_STR_LEN];
if(!transaction) {
uuid_t uuid;
uuid_generate_random(uuid);
uuid_unparse_lower(uuid, uuid_str);
uuid_unparse_lower_compact(uuid, uuid_str);
transaction = uuid_str;
}

View file

@ -80,8 +80,6 @@ static inline void rrdhost_init() {
}
RRDHOST_ACQUIRED *rrdhost_find_and_acquire(const char *machine_guid) {
netdata_log_debug(D_RRD_CALLS, "rrdhost_find_and_acquire() host %s", machine_guid);
return (RRDHOST_ACQUIRED *)dictionary_get_and_acquire_item(rrdhost_root_index, machine_guid);
}
@ -116,8 +114,9 @@ static inline RRDHOST *rrdhost_index_add_by_guid(RRDHOST *host) {
rrdhost_option_set(host, RRDHOST_OPTION_INDEXED_MACHINE_GUID);
else {
rrdhost_option_clear(host, RRDHOST_OPTION_INDEXED_MACHINE_GUID);
netdata_log_error("RRDHOST: %s() host with machine guid '%s' is already indexed",
__FUNCTION__, host->machine_guid);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"RRDHOST: host with machine guid '%s' is already indexed. Not adding it again.",
host->machine_guid);
}
return host;
@ -126,8 +125,9 @@ static inline RRDHOST *rrdhost_index_add_by_guid(RRDHOST *host) {
static void rrdhost_index_del_by_guid(RRDHOST *host) {
if(rrdhost_option_check(host, RRDHOST_OPTION_INDEXED_MACHINE_GUID)) {
if(!dictionary_del(rrdhost_root_index, host->machine_guid))
netdata_log_error("RRDHOST: %s() failed to delete machine guid '%s' from index",
__FUNCTION__, host->machine_guid);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"RRDHOST: failed to delete machine guid '%s' from index",
host->machine_guid);
rrdhost_option_clear(host, RRDHOST_OPTION_INDEXED_MACHINE_GUID);
}
@ -148,8 +148,9 @@ static inline void rrdhost_index_del_hostname(RRDHOST *host) {
if(rrdhost_option_check(host, RRDHOST_OPTION_INDEXED_HOSTNAME)) {
if(!dictionary_del(rrdhost_root_index_hostname, rrdhost_hostname(host)))
netdata_log_error("RRDHOST: %s() failed to delete hostname '%s' from index",
__FUNCTION__, rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"RRDHOST: failed to delete hostname '%s' from index",
rrdhost_hostname(host));
rrdhost_option_clear(host, RRDHOST_OPTION_INDEXED_HOSTNAME);
}
@ -303,11 +304,11 @@ static RRDHOST *rrdhost_create(
int is_localhost,
bool archived
) {
netdata_log_debug(D_RRDHOST, "Host '%s': adding with guid '%s'", hostname, guid);
if(memory_mode == RRD_MEMORY_MODE_DBENGINE && !dbengine_enabled) {
netdata_log_error("memory mode 'dbengine' is not enabled, but host '%s' is configured for it. Falling back to 'alloc'",
hostname);
nd_log(NDLS_DAEMON, NDLP_ERR,
"memory mode 'dbengine' is not enabled, but host '%s' is configured for it. Falling back to 'alloc'",
hostname);
memory_mode = RRD_MEMORY_MODE_ALLOC;
}
@ -392,7 +393,9 @@ int is_legacy = 1;
(host->rrd_memory_mode == RRD_MEMORY_MODE_DBENGINE && is_legacy))) {
int r = mkdir(host->cache_dir, 0775);
if(r != 0 && errno != EEXIST)
netdata_log_error("Host '%s': cannot create directory '%s'", rrdhost_hostname(host), host->cache_dir);
nd_log(NDLS_DAEMON, NDLP_CRIT,
"Host '%s': cannot create directory '%s'",
rrdhost_hostname(host), host->cache_dir);
}
}
@ -418,7 +421,9 @@ int is_legacy = 1;
ret = mkdir(dbenginepath, 0775);
if (ret != 0 && errno != EEXIST)
netdata_log_error("Host '%s': cannot create directory '%s'", rrdhost_hostname(host), dbenginepath);
nd_log(NDLS_DAEMON, NDLP_CRIT,
"Host '%s': cannot create directory '%s'",
rrdhost_hostname(host), dbenginepath);
else
ret = 0; // succeed
@ -459,8 +464,9 @@ int is_legacy = 1;
}
if (ret) { // check legacy or multihost initialization success
netdata_log_error("Host '%s': cannot initialize host with machine guid '%s'. Failed to initialize DB engine at '%s'.",
rrdhost_hostname(host), host->machine_guid, host->cache_dir);
nd_log(NDLS_DAEMON, NDLP_CRIT,
"Host '%s': cannot initialize host with machine guid '%s'. Failed to initialize DB engine at '%s'.",
rrdhost_hostname(host), host->machine_guid, host->cache_dir);
rrd_wrlock();
rrdhost_free___while_having_rrd_wrlock(host, true);
@ -508,10 +514,13 @@ int is_legacy = 1;
RRDHOST *t = rrdhost_index_add_by_guid(host);
if(t != host) {
netdata_log_error("Host '%s': cannot add host with machine guid '%s' to index. It already exists as host '%s' with machine guid '%s'.",
rrdhost_hostname(host), host->machine_guid, rrdhost_hostname(t), t->machine_guid);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"Host '%s': cannot add host with machine guid '%s' to index. It already exists as host '%s' with machine guid '%s'.",
rrdhost_hostname(host), host->machine_guid, rrdhost_hostname(t), t->machine_guid);
if (!is_localhost)
rrdhost_free___while_having_rrd_wrlock(host, true);
rrd_unlock();
return NULL;
}
@ -527,21 +536,22 @@ int is_legacy = 1;
// ------------------------------------------------------------------------
netdata_log_info("Host '%s' (at registry as '%s') with guid '%s' initialized"
", os '%s'"
", timezone '%s'"
", tags '%s'"
", program_name '%s'"
", program_version '%s'"
", update every %d"
", memory mode %s"
", history entries %d"
", streaming %s"
" (to '%s' with api key '%s')"
", health %s"
", cache_dir '%s'"
", alarms default handler '%s'"
", alarms default recipient '%s'"
nd_log(NDLS_DAEMON, NDLP_INFO,
"Host '%s' (at registry as '%s') with guid '%s' initialized"
", os '%s'"
", timezone '%s'"
", tags '%s'"
", program_name '%s'"
", program_version '%s'"
", update every %d"
", memory mode %s"
", history entries %d"
", streaming %s"
" (to '%s' with api key '%s')"
", health %s"
", cache_dir '%s'"
", alarms default handler '%s'"
", alarms default recipient '%s'"
, rrdhost_hostname(host)
, rrdhost_registry_hostname(host)
, host->machine_guid
@ -621,44 +631,56 @@ static void rrdhost_update(RRDHOST *host
host->registry_hostname = string_strdupz((registry_hostname && *registry_hostname)?registry_hostname:hostname);
if(strcmp(rrdhost_hostname(host), hostname) != 0) {
netdata_log_info("Host '%s' has been renamed to '%s'. If this is not intentional it may mean multiple hosts are using the same machine_guid.", rrdhost_hostname(host), hostname);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Host '%s' has been renamed to '%s'. If this is not intentional it may mean multiple hosts are using the same machine_guid.",
rrdhost_hostname(host), hostname);
rrdhost_init_hostname(host, hostname, true);
} else {
rrdhost_index_add_hostname(host);
}
if(strcmp(rrdhost_program_name(host), program_name) != 0) {
netdata_log_info("Host '%s' switched program name from '%s' to '%s'", rrdhost_hostname(host), rrdhost_program_name(host), program_name);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"Host '%s' switched program name from '%s' to '%s'",
rrdhost_hostname(host), rrdhost_program_name(host), program_name);
STRING *t = host->program_name;
host->program_name = string_strdupz(program_name);
string_freez(t);
}
if(strcmp(rrdhost_program_version(host), program_version) != 0) {
netdata_log_info("Host '%s' switched program version from '%s' to '%s'", rrdhost_hostname(host), rrdhost_program_version(host), program_version);
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"Host '%s' switched program version from '%s' to '%s'",
rrdhost_hostname(host), rrdhost_program_version(host), program_version);
STRING *t = host->program_version;
host->program_version = string_strdupz(program_version);
string_freez(t);
}
if(host->rrd_update_every != update_every)
netdata_log_error("Host '%s' has an update frequency of %d seconds, but the wanted one is %d seconds. "
"Restart netdata here to apply the new settings.",
rrdhost_hostname(host), host->rrd_update_every, update_every);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Host '%s' has an update frequency of %d seconds, but the wanted one is %d seconds. "
"Restart netdata here to apply the new settings.",
rrdhost_hostname(host), host->rrd_update_every, update_every);
if(host->rrd_memory_mode != mode)
netdata_log_error("Host '%s' has memory mode '%s', but the wanted one is '%s'. "
"Restart netdata here to apply the new settings.",
rrdhost_hostname(host),
rrd_memory_mode_name(host->rrd_memory_mode),
rrd_memory_mode_name(mode));
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Host '%s' has memory mode '%s', but the wanted one is '%s'. "
"Restart netdata here to apply the new settings.",
rrdhost_hostname(host),
rrd_memory_mode_name(host->rrd_memory_mode),
rrd_memory_mode_name(mode));
else if(host->rrd_memory_mode != RRD_MEMORY_MODE_DBENGINE && host->rrd_history_entries < history)
netdata_log_error("Host '%s' has history of %d entries, but the wanted one is %ld entries. "
"Restart netdata here to apply the new settings.",
rrdhost_hostname(host),
host->rrd_history_entries,
history);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Host '%s' has history of %d entries, but the wanted one is %ld entries. "
"Restart netdata here to apply the new settings.",
rrdhost_hostname(host),
host->rrd_history_entries,
history);
// update host tags
rrdhost_init_tags(host, tags);
@ -700,7 +722,9 @@ static void rrdhost_update(RRDHOST *host
ml_host_new(host);
rrdhost_load_rrdcontext_data(host);
netdata_log_info("Host %s is not in archived mode anymore", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Host %s is not in archived mode anymore",
rrdhost_hostname(host));
}
spinlock_unlock(&host->rrdhost_update_lock);
@ -731,8 +755,6 @@ RRDHOST *rrdhost_find_or_create(
, struct rrdhost_system_info *system_info
, bool archived
) {
netdata_log_debug(D_RRDHOST, "Searching for host '%s' with guid '%s'", hostname, guid);
RRDHOST *host = rrdhost_find_by_guid(guid);
if (unlikely(host && host->rrd_memory_mode != mode && rrdhost_flag_check(host, RRDHOST_FLAG_ARCHIVED))) {
@ -740,10 +762,11 @@ RRDHOST *rrdhost_find_or_create(
return host;
/* If a legacy memory mode instantiates all dbengine state must be discarded to avoid inconsistencies */
netdata_log_error("Archived host '%s' has memory mode '%s', but the wanted one is '%s'. Discarding archived state.",
rrdhost_hostname(host),
rrd_memory_mode_name(host->rrd_memory_mode),
rrd_memory_mode_name(mode));
nd_log(NDLS_DAEMON, NDLP_INFO,
"Archived host '%s' has memory mode '%s', but the wanted one is '%s'. Discarding archived state.",
rrdhost_hostname(host),
rrd_memory_mode_name(host->rrd_memory_mode),
rrd_memory_mode_name(mode));
rrd_wrlock();
rrdhost_free___while_having_rrd_wrlock(host, true);
@ -851,18 +874,26 @@ void dbengine_init(char *hostname) {
if (read_num > 0 && read_num <= MAX_PAGES_PER_EXTENT)
rrdeng_pages_per_extent = read_num;
else {
netdata_log_error("Invalid dbengine pages per extent %u given. Using %u.", read_num, rrdeng_pages_per_extent);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Invalid dbengine pages per extent %u given. Using %u.",
read_num, rrdeng_pages_per_extent);
config_set_number(CONFIG_SECTION_DB, "dbengine pages per extent", rrdeng_pages_per_extent);
}
storage_tiers = config_get_number(CONFIG_SECTION_DB, "storage tiers", storage_tiers);
if(storage_tiers < 1) {
netdata_log_error("At least 1 storage tier is required. Assuming 1.");
nd_log(NDLS_DAEMON, NDLP_WARNING,
"At least 1 storage tier is required. Assuming 1.");
storage_tiers = 1;
config_set_number(CONFIG_SECTION_DB, "storage tiers", storage_tiers);
}
if(storage_tiers > RRD_STORAGE_TIERS) {
netdata_log_error("Up to %d storage tier are supported. Assuming %d.", RRD_STORAGE_TIERS, RRD_STORAGE_TIERS);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Up to %d storage tier are supported. Assuming %d.",
RRD_STORAGE_TIERS, RRD_STORAGE_TIERS);
storage_tiers = RRD_STORAGE_TIERS;
config_set_number(CONFIG_SECTION_DB, "storage tiers", storage_tiers);
}
@ -884,7 +915,9 @@ void dbengine_init(char *hostname) {
int ret = mkdir(dbenginepath, 0775);
if (ret != 0 && errno != EEXIST) {
netdata_log_error("DBENGINE on '%s': cannot create directory '%s'", hostname, dbenginepath);
nd_log(NDLS_DAEMON, NDLP_CRIT,
"DBENGINE on '%s': cannot create directory '%s'",
hostname, dbenginepath);
break;
}
@ -904,9 +937,9 @@ void dbengine_init(char *hostname) {
if(grouping_iterations < 2) {
grouping_iterations = 2;
config_set_number(CONFIG_SECTION_DB, dbengineconfig, grouping_iterations);
netdata_log_error("DBENGINE on '%s': 'dbegnine tier %zu update every iterations' cannot be less than 2. Assuming 2.",
hostname,
tier);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"DBENGINE on '%s': 'dbegnine tier %zu update every iterations' cannot be less than 2. Assuming 2.",
hostname, tier);
}
snprintfz(dbengineconfig, 200, "dbengine tier %zu backfill", tier);
@ -915,7 +948,10 @@ void dbengine_init(char *hostname) {
else if(strcmp(bf, "full") == 0) backfill = RRD_BACKFILL_FULL;
else if(strcmp(bf, "none") == 0) backfill = RRD_BACKFILL_NONE;
else {
netdata_log_error("DBENGINE: unknown backfill value '%s', assuming 'new'", bf);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"DBENGINE: unknown backfill value '%s', assuming 'new'",
bf);
config_set(CONFIG_SECTION_DB, dbengineconfig, "new");
backfill = RRD_BACKFILL_NEW;
}
@ -926,10 +962,10 @@ void dbengine_init(char *hostname) {
if(tier > 0 && get_tier_grouping(tier) > 65535) {
storage_tiers_grouping_iterations[tier] = 1;
netdata_log_error("DBENGINE on '%s': dbengine tier %zu gives aggregation of more than 65535 points of tier 0. Disabling tiers above %zu",
hostname,
tier,
tier);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"DBENGINE on '%s': dbengine tier %zu gives aggregation of more than 65535 points of tier 0. "
"Disabling tiers above %zu",
hostname, tier, tier);
break;
}
@ -957,21 +993,19 @@ void dbengine_init(char *hostname) {
netdata_thread_join(tiers_init[tier].thread, &ptr);
if(tiers_init[tier].ret != 0) {
netdata_log_error("DBENGINE on '%s': Failed to initialize multi-host database tier %zu on path '%s'",
hostname,
tiers_init[tier].tier,
tiers_init[tier].path);
nd_log(NDLS_DAEMON, NDLP_ERR,
"DBENGINE on '%s': Failed to initialize multi-host database tier %zu on path '%s'",
hostname, tiers_init[tier].tier, tiers_init[tier].path);
}
else if(created_tiers == tier)
created_tiers++;
}
if(created_tiers && created_tiers < storage_tiers) {
netdata_log_error("DBENGINE on '%s': Managed to create %zu tiers instead of %zu. Continuing with %zu available.",
hostname,
created_tiers,
storage_tiers,
created_tiers);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"DBENGINE on '%s': Managed to create %zu tiers instead of %zu. Continuing with %zu available.",
hostname, created_tiers, storage_tiers, created_tiers);
storage_tiers = created_tiers;
}
else if(!created_tiers)
@ -984,7 +1018,10 @@ void dbengine_init(char *hostname) {
#else
storage_tiers = config_get_number(CONFIG_SECTION_DB, "storage tiers", 1);
if(storage_tiers != 1) {
netdata_log_error("DBENGINE is not available on '%s', so only 1 database tier can be supported.", hostname);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"DBENGINE is not available on '%s', so only 1 database tier can be supported.",
hostname);
storage_tiers = 1;
config_set_number(CONFIG_SECTION_DB, "storage tiers", storage_tiers);
}
@ -1000,7 +1037,9 @@ int rrd_init(char *hostname, struct rrdhost_system_info *system_info, bool unitt
set_late_global_environment(system_info);
fatal("Failed to initialize SQLite");
}
netdata_log_info("Skipping SQLITE metadata initialization since memory mode is not dbengine");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Skipping SQLITE metadata initialization since memory mode is not dbengine");
}
if (unlikely(sql_init_context_database(system_info ? 0 : 1))) {
@ -1015,23 +1054,28 @@ int rrd_init(char *hostname, struct rrdhost_system_info *system_info, bool unitt
rrdpush_init();
if (default_rrd_memory_mode == RRD_MEMORY_MODE_DBENGINE || rrdpush_receiver_needs_dbengine()) {
netdata_log_info("DBENGINE: Initializing ...");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"DBENGINE: Initializing ...");
dbengine_init(hostname);
}
else {
netdata_log_info("DBENGINE: Not initializing ...");
else
storage_tiers = 1;
}
if (!dbengine_enabled) {
if (storage_tiers > 1) {
netdata_log_error("dbengine is not enabled, but %zu tiers have been requested. Resetting tiers to 1",
storage_tiers);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"dbengine is not enabled, but %zu tiers have been requested. Resetting tiers to 1",
storage_tiers);
storage_tiers = 1;
}
if (default_rrd_memory_mode == RRD_MEMORY_MODE_DBENGINE) {
netdata_log_error("dbengine is not enabled, but it has been given as the default db mode. Resetting db mode to alloc");
nd_log(NDLS_DAEMON, NDLP_WARNING,
"dbengine is not enabled, but it has been given as the default db mode. "
"Resetting db mode to alloc");
default_rrd_memory_mode = RRD_MEMORY_MODE_ALLOC;
}
}
@ -1040,7 +1084,6 @@ int rrd_init(char *hostname, struct rrdhost_system_info *system_info, bool unitt
if(!unittest)
metadata_sync_init();
netdata_log_debug(D_RRDHOST, "Initializing localhost with hostname '%s'", hostname);
localhost = rrdhost_create(
hostname
, registry_get_this_machine_hostname()
@ -1177,7 +1220,9 @@ void rrdhost_free___while_having_rrd_wrlock(RRDHOST *host, bool force) {
if(!host) return;
if (netdata_exit || force) {
netdata_log_info("RRD: 'host:%s' freeing memory...", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: 'host:%s' freeing memory...",
rrdhost_hostname(host));
// ------------------------------------------------------------------------
// first remove it from the indexes, so that it will not be discoverable
@ -1243,7 +1288,10 @@ void rrdhost_free___while_having_rrd_wrlock(RRDHOST *host, bool force) {
#endif
if (!netdata_exit && !force) {
netdata_log_info("RRD: 'host:%s' is now in archive mode...", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: 'host:%s' is now in archive mode...",
rrdhost_hostname(host));
rrdhost_flag_set(host, RRDHOST_FLAG_ARCHIVED | RRDHOST_FLAG_ORPHAN);
return;
}
@ -1313,7 +1361,9 @@ void rrd_finalize_collection_for_all_hosts(void) {
void rrdhost_save_charts(RRDHOST *host) {
if(!host) return;
netdata_log_info("RRD: 'host:%s' saving / closing database...", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: 'host:%s' saving / closing database...",
rrdhost_hostname(host));
RRDSET *st;
@ -1442,7 +1492,9 @@ static void rrdhost_load_config_labels(void) {
int status = config_load(NULL, 1, CONFIG_SECTION_HOST_LABEL);
if(!status) {
char *filename = CONFIG_DIR "/" CONFIG_FILENAME;
netdata_log_error("RRDLABEL: Cannot reload the configuration file '%s', using labels in memory", filename);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"RRDLABEL: Cannot reload the configuration file '%s', using labels in memory",
filename);
}
struct section *co = appconfig_get_section(&netdata_config, CONFIG_SECTION_HOST_LABEL);
@ -1462,12 +1514,13 @@ static void rrdhost_load_kubernetes_labels(void) {
sprintf(label_script, "%s/%s", netdata_configured_primary_plugins_dir, "get-kubernetes-labels.sh");
if (unlikely(access(label_script, R_OK) != 0)) {
netdata_log_error("Kubernetes pod label fetching script %s not found.",label_script);
nd_log(NDLS_DAEMON, NDLP_ERR,
"Kubernetes pod label fetching script %s not found.",
label_script);
return;
}
netdata_log_debug(D_RRDHOST, "Attempting to fetch external labels via %s", label_script);
pid_t pid;
FILE *fp_child_input;
FILE *fp_child_output = netdata_popen(label_script, &pid, &fp_child_input);
@ -1481,7 +1534,9 @@ static void rrdhost_load_kubernetes_labels(void) {
// Here we'll inform with an ERROR that the script failed, show whatever (if anything) was added to the list of labels, free the memory and set the return to null
int rc = netdata_pclose(fp_child_input, fp_child_output, pid);
if(rc)
netdata_log_error("%s exited abnormally. Failed to get kubernetes labels.", label_script);
nd_log(NDLS_DAEMON, NDLP_ERR,
"%s exited abnormally. Failed to get kubernetes labels.",
label_script);
}
void reload_host_labels(void) {
@ -1501,7 +1556,9 @@ void reload_host_labels(void) {
}
void rrdhost_finalize_collection(RRDHOST *host) {
netdata_log_info("RRD: 'host:%s' stopping data collection...", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: 'host:%s' stopping data collection...",
rrdhost_hostname(host));
RRDSET *st;
rrdset_foreach_read(st, host)
@ -1515,7 +1572,9 @@ void rrdhost_finalize_collection(RRDHOST *host) {
void rrdhost_delete_charts(RRDHOST *host) {
if(!host) return;
netdata_log_info("RRD: 'host:%s' deleting disk files...", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: 'host:%s' deleting disk files...",
rrdhost_hostname(host));
RRDSET *st;
@ -1523,8 +1582,8 @@ void rrdhost_delete_charts(RRDHOST *host) {
// we get a write lock
// to ensure only one thread is saving the database
rrdset_foreach_write(st, host){
rrdset_delete_files(st);
}
rrdset_delete_files(st);
}
rrdset_foreach_done(st);
}
@ -1537,7 +1596,9 @@ void rrdhost_delete_charts(RRDHOST *host) {
void rrdhost_cleanup_charts(RRDHOST *host) {
if(!host) return;
netdata_log_info("RRD: 'host:%s' cleaning up disk files...", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: 'host:%s' cleaning up disk files...",
rrdhost_hostname(host));
RRDSET *st;
uint32_t rrdhost_delete_obsolete_charts = rrdhost_option_check(host, RRDHOST_OPTION_DELETE_OBSOLETE_CHARTS);
@ -1564,7 +1625,9 @@ void rrdhost_cleanup_charts(RRDHOST *host) {
// RRDHOST - save all hosts to disk
void rrdhost_save_all(void) {
netdata_log_info("RRD: saving databases [%zu hosts(s)]...", rrdhost_hosts_available());
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: saving databases [%zu hosts(s)]...",
rrdhost_hosts_available());
rrd_rdlock();
@ -1579,7 +1642,9 @@ void rrdhost_save_all(void) {
// RRDHOST - save or delete all hosts from disk
void rrdhost_cleanup_all(void) {
netdata_log_info("RRD: cleaning up database [%zu hosts(s)]...", rrdhost_hosts_available());
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"RRD: cleaning up database [%zu hosts(s)]...",
rrdhost_hosts_available());
rrd_rdlock();

View file

@ -1323,6 +1323,14 @@ void rrddim_store_metric_with_trace(RRDDIM *rd, usec_t point_end_time_ut, NETDAT
#else // !NETDATA_LOG_COLLECTION_ERRORS
void rrddim_store_metric(RRDDIM *rd, usec_t point_end_time_ut, NETDATA_DOUBLE n, SN_FLAGS flags) {
#endif // !NETDATA_LOG_COLLECTION_ERRORS
static __thread struct log_stack_entry lgs[] = {
[0] = ND_LOG_FIELD_STR(NDF_NIDL_DIMENSION, NULL),
[1] = ND_LOG_FIELD_END(),
};
lgs[0].str = rd->id;
log_stack_push(lgs);
#ifdef NETDATA_LOG_COLLECTION_ERRORS
rd->rrddim_store_metric_count++;
@ -1384,6 +1392,7 @@ void rrddim_store_metric(RRDDIM *rd, usec_t point_end_time_ut, NETDATA_DOUBLE n,
}
rrdcontext_collected_rrddim(rd);
log_stack_pop(&lgs);
}
void store_metric_collection_completed() {

View file

@ -267,7 +267,7 @@ void aclk_push_alert_event(struct aclk_sync_cfg_t *wc)
int rc;
if (unlikely(!wc->alert_updates)) {
netdata_log_access(
nd_log(NDLS_ACCESS, NDLP_NOTICE,
"ACLK STA [%s (%s)]: Ignoring alert push event, updates have been turned off for this node.",
wc->node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A");
@ -424,7 +424,7 @@ void aclk_push_alert_event(struct aclk_sync_cfg_t *wc)
} else {
if (wc->alerts_log_first_sequence_id)
netdata_log_access(
nd_log(NDLS_ACCESS, NDLP_DEBUG,
"ACLK RES [%s (%s)]: ALERTS SENT from %" PRIu64 " to %" PRIu64 "",
wc->node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A",
@ -523,8 +523,11 @@ void aclk_send_alarm_configuration(char *config_hash)
if (unlikely(!wc))
return;
netdata_log_access(
"ACLK REQ [%s (%s)]: Request to send alert config %s.", wc->node_id, rrdhost_hostname(wc->host), config_hash);
nd_log(NDLS_ACCESS, NDLP_DEBUG,
"ACLK REQ [%s (%s)]: Request to send alert config %s.",
wc->node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A",
config_hash);
aclk_push_alert_config(wc->node_id, config_hash);
}
@ -634,13 +637,13 @@ int aclk_push_alert_config_event(char *node_id __maybe_unused, char *config_hash
}
if (likely(p_alarm_config.cfg_hash)) {
netdata_log_access("ACLK RES [%s (%s)]: Sent alert config %s.", wc->node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A", config_hash);
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK RES [%s (%s)]: Sent alert config %s.", wc->node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A", config_hash);
aclk_send_provide_alarm_cfg(&p_alarm_config);
freez(p_alarm_config.cfg_hash);
destroy_aclk_alarm_configuration(&alarm_config);
}
else
netdata_log_access("ACLK STA [%s (%s)]: Alert config for %s not found.", wc->node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A", config_hash);
nd_log(NDLS_ACCESS, NDLP_WARNING, "ACLK STA [%s (%s)]: Alert config for %s not found.", wc->node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A", config_hash);
bind_fail:
rc = sqlite3_finalize(res);
@ -669,20 +672,15 @@ void aclk_start_alert_streaming(char *node_id, bool resets)
return;
if (unlikely(!host->health.health_enabled)) {
netdata_log_access(
"ACLK STA [%s (N/A)]: Ignoring request to stream alert state changes, health is disabled.", node_id);
nd_log(NDLS_ACCESS, NDLP_NOTICE, "ACLK STA [%s (N/A)]: Ignoring request to stream alert state changes, health is disabled.", node_id);
return;
}
if (resets) {
netdata_log_access(
"ACLK REQ [%s (%s)]: STREAM ALERTS ENABLED (RESET REQUESTED)",
node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A");
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK REQ [%s (%s)]: STREAM ALERTS ENABLED (RESET REQUESTED)", node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A");
sql_queue_existing_alerts_to_aclk(host);
} else
netdata_log_access(
"ACLK REQ [%s (%s)]: STREAM ALERTS ENABLED", node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A");
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK REQ [%s (%s)]: STREAM ALERTS ENABLED", node_id, wc->host ? rrdhost_hostname(wc->host) : "N/A");
wc->alert_updates = 1;
wc->alert_queue_removed = SEND_REMOVED_AFTER_HEALTH_LOOPS;
@ -725,7 +723,7 @@ void sql_process_queue_removed_alerts_to_aclk(char *node_id)
rc = execute_insert(res);
if (likely(rc == SQLITE_DONE)) {
netdata_log_access("ACLK STA [%s (%s)]: QUEUED REMOVED ALERTS", wc->node_id, rrdhost_hostname(wc->host));
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK STA [%s (%s)]: QUEUED REMOVED ALERTS", wc->node_id, rrdhost_hostname(wc->host));
rrdhost_flag_set(wc->host, RRDHOST_FLAG_ACLK_STREAM_ALERTS);
wc->alert_queue_removed = 0;
}
@ -758,15 +756,15 @@ void aclk_process_send_alarm_snapshot(char *node_id, char *claim_id __maybe_unus
RRDHOST *host = find_host_by_node_id(node_id);
if (unlikely(!host || !(wc = host->aclk_config))) {
netdata_log_access("ACLK STA [%s (N/A)]: ACLK node id does not exist", node_id);
nd_log(NDLS_ACCESS, NDLP_WARNING, "ACLK STA [%s (N/A)]: ACLK node id does not exist", node_id);
return;
}
netdata_log_access(
"IN [%s (%s)]: Request to send alerts snapshot, snapshot_uuid %s",
node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A",
snapshot_uuid);
nd_log(NDLS_ACCESS, NDLP_DEBUG,
"IN [%s (%s)]: Request to send alerts snapshot, snapshot_uuid %s",
node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A",
snapshot_uuid);
if (wc->alerts_snapshot_uuid && !strcmp(wc->alerts_snapshot_uuid,snapshot_uuid))
return;
@ -855,7 +853,7 @@ void aclk_push_alert_snapshot_event(char *node_id __maybe_unused)
RRDHOST *host = find_host_by_node_id(node_id);
if (unlikely(!host)) {
netdata_log_access("AC [%s (N/A)]: Node id not found", node_id);
nd_log(NDLS_ACCESS, NDLP_WARNING, "AC [%s (N/A)]: Node id not found", node_id);
freez(node_id);
return;
}
@ -865,7 +863,7 @@ void aclk_push_alert_snapshot_event(char *node_id __maybe_unused)
// we perhaps we don't need this for snapshots
if (unlikely(!wc->alert_updates)) {
netdata_log_access(
nd_log(NDLS_ACCESS, NDLP_NOTICE,
"ACLK STA [%s (%s)]: Ignoring alert snapshot event, updates have been turned off for this node.",
wc->node_id,
wc->host ? rrdhost_hostname(wc->host) : "N/A");
@ -879,7 +877,7 @@ void aclk_push_alert_snapshot_event(char *node_id __maybe_unused)
if (unlikely(!claim_id))
return;
netdata_log_access("ACLK REQ [%s (%s)]: Sending alerts snapshot, snapshot_uuid %s", wc->node_id, rrdhost_hostname(wc->host), wc->alerts_snapshot_uuid);
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK REQ [%s (%s)]: Sending alerts snapshot, snapshot_uuid %s", wc->node_id, rrdhost_hostname(wc->host), wc->alerts_snapshot_uuid);
uint32_t cnt = 0;
@ -1057,9 +1055,9 @@ void aclk_send_alarm_checkpoint(char *node_id, char *claim_id __maybe_unused)
RRDHOST *host = find_host_by_node_id(node_id);
if (unlikely(!host || !(wc = host->aclk_config)))
netdata_log_access("ACLK REQ [%s (N/A)]: ALERTS CHECKPOINT REQUEST RECEIVED FOR INVALID NODE", node_id);
nd_log(NDLS_ACCESS, NDLP_WARNING, "ACLK REQ [%s (N/A)]: ALERTS CHECKPOINT REQUEST RECEIVED FOR INVALID NODE", node_id);
else {
netdata_log_access("ACLK REQ [%s (%s)]: ALERTS CHECKPOINT REQUEST RECEIVED", node_id, rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK REQ [%s (%s)]: ALERTS CHECKPOINT REQUEST RECEIVED", node_id, rrdhost_hostname(host));
wc->alert_checkpoint_req = SEND_CHECKPOINT_AFTER_HEALTH_LOOPS;
}
}
@ -1087,14 +1085,14 @@ void aclk_push_alarm_checkpoint(RRDHOST *host __maybe_unused)
#ifdef ENABLE_ACLK
struct aclk_sync_cfg_t *wc = host->aclk_config;
if (unlikely(!wc)) {
netdata_log_access("ACLK REQ [%s (N/A)]: ALERTS CHECKPOINT REQUEST RECEIVED FOR INVALID NODE", rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_WARNING, "ACLK REQ [%s (N/A)]: ALERTS CHECKPOINT REQUEST RECEIVED FOR INVALID NODE", rrdhost_hostname(host));
return;
}
if (rrdhost_flag_check(host, RRDHOST_FLAG_ACLK_STREAM_ALERTS)) {
//postpone checkpoint send
wc->alert_checkpoint_req += 3;
netdata_log_access("ACLK REQ [%s (N/A)]: ALERTS CHECKPOINT POSTPONED", rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_NOTICE, "ACLK REQ [%s (N/A)]: ALERTS CHECKPOINT POSTPONED", rrdhost_hostname(host));
return;
}
@ -1157,9 +1155,9 @@ void aclk_push_alarm_checkpoint(RRDHOST *host __maybe_unused)
aclk_send_provide_alarm_checkpoint(&alarm_checkpoint);
freez(claim_id);
netdata_log_access("ACLK RES [%s (%s)]: ALERTS CHECKPOINT SENT", wc->node_id, rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK RES [%s (%s)]: ALERTS CHECKPOINT SENT", wc->node_id, rrdhost_hostname(host));
} else
netdata_log_access("ACLK RES [%s (%s)]: FAILED TO CREATE ALERTS CHECKPOINT HASH", wc->node_id, rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_ERR, "ACLK RES [%s (%s)]: FAILED TO CREATE ALERTS CHECKPOINT HASH", wc->node_id, rrdhost_hostname(host));
wc->alert_checkpoint_req = 0;
buffer_free(alarms_to_hash);

View file

@ -43,7 +43,7 @@ static void build_node_collectors(RRDHOST *host)
dictionary_destroy(dict);
freez(upd_node_collectors.claim_id);
netdata_log_access("ACLK RES [%s (%s)]: NODE COLLECTORS SENT", wc->node_id, rrdhost_hostname(host));
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK RES [%s (%s)]: NODE COLLECTORS SENT", wc->node_id, rrdhost_hostname(host));
}
static void build_node_info(RRDHOST *host)
@ -103,7 +103,7 @@ static void build_node_info(RRDHOST *host)
node_info.data.host_labels_ptr = host->rrdlabels;
aclk_update_node_info(&node_info);
netdata_log_access("ACLK RES [%s (%s)]: NODE INFO SENT for guid [%s] (%s)", wc->node_id, rrdhost_hostname(wc->host), host->machine_guid, wc->host == localhost ? "parent" : "child");
nd_log(NDLS_ACCESS, NDLP_DEBUG, "ACLK RES [%s (%s)]: NODE INFO SENT for guid [%s] (%s)", wc->node_id, rrdhost_hostname(wc->host), host->machine_guid, wc->host == localhost ? "parent" : "child");
rrd_unlock();
freez(node_info.claim_id);
@ -169,8 +169,9 @@ void aclk_check_node_info_and_collectors(void)
dfe_done(host);
if (context_loading || replicating) {
error_limit_static_thread_var(erl, 10, 100 * USEC_PER_MS);
error_limit(&erl, "%zu nodes loading contexts, %zu replicating data", context_loading, replicating);
nd_log_limit_static_thread_var(erl, 10, 100 * USEC_PER_MS);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_INFO,
"%zu nodes loading contexts, %zu replicating data", context_loading, replicating);
}
}

View file

@ -914,7 +914,9 @@ void sql_health_alarm_log_load(RRDHOST *host)
if (unlikely(!host->health_log.next_alarm_id || host->health_log.next_alarm_id <= host->health_max_alarm_id))
host->health_log.next_alarm_id = host->health_max_alarm_id + 1;
netdata_log_health("[%s]: Table health_log, loaded %zd alarm entries, errors in %zd entries.", rrdhost_hostname(host), loaded, errored);
nd_log(NDLS_DAEMON, errored ? NDLP_WARNING : NDLP_DEBUG,
"[%s]: Table health_log, loaded %zd alarm entries, errors in %zd entries.",
rrdhost_hostname(host), loaded, errored);
ret = sqlite3_finalize(res);
if (unlikely(ret != SQLITE_OK))

View file

@ -1144,7 +1144,7 @@ void vacuum_database(sqlite3 *database, const char *db_alias, int threshold, int
if (free_pages > (total_pages * threshold / 100)) {
int do_free_pages = (int) (free_pages * vacuum_pc / 100);
netdata_log_info("%s: Freeing %d database pages", db_alias, do_free_pages);
nd_log(NDLS_DAEMON, NDLP_DEBUG, "%s: Freeing %d database pages", db_alias, do_free_pages);
char sql[128];
snprintfz(sql, 127, "PRAGMA incremental_vacuum(%d)", do_free_pages);
@ -1258,7 +1258,7 @@ static void start_all_host_load_context(uv_work_t *req __maybe_unused)
RRDHOST *host;
size_t max_threads = MIN(get_netdata_cpus() / 2, 6);
netdata_log_info("METADATA: Using %zu threads for context loading", max_threads);
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Using %zu threads for context loading", max_threads);
struct host_context_load_thread *hclt = callocz(max_threads, sizeof(*hclt));
size_t thread_index;
@ -1291,7 +1291,7 @@ static void start_all_host_load_context(uv_work_t *req __maybe_unused)
cleanup_finished_threads(hclt, max_threads, true);
freez(hclt);
usec_t ended_ut = now_monotonic_usec(); (void)ended_ut;
netdata_log_info("METADATA: host contexts loaded in %0.2f ms", (double)(ended_ut - started_ut) / USEC_PER_MS);
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: host contexts loaded in %0.2f ms", (double)(ended_ut - started_ut) / USEC_PER_MS);
worker_is_idle();
}
@ -1556,7 +1556,7 @@ static void metadata_event_loop(void *arg)
wc->timer_req.data = wc;
fatal_assert(0 == uv_timer_start(&wc->timer_req, timer_cb, TIMER_INITIAL_PERIOD_MS, TIMER_REPEAT_PERIOD_MS));
netdata_log_info("Starting metadata sync thread");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "Starting metadata sync thread");
struct metadata_cmd cmd;
memset(&cmd, 0, sizeof(cmd));
@ -1684,7 +1684,7 @@ static void metadata_event_loop(void *arg)
freez(loop);
worker_unregister();
netdata_log_info("Shutting down event loop");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "Shutting down event loop");
completion_mark_complete(&wc->start_stop_complete);
if (wc->scan_complete) {
completion_destroy(wc->scan_complete);
@ -1710,15 +1710,15 @@ void metadata_sync_shutdown(void)
struct metadata_cmd cmd;
memset(&cmd, 0, sizeof(cmd));
netdata_log_info("METADATA: Sending a shutdown command");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Sending a shutdown command");
cmd.opcode = METADATA_SYNC_SHUTDOWN;
metadata_enq_cmd(&metasync_worker, &cmd);
/* wait for metadata thread to shut down */
netdata_log_info("METADATA: Waiting for shutdown ACK");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Waiting for shutdown ACK");
completion_wait_for(&metasync_worker.start_stop_complete);
completion_destroy(&metasync_worker.start_stop_complete);
netdata_log_info("METADATA: Shutdown complete");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Shutdown complete");
}
void metadata_sync_shutdown_prepare(void)
@ -1735,20 +1735,20 @@ void metadata_sync_shutdown_prepare(void)
completion_init(compl);
__atomic_store_n(&wc->scan_complete, compl, __ATOMIC_RELAXED);
netdata_log_info("METADATA: Sending a scan host command");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Sending a scan host command");
uint32_t max_wait_iterations = 2000;
while (unlikely(metadata_flag_check(&metasync_worker, METADATA_FLAG_PROCESSING)) && max_wait_iterations--) {
if (max_wait_iterations == 1999)
netdata_log_info("METADATA: Current worker is running; waiting to finish");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Current worker is running; waiting to finish");
sleep_usec(1000);
}
cmd.opcode = METADATA_SCAN_HOSTS;
metadata_enq_cmd(&metasync_worker, &cmd);
netdata_log_info("METADATA: Waiting for host scan completion");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Waiting for host scan completion");
completion_wait_for(wc->scan_complete);
netdata_log_info("METADATA: Host scan complete; can continue with shutdown");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "METADATA: Host scan complete; can continue with shutdown");
}
// -------------------------------------------------------------
@ -1766,7 +1766,7 @@ void metadata_sync_init(void)
completion_wait_for(&wc->start_stop_complete);
completion_destroy(&wc->start_stop_complete);
netdata_log_info("SQLite metadata sync initialization complete");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "SQLite metadata sync initialization complete");
}
@ -1825,7 +1825,7 @@ void metadata_queue_load_host_context(RRDHOST *host)
if (unlikely(!metasync_worker.loop))
return;
queue_metadata_cmd(METADATA_LOAD_HOST_CONTEXT, host, NULL);
netdata_log_info("Queued command to load host contexts");
nd_log(NDLS_DAEMON, NDLP_DEBUG, "Queued command to load host contexts");
}
//

View file

@ -153,8 +153,8 @@ void aws_kinesis_connector_worker(void *instance_p)
char error_message[ERROR_LINE_MAX + 1] = "";
netdata_log_debug(D_EXPORTING,
"EXPORTING: kinesis_put_record(): dest = %s, id = %s, key = %s, stream = %s, partition_key = %s, \ "
" buffer = %zu, record = %zu",
"EXPORTING: kinesis_put_record(): dest = %s, id = %s, key = %s, stream = %s, partition_key = %s, "
"buffer = %zu, record = %zu",
instance->config.destination,
connector_specific_config->auth_key_id,
connector_specific_config->secure_key,

View file

@ -82,10 +82,13 @@ static bool prepare_command(BUFFER *wb,
const char *edit_command,
const char *machine_guid,
uuid_t *transition_id,
const char *summary
const char *summary,
const char *context,
const char *component,
const char *type
) {
char buf[8192];
size_t n = 8192 - 1;
size_t n = sizeof(buf) - 1;
buffer_strcat(wb, "exec");
@ -195,6 +198,18 @@ static bool prepare_command(BUFFER *wb,
return false;
buffer_sprintf(wb, " '%s'", buf);
if (!sanitize_command_argument_string(buf, context, n))
return false;
buffer_sprintf(wb, " '%s'", buf);
if (!sanitize_command_argument_string(buf, component, n))
return false;
buffer_sprintf(wb, " '%s'", buf);
if (!sanitize_command_argument_string(buf, type, n))
return false;
buffer_sprintf(wb, " '%s'", buf);
return true;
}
@ -342,7 +357,9 @@ static void health_reload_host(RRDHOST *host) {
if(unlikely(!host->health.health_enabled) && !rrdhost_flag_check(host, RRDHOST_FLAG_INITIALIZED_HEALTH))
return;
netdata_log_health("[%s]: Reloading health.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Reloading health.",
rrdhost_hostname(host));
char *user_path = health_user_config_dir();
char *stock_path = health_stock_config_dir();
@ -436,8 +453,10 @@ static inline void health_alarm_execute(RRDHOST *host, ALARM_ENTRY *ae) {
if(unlikely(ae->new_status <= RRDCALC_STATUS_CLEAR && (ae->flags & HEALTH_ENTRY_FLAG_NO_CLEAR_NOTIFICATION))) {
// do not send notifications for disabled statuses
netdata_log_debug(D_HEALTH, "Health not sending notification for alarm '%s.%s' status %s (it has no-clear-notification enabled)", ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
netdata_log_health("[%s]: Health not sending notification for alarm '%s.%s' status %s (it has no-clear-notification enabled)", rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Health not sending notification for alarm '%s.%s' status %s (it has no-clear-notification enabled)",
rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
// mark it as run, so that we will send the same alarm if it happens again
goto done;
@ -454,10 +473,10 @@ static inline void health_alarm_execute(RRDHOST *host, ALARM_ENTRY *ae) {
// we have executed this alarm notification in the past
if(last_executed_status == ae->new_status && !(ae->flags & HEALTH_ENTRY_FLAG_IS_REPEATING)) {
// don't send the notification for the same status again
netdata_log_debug(D_HEALTH, "Health not sending again notification for alarm '%s.%s' status %s", ae_chart_id(ae), ae_name(ae)
, rrdcalc_status2string(ae->new_status));
netdata_log_health("[%s]: Health not sending again notification for alarm '%s.%s' status %s", rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae)
, rrdcalc_status2string(ae->new_status));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Health not sending again notification for alarm '%s.%s' status %s",
rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae),
rrdcalc_status2string(ae->new_status));
goto done;
}
}
@ -476,11 +495,16 @@ static inline void health_alarm_execute(RRDHOST *host, ALARM_ENTRY *ae) {
// Check if alarm notifications are silenced
if (ae->flags & HEALTH_ENTRY_FLAG_SILENCED) {
netdata_log_health("[%s]: Health not sending notification for alarm '%s.%s' status %s (command API has disabled notifications)", rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Health not sending notification for alarm '%s.%s' status %s "
"(command API has disabled notifications)",
rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
goto done;
}
netdata_log_health("[%s]: Sending notification for alarm '%s.%s' status %s.", rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Sending notification for alarm '%s.%s' status %s.",
rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), rrdcalc_status2string(ae->new_status));
const char *exec = (ae->exec) ? ae_exec(ae) : string2str(host->health.health_default_exec);
const char *recipient = (ae->recipient) ? ae_recipient(ae) : string2str(host->health.health_default_recipient);
@ -581,7 +605,11 @@ static inline void health_alarm_execute(RRDHOST *host, ALARM_ENTRY *ae) {
edit_command,
host->machine_guid,
&ae->transition_id,
host->health.use_summary_for_notifications && ae->summary?ae_summary(ae):ae_name(ae));
host->health.use_summary_for_notifications && ae->summary?ae_summary(ae):ae_name(ae),
string2str(ae->chart_context),
string2str(ae->component),
string2str(ae->type)
);
const char *command_to_run = buffer_tostring(wb);
if (ok) {
@ -778,7 +806,8 @@ static void health_main_cleanup(void *ptr) {
netdata_log_info("cleaning up...");
static_thread->enabled = NETDATA_MAIN_THREAD_EXITED;
netdata_log_health("Health thread ended.");
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Health thread ended.");
}
static void initialize_health(RRDHOST *host)
@ -790,7 +819,9 @@ static void initialize_health(RRDHOST *host)
rrdhost_flag_set(host, RRDHOST_FLAG_INITIALIZED_HEALTH);
netdata_log_health("[%s]: Initializing health.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Initializing health.",
rrdhost_hostname(host));
host->health.health_default_warn_repeat_every = config_get_duration(CONFIG_SECTION_HEALTH, "default repeat warning", "never");
host->health.health_default_crit_repeat_every = config_get_duration(CONFIG_SECTION_HEALTH, "default repeat critical", "never");
@ -803,7 +834,11 @@ static void initialize_health(RRDHOST *host)
long n = config_get_number(CONFIG_SECTION_HEALTH, "in memory max health log entries", host->health_log.max);
if(n < 10) {
netdata_log_health("Host '%s': health configuration has invalid max log entries %ld. Using default %u", rrdhost_hostname(host), n, host->health_log.max);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Host '%s': health configuration has invalid max log entries %ld. "
"Using default %u",
rrdhost_hostname(host), n, host->health_log.max);
config_set_number(CONFIG_SECTION_HEALTH, "in memory max health log entries", (long)host->health_log.max);
}
else
@ -811,7 +846,11 @@ static void initialize_health(RRDHOST *host)
uint32_t m = config_get_number(CONFIG_SECTION_HEALTH, "health log history", HEALTH_LOG_DEFAULT_HISTORY);
if (m < HEALTH_LOG_MINIMUM_HISTORY) {
netdata_log_health("Host '%s': health configuration has invalid health log history %u. Using minimum %d", rrdhost_hostname(host), m, HEALTH_LOG_MINIMUM_HISTORY);
nd_log(NDLS_DAEMON, NDLP_WARNING,
"Host '%s': health configuration has invalid health log history %u. "
"Using minimum %d",
rrdhost_hostname(host), m, HEALTH_LOG_MINIMUM_HISTORY);
config_set_number(CONFIG_SECTION_HEALTH, "health log history", HEALTH_LOG_MINIMUM_HISTORY);
m = HEALTH_LOG_MINIMUM_HISTORY;
}
@ -823,7 +862,9 @@ static void initialize_health(RRDHOST *host)
} else
host->health_log.health_log_history = m;
netdata_log_health("[%s]: Health log history is set to %u seconds (%u days)", rrdhost_hostname(host), host->health_log.health_log_history, host->health_log.health_log_history / 86400);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Health log history is set to %u seconds (%u days)",
rrdhost_hostname(host), host->health_log.health_log_history, host->health_log.health_log_history / 86400);
conf_enabled_alarms = simple_pattern_create(config_get(CONFIG_SECTION_HEALTH, "enabled alarms", "*"), NULL,
SIMPLE_PATTERN_EXACT, true);
@ -1049,7 +1090,7 @@ void *health_main(void *ptr) {
if (unlikely(check_if_resumed_from_suspension())) {
apply_hibernation_delay = 1;
netdata_log_health(
nd_log(NDLS_DAEMON, NDLP_NOTICE,
"Postponing alarm checks for %"PRId64" seconds, "
"because it seems that the system was just resumed from suspension.",
(int64_t)hibernation_delay);
@ -1058,8 +1099,9 @@ void *health_main(void *ptr) {
if (unlikely(silencers->all_alarms && silencers->stype == STYPE_DISABLE_ALARMS)) {
static int logged=0;
if (!logged) {
netdata_log_health("Skipping health checks, because all alarms are disabled via a %s command.",
HEALTH_CMDAPI_CMD_DISABLEALL);
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Skipping health checks, because all alarms are disabled via a %s command.",
HEALTH_CMDAPI_CMD_DISABLEALL);
logged = 1;
}
}
@ -1081,7 +1123,7 @@ void *health_main(void *ptr) {
rrdcalc_delete_alerts_not_matching_host_labels_from_this_host(host);
if (unlikely(apply_hibernation_delay)) {
netdata_log_health(
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Postponing health checks for %"PRId64" seconds.",
rrdhost_hostname(host),
(int64_t)hibernation_delay);
@ -1094,20 +1136,30 @@ void *health_main(void *ptr) {
continue;
}
netdata_log_health("[%s]: Resuming health checks after delay.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Resuming health checks after delay.",
rrdhost_hostname(host));
host->health.health_delay_up_to = 0;
}
// wait until cleanup of obsolete charts on children is complete
if (host != localhost) {
if (unlikely(host->trigger_chart_obsoletion_check == 1)) {
netdata_log_health("[%s]: Waiting for chart obsoletion check.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Waiting for chart obsoletion check.",
rrdhost_hostname(host));
continue;
}
}
if (!health_running_logged) {
netdata_log_health("[%s]: Health is running.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Health is running.",
rrdhost_hostname(host));
health_running_logged = true;
}
@ -1161,6 +1213,7 @@ void *health_main(void *ptr) {
rrdcalc_isrepeating(rc)?HEALTH_ENTRY_FLAG_IS_REPEATING:0);
if (ae) {
health_log_alert(host, ae);
health_alarm_log_add_entry(host, ae);
rc->old_status = rc->status;
rc->status = RRDCALC_STATUS_REMOVED;
@ -1432,9 +1485,13 @@ void *health_main(void *ptr) {
)
);
health_log_alert(host, ae);
health_alarm_log_add_entry(host, ae);
netdata_log_health("[%s]: Alert event for [%s.%s], value [%s], status [%s].", rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), ae_new_value_string(ae), rrdcalc_status2string(ae->new_status));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Alert event for [%s.%s], value [%s], status [%s].",
rrdhost_hostname(host), ae_chart_id(ae), ae_name(ae), ae_new_value_string(ae),
rrdcalc_status2string(ae->new_status));
rc->last_status_change_value = rc->value;
rc->last_status_change = now;
@ -1519,6 +1576,7 @@ void *health_main(void *ptr) {
)
);
health_log_alert(host, ae);
ae->last_repeat = rc->last_repeat;
if (!(rc->run_flags & RRDCALC_FLAG_RUN_ONCE) && rc->status == RRDCALC_STATUS_CLEAR) {
ae->flags |= HEALTH_ENTRY_RUN_ONCE;

View file

@ -105,4 +105,7 @@ void sql_refresh_hashes(void);
void health_add_host_labels(void);
void health_string2json(BUFFER *wb, const char *prefix, const char *label, const char *value, const char *suffix);
void health_log_alert_transition_with_trace(RRDHOST *host, ALARM_ENTRY *ae, int line, const char *file, const char *function);
#define health_log_alert(host, ae) health_log_alert_transition_with_trace(host, ae, __LINE__, __FILE__, __FUNCTION__)
#endif //NETDATA_HEALTH_H

View file

@ -1368,7 +1368,10 @@ void health_readdir(RRDHOST *host, const char *user_path, const char *stock_path
CONFIG_BOOLEAN_YES);
if (!stock_enabled) {
netdata_log_health("[%s]: Netdata will not load stock alarms.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Netdata will not load stock alarms.",
rrdhost_hostname(host));
stock_path = user_path;
}
@ -1376,6 +1379,10 @@ void health_readdir(RRDHOST *host, const char *user_path, const char *stock_path
health_rrdvars = health_rrdvariables_create();
recursive_config_double_dir_load(user_path, stock_path, subpath, health_readfile, (void *) host, 0);
netdata_log_health("[%s]: Read health configuration.", rrdhost_hostname(host));
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"[%s]: Read health configuration.",
rrdhost_hostname(host));
sql_store_hashes = 0;
}

View file

@ -8,6 +8,79 @@ inline void health_alarm_log_save(RRDHOST *host, ALARM_ENTRY *ae) {
sql_health_alarm_log_save(host, ae);
}
void health_log_alert_transition_with_trace(RRDHOST *host, ALARM_ENTRY *ae, int line, const char *file, const char *function) {
ND_LOG_STACK lgs[] = {
ND_LOG_FIELD_UUID(NDF_MESSAGE_ID, &health_alert_transition_msgid),
ND_LOG_FIELD_STR(NDF_NIDL_NODE, host->hostname),
ND_LOG_FIELD_STR(NDF_NIDL_INSTANCE, ae->chart_name),
ND_LOG_FIELD_STR(NDF_NIDL_CONTEXT, ae->chart_context),
ND_LOG_FIELD_U64(NDF_ALERT_ID, ae->alarm_id),
ND_LOG_FIELD_U64(NDF_ALERT_UNIQUE_ID, ae->unique_id),
ND_LOG_FIELD_U64(NDF_ALERT_EVENT_ID, ae->alarm_event_id),
ND_LOG_FIELD_UUID(NDF_ALERT_CONFIG_HASH, &ae->config_hash_id),
ND_LOG_FIELD_UUID(NDF_ALERT_TRANSITION_ID, &ae->transition_id),
ND_LOG_FIELD_STR(NDF_ALERT_NAME, ae->name),
ND_LOG_FIELD_STR(NDF_ALERT_CLASS, ae->classification),
ND_LOG_FIELD_STR(NDF_ALERT_COMPONENT, ae->component),
ND_LOG_FIELD_STR(NDF_ALERT_TYPE, ae->type),
ND_LOG_FIELD_STR(NDF_ALERT_EXEC, ae->exec),
ND_LOG_FIELD_STR(NDF_ALERT_RECIPIENT, ae->recipient),
ND_LOG_FIELD_STR(NDF_ALERT_SOURCE, ae->exec),
ND_LOG_FIELD_STR(NDF_ALERT_UNITS, ae->units),
ND_LOG_FIELD_STR(NDF_ALERT_SUMMARY, ae->summary),
ND_LOG_FIELD_STR(NDF_ALERT_INFO, ae->info),
ND_LOG_FIELD_DBL(NDF_ALERT_VALUE, ae->new_value),
ND_LOG_FIELD_DBL(NDF_ALERT_VALUE_OLD, ae->old_value),
ND_LOG_FIELD_TXT(NDF_ALERT_STATUS, rrdcalc_status2string(ae->new_status)),
ND_LOG_FIELD_TXT(NDF_ALERT_STATUS_OLD, rrdcalc_status2string(ae->old_status)),
ND_LOG_FIELD_I64(NDF_ALERT_DURATION, ae->duration),
ND_LOG_FIELD_I64(NDF_RESPONSE_CODE, ae->exec_code),
ND_LOG_FIELD_U64(NDF_ALERT_NOTIFICATION_REALTIME_USEC, ae->delay_up_to_timestamp * USEC_PER_SEC),
ND_LOG_FIELD_END(),
};
ND_LOG_STACK_PUSH(lgs);
errno = 0;
ND_LOG_FIELD_PRIORITY priority = NDLP_INFO;
switch(ae->new_status) {
case RRDCALC_STATUS_UNDEFINED:
if(ae->old_status >= RRDCALC_STATUS_CLEAR)
priority = NDLP_NOTICE;
else
priority = NDLP_DEBUG;
break;
default:
case RRDCALC_STATUS_UNINITIALIZED:
case RRDCALC_STATUS_REMOVED:
priority = NDLP_DEBUG;
break;
case RRDCALC_STATUS_CLEAR:
priority = NDLP_INFO;
break;
case RRDCALC_STATUS_WARNING:
if(ae->old_status < RRDCALC_STATUS_WARNING)
priority = NDLP_WARNING;
break;
case RRDCALC_STATUS_CRITICAL:
if(ae->old_status < RRDCALC_STATUS_CRITICAL)
priority = NDLP_CRIT;
break;
}
netdata_logger(NDLS_HEALTH, priority, file, function, line,
"ALERT '%s' of instance '%s' on node '%s', transitioned from %s to %s",
string2str(ae->name), string2str(ae->chart), string2str(host->hostname),
rrdcalc_status2string(ae->old_status), rrdcalc_status2string(ae->new_status)
);
}
// ----------------------------------------------------------------------------
// health alarm log management

View file

@ -42,6 +42,8 @@
# -----------------------------------------------------------------------------
# testing notifications
cmd_line="'${0}' $(printf "'%s' " "${@}")"
if { [ "${1}" = "test" ] || [ "${2}" = "test" ]; } && [ "${#}" -le 2 ]; then
if [ "${2}" = "test" ]; then
recipient="${1}"
@ -78,61 +80,139 @@ export PATH="${PATH}:/sbin:/usr/sbin:/usr/local/sbin"
export LC_ALL=C
# -----------------------------------------------------------------------------
# logging
PROGRAM_NAME="$(basename "${0}")"
LOG_LEVEL_ERR=1
LOG_LEVEL_WARN=2
LOG_LEVEL_INFO=3
LOG_LEVEL="$LOG_LEVEL_INFO"
# these should be the same with syslog() priorities
NDLP_EMERG=0 # system is unusable
NDLP_ALERT=1 # action must be taken immediately
NDLP_CRIT=2 # critical conditions
NDLP_ERR=3 # error conditions
NDLP_WARN=4 # warning conditions
NDLP_NOTICE=5 # normal but significant condition
NDLP_INFO=6 # informational
NDLP_DEBUG=7 # debug-level messages
set_log_severity_level() {
case ${NETDATA_LOG_SEVERITY_LEVEL,,} in
"info") LOG_LEVEL="$LOG_LEVEL_INFO";;
"warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";;
"err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";;
# the max (numerically) log level we will log
LOG_LEVEL=$NDLP_INFO
set_log_min_priority() {
case "${NETDATA_LOG_PRIORITY_LEVEL,,}" in
"emerg" | "emergency")
LOG_LEVEL=$NDLP_EMERG
;;
"alert")
LOG_LEVEL=$NDLP_ALERT
;;
"crit" | "critical")
LOG_LEVEL=$NDLP_CRIT
;;
"err" | "error")
LOG_LEVEL=$NDLP_ERR
;;
"warn" | "warning")
LOG_LEVEL=$NDLP_WARN
;;
"notice")
LOG_LEVEL=$NDLP_NOTICE
;;
"info")
LOG_LEVEL=$NDLP_INFO
;;
"debug")
LOG_LEVEL=$NDLP_DEBUG
;;
esac
}
set_log_severity_level
logdate() {
date "+%Y-%m-%d %H:%M:%S"
}
set_log_min_priority
log() {
local status="${1}"
shift
local level="${1}"
shift 1
echo >&2 "$(logdate): ${PROGRAM_NAME}: ${status}: ${*}"
[[ -n "$level" && -n "$LOG_LEVEL" && "$level" -gt "$LOG_LEVEL" ]] && return
systemd-cat-native --log-as-netdata --newline="{NEWLINE}" <<EOFLOG
INVOCATION_ID=${NETDATA_INVOCATION_ID}
SYSLOG_IDENTIFIER=${PROGRAM_NAME}
PRIORITY=${level}
THREAD_TAG="alarm-notify"
ND_LOG_SOURCE=health
ND_NIDL_NODE=${host}
ND_NIDL_INSTANCE=${chart}
ND_NIDL_CONTEXT=${context}
ND_ALERT_NAME=${name}
ND_ALERT_ID=${alarm_id}
ND_ALERT_UNIQUE_ID=${unique_id}
ND_ALERT_EVENT_ID=${alarm_event_id}
ND_ALERT_TRANSITION_ID=${transition_id//-/}
ND_ALERT_CLASS=${classification}
ND_ALERT_COMPONENT=${component}
ND_ALERT_TYPE=${type}
ND_ALERT_RECIPIENT=${roles}
ND_ALERT_VALUE=${value}
ND_ALERT_VALUE_OLD=${old_value}
ND_ALERT_STATUS=${status}
ND_ALERT_STATUS_OLD=${old_status}
ND_ALERT_UNITS=${units}
ND_ALERT_SUMMARY=${summary}
ND_ALERT_INFO=${info}
ND_ALERT_DURATION=${duration}
ND_REQUEST=${cmd_line}
MESSAGE_ID=6db0018e83e34320ae2a659d78019fb7
MESSAGE=[ALERT NOTIFICATION]: ${*//[$'\r\n']/{NEWLINE}}
EOFLOG
# AN EMPTY LINE IS NEEDED ABOVE
}
info() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return
log INFO "${@}"
log "$NDLP_INFO" "${@}"
}
warning() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return
log WARNING "${@}"
log "$NDLP_WARN" "${@}"
}
error() {
[[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return
log ERROR "${@}"
log "$NDLP_ERR" "${@}"
}
fatal() {
log FATAL "${@}"
log "$NDLP_ALERT" "${@}"
exit 1
}
debug=${NETDATA_ALARM_NOTIFY_DEBUG-0}
debug() {
[ "${debug}" = "1" ] && log DEBUG "${@}"
log "$NDLP_DEBUG" "${@}"
}
debug=0
if [ "${NETDATA_ALARM_NOTIFY_DEBUG-0}" = "1" ]; then
debug=1
LOG_LEVEL=$NDLP_DEBUG
fi
# -----------------------------------------------------------------------------
# check for BASH v4+ (required for associative arrays)
if [ ${BASH_VERSINFO[0]} -lt 4 ]; then
echo >&2 "BASH version 4 or later is required (this is ${BASH_VERSION})."
exit 1
fi
# -----------------------------------------------------------------------------
docurl() {
if [ -z "${curl}" ]; then
error "${curl} is unset."
@ -199,16 +279,9 @@ ntfy
# this is to be overwritten by the config file
custom_sender() {
info "not sending custom notification for ${status} of '${host}.${chart}.${name}'"
info "custom notification mechanism is not configured; not sending ${notification_description}"
}
# -----------------------------------------------------------------------------
# check for BASH v4+ (required for associative arrays)
if [ ${BASH_VERSINFO[0]} -lt 4 ]; then
fatal "BASH version 4 or later is required (this is ${BASH_VERSION})."
fi
# -----------------------------------------------------------------------------
# defaults to allow running this script by hand
@ -228,8 +301,8 @@ if [[ ${1} = "unittest" ]]; then
status="${4}" # the current status : REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL
old_status="${5}" # the previous status: REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL
elif [[ ${1} = "dump_methods" ]]; then
dump_methods=1
status="WARNING"
dump_methods=1
status="WARNING"
else
roles="${1}" # the roles that should be notified for this event
args_host="${2}" # the host generated this event
@ -263,6 +336,9 @@ else
child_machine_guid="${28}" # the machine_guid of the child
transition_id="${29}" # the transition_id of the alert
summary="${30}" # the summary text field of the alert
context="${31}" # the context of the chart
component="${32}"
type="${33}"
fi
# -----------------------------------------------------------------------------
@ -276,18 +352,20 @@ else
host="${args_host}"
fi
notification_description="notification to '${roles}' for transition from ${old_status} to ${status}, of alert '${name}' = '${value_string}', of instance '${chart}', context '${context}' on host '${host}'"
# -----------------------------------------------------------------------------
# screen statuses we don't need to send a notification
# don't do anything if this is not WARNING, CRITICAL or CLEAR
if [ "${status}" != "WARNING" ] && [ "${status}" != "CRITICAL" ] && [ "${status}" != "CLEAR" ]; then
info "not sending notification for ${status} of '${host}.${chart}.${name}'"
debug "not sending ${notification_description}"
exit 1
fi
# don't do anything if this is CLEAR, but it was not WARNING or CRITICAL
if [ "${clear_alarm_always}" != "YES" ] && [ "${old_status}" != "WARNING" ] && [ "${old_status}" != "CRITICAL" ] && [ "${status}" = "CLEAR" ]; then
info "not sending notification for ${status} of '${host}.${chart}.${name}' (last status was ${old_status})"
debug "not sending ${notification_description}"
exit 1
fi
@ -434,7 +512,7 @@ else
debug "Loading config file '${CONFIG}'..."
source "${CONFIG}" || error "Failed to load config file '${CONFIG}'."
else
warning "Cannot find file '${CONFIG}'."
debug "Cannot find file '${CONFIG}'."
fi
done
fi
@ -598,7 +676,16 @@ filter_recipient_by_criticality() {
}
# -----------------------------------------------------------------------------
# verify the delivery methods supported
# check the configured targets
# check email
if [ "${SEND_EMAIL}" = "AUTO" ]; then
if command -v curl >/dev/null 2>&1; then
SEND_EMAIL="YES"
else
SEND_EMAIL="NO"
fi
fi
# check slack
[ -z "${SLACK_WEBHOOK_URL}" ] && SEND_SLACK="NO"
@ -677,112 +764,121 @@ filter_recipient_by_criticality() {
# check custom
[ -z "${DEFAULT_RECIPIENT_CUSTOM}" ] && SEND_CUSTOM="NO"
if [ "${SEND_PUSHOVER}" = "YES" ] ||
[ "${SEND_SLACK}" = "YES" ] ||
[ "${SEND_ROCKETCHAT}" = "YES" ] ||
[ "${SEND_ALERTA}" = "YES" ] ||
[ "${SEND_PD}" = "YES" ] ||
[ "${SEND_FLOCK}" = "YES" ] ||
[ "${SEND_DISCORD}" = "YES" ] ||
[ "${SEND_HIPCHAT}" = "YES" ] ||
[ "${SEND_TWILIO}" = "YES" ] ||
[ "${SEND_MESSAGEBIRD}" = "YES" ] ||
[ "${SEND_KAVENEGAR}" = "YES" ] ||
[ "${SEND_TELEGRAM}" = "YES" ] ||
[ "${SEND_PUSHBULLET}" = "YES" ] ||
[ "${SEND_KAFKA}" = "YES" ] ||
[ "${SEND_FLEEP}" = "YES" ] ||
[ "${SEND_PROWL}" = "YES" ] ||
[ "${SEND_MATRIX}" = "YES" ] ||
[ "${SEND_CUSTOM}" = "YES" ] ||
[ "${SEND_MSTEAMS}" = "YES" ] ||
[ "${SEND_DYNATRACE}" = "YES" ] ||
[ "${SEND_OPSGENIE}" = "YES" ] ||
[ "${SEND_GOTIFY}" = "YES" ] ||
[ "${SEND_NTFY}" = "YES" ]; then
# if we need curl, check for the curl command
if [ -z "${curl}" ]; then
curl="$(command -v curl 2>/dev/null)"
fi
if [ -z "${curl}" ]; then
error "Cannot find curl command in the system path. Disabling all curl based notifications."
SEND_PUSHOVER="NO"
SEND_PUSHBULLET="NO"
SEND_TELEGRAM="NO"
SEND_SLACK="NO"
SEND_MSTEAMS="NO"
SEND_ROCKETCHAT="NO"
SEND_ALERTA="NO"
SEND_PD="NO"
SEND_FLOCK="NO"
SEND_DISCORD="NO"
SEND_TWILIO="NO"
SEND_HIPCHAT="NO"
SEND_MESSAGEBIRD="NO"
SEND_KAVENEGAR="NO"
SEND_KAFKA="NO"
SEND_FLEEP="NO"
SEND_PROWL="NO"
SEND_MATRIX="NO"
SEND_CUSTOM="NO"
SEND_DYNATRACE="NO"
SEND_OPSGENIE="NO"
SEND_GOTIFY="NO"
SEND_NTFY="NO"
fi
fi
# -----------------------------------------------------------------------------
# check the availability of targets
if [ "${SEND_SMS}" = "YES" ]; then
if [ -z "${sendsms}" ]; then
sendsms="$(command -v sendsms 2>/dev/null)"
fi
if [ -z "${sendsms}" ]; then
SEND_SMS="NO"
fi
fi
# if we need sendmail, check for the sendmail command
if [ "${SEND_EMAIL}" = "YES" ] && [ -z "${sendmail}" ]; then
sendmail="$(command -v sendmail 2>/dev/null)"
if [ -z "${sendmail}" ]; then
debug "Cannot find sendmail command in the system path. Disabling email notifications."
SEND_EMAIL="NO"
fi
fi
check_supported_targets() {
local log=${1}
shift
# if we need logger, check for the logger command
if [ "${SEND_SYSLOG}" = "YES" ] && [ -z "${logger}" ]; then
logger="$(command -v logger 2>/dev/null)"
if [ -z "${logger}" ]; then
debug "Cannot find logger command in the system path. Disabling syslog notifications."
SEND_SYSLOG="NO"
if [ "${SEND_PUSHOVER}" = "YES" ] ||
[ "${SEND_SLACK}" = "YES" ] ||
[ "${SEND_ROCKETCHAT}" = "YES" ] ||
[ "${SEND_ALERTA}" = "YES" ] ||
[ "${SEND_PD}" = "YES" ] ||
[ "${SEND_FLOCK}" = "YES" ] ||
[ "${SEND_DISCORD}" = "YES" ] ||
[ "${SEND_HIPCHAT}" = "YES" ] ||
[ "${SEND_TWILIO}" = "YES" ] ||
[ "${SEND_MESSAGEBIRD}" = "YES" ] ||
[ "${SEND_KAVENEGAR}" = "YES" ] ||
[ "${SEND_TELEGRAM}" = "YES" ] ||
[ "${SEND_PUSHBULLET}" = "YES" ] ||
[ "${SEND_KAFKA}" = "YES" ] ||
[ "${SEND_FLEEP}" = "YES" ] ||
[ "${SEND_PROWL}" = "YES" ] ||
[ "${SEND_MATRIX}" = "YES" ] ||
[ "${SEND_CUSTOM}" = "YES" ] ||
[ "${SEND_MSTEAMS}" = "YES" ] ||
[ "${SEND_DYNATRACE}" = "YES" ] ||
[ "${SEND_OPSGENIE}" = "YES" ] ||
[ "${SEND_GOTIFY}" = "YES" ] ||
[ "${SEND_NTFY}" = "YES" ]; then
# if we need curl, check for the curl command
if [ -z "${curl}" ]; then
curl="$(command -v curl 2>/dev/null)"
fi
if [ -z "${curl}" ]; then
$log "Cannot find curl command in the system path. Disabling all curl based notifications."
SEND_PUSHOVER="NO"
SEND_PUSHBULLET="NO"
SEND_TELEGRAM="NO"
SEND_SLACK="NO"
SEND_MSTEAMS="NO"
SEND_ROCKETCHAT="NO"
SEND_ALERTA="NO"
SEND_PD="NO"
SEND_FLOCK="NO"
SEND_DISCORD="NO"
SEND_TWILIO="NO"
SEND_HIPCHAT="NO"
SEND_MESSAGEBIRD="NO"
SEND_KAVENEGAR="NO"
SEND_KAFKA="NO"
SEND_FLEEP="NO"
SEND_PROWL="NO"
SEND_MATRIX="NO"
SEND_CUSTOM="NO"
SEND_DYNATRACE="NO"
SEND_OPSGENIE="NO"
SEND_GOTIFY="NO"
SEND_NTFY="NO"
fi
fi
fi
# if we need aws, check for the aws command
if [ "${SEND_AWSSNS}" = "YES" ] && [ -z "${aws}" ]; then
aws="$(command -v aws 2>/dev/null)"
if [ -z "${aws}" ]; then
debug "Cannot find aws command in the system path. Disabling Amazon SNS notifications."
SEND_AWSSNS="NO"
if [ "${SEND_SMS}" = "YES" ]; then
if [ -z "${sendsms}" ]; then
sendsms="$(command -v sendsms 2>/dev/null)"
fi
if [ -z "${sendsms}" ]; then
SEND_SMS="NO"
fi
fi
# if we need sendmail, check for the sendmail command
if [ "${SEND_EMAIL}" = "YES" ] && [ -z "${sendmail}" ]; then
sendmail="$(command -v sendmail 2>/dev/null)"
if [ -z "${sendmail}" ]; then
$log "Cannot find sendmail command in the system path. Disabling email notifications."
SEND_EMAIL="NO"
fi
fi
fi
# if we need nc, check for the nc command
if [ "${SEND_IRC}" = "YES" ] && [ -z "${nc}" ]; then
nc="$(command -v nc 2>/dev/null)"
if [ -z "${nc}" ]; then
debug "Cannot find nc command in the system path. Disabling IRC notifications."
SEND_IRC="NO"
# if we need logger, check for the logger command
if [ "${SEND_SYSLOG}" = "YES" ] && [ -z "${logger}" ]; then
logger="$(command -v logger 2>/dev/null)"
if [ -z "${logger}" ]; then
$log "Cannot find logger command in the system path. Disabling syslog notifications."
SEND_SYSLOG="NO"
fi
fi
fi
# if we need aws, check for the aws command
if [ "${SEND_AWSSNS}" = "YES" ] && [ -z "${aws}" ]; then
aws="$(command -v aws 2>/dev/null)"
if [ -z "${aws}" ]; then
$log "Cannot find aws command in the system path. Disabling Amazon SNS notifications."
SEND_AWSSNS="NO"
fi
fi
# if we need nc, check for the nc command
if [ "${SEND_IRC}" = "YES" ] && [ -z "${nc}" ]; then
nc="$(command -v nc 2>/dev/null)"
if [ -z "${nc}" ]; then
$log "Cannot find nc command in the system path. Disabling IRC notifications."
SEND_IRC="NO"
fi
fi
}
if [ ${dump_methods} ]; then
check_supported_targets debug
for name in "${!SEND_@}"; do
if [ "${!name}" = "YES" ]; then
echo "$name"
fi
done
exit
exit 0
fi
# -----------------------------------------------------------------------------
@ -790,6 +886,7 @@ fi
# netdata may call us with multiple roles, and roles may have multiple but
# overlapping recipients - so, here we find the unique recipients.
have_to_send_something="NO"
for method_name in ${method_names}; do
send_var="SEND_${method_name^^}"
if [ "${!send_var}" = "NO" ]; then
@ -819,7 +916,11 @@ for method_name in ${method_names}; do
to_var="to_${method_name}"
declare to_${method_name}="${!arr_var[*]}"
[ -z "${!to_var}" ] && declare ${send_var}="NO"
if [ -z "${!to_var}" ]; then
declare ${send_var}="NO"
else
have_to_send_something="YES"
fi
done
# -----------------------------------------------------------------------------
@ -884,10 +985,18 @@ for method in "${SEND_EMAIL}" \
break
fi
done
if [ "$proceed" -eq 0 ]; then
fatal "All notification methods are disabled. Not sending notification for host '${host}', chart '${chart}' to '${roles}' for '${name}' = '${value}' for status '${status}'."
if [ "${have_to_send_something}" = "NO" ]; then
debug "All notification methods are disabled; not sending ${notification_description}."
exit 0
else
fatal "All notification methods are disabled; not sending ${notification_description}."
fi
fi
check_supported_targets error
# -----------------------------------------------------------------------------
# get the date the alarm happened
@ -1023,10 +1132,10 @@ send_email() {
ret=$?
if [ ${ret} -eq 0 ]; then
info "sent email notification for: ${host} ${chart}.${name} is ${status} to '${to_email}'"
info "sent email to '${to_email}' for ${notification_description}"
return 0
else
error "failed to send email notification for: ${host} ${chart}.${name} is ${status} to '${to_email}' with error code ${ret} (${cmd_output})."
error "failed to send email to '${to_email}' for ${notification_description}, with error code ${ret} (${cmd_output})."
return 1
fi
fi
@ -1065,10 +1174,10 @@ send_pushover() {
https://api.pushover.net/1/messages.json)
if [ "${httpcode}" = "200" ]; then
info "sent pushover notification for: ${host} ${chart}.${name} is ${status} to '${user}'"
info "sent pushover notification to '${user}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send pushover notification for: ${host} ${chart}.${name} is ${status} to '${user}' with HTTP response status code ${httpcode}."
error "failed to send pushover notification to '${user}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1112,10 +1221,10 @@ EOF
) "https://api.pushbullet.com/v2/pushes" -X POST)
if [ "${httpcode}" = "200" ]; then
info "sent pushbullet notification for: ${host} ${chart}.${name} is ${status} to '${userOrChannelTag}'"
info "sent pushbullet notification to '${userOrChannelTag}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send pushbullet notification for: ${host} ${chart}.${name} is ${status} to '${userOrChannelTag}' with HTTP response status code ${httpcode}."
error "failed to send pushbullet notification to '${userOrChannelTag}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1136,10 +1245,10 @@ send_kafka() {
"${KAFKA_URL}")
if [ "${httpcode}" = "204" ]; then
info "sent kafka data for: ${host} ${chart}.${name} is ${status} and ip '${KAFKA_SENDER_IP}'"
info "sent kafka data to '${KAFKA_SENDER_IP}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send kafka data for: ${host} ${chart}.${name} is ${status} and ip '${KAFKA_SENDER_IP}' with HTTP response status code ${httpcode}."
error "failed to send kafka data to '${KAFKA_SENDER_IP}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
[ ${sent} -gt 0 ] && return 0
@ -1237,10 +1346,10 @@ EOF
fi
httpcode=$(docurl -X POST --data "${payload}" ${url})
if [ "${httpcode}" = "${response_code}" ]; then
info "sent pagerduty notification for: ${host} ${chart}.${name} is ${status}'"
info "sent pagerduty event for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send pagerduty notification for: ${host} ${chart}.${name} is ${status}, with HTTP response status code ${httpcode}."
error "failed to send pagerduty event for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1266,10 +1375,10 @@ send_twilio() {
"https://api.twilio.com/2010-04-01/Accounts/${accountsid}/Messages.json")
if [ "${httpcode}" = "201" ]; then
info "sent Twilio SMS for: ${host} ${chart}.${name} is ${status} to '${user}'"
info "sent Twilio SMS to '${user}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send Twilio SMS for: ${host} ${chart}.${name} is ${status} to '${user}' with HTTP response status code ${httpcode}."
error "failed to send Twilio SMS to '${user}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1315,10 +1424,10 @@ send_hipchat() {
"https://${HIPCHAT_SERVER}/v2/room/${room}/notification")
if [ "${httpcode}" = "204" ]; then
info "sent HipChat notification for: ${host} ${chart}.${name} is ${status} to '${room}'"
info "sent HipChat notification to '${room}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send HipChat notification for: ${host} ${chart}.${name} is ${status} to '${room}' with HTTP response status code ${httpcode}."
error "failed to send HipChat notification to '${room}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1345,10 +1454,10 @@ send_messagebird() {
"https://rest.messagebird.com/messages")
if [ "${httpcode}" = "201" ]; then
info "sent Messagebird SMS for: ${host} ${chart}.${name} is ${status} to '${user}'"
info "sent Messagebird SMS to '${user}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send Messagebird SMS for: ${host} ${chart}.${name} is ${status} to '${user}' with HTTP response status code ${httpcode}."
error "failed to send Messagebird SMS to '${user}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1372,10 +1481,10 @@ send_kavenegar() {
--data-urlencode "message=${title} ${message}")
if [ "${httpcode}" = "200" ]; then
info "sent Kavenegar SMS for: ${host} ${chart}.${name} is ${status} to '${user}'"
info "sent Kavenegar SMS to '${user}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send Kavenegar SMS for: ${host} ${chart}.${name} is ${status} to '${user}' with HTTP response status code ${httpcode}."
error "failed to send Kavenegar SMS to '${user}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1416,21 +1525,21 @@ send_telegram() {
notify_telegram=0
if [ "${httpcode}" = "200" ]; then
info "sent telegram notification for: ${host} ${chart}.${name} is ${status} to '${chatid}'"
info "sent telegram notification to '${chatid}' for ${notification_description}"
sent=$((sent + 1))
elif [ "${httpcode}" = "401" ]; then
error "failed to send telegram notification for: ${host} ${chart}.${name} is ${status} to '${chatid}': Wrong bot token."
error "failed to send telegram notification to '${chatid}' for ${notification_description}, wrong bot token."
elif [ "${httpcode}" = "429" ]; then
if [ "$notify_retries" -gt 0 ]; then
error "failed to send telegram notification for: ${host} ${chart}.${name} is ${status} to '${chatid}': rate limit exceeded, retrying after 1s."
error "failed to send telegram notification to '${chatid}' for ${notification_description}, rate limit exceeded, retrying after 1s."
notify_retries=$((notify_retries - 1))
notify_telegram=1
sleep 1
else
error "failed to send telegram notification for: ${host} ${chart}.${name} is ${status} to '${chatid}': rate limit exceeded."
error "failed to send telegram notification to '${chatid}' for ${notification_description}, rate limit exceeded."
fi
else
error "failed to send telegram notification for: ${host} ${chart}.${name} is ${status} to '${chatid}' with HTTP response status code ${httpcode}."
error "failed to send telegram notification to '${chatid}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
done
@ -1487,10 +1596,10 @@ EOF
httpcode=$(docurl -H "Content-Type: application/json" -d "${payload}" "${cur_webhook}")
if [ "${httpcode}" = "200" ]; then
info "sent Microsoft team notification for: ${host} ${chart}.${name} is ${status} to '${cur_webhook}'"
info "sent Microsoft team notification to '${cur_webhook}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send Microsoft team notification for: ${host} ${chart}.${name} is ${status} to '${cur_webhook}', with HTTP response status code ${httpcode}."
error "failed to send Microsoft team to '${cur_webhook}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1558,10 +1667,10 @@ EOF
httpcode=$(docurl -X POST --data-urlencode "payload=${payload}" "${webhook}")
if [ "${httpcode}" = "200" ]; then
info "sent slack notification for: ${host} ${chart}.${name} is ${status} ${chstr}"
info "sent slack notification ${chstr} for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send slack notification for: ${host} ${chart}.${name} is ${status} ${chstr}, with HTTP response status code ${httpcode}."
error "failed to send slack notification ${chstr} for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1616,10 +1725,10 @@ EOF
httpcode=$(docurl -X POST --data-urlencode "payload=${payload}" "${webhook}")
if [ "${httpcode}" = "200" ]; then
info "sent rocketchat notification for: ${host} ${chart}.${name} is ${status} to '${channel}'"
info "sent rocketchat notification to '${channel}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send rocketchat notification for: ${host} ${chart}.${name} is ${status} to '${channel}', with HTTP response status code ${httpcode}."
error "failed to send rocketchat notification to '${channel}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1685,12 +1794,12 @@ EOF
httpcode=$(docurl -X POST "${webhook}/alert" -H "Content-Type: application/json" -H "Authorization: $auth" --data "${payload}")
if [ "${httpcode}" = "200" ] || [ "${httpcode}" = "201" ]; then
info "sent alerta notification for: ${host} ${chart}.${name} is ${status} to '${channel}'"
info "sent alerta notification to '${channel}' for ${notification_description}"
sent=$((sent + 1))
elif [ "${httpcode}" = "202" ]; then
info "suppressed alerta notification for: ${host} ${chart}.${name} is ${status} to '${channel}'"
info "suppressed alerta notification to '${channel}' for ${notification_description}"
else
error "failed to send alerta notification for: ${host} ${chart}.${name} is ${status} to '${channel}', with HTTP response status code ${httpcode}."
error "failed to send alerta notification to '${channel}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1740,10 +1849,10 @@ send_flock() {
]
}")
if [ "${httpcode}" = "200" ]; then
info "sent flock notification for: ${host} ${chart}.${name} is ${status} to '${channel}'"
info "sent flock notification to '${channel}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send flock notification for: ${host} ${chart}.${name} is ${status} to '${channel}', with HTTP response status code ${httpcode}."
error "failed to send flock notification to '${channel}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1801,10 +1910,10 @@ EOF
httpcode=$(docurl -X POST --data-urlencode "payload=${payload}" "${webhook}")
if [ "${httpcode}" = "200" ]; then
info "sent discord notification for: ${host} ${chart}.${name} is ${status} to '${channel}'"
info "sent discord notification to '${channel}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send discord notification for: ${host} ${chart}.${name} is ${status} to '${channel}', with HTTP response status code ${httpcode}."
error "failed to send discord notification to '${channel}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1830,10 +1939,10 @@ send_fleep() {
httpcode=$(docurl -X POST --data "${data}" "https://fleep.io/hook/${hook}")
if [ "${httpcode}" = "200" ]; then
info "sent fleep data for: ${host} ${chart}.${name} is ${status} and user '${FLEEP_SENDER}'"
info "sent fleep data to user '${FLEEP_SENDER}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send fleep data for: ${host} ${chart}.${name} is ${status} and user '${FLEEP_SENDER}' with HTTP response status code ${httpcode}."
error "failed to send fleep data to user '${FLEEP_SENDER}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -1875,10 +1984,10 @@ send_prowl() {
httpcode=$(docurl -X POST --data "${data}" "https://api.prowlapp.com/publicapi/add")
if [ "${httpcode}" = "200" ]; then
info "sent prowl data for: ${host} ${chart}.${name} is ${status}"
info "sent prowl event for ${notification_description}"
sent=1
else
error "failed to send prowl data for: ${host} ${chart}.${name} is ${status} with with error code ${httpcode}."
error "failed to send prowl event for ${notification_description}, with HTTP response status code ${httpcode}."
fi
[ ${sent} -gt 0 ] && return 0
@ -1914,10 +2023,10 @@ send_irc() {
done
if [ "${error}" -eq 0 ]; then
info "sent irc notification for: ${host} ${chart}.${name} is ${status} to '${CHANNEL}'"
info "sent irc notification to '${CHANNEL}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send irc notification for: ${host} ${chart}.${name} is ${status} to '${CHANNEL}', with error code ${code}."
error "failed to send irc notification to '${CHANNEL}' for ${notification_description}, with error code ${code}."
fi
done
fi
@ -1942,10 +2051,10 @@ send_awssns() {
# Extract the region from the target ARN. We need to explicitly specify the region so that it matches up correctly.
region="$(echo ${target} | cut -f 4 -d ':')"
if ${aws} sns publish --region "${region}" --subject "${host} ${status_message} - ${name//_/ } - ${chart}" --message "${message}" --target-arn ${target} &>/dev/null; then
info "sent Amazon SNS notification for: ${host} ${chart}.${name} is ${status} to '${target}'"
info "sent Amazon SNS notification to '${target}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send Amazon SNS notification for: ${host} ${chart}.${name} is ${status} to '${target}'"
error "failed to send Amazon SNS notification to '${target}' for ${notification_description}"
fi
done
@ -1987,10 +2096,10 @@ EOF
httpcode=$(docurl -X POST --data "${payload}" "${webhook}")
if [ "${httpcode}" == "200" ]; then
info "sent Matrix notification for: ${host} ${chart}.${name} is ${status} to '${room}'"
info "sent Matrix notification to '${room}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send Matrix notification for: ${host} ${chart}.${name} is ${status} to '${room}', with HTTP response status code ${httpcode}."
error "failed to send Matrix notification to '${room}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done
@ -2089,10 +2198,10 @@ send_sms() {
errmessage=$($sendsms $phone "$msg" 2>&1)
errcode=$?
if [ ${errcode} -eq 0 ]; then
info "sent smstools3 SMS for: ${host} ${chart}.${name} is ${status} to '${user}'"
info "sent smstools3 SMS to '${user}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send smstools3 SMS for: ${host} ${chart}.${name} is ${status} to '${user}' with error code ${errcode}: ${errmessage}."
error "failed to send smstools3 SMS to '${user}' for ${notification_description}, with error code ${errcode}: ${errmessage}."
fi
done
@ -2139,14 +2248,14 @@ EOF
if [ ${ret} -eq 0 ]; then
if [ "${httpcode}" = "200" ]; then
info "sent ${DYNATRACE_EVENT} to ${DYNATRACE_SERVER}"
info "sent Dynatrace event '${DYNATRACE_EVENT}' to '${DYNATRACE_SERVER}' for ${notification_description}"
return 0
else
warning "Dynatrace ${DYNATRACE_SERVER} responded ${httpcode} notification for: ${host} ${chart}.${name} is ${status} was not sent!"
warning "failed to send Dynatrace event to '${DYNATRACE_SERVER}' for ${notification_description}, with HTTP response status code ${httpcode}"
return 1
fi
else
error "failed to sent ${DYNATRACE_EVENT} notification for: ${host} ${chart}.${name} is ${status} to ${DYNATRACE_SERVER} with error code ${ret}."
error "failed to sent Dynatrace '${DYNATRACE_EVENT}' to '${DYNATRACE_SERVER}' for ${notification_description}, with code ${ret}."
return 1
fi
}
@ -2204,9 +2313,9 @@ EOF
httpcode=$(docurl -X POST -H "Content-Type: application/json" -d "${payload}" "${OPSGENIE_API_URL}/v1/json/integrations/webhooks/netdata?apiKey=${OPSGENIE_API_KEY}")
# https://docs.opsgenie.com/docs/alert-api#create-alert
if [ "${httpcode}" = "200" ]; then
info "sent opsgenie notification for: ${host} ${chart}.${name} is ${status}"
info "sent opsgenie event for ${notification_description}"
else
error "failed to send opsgenie notification for: ${host} ${chart}.${name} is ${status}, with HTTP error code ${httpcode}."
error "failed to send opsgenie event for ${notification_description}, with HTTP response status code ${httpcode}."
return 1
fi
@ -2243,9 +2352,9 @@ EOF
httpcode=$(docurl -X POST -H "Content-Type: application/json" -d "${payload}" "${GOTIFY_APP_URL}/message?token=${GOTIFY_APP_TOKEN}")
if [ "${httpcode}" = "200" ]; then
info "sent gotify notification for: ${host} ${chart}.${name} is ${status}"
info "sent gotify event for ${notification_description}"
else
error "failed to send gotify notification for: ${host} ${chart}.${name} is ${status}, with HTTP error code ${httpcode}."
error "failed to send gotify event for ${notification_description}, with HTTP response status code ${httpcode}."
return 1
fi
@ -2298,10 +2407,10 @@ send_ntfy() {
-d "${msg}" \
${recipient})
if [ "${httpcode}" == "200" ]; then
info "sent ntfy notification for: ${host} ${chart}.${name} is ${status} to '${recipient}'"
info "sent ntfy notification to '${recipient}' for ${notification_description}"
sent=$((sent + 1))
else
error "failed to send ntfy notification for: ${host} ${chart}.${name} is ${status} to '${recipient}', with HTTP response status code ${httpcode}."
error "failed to send ntfy notification to '${recipient}' for ${notification_description}, with HTTP response status code ${httpcode}."
fi
done

View file

@ -211,8 +211,8 @@ sendsms=""
# EMAIL_SENDER="\"User Name\" <user@domain>"
EMAIL_SENDER=""
# enable/disable sending emails
SEND_EMAIL="YES"
# enable/disable sending emails, set this YES, or NO, AUTO to enable/disable based on sendmail availability
SEND_EMAIL="AUTO"
# if a role recipient is not configured, an email will be send to:
DEFAULT_RECIPIENT_EMAIL="root"

View file

@ -8,9 +8,11 @@ SUBDIRS = \
aral \
avl \
buffer \
buffered_reader \
clocks \
completion \
config \
datetime \
dictionary \
ebpf \
eval \
@ -19,6 +21,7 @@ SUBDIRS = \
json \
july \
health \
line_splitter \
locks \
log \
onewayalloc \
@ -31,6 +34,7 @@ SUBDIRS = \
string \
threads \
url \
uuid \
worker_utilization \
tests \
$(NULL)

View file

@ -81,6 +81,7 @@ void buffer_snprintf(BUFFER *wb, size_t len, const char *fmt, ...)
va_list args;
va_start(args, fmt);
// vsnprintfz() returns the number of bytes actually written - after possible truncation
wb->len += vsnprintfz(&wb->buffer[wb->len], len, fmt, args);
va_end(args);
@ -89,53 +90,39 @@ void buffer_snprintf(BUFFER *wb, size_t len, const char *fmt, ...)
// the buffer is \0 terminated by vsnprintfz
}
void buffer_vsprintf(BUFFER *wb, const char *fmt, va_list args)
{
inline void buffer_vsprintf(BUFFER *wb, const char *fmt, va_list args) {
if(unlikely(!fmt || !*fmt)) return;
size_t wrote = 0, need = 2, space_remaining = 0;
size_t full_size_bytes = 0, need = 2, space_remaining = 0;
do {
need += space_remaining * 2;
need += full_size_bytes + 2;
netdata_log_debug(D_WEB_BUFFER, "web_buffer_sprintf(): increasing web_buffer at position %zu, size = %zu, by %zu bytes (wrote = %zu)\n", wb->len, wb->size, need, wrote);
buffer_need_bytes(wb, need);
space_remaining = wb->size - wb->len - 1;
wrote = (size_t) vsnprintfz(&wb->buffer[wb->len], space_remaining, fmt, args);
// Use the copy of va_list for vsnprintf
va_list args_copy;
va_copy(args_copy, args);
// vsnprintf() returns the number of bytes required, even if bigger than the buffer provided
full_size_bytes = (size_t) vsnprintf(&wb->buffer[wb->len], space_remaining, fmt, args_copy);
va_end(args_copy);
} while(wrote >= space_remaining);
} while(full_size_bytes >= space_remaining);
wb->len += wrote;
wb->len += full_size_bytes;
// the buffer is \0 terminated by vsnprintf
wb->buffer[wb->len] = '\0';
buffer_overflow_check(wb);
}
void buffer_sprintf(BUFFER *wb, const char *fmt, ...)
{
if(unlikely(!fmt || !*fmt)) return;
va_list args;
size_t wrote = 0, need = 2, space_remaining = 0;
do {
need += space_remaining * 2;
netdata_log_debug(D_WEB_BUFFER, "web_buffer_sprintf(): increasing web_buffer at position %zu, size = %zu, by %zu bytes (wrote = %zu)\n", wb->len, wb->size, need, wrote);
buffer_need_bytes(wb, need);
space_remaining = wb->size - wb->len - 1;
va_start(args, fmt);
wrote = (size_t) vsnprintfz(&wb->buffer[wb->len], space_remaining, fmt, args);
va_end(args);
} while(wrote >= space_remaining);
wb->len += wrote;
// the buffer is \0 terminated by vsnprintf
va_start(args, fmt);
buffer_vsprintf(wb, fmt, args);
va_end(args);
}
// generate a javascript date, the fastest possible way...

View file

@ -94,6 +94,8 @@ typedef struct web_buffer {
} json;
} BUFFER;
#define CLEAN_BUFFER _cleanup_(buffer_freep) BUFFER
#define buffer_cacheable(wb) do { (wb)->options |= WB_CONTENT_CACHEABLE; if((wb)->options & WB_CONTENT_NO_CACHEABLE) (wb)->options &= ~WB_CONTENT_NO_CACHEABLE; } while(0)
#define buffer_no_cacheable(wb) do { (wb)->options |= WB_CONTENT_NO_CACHEABLE; if((wb)->options & WB_CONTENT_CACHEABLE) (wb)->options &= ~WB_CONTENT_CACHEABLE; (wb)->expires = 0; } while(0)
@ -135,6 +137,10 @@ BUFFER *buffer_create(size_t size, size_t *statistics);
void buffer_free(BUFFER *b);
void buffer_increase(BUFFER *b, size_t free_size_required);
static inline void buffer_freep(BUFFER **bp) {
if(bp) buffer_free(*bp);
}
void buffer_snprintf(BUFFER *wb, size_t len, const char *fmt, ...) PRINTFLIKE(3, 4);
void buffer_vsprintf(BUFFER *wb, const char *fmt, va_list args);
void buffer_sprintf(BUFFER *wb, const char *fmt, ...) PRINTFLIKE(2,3);
@ -210,6 +216,13 @@ static inline void buffer_fast_rawcat(BUFFER *wb, const char *txt, size_t len) {
buffer_overflow_check(wb);
}
static inline void buffer_putc(BUFFER *wb, char c) {
buffer_need_bytes(wb, 2);
wb->buffer[wb->len++] = c;
wb->buffer[wb->len] = '\0';
buffer_overflow_check(wb);
}
static inline void buffer_fast_strcat(BUFFER *wb, const char *txt, size_t len) {
if(unlikely(!txt || !*txt || !len)) return;
@ -283,6 +296,19 @@ static inline void buffer_strncat(BUFFER *wb, const char *txt, size_t len) {
buffer_overflow_check(wb);
}
static inline void buffer_memcat(BUFFER *wb, const void *mem, size_t bytes) {
if(unlikely(!mem)) return;
buffer_need_bytes(wb, bytes + 1);
memcpy(&wb->buffer[wb->len], mem, bytes);
wb->len += bytes;
wb->buffer[wb->len] = '\0';
buffer_overflow_check(wb);
}
static inline void buffer_json_strcat(BUFFER *wb, const char *txt) {
if(unlikely(!txt || !*txt)) return;

View file

@ -0,0 +1,8 @@
# SPDX-License-Identifier: GPL-3.0-or-later
AUTOMAKE_OPTIONS = subdir-objects
MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
dist_noinst_DATA = \
README.md \
$(NULL)

View file

View file

@ -0,0 +1,3 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"

View file

@ -0,0 +1,146 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
#ifndef NETDATA_BUFFERED_READER_H
#define NETDATA_BUFFERED_READER_H
struct buffered_reader {
ssize_t read_len;
ssize_t pos;
char read_buffer[PLUGINSD_LINE_MAX + 1];
};
static inline void buffered_reader_init(struct buffered_reader *reader) {
reader->read_buffer[0] = '\0';
reader->read_len = 0;
reader->pos = 0;
}
typedef enum {
BUFFERED_READER_READ_OK = 0,
BUFFERED_READER_READ_FAILED = -1,
BUFFERED_READER_READ_BUFFER_FULL = -2,
BUFFERED_READER_READ_POLLERR = -3,
BUFFERED_READER_READ_POLLHUP = -4,
BUFFERED_READER_READ_POLLNVAL = -5,
BUFFERED_READER_READ_POLL_UNKNOWN = -6,
BUFFERED_READER_READ_POLL_TIMEOUT = -7,
BUFFERED_READER_READ_POLL_FAILED = -8,
} buffered_reader_ret_t;
static inline buffered_reader_ret_t buffered_reader_read(struct buffered_reader *reader, int fd) {
#ifdef NETDATA_INTERNAL_CHECKS
if(reader->read_buffer[reader->read_len] != '\0')
fatal("read_buffer does not start with zero");
#endif
char *read_at = reader->read_buffer + reader->read_len;
ssize_t remaining = sizeof(reader->read_buffer) - reader->read_len - 1;
if(unlikely(remaining <= 0))
return BUFFERED_READER_READ_BUFFER_FULL;
ssize_t bytes_read = read(fd, read_at, remaining);
if(unlikely(bytes_read <= 0))
return BUFFERED_READER_READ_FAILED;
reader->read_len += bytes_read;
reader->read_buffer[reader->read_len] = '\0';
return BUFFERED_READER_READ_OK;
}
static inline buffered_reader_ret_t buffered_reader_read_timeout(struct buffered_reader *reader, int fd, int timeout_ms, bool log_error) {
errno = 0;
struct pollfd fds[1];
fds[0].fd = fd;
fds[0].events = POLLIN;
int ret = poll(fds, 1, timeout_ms);
if (ret > 0) {
/* There is data to read */
if (fds[0].revents & POLLIN)
return buffered_reader_read(reader, fd);
else if(fds[0].revents & POLLERR) {
if(log_error)
netdata_log_error("PARSER: read failed: POLLERR.");
return BUFFERED_READER_READ_POLLERR;
}
else if(fds[0].revents & POLLHUP) {
if(log_error)
netdata_log_error("PARSER: read failed: POLLHUP.");
return BUFFERED_READER_READ_POLLHUP;
}
else if(fds[0].revents & POLLNVAL) {
if(log_error)
netdata_log_error("PARSER: read failed: POLLNVAL.");
return BUFFERED_READER_READ_POLLNVAL;
}
if(log_error)
netdata_log_error("PARSER: poll() returned positive number, but POLLIN|POLLERR|POLLHUP|POLLNVAL are not set.");
return BUFFERED_READER_READ_POLL_UNKNOWN;
}
else if (ret == 0) {
if(log_error)
netdata_log_error("PARSER: timeout while waiting for data.");
return BUFFERED_READER_READ_POLL_TIMEOUT;
}
if(log_error)
netdata_log_error("PARSER: poll() failed with code %d.", ret);
return BUFFERED_READER_READ_POLL_FAILED;
}
/* Produce a full line if one exists, statefully return where we start next time.
* When we hit the end of the buffer with a partial line move it to the beginning for the next fill.
*/
static inline bool buffered_reader_next_line(struct buffered_reader *reader, BUFFER *dst) {
buffer_need_bytes(dst, reader->read_len - reader->pos + 2);
size_t start = reader->pos;
char *ss = &reader->read_buffer[start];
char *se = &reader->read_buffer[reader->read_len];
char *ds = &dst->buffer[dst->len];
char *de = &ds[dst->size - dst->len - 2];
if(ss >= se) {
*ds = '\0';
reader->pos = 0;
reader->read_len = 0;
reader->read_buffer[reader->read_len] = '\0';
return false;
}
// copy all bytes to buffer
while(ss < se && ds < de && *ss != '\n') {
*ds++ = *ss++;
dst->len++;
}
// if we have a newline, return the buffer
if(ss < se && ds < de && *ss == '\n') {
// newline found in the r->read_buffer
*ds++ = *ss++; // copy the newline too
dst->len++;
*ds = '\0';
reader->pos = ss - reader->read_buffer;
return true;
}
reader->pos = 0;
reader->read_len = 0;
reader->read_buffer[reader->read_len] = '\0';
return false;
}
#endif //NETDATA_BUFFERED_READER_H

View file

@ -298,8 +298,10 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick) {
// TODO: The heartbeat tick should be specified at the heartbeat_init() function
usec_t tmp = (now_realtime_usec() * clock_realtime_resolution) % (tick / 2);
error_limit_static_global_var(erl, 10, 0);
error_limit(&erl, "heartbeat randomness of %"PRIu64" is too big for a tick of %"PRIu64" - setting it to %"PRIu64"", hb->randomness, tick, tmp);
nd_log_limit_static_global_var(erl, 10, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_NOTICE,
"heartbeat randomness of %"PRIu64" is too big for a tick of %"PRIu64" - setting it to %"PRIu64"",
hb->randomness, tick, tmp);
hb->randomness = tmp;
}
@ -325,13 +327,19 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick) {
if(unlikely(now < next)) {
errno = 0;
error_limit_static_global_var(erl, 10, 0);
error_limit(&erl, "heartbeat clock: woke up %"PRIu64" microseconds earlier than expected (can be due to the CLOCK_REALTIME set to the past).", next - now);
nd_log_limit_static_global_var(erl, 10, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_NOTICE,
"heartbeat clock: woke up %"PRIu64" microseconds earlier than expected "
"(can be due to the CLOCK_REALTIME set to the past).",
next - now);
}
else if(unlikely(now - next > tick / 2)) {
errno = 0;
error_limit_static_global_var(erl, 10, 0);
error_limit(&erl, "heartbeat clock: woke up %"PRIu64" microseconds later than expected (can be due to system load or the CLOCK_REALTIME set to the future).", now - next);
nd_log_limit_static_global_var(erl, 10, 0);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_NOTICE,
"heartbeat clock: woke up %"PRIu64" microseconds later than expected "
"(can be due to system load or the CLOCK_REALTIME set to the future).",
now - next);
}
if(unlikely(!hb->realtime)) {

View file

@ -0,0 +1,8 @@
# SPDX-License-Identifier: GPL-3.0-or-later
AUTOMAKE_OPTIONS = subdir-objects
MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
dist_noinst_DATA = \
README.md \
$(NULL)

View file

@ -0,0 +1,11 @@
<!--
title: "Datetime"
custom_edit_url: https://github.com/netdata/netdata/edit/master/libnetdata/datetime/README.md
sidebar_label: "Datetime"
learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata"
-->
# Datetime
Formatting dates and timestamps.

View file

@ -0,0 +1,81 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
size_t iso8601_datetime_ut(char *buffer, size_t len, usec_t now_ut, ISO8601_OPTIONS options) {
if(unlikely(!buffer || len == 0))
return 0;
time_t t = (time_t)(now_ut / USEC_PER_SEC);
struct tm *tmp, tmbuf;
if(options & ISO8601_UTC)
// Use gmtime_r for UTC time conversion.
tmp = gmtime_r(&t, &tmbuf);
else
// Use localtime_r for local time conversion.
tmp = localtime_r(&t, &tmbuf);
if (unlikely(!tmp)) {
buffer[0] = '\0';
return 0;
}
// Format the date and time according to the ISO 8601 format.
size_t used_length = strftime(buffer, len, "%Y-%m-%dT%H:%M:%S", tmp);
if (unlikely(used_length == 0)) {
buffer[0] = '\0';
return 0;
}
if(options & ISO8601_MILLISECONDS) {
// Calculate the remaining microseconds
int milliseconds = (int) ((now_ut % USEC_PER_SEC) / USEC_PER_MS);
if(milliseconds && len - used_length > 4)
used_length += snprintfz(buffer + used_length, len - used_length, ".%03d", milliseconds);
}
else if(options & ISO8601_MICROSECONDS) {
// Calculate the remaining microseconds
int microseconds = (int) (now_ut % USEC_PER_SEC);
if(microseconds && len - used_length > 7)
used_length += snprintfz(buffer + used_length, len - used_length, ".%06d", microseconds);
}
if(options & ISO8601_UTC) {
if(used_length + 1 < len) {
buffer[used_length++] = 'Z';
buffer[used_length] = '\0'; // null-terminate the string.
}
}
else {
// Calculate the timezone offset in hours and minutes from UTC.
long offset = tmbuf.tm_gmtoff;
int hours = (int) (offset / 3600); // Convert offset seconds to hours.
int minutes = (int) ((offset % 3600) / 60); // Convert remainder to minutes (keep the sign for minutes).
// Check if timezone is UTC.
if(hours == 0 && minutes == 0) {
// For UTC, append 'Z' to the timestamp.
if(used_length + 1 < len) {
buffer[used_length++] = 'Z';
buffer[used_length] = '\0'; // null-terminate the string.
}
}
else {
// For non-UTC, format the timezone offset. Omit minutes if they are zero.
if(minutes == 0) {
// Check enough space is available for the timezone offset string.
if(used_length + 3 < len) // "+hh\0"
used_length += snprintfz(buffer + used_length, len - used_length, "%+03d", hours);
}
else {
// Check enough space is available for the timezone offset string.
if(used_length + 6 < len) // "+hh:mm\0"
used_length += snprintfz(buffer + used_length, len - used_length,
"%+03d:%02d", hours, abs(minutes));
}
}
}
return used_length;
}

View file

@ -0,0 +1,18 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
#ifndef NETDATA_ISO8601_H
#define NETDATA_ISO8601_H
typedef enum __attribute__((__packed__)) {
ISO8601_UTC = (1 << 0),
ISO8601_LOCAL_TIMEZONE = (1 << 1),
ISO8601_MILLISECONDS = (1 << 2),
ISO8601_MICROSECONDS = (1 << 3),
} ISO8601_OPTIONS;
#define ISO8601_MAX_LENGTH 64
size_t iso8601_datetime_ut(char *buffer, size_t len, usec_t now_ut, ISO8601_OPTIONS options);
#endif //NETDATA_ISO8601_H

View file

@ -0,0 +1,29 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
inline size_t rfc7231_datetime(char *buffer, size_t len, time_t now_t) {
if (unlikely(!buffer || !len))
return 0;
struct tm *tmp, tmbuf;
// Use gmtime_r for UTC time conversion.
tmp = gmtime_r(&now_t, &tmbuf);
if (unlikely(!tmp)) {
buffer[0] = '\0';
return 0;
}
// Format the date and time according to the RFC 7231 format.
size_t ret = strftime(buffer, len, "%a, %d %b %Y %H:%M:%S GMT", tmp);
if (unlikely(ret == 0))
buffer[0] = '\0';
return ret;
}
size_t rfc7231_datetime_ut(char *buffer, size_t len, usec_t now_ut) {
return rfc7231_datetime(buffer, len, (time_t) (now_ut / USEC_PER_SEC));
}

View file

@ -0,0 +1,12 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
#ifndef NETDATA_RFC7231_H
#define NETDATA_RFC7231_H
#define RFC7231_MAX_LENGTH 30
size_t rfc7231_datetime(char *buffer, size_t len, time_t now_t);
size_t rfc7231_datetime_ut(char *buffer, size_t len, usec_t now_ut);
#endif //NETDATA_RFC7231_H

View file

@ -64,6 +64,12 @@ static void *rrd_functions_worker_globals_worker_main(void *arg) {
pthread_mutex_unlock(&wg->worker_mutex);
if(acquired) {
ND_LOG_STACK lgs[] = {
ND_LOG_FIELD_TXT(NDF_REQUEST, j->cmd),
ND_LOG_FIELD_END(),
};
ND_LOG_STACK_PUSH(lgs);
last_acquired = true;
j = dictionary_acquired_item_value(acquired);
j->cb(j->transaction, j->cmd, j->timeout, &j->cancelled);

View file

@ -426,13 +426,24 @@ static inline void sanitize_json_string(char *dst, const char *src, size_t dst_s
}
static inline bool sanitize_command_argument_string(char *dst, const char *src, size_t dst_size) {
if(dst_size)
*dst = '\0';
// skip leading dashes
while (src[0] == '-')
while (*src == '-')
src++;
// escape single quotes
while (src[0] != '\0') {
if (src[0] == '\'') {
while (*src != '\0') {
if (dst_size < 1)
return false;
if (iscntrl(*src) || *src == '$') {
// remove control characters and characters that are expanded by bash
*dst++ = '_';
dst_size--;
}
else if (*src == '\'' || *src == '`') {
// escape single quotes
if (dst_size < 4)
return false;
@ -440,14 +451,10 @@ static inline bool sanitize_command_argument_string(char *dst, const char *src,
dst += 4;
dst_size -= 4;
} else {
if (dst_size < 1)
return false;
dst[0] = src[0];
dst += 1;
dst_size -= 1;
}
else {
*dst++ = *src;
dst_size--;
}
src++;
@ -456,6 +463,7 @@ static inline bool sanitize_command_argument_string(char *dst, const char *src,
// make sure we have space to terminate the string
if (dst_size == 0)
return false;
*dst = '\0';
return true;
@ -531,10 +539,6 @@ static inline int read_single_base64_or_hex_number_file(const char *filename, un
}
}
static inline int uuid_memcmp(const uuid_t *uu1, const uuid_t *uu2) {
return memcmp(uu1, uu2, sizeof(uuid_t));
}
static inline char *strsep_skip_consecutive_separators(char **ptr, char *s) {
char *p = (char *)"";
while (p && !p[0] && *ptr) p = strsep(ptr, s);

View file

@ -1269,17 +1269,19 @@ char *fgets_trim_len(char *buf, size_t buf_size, FILE *fp, size_t *len) {
return s;
}
// vsnprintfz() returns the number of bytes actually written - after possible truncation
int vsnprintfz(char *dst, size_t n, const char *fmt, va_list args) {
if(unlikely(!n)) return 0;
int size = vsnprintf(dst, n, fmt, args);
dst[n - 1] = '\0';
if (unlikely((size_t) size > n)) size = (int)n;
if (unlikely((size_t) size >= n)) size = (int)(n - 1);
return size;
}
// snprintfz() returns the number of bytes actually written - after possible truncation
int snprintfz(char *dst, size_t n, const char *fmt, ...) {
va_list args;
@ -1694,53 +1696,6 @@ char *find_and_replace(const char *src, const char *find, const char *replace, c
return value;
}
inline int pluginsd_isspace(char c) {
switch(c) {
case ' ':
case '\t':
case '\r':
case '\n':
case '=':
return 1;
default:
return 0;
}
}
inline int config_isspace(char c) {
switch (c) {
case ' ':
case '\t':
case '\r':
case '\n':
case ',':
return 1;
default:
return 0;
}
}
inline int group_by_label_isspace(char c) {
if(c == ',' || c == '|')
return 1;
return 0;
}
bool isspace_map_pluginsd[256] = {};
bool isspace_map_config[256] = {};
bool isspace_map_group_by_label[256] = {};
__attribute__((constructor)) void initialize_is_space_arrays(void) {
for(int c = 0; c < 256 ; c++) {
isspace_map_pluginsd[c] = pluginsd_isspace((char) c);
isspace_map_config[c] = config_isspace((char) c);
isspace_map_group_by_label[c] = group_by_label_isspace((char) c);
}
}
bool run_command_and_copy_output_to_stdout(const char *command, int max_line_length) {
pid_t pid;
FILE *fp = netdata_popen(command, &pid, NULL);

View file

@ -201,6 +201,8 @@ extern "C" {
// ----------------------------------------------------------------------------
// netdata common definitions
#define _cleanup_(x) __attribute__((__cleanup__(x)))
#ifdef __GNUC__
#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
#endif // __GNUC__
@ -685,103 +687,6 @@ static inline BITMAPX *bitmapX_create(uint32_t bits) {
#define COMPRESSION_MAX_OVERHEAD 128
#define COMPRESSION_MAX_MSG_SIZE (COMPRESSION_MAX_CHUNK - COMPRESSION_MAX_OVERHEAD - 1)
#define PLUGINSD_LINE_MAX (COMPRESSION_MAX_MSG_SIZE - 768)
int pluginsd_isspace(char c);
int config_isspace(char c);
int group_by_label_isspace(char c);
extern bool isspace_map_pluginsd[256];
extern bool isspace_map_config[256];
extern bool isspace_map_group_by_label[256];
static inline size_t quoted_strings_splitter(char *str, char **words, size_t max_words, bool *isspace_map) {
char *s = str, quote = 0;
size_t i = 0;
// skip all white space
while (unlikely(isspace_map[(uint8_t)*s]))
s++;
if(unlikely(!*s)) {
words[i] = NULL;
return 0;
}
// check for quote
if (unlikely(*s == '\'' || *s == '"')) {
quote = *s; // remember the quote
s++; // skip the quote
}
// store the first word
words[i++] = s;
// while we have something
while (likely(*s)) {
// if it is an escape
if (unlikely(*s == '\\' && s[1])) {
s += 2;
continue;
}
// if it is a quote
else if (unlikely(*s == quote)) {
quote = 0;
*s = ' ';
continue;
}
// if it is a space
else if (unlikely(quote == 0 && isspace_map[(uint8_t)*s])) {
// terminate the word
*s++ = '\0';
// skip all white space
while (likely(isspace_map[(uint8_t)*s]))
s++;
// check for a quote
if (unlikely(*s == '\'' || *s == '"')) {
quote = *s; // remember the quote
s++; // skip the quote
}
// if we reached the end, stop
if (unlikely(!*s))
break;
// store the next word
if (likely(i < max_words))
words[i++] = s;
else
break;
}
// anything else
else
s++;
}
if (likely(i < max_words))
words[i] = NULL;
return i;
}
#define quoted_strings_splitter_query_group_by_label(str, words, max_words) \
quoted_strings_splitter(str, words, max_words, isspace_map_group_by_label)
#define quoted_strings_splitter_config(str, words, max_words) \
quoted_strings_splitter(str, words, max_words, isspace_map_config)
#define quoted_strings_splitter_pluginsd(str, words, max_words) \
quoted_strings_splitter(str, words, max_words, isspace_map_pluginsd)
static inline char *get_word(char **words, size_t num_words, size_t index) {
if (unlikely(index >= num_words))
return NULL;
return words[index];
}
bool run_command_and_copy_output_to_stdout(const char *command, int max_line_length);
@ -803,6 +708,8 @@ extern char *netdata_configured_host_prefix;
#define XXH_INLINE_ALL
#include "xxhash.h"
#include "uuid/uuid.h"
#include "libjudy/src/Judy.h"
#include "july/july.h"
#include "os.h"
@ -812,7 +719,10 @@ extern char *netdata_configured_host_prefix;
#include "circular_buffer/circular_buffer.h"
#include "avl/avl.h"
#include "inlined.h"
#include "line_splitter/line_splitter.h"
#include "clocks/clocks.h"
#include "datetime/iso8601.h"
#include "datetime/rfc7231.h"
#include "completion/completion.h"
#include "popen/popen.h"
#include "simple_pattern/simple_pattern.h"
@ -821,7 +731,9 @@ extern char *netdata_configured_host_prefix;
#endif
#include "socket/socket.h"
#include "config/appconfig.h"
#include "log/journal.h"
#include "log/log.h"
#include "buffered_reader/buffered_reader.h"
#include "procfile/procfile.h"
#include "string/string.h"
#include "dictionary/dictionary.h"

View file

@ -0,0 +1,8 @@
# SPDX-License-Identifier: GPL-3.0-or-later
AUTOMAKE_OPTIONS = subdir-objects
MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
dist_noinst_DATA = \
README.md \
$(NULL)

View file

@ -0,0 +1,14 @@
<!--
title: "Log"
custom_edit_url: https://github.com/netdata/netdata/edit/master/libnetdata/log/README.md
sidebar_label: "Log"
learn_status: "Published"
learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata"
-->
# Log
The netdata log library supports debug, info, error and fatal error logging.
By default we have an access log, an error log and a collectors log.

View file

@ -0,0 +1,69 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
bool line_splitter_reconstruct_line(BUFFER *wb, void *ptr) {
struct line_splitter *spl = ptr;
if(!spl) return false;
size_t added = 0;
for(size_t i = 0; i < spl->num_words ;i++) {
if(i) buffer_fast_strcat(wb, " ", 1);
buffer_fast_strcat(wb, "'", 1);
const char *s = get_word(spl->words, spl->num_words, i);
buffer_strcat(wb, s?s:"");
buffer_fast_strcat(wb, "'", 1);
added++;
}
return added > 0;
}
inline int pluginsd_isspace(char c) {
switch(c) {
case ' ':
case '\t':
case '\r':
case '\n':
case '=':
return 1;
default:
return 0;
}
}
inline int config_isspace(char c) {
switch (c) {
case ' ':
case '\t':
case '\r':
case '\n':
case ',':
return 1;
default:
return 0;
}
}
inline int group_by_label_isspace(char c) {
if(c == ',' || c == '|')
return 1;
return 0;
}
bool isspace_map_pluginsd[256] = {};
bool isspace_map_config[256] = {};
bool isspace_map_group_by_label[256] = {};
__attribute__((constructor)) void initialize_is_space_arrays(void) {
for(int c = 0; c < 256 ; c++) {
isspace_map_pluginsd[c] = pluginsd_isspace((char) c);
isspace_map_config[c] = config_isspace((char) c);
isspace_map_group_by_label[c] = group_by_label_isspace((char) c);
}
}

View file

@ -0,0 +1,120 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
#ifndef NETDATA_LINE_SPLITTER_H
#define NETDATA_LINE_SPLITTER_H
#define PLUGINSD_MAX_WORDS 30
struct line_splitter {
size_t count; // counts number of lines
char *words[PLUGINSD_MAX_WORDS]; // an array of pointers for the words in this line
size_t num_words; // the number of pointers used in this line
};
bool line_splitter_reconstruct_line(BUFFER *wb, void *ptr);
static inline void line_splitter_reset(struct line_splitter *line) {
line->num_words = 0;
}
int pluginsd_isspace(char c);
int config_isspace(char c);
int group_by_label_isspace(char c);
extern bool isspace_map_pluginsd[256];
extern bool isspace_map_config[256];
extern bool isspace_map_group_by_label[256];
static inline size_t quoted_strings_splitter(char *str, char **words, size_t max_words, bool *isspace_map) {
char *s = str, quote = 0;
size_t i = 0;
// skip all white space
while (unlikely(isspace_map[(uint8_t)*s]))
s++;
if(unlikely(!*s)) {
words[i] = NULL;
return 0;
}
// check for quote
if (unlikely(*s == '\'' || *s == '"')) {
quote = *s; // remember the quote
s++; // skip the quote
}
// store the first word
words[i++] = s;
// while we have something
while (likely(*s)) {
// if it is an escape
if (unlikely(*s == '\\' && s[1])) {
s += 2;
continue;
}
// if it is a quote
else if (unlikely(*s == quote)) {
quote = 0;
*s = ' ';
continue;
}
// if it is a space
else if (unlikely(quote == 0 && isspace_map[(uint8_t)*s])) {
// terminate the word
*s++ = '\0';
// skip all white space
while (likely(isspace_map[(uint8_t)*s]))
s++;
// check for a quote
if (unlikely(*s == '\'' || *s == '"')) {
quote = *s; // remember the quote
s++; // skip the quote
}
// if we reached the end, stop
if (unlikely(!*s))
break;
// store the next word
if (likely(i < max_words))
words[i++] = s;
else
break;
}
// anything else
else
s++;
}
if (likely(i < max_words))
words[i] = NULL;
return i;
}
#define quoted_strings_splitter_query_group_by_label(str, words, max_words) \
quoted_strings_splitter(str, words, max_words, isspace_map_group_by_label)
#define quoted_strings_splitter_config(str, words, max_words) \
quoted_strings_splitter(str, words, max_words, isspace_map_config)
#define quoted_strings_splitter_pluginsd(str, words, max_words) \
quoted_strings_splitter(str, words, max_words, isspace_map_pluginsd)
static inline char *get_word(char **words, size_t num_words, size_t index) {
if (unlikely(index >= num_words))
return NULL;
return words[index];
}
#endif //NETDATA_LINE_SPLITTER_H

View file

@ -5,4 +5,5 @@ MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
dist_noinst_DATA = \
README.md \
log2journal.md \
$(NULL)

View file

@ -7,8 +7,196 @@ learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata"
-->
# Log
# Netdata Logging
The netdata log library supports debug, info, error and fatal error logging.
By default we have an access log, an error log and a collectors log.
This document describes how Netdata generates its own logs, not how Netdata manages and queries logs databases.
## Log sources
Netdata supports the following log sources:
1. **daemon**, logs generated by Netdata daemon.
2. **collector**, logs generated by Netdata collectors, including internal and external ones.
3. **access**, API requests received by Netdata
4. **health**, logs all alert transitions
## Log outputs
For each log source, Netdata supports the following output methods:
- **off**, to disable this log source
- **journal**, to send the logs to systemd-journal.
- **syslog**, to send the logs to syslog.
- **system**, to send the output to `stderr` or `stdout` depending on the log source.
- **stdout**, to write the logs to Netdata's `stdout`.
- **stderr**, to write the logs to Netdata's `stderr`.
- **filename**, to send the logs to a file.
For `daemon` and `collector` the default is `journal` when systemd-journal is available.
To decide if systemd-journal is available, Netdata checks:
1. `stderr` is connected to systemd-journald
2. `/run/systemd/journal/socket` exists
3. `/host/run/systemd/journal/socket` exists (`/host` is configurable in containers)
If any of the above is detected, Netdata will select `journal` for `daemon` and `collector` sources.
All other sources default to a file.
## Log formats
Netdata supports the follow formats for its logs:
- **journal**, this is automatically selected when logging to systemd-journal.
- **logfmt**, this is the default when logging to any output other than `journal`. In this format, Netdata annotates the fields to make them human readable.
- **json**, to write logs lines in json format. The output is machine readable, similar to `journal`.
## Log levels
Each time Netdata logs, it assigns a priority to the log. It can be one of this (in order of importance):
- **emergency**, a fatal condition; most likely Netdata will exit immediately after,
- **alert**, a very important issue that may affect how Netdata operates,
- **critical**, a very important issue the user should know which, Netdata thinks it can survive,
- **error**, an error condition indicating that Netdata is trying to do something but it fails,
- **warning**, something that may or may not affect the operation of Netdata, but the outcome cannot be determined at the time Netdata logs,
- **notice**, something that does not affect the operation of Netdata, but the user should notice,
- **info**, the default log level about information the user should know,
- **debug**, these are more verbose logs that can be ignored,
## Logs Configuration
In `netdata.conf`, there are the following settings:
```
[logs]
# logs to trigger flood protection = 600
# logs flood protection period = 3600
# facility = daemon
# level = info
# daemon = journal
# collector = journal
# access = /var/log/netdata/access.log
# health = /var/log/netdata/health.log
```
- `logs to trigger flood protection` and `logs flood protection period` enable logs flood protection for `daemon` and `collector` sources. It can also be configured per log source.
- `facility` is used only when Netdata logs to syslog.
- `level` defines the minimum [log level](#log-levels) of logs that will be logged. This setting is applied only to `daemon` and `collector` sources. It can also be configured per source.
### Configuring log sources
Each for the sources (`daemon`, `collector`, `access`, `health`), accepts the following:
```
source = {FORMAT},level={LEVEL},protection={LOG}/{PERIOD}@{OUTPUT}
```
Where:
- `{FORMAT}`, is one of the [log formats](#log-formats),
- `{LEVEL}`, is the minimum [log level](#log-levels) to be logged,
- `{LOGS}` is the number of `logs to trigger flood protection` configured per output,
- `{PERIOD}` is the equivalent of `logs flood protection period` configured per output,
- `{OUTPUT}` is one of the `[log outputs](#log-outputs),
All parameters can be omitted, except `{OUTPUT}`. If `{OUTPUT}` is the only given parameter, `@` can be omitted.
### Logs rotation
Netdata comes with `logrotate` configuration to rotate its log files periodically.
The default is usually found in `/etc/logrotate.d/netdata`.
Sending a `SIGHUP` to Netdata, will instruct it to re-open all its log files.
## Log Fields
Netdata exposes the following fields to its logs:
| journal | logfmt | json | Description |
|:--------------------------------------:|:------------------------------:|:------------------------------:|:---------------------------------------------------------------------------------------------------------:|
| `_SOURCE_REALTIME_TIMESTAMP` | `time` | `time` | the timestamp of the event |
| `SYSLOG_IDENTIFIER` | `comm` | `comm` | the program logging the event |
| `ND_LOG_SOURCE` | `source` | `source` | one of the [log sources](#log-sources) |
| `PRIORITY`<br/>numeric | `level`<br/>text | `level`<br/>numeric | one of the [log levels](#log-levels) |
| `ERRNO` | `errno` | `errno` | the numeric value of `errno` |
| `INVOCATION_ID` | - | - | a unique UUID of the Netdata session, reset on every Netdata restart, inherited by systemd when available |
| `CODE_LINE` | - | - | the line number of of the source code logging this event |
| `CODE_FILE` | - | - | the filename of the source code logging this event |
| `CODE_FUNCTION` | - | - | the function name of the source code logging this event |
| `TID` | `tid` | `tid` | the thread id of the thread logging this event |
| `THREAD_TAG` | `thread` | `thread` | the name of the thread logging this event |
| `MESSAGE_ID` | `msg_id` | `msg_id` | see [message IDs](#message-ids) |
| `ND_MODULE` | `module` | `module` | the Netdata module logging this event |
| `ND_NIDL_NODE` | `node` | `node` | the hostname of the node the event is related to |
| `ND_NIDL_INSTANCE` | `instance` | `instance` | the instance of the node the event is related to |
| `ND_NIDL_CONTEXT` | `context` | `context` | the context the event is related to (this is usually the chart name, as shown on netdata dashboards |
| `ND_NIDL_DIMENSION` | `dimension` | `dimension` | the dimension the event is related to |
| `ND_SRC_TRANSPORT` | `src_transport` | `src_transport` | when the event happened during a request, this is the request transport |
| `ND_SRC_IP` | `src_ip` | `src_ip` | when the event happened during an inbound request, this is the IP the request came from |
| `ND_SRC_PORT` | `src_port` | `src_port` | when the event happened during an inbound request, this is the port the request came from |
| `ND_SRC_CAPABILITIES` | `src_capabilities` | `src_capabilities` | when the request came from a child, this is the communication capabilities of the child |
| `ND_DST_TRANSPORT` | `dst_transport` | `dst_transport` | when the event happened during an outbound request, this is the outbound request transport |
| `ND_DST_IP` | `dst_ip` | `dst_ip` | when the event happened during an outbound request, this is the IP the request destination |
| `ND_DST_PORT` | `dst_port` | `dst_port` | when the event happened during an outbound request, this is the port the request destination |
| `ND_DST_CAPABILITIES` | `dst_capabilities` | `dst_capabilities` | when the request goes to a parent, this is the communication capabilities of the parent |
| `ND_REQUEST_METHOD` | `req_method` | `req_method` | when the event happened during an inbound request, this is the method the request was received |
| `ND_RESPONSE_CODE` | `code` | `code` | when responding to a request, this this the response code |
| `ND_CONNECTION_ID` | `conn` | `conn` | when there is a connection id for an inbound connection, this is the connection id |
| `ND_TRANSACTION_ID` | `transaction` | `transaction` | the transaction id (UUID) of all API requests |
| `ND_RESPONSE_SENT_BYTES` | `sent_bytes` | `sent_bytes` | the bytes we sent to API responses |
| `ND_RESPONSE_SIZE_BYTES` | `size_bytes` | `size_bytes` | the uncompressed bytes of the API responses |
| `ND_RESPONSE_PREP_TIME_USEC` | `prep_ut` | `prep_ut` | the time needed to prepare a response |
| `ND_RESPONSE_SENT_TIME_USEC` | `sent_ut` | `sent_ut` | the time needed to send a response |
| `ND_RESPONSE_TOTAL_TIME_USEC` | `total_ut` | `total_ut` | the total time needed to complete a response |
| `ND_ALERT_ID` | `alert_id` | `alert_id` | the alert id this event is related to |
| `ND_ALERT_EVENT_ID` | `alert_event_id` | `alert_event_id` | a sequential number of the alert transition (per host) |
| `ND_ALERT_UNIQUE_ID` | `alert_unique_id` | `alert_unique_id` | a sequential number of the alert transition (per alert) |
| `ND_ALERT_TRANSITION_ID` | `alert_transition_id` | `alert_transition_id` | the unique UUID of this alert transition |
| `ND_ALERT_CONFIG` | `alert_config` | `alert_config` | the alert configuration hash (UUID) |
| `ND_ALERT_NAME` | `alert` | `alert` | the alert name |
| `ND_ALERT_CLASS` | `alert_class` | `alert_class` | the alert classification |
| `ND_ALERT_COMPONENT` | `alert_component` | `alert_component` | the alert component |
| `ND_ALERT_TYPE` | `alert_type` | `alert_type` | the alert type |
| `ND_ALERT_EXEC` | `alert_exec` | `alert_exec` | the alert notification program |
| `ND_ALERT_RECIPIENT` | `alert_recipient` | `alert_recipient` | the alert recipient(s) |
| `ND_ALERT_VALUE` | `alert_value` | `alert_value` | the current alert value |
| `ND_ALERT_VALUE_OLD` | `alert_value_old` | `alert_value_old` | the previous alert value |
| `ND_ALERT_STATUS` | `alert_status` | `alert_status` | the current alert status |
| `ND_ALERT_STATUS_OLD` | `alert_value_old` | `alert_value_old` | the previous alert value |
| `ND_ALERT_UNITS` | `alert_units` | `alert_units` | the units of the alert |
| `ND_ALERT_SUMMARY` | `alert_summary` | `alert_summary` | the summary text of the alert |
| `ND_ALERT_INFO` | `alert_info` | `alert_info` | the info text of the alert |
| `ND_ALERT_DURATION` | `alert_duration` | `alert_duration` | the duration the alert was in its previous state |
| `ND_ALERT_NOTIFICATION_TIMESTAMP_USEC` | `alert_notification_timestamp` | `alert_notification_timestamp` | the timestamp the notification delivery is scheduled |
| `ND_REQUEST` | `request` | `request` | the full request during which the event happened |
| `MESSAGE` | `msg` | `msg` | the event message |
### Message IDs
Netdata assigns specific message IDs to certain events:
- `ed4cdb8f1beb4ad3b57cb3cae2d162fa` when a Netdata child connects to this Netdata
- `6e2e3839067648968b646045dbf28d66` when this Netdata connects to a Netdata parent
- `9ce0cb58ab8b44df82c4bf1ad9ee22de` when alerts change state
- `6db0018e83e34320ae2a659d78019fb7` when notifications are sent
You can view these events using the Netdata systemd-journal.plugin at the `MESSAGE_ID` filter,
or using `journalctl` like this:
```bash
# query children connection
journalctl MESSAGE_ID=ed4cdb8f1beb4ad3b57cb3cae2d162fa
# query parent connection
journalctl MESSAGE_ID=6e2e3839067648968b646045dbf28d66
# query alert transitions
journalctl MESSAGE_ID=9ce0cb58ab8b44df82c4bf1ad9ee22de
# query alert notifications
journalctl MESSAGE_ID=6db0018e83e34320ae2a659d78019fb7
```

138
libnetdata/log/journal.c Normal file
View file

@ -0,0 +1,138 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "journal.h"
bool is_path_unix_socket(const char *path) {
if(!path || !*path)
return false;
struct stat statbuf;
// Check if the path is valid
if (!path || !*path)
return false;
// Use stat to check if the file exists and is a socket
if (stat(path, &statbuf) == -1)
// The file does not exist or cannot be accessed
return false;
// Check if the file is a socket
if (S_ISSOCK(statbuf.st_mode))
return true;
return false;
}
bool is_stderr_connected_to_journal(void) {
const char *journal_stream = getenv("JOURNAL_STREAM");
if (!journal_stream)
return false; // JOURNAL_STREAM is not set
struct stat stderr_stat;
if (fstat(STDERR_FILENO, &stderr_stat) < 0)
return false; // Error in getting stderr info
// Parse device and inode from JOURNAL_STREAM
char *endptr;
long journal_dev = strtol(journal_stream, &endptr, 10);
if (*endptr != ':')
return false; // Format error in JOURNAL_STREAM
long journal_ino = strtol(endptr + 1, NULL, 10);
return (stderr_stat.st_dev == (dev_t)journal_dev) && (stderr_stat.st_ino == (ino_t)journal_ino);
}
int journal_direct_fd(const char *path) {
if(!path || !*path)
path = JOURNAL_DIRECT_SOCKET;
if(!is_path_unix_socket(path))
return -1;
int fd = socket(AF_UNIX, SOCK_DGRAM, 0);
if (fd < 0) return -1;
struct sockaddr_un addr;
memset(&addr, 0, sizeof(struct sockaddr_un));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, path, sizeof(addr.sun_path) - 1);
// Connect the socket (optional, but can simplify send operations)
if (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
close(fd);
return -1;
}
return fd;
}
static inline bool journal_send_with_memfd(int fd, const char *msg, size_t msg_len) {
#if defined(__NR_memfd_create) && defined(MFD_ALLOW_SEALING) && defined(F_ADD_SEALS) && defined(F_SEAL_SHRINK) && defined(F_SEAL_GROW) && defined(F_SEAL_WRITE)
// Create a memory file descriptor
int memfd = (int)syscall(__NR_memfd_create, "journald", MFD_ALLOW_SEALING);
if (memfd < 0) return false;
// Write data to the memfd
if (write(memfd, msg, msg_len) != (ssize_t)msg_len) {
close(memfd);
return false;
}
// Seal the memfd to make it immutable
if (fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE) < 0) {
close(memfd);
return false;
}
struct iovec iov = {0};
struct msghdr msghdr = {0};
struct cmsghdr *cmsghdr;
char cmsgbuf[CMSG_SPACE(sizeof(int))];
msghdr.msg_iov = &iov;
msghdr.msg_iovlen = 1;
msghdr.msg_control = cmsgbuf;
msghdr.msg_controllen = sizeof(cmsgbuf);
cmsghdr = CMSG_FIRSTHDR(&msghdr);
cmsghdr->cmsg_level = SOL_SOCKET;
cmsghdr->cmsg_type = SCM_RIGHTS;
cmsghdr->cmsg_len = CMSG_LEN(sizeof(int));
memcpy(CMSG_DATA(cmsghdr), &memfd, sizeof(int));
ssize_t r = sendmsg(fd, &msghdr, 0);
close(memfd);
return r >= 0;
#else
return false;
#endif
}
bool journal_direct_send(int fd, const char *msg, size_t msg_len) {
// Send the datagram
if (send(fd, msg, msg_len, 0) < 0) {
if(errno != EMSGSIZE)
return false;
// datagram is too large, fallback to memfd
if(!journal_send_with_memfd(fd, msg, msg_len))
return false;
}
return true;
}
void journal_construct_path(char *dst, size_t dst_len, const char *host_prefix, const char *namespace_str) {
if(!host_prefix)
host_prefix = "";
if(namespace_str)
snprintfz(dst, dst_len, "%s/run/systemd/journal.%s/socket",
host_prefix, namespace_str);
else
snprintfz(dst, dst_len, "%s" JOURNAL_DIRECT_SOCKET,
host_prefix);
}

18
libnetdata/log/journal.h Normal file
View file

@ -0,0 +1,18 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
#ifndef NETDATA_LOG_JOURNAL_H
#define NETDATA_LOG_JOURNAL_H
#define JOURNAL_DIRECT_SOCKET "/run/systemd/journal/socket"
void journal_construct_path(char *dst, size_t dst_len, const char *host_prefix, const char *namespace_str);
int journal_direct_fd(const char *path);
bool journal_direct_send(int fd, const char *msg, size_t msg_len);
bool is_path_unix_socket(const char *path);
bool is_stderr_connected_to_journal(void);
#endif //NETDATA_LOG_JOURNAL_H

File diff suppressed because it is too large Load diff

View file

@ -9,6 +9,181 @@ extern "C" {
#include "../libnetdata.h"
#define ND_LOG_DEFAULT_THROTTLE_LOGS 1200
#define ND_LOG_DEFAULT_THROTTLE_PERIOD 3600
typedef enum __attribute__((__packed__)) {
NDLS_UNSET = 0, // internal use only
NDLS_ACCESS, // access.log
NDLS_ACLK, // aclk.log
NDLS_COLLECTORS, // collectors.log
NDLS_DAEMON, // error.log
NDLS_HEALTH, // health.log
NDLS_DEBUG, // debug.log
// terminator
_NDLS_MAX,
} ND_LOG_SOURCES;
typedef enum __attribute__((__packed__)) {
NDLP_EMERG = LOG_EMERG,
NDLP_ALERT = LOG_ALERT,
NDLP_CRIT = LOG_CRIT,
NDLP_ERR = LOG_ERR,
NDLP_WARNING = LOG_WARNING,
NDLP_NOTICE = LOG_NOTICE,
NDLP_INFO = LOG_INFO,
NDLP_DEBUG = LOG_DEBUG,
} ND_LOG_FIELD_PRIORITY;
typedef enum __attribute__((__packed__)) {
// KEEP THESE IN THE SAME ORDER AS in thread_log_fields (log.c)
// so that it easy to audit for missing fields
NDF_STOP = 0,
NDF_TIMESTAMP_REALTIME_USEC, // the timestamp of the log message - added automatically
NDF_SYSLOG_IDENTIFIER, // the syslog identifier of the application - added automatically
NDF_LOG_SOURCE, // DAEMON, COLLECTORS, HEALTH, ACCESS, ACLK - set at the log call
NDF_PRIORITY, // the syslog priority (severity) - set at the log call
NDF_ERRNO, // the ERRNO at the time of the log call - added automatically
NDF_INVOCATION_ID, // the INVOCATION_ID of Netdata - added automatically
NDF_LINE, // the source code file line number - added automatically
NDF_FILE, // the source code filename - added automatically
NDF_FUNC, // the source code function - added automatically
NDF_TID, // the thread ID of the thread logging - added automatically
NDF_THREAD_TAG, // the thread tag of the thread logging - added automatically
NDF_MESSAGE_ID, // for specific events
NDF_MODULE, // for internal plugin module, all other get the NDF_THREAD_TAG
NDF_NIDL_NODE, // the node / rrdhost currently being worked
NDF_NIDL_INSTANCE, // the instance / rrdset currently being worked
NDF_NIDL_CONTEXT, // the context of the instance currently being worked
NDF_NIDL_DIMENSION, // the dimension / rrddim currently being worked
// web server, aclk and stream receiver
NDF_SRC_TRANSPORT, // the transport we received the request, one of: http, https, pluginsd
// web server and stream receiver
NDF_SRC_IP, // the streaming / web server source IP
NDF_SRC_PORT, // the streaming / web server source Port
NDF_SRC_CAPABILITIES, // the stream receiver capabilities
// stream sender (established links)
NDF_DST_TRANSPORT, // the transport we send the request, one of: http, https
NDF_DST_IP, // the destination streaming IP
NDF_DST_PORT, // the destination streaming Port
NDF_DST_CAPABILITIES, // the destination streaming capabilities
// web server, aclk and stream receiver
NDF_REQUEST_METHOD, // for http like requests, the http request method
NDF_RESPONSE_CODE, // for http like requests, the http response code, otherwise a status string
// web server (all), aclk (queries)
NDF_CONNECTION_ID, // the web server connection ID
NDF_TRANSACTION_ID, // the web server and API transaction ID
NDF_RESPONSE_SENT_BYTES, // for http like requests, the response bytes
NDF_RESPONSE_SIZE_BYTES, // for http like requests, the uncompressed response size
NDF_RESPONSE_PREPARATION_TIME_USEC, // for http like requests, the preparation time
NDF_RESPONSE_SENT_TIME_USEC, // for http like requests, the time to send the response back
NDF_RESPONSE_TOTAL_TIME_USEC, // for http like requests, the total time to complete the response
// health alerts
NDF_ALERT_ID,
NDF_ALERT_UNIQUE_ID,
NDF_ALERT_EVENT_ID,
NDF_ALERT_TRANSITION_ID,
NDF_ALERT_CONFIG_HASH,
NDF_ALERT_NAME,
NDF_ALERT_CLASS,
NDF_ALERT_COMPONENT,
NDF_ALERT_TYPE,
NDF_ALERT_EXEC,
NDF_ALERT_RECIPIENT,
NDF_ALERT_DURATION,
NDF_ALERT_VALUE,
NDF_ALERT_VALUE_OLD,
NDF_ALERT_STATUS,
NDF_ALERT_STATUS_OLD,
NDF_ALERT_SOURCE,
NDF_ALERT_UNITS,
NDF_ALERT_SUMMARY,
NDF_ALERT_INFO,
NDF_ALERT_NOTIFICATION_REALTIME_USEC,
// NDF_ALERT_FLAGS,
// put new items here
// leave the request URL and the message last
NDF_REQUEST, // the request we are currently working on
NDF_MESSAGE, // the log message, if any
// terminator
_NDF_MAX,
} ND_LOG_FIELD_ID;
typedef enum __attribute__((__packed__)) {
NDFT_UNSET = 0,
NDFT_TXT,
NDFT_STR,
NDFT_BFR,
NDFT_U64,
NDFT_I64,
NDFT_DBL,
NDFT_UUID,
NDFT_CALLBACK,
} ND_LOG_STACK_FIELD_TYPE;
void nd_log_set_user_settings(ND_LOG_SOURCES source, const char *setting);
void nd_log_set_facility(const char *facility);
void nd_log_set_priority_level(const char *setting);
void nd_log_initialize(void);
void nd_log_reopen_log_files(void);
void chown_open_file(int fd, uid_t uid, gid_t gid);
void nd_log_chown_log_files(uid_t uid, gid_t gid);
void nd_log_set_flood_protection(size_t logs, time_t period);
void nd_log_initialize_for_external_plugins(const char *name);
void nd_log_set_thread_source(ND_LOG_SOURCES source);
bool nd_log_journal_socket_available(void);
ND_LOG_FIELD_ID nd_log_field_id_by_name(const char *field, size_t len);
int nd_log_priority2id(const char *priority);
typedef bool (*log_formatter_callback_t)(BUFFER *wb, void *data);
struct log_stack_entry {
ND_LOG_FIELD_ID id;
ND_LOG_STACK_FIELD_TYPE type;
bool set;
union {
const char *txt;
struct netdata_string *str;
BUFFER *bfr;
uint64_t u64;
int64_t i64;
double dbl;
const uuid_t *uuid;
struct {
log_formatter_callback_t formatter;
void *formatter_data;
} cb;
};
};
#define ND_LOG_STACK _cleanup_(log_stack_pop) struct log_stack_entry
#define ND_LOG_STACK_PUSH(lgs) log_stack_push(lgs)
#define ND_LOG_FIELD_TXT(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_TXT, .txt = (value), .set = true, }
#define ND_LOG_FIELD_STR(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_STR, .str = (value), .set = true, }
#define ND_LOG_FIELD_BFR(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_BFR, .bfr = (value), .set = true, }
#define ND_LOG_FIELD_U64(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_U64, .u64 = (value), .set = true, }
#define ND_LOG_FIELD_I64(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_I64, .i64 = (value), .set = true, }
#define ND_LOG_FIELD_DBL(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_DBL, .dbl = (value), .set = true, }
#define ND_LOG_FIELD_CB(field, func, data) (struct log_stack_entry){ .id = (field), .type = NDFT_CALLBACK, .cb = { .formatter = (func), .formatter_data = (data) }, .set = true, }
#define ND_LOG_FIELD_UUID(field, value) (struct log_stack_entry){ .id = (field), .type = NDFT_UUID, .uuid = (value), .set = true, }
#define ND_LOG_FIELD_END() (struct log_stack_entry){ .id = NDF_STOP, .type = NDFT_UNSET, .set = false, }
void log_stack_pop(void *ptr);
void log_stack_push(struct log_stack_entry *lgs);
#define D_WEB_BUFFER 0x0000000000000001
#define D_WEB_CLIENT 0x0000000000000002
#define D_LISTENER 0x0000000000000004
@ -46,114 +221,75 @@ extern "C" {
#define D_REPLICATION 0x0000002000000000
#define D_SYSTEM 0x8000000000000000
extern int web_server_is_multithreaded;
extern uint64_t debug_flags;
extern const char *program_name;
extern int stdaccess_fd;
extern FILE *stdaccess;
extern int stdhealth_fd;
extern FILE *stdhealth;
extern int stdcollector_fd;
extern FILE *stderror;
extern const char *stdaccess_filename;
extern const char *stderr_filename;
extern const char *stdout_filename;
extern const char *stdhealth_filename;
extern const char *stdcollector_filename;
extern const char *facility_log;
#ifdef ENABLE_ACLK
extern const char *aclklog_filename;
extern int aclklog_fd;
extern FILE *aclklog;
extern int aclklog_enabled;
#endif
extern int access_log_syslog;
extern int error_log_syslog;
extern int output_log_syslog;
extern int health_log_syslog;
extern time_t error_log_throttle_period;
extern unsigned long error_log_errors_per_period, error_log_errors_per_period_backup;
int error_log_limit(int reset);
void open_all_log_files();
void reopen_all_log_files();
#define LOG_DATE_LENGTH 26
void log_date(char *buffer, size_t len, time_t now);
static inline void debug_dummy(void) {}
void error_log_limit_reset(void);
void error_log_limit_unlimited(void);
void nd_log_limits_reset(void);
void nd_log_limits_unlimited(void);
typedef struct error_with_limit {
time_t log_every;
size_t count;
time_t last_logged;
usec_t sleep_ut;
} ERROR_LIMIT;
typedef enum netdata_log_level {
NETDATA_LOG_LEVEL_ERROR,
NETDATA_LOG_LEVEL_INFO,
NETDATA_LOG_LEVEL_END
} netdata_log_level_t;
#define NETDATA_LOG_LEVEL_INFO_STR "info"
#define NETDATA_LOG_LEVEL_ERROR_STR "error"
#define NETDATA_LOG_LEVEL_ERROR_SHORT_STR "err"
extern netdata_log_level_t global_log_severity_level;
netdata_log_level_t log_severity_string_to_severity_level(char *level);
char *log_severity_level_to_severity_string(netdata_log_level_t level);
void log_set_global_severity_level(netdata_log_level_t value);
void log_set_global_severity_for_external_plugins();
#define error_limit_static_global_var(var, log_every_secs, sleep_usecs) static ERROR_LIMIT var = { .last_logged = 0, .count = 0, .log_every = (log_every_secs), .sleep_ut = (sleep_usecs) }
#define error_limit_static_thread_var(var, log_every_secs, sleep_usecs) static __thread ERROR_LIMIT var = { .last_logged = 0, .count = 0, .log_every = (log_every_secs), .sleep_ut = (sleep_usecs) }
#define NDLP_INFO_STR "info"
#ifdef NETDATA_INTERNAL_CHECKS
#define netdata_log_debug(type, args...) do { if(unlikely(debug_flags & type)) debug_int(__FILE__, __FUNCTION__, __LINE__, ##args); } while(0)
#define internal_error(condition, args...) do { if(unlikely(condition)) error_int(0, "IERR", __FILE__, __FUNCTION__, __LINE__, ##args); } while(0)
#define internal_fatal(condition, args...) do { if(unlikely(condition)) fatal_int(__FILE__, __FUNCTION__, __LINE__, ##args); } while(0)
#define netdata_log_debug(type, args...) do { if(unlikely(debug_flags & type)) netdata_logger(NDLS_DEBUG, NDLP_DEBUG, __FILE__, __FUNCTION__, __LINE__, ##args); } while(0)
#define internal_error(condition, args...) do { if(unlikely(condition)) netdata_logger(NDLS_DAEMON, NDLP_DEBUG, __FILE__, __FUNCTION__, __LINE__, ##args); } while(0)
#define internal_fatal(condition, args...) do { if(unlikely(condition)) netdata_logger_fatal(__FILE__, __FUNCTION__, __LINE__, ##args); } while(0)
#else
#define netdata_log_debug(type, args...) debug_dummy()
#define internal_error(args...) debug_dummy()
#define internal_fatal(args...) debug_dummy()
#endif
#define netdata_log_info(args...) info_int(0, __FILE__, __FUNCTION__, __LINE__, ##args)
#define collector_info(args...) info_int(1, __FILE__, __FUNCTION__, __LINE__, ##args)
#define infoerr(args...) error_int(0, "INFO", __FILE__, __FUNCTION__, __LINE__, ##args)
#define netdata_log_error(args...) error_int(0, "ERROR", __FILE__, __FUNCTION__, __LINE__, ##args)
#define collector_infoerr(args...) error_int(1, "INFO", __FILE__, __FUNCTION__, __LINE__, ##args)
#define collector_error(args...) error_int(1, "ERROR", __FILE__, __FUNCTION__, __LINE__, ##args)
#define error_limit(erl, args...) error_limit_int(erl, "ERROR", __FILE__, __FUNCTION__, __LINE__, ##args)
#define fatal(args...) fatal_int(__FILE__, __FUNCTION__, __LINE__, ##args)
#define fatal_assert(expr) ((expr) ? (void)(0) : fatal_int(__FILE__, __FUNCTION__, __LINE__, "Assertion `%s' failed", #expr))
#define fatal(args...) netdata_logger_fatal(__FILE__, __FUNCTION__, __LINE__, ##args)
#define fatal_assert(expr) ((expr) ? (void)(0) : netdata_logger_fatal(__FILE__, __FUNCTION__, __LINE__, "Assertion `%s' failed", #expr))
// ----------------------------------------------------------------------------
// normal logging
void netdata_logger(ND_LOG_SOURCES source, ND_LOG_FIELD_PRIORITY priority, const char *file, const char *function, unsigned long line, const char *fmt, ... ) PRINTFLIKE(6, 7);
#define nd_log(NDLS, NDLP, args...) netdata_logger(NDLS, NDLP, __FILE__, __FUNCTION__, __LINE__, ##args)
#define nd_log_daemon(NDLP, args...) netdata_logger(NDLS_DAEMON, NDLP, __FILE__, __FUNCTION__, __LINE__, ##args)
#define nd_log_collector(NDLP, args...) netdata_logger(NDLS_COLLECTORS, NDLP, __FILE__, __FUNCTION__, __LINE__, ##args)
#define netdata_log_info(args...) netdata_logger(NDLS_DAEMON, NDLP_INFO, __FILE__, __FUNCTION__, __LINE__, ##args)
#define netdata_log_error(args...) netdata_logger(NDLS_DAEMON, NDLP_ERR, __FILE__, __FUNCTION__, __LINE__, ##args)
#define collector_info(args...) netdata_logger(NDLS_COLLECTORS, NDLP_INFO, __FILE__, __FUNCTION__, __LINE__, ##args)
#define collector_error(args...) netdata_logger(NDLS_COLLECTORS, NDLP_ERR, __FILE__, __FUNCTION__, __LINE__, ##args)
#define log_aclk_message_bin(__data, __data_len, __tx, __mqtt_topic, __message_name) \
nd_log(NDLS_ACLK, NDLP_INFO, \
"direction:%s message:'%s' topic:'%s' json:'%.*s'", \
(__tx) ? "OUTGOING" : "INCOMING", __message_name, __mqtt_topic, (int)(__data_len), __data)
// ----------------------------------------------------------------------------
// logging with limits
typedef struct error_with_limit {
SPINLOCK spinlock;
time_t log_every;
size_t count;
time_t last_logged;
usec_t sleep_ut;
} ERROR_LIMIT;
#define nd_log_limit_static_global_var(var, log_every_secs, sleep_usecs) static ERROR_LIMIT var = { .last_logged = 0, .count = 0, .log_every = (log_every_secs), .sleep_ut = (sleep_usecs) }
#define nd_log_limit_static_thread_var(var, log_every_secs, sleep_usecs) static __thread ERROR_LIMIT var = { .last_logged = 0, .count = 0, .log_every = (log_every_secs), .sleep_ut = (sleep_usecs) }
void netdata_logger_with_limit(ERROR_LIMIT *erl, ND_LOG_SOURCES source, ND_LOG_FIELD_PRIORITY priority, const char *file, const char *function, unsigned long line, const char *fmt, ... ) PRINTFLIKE(7, 8);;
#define nd_log_limit(erl, NDLS, NDLP, args...) netdata_logger_with_limit(erl, NDLS, NDLP, __FILE__, __FUNCTION__, __LINE__, ##args)
// ----------------------------------------------------------------------------
void send_statistics(const char *action, const char *action_result, const char *action_data);
void debug_int( const char *file, const char *function, const unsigned long line, const char *fmt, ... ) PRINTFLIKE(4, 5);
void info_int( int is_collector, const char *file, const char *function, const unsigned long line, const char *fmt, ... ) PRINTFLIKE(5, 6);
void error_int( int is_collector, const char *prefix, const char *file, const char *function, const unsigned long line, const char *fmt, ... ) PRINTFLIKE(6, 7);
void error_limit_int(ERROR_LIMIT *erl, const char *prefix, const char *file __maybe_unused, const char *function __maybe_unused, unsigned long line __maybe_unused, const char *fmt, ... ) PRINTFLIKE(6, 7);;
void fatal_int( const char *file, const char *function, const unsigned long line, const char *fmt, ... ) NORETURN PRINTFLIKE(4, 5);
void netdata_log_access( const char *fmt, ... ) PRINTFLIKE(1, 2);
void netdata_log_health( const char *fmt, ... ) PRINTFLIKE(1, 2);
#ifdef ENABLE_ACLK
void log_aclk_message_bin( const char *data, const size_t data_len, int tx, const char *mqtt_topic, const char *message_name);
#endif
void netdata_logger_fatal( const char *file, const char *function, unsigned long line, const char *fmt, ... ) NORETURN PRINTFLIKE(4, 5);
# ifdef __cplusplus
}

1015
libnetdata/log/log2journal.c Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,518 @@
# log2journal
`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.
By combining these tools, together with the usual UNIX shell tools you can create advanced log processing pipelines sending any kind of structured text logs to systemd-journald. This is a simple, but powerful and efficient way to handle log processing.
The process involves the usual piping of shell commands, to get and process the log files in realtime.
The overall process looks like this:
```bash
tail -F /var/log/nginx/*.log |\ # outputs log lines
log2journal 'PATTERN' |\ # outputs Journal Export Format
sed -u -e SEARCH-REPLACE-RULES |\ # optional rewriting rules
systemd-cat-native # send to local/remote journald
```
Let's see the steps:
1. `tail -F /var/log/nginx/*.log`<br/>this command will tail all `*.log` files in `/var/log/nginx/`. We use `-F` instead of `-f` to ensure that files will still be tailed after log rotation.
2. `log2joural` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
```bash
KEY1=VALUE1 # << start of the first log line
KEY2=VALUE2
# << log lines separator
KEY1=VALUE1 # << start of the second log line
KEY2=VALUE2
```
3. `sed` is an optional step and is an example. Any kind of processing can be applied at this stage, in case we want to alter the fields in some way. For example, we may want to set the PRIORITY field of systemd-journal to make Netdata dashboards and `journalctl` color the internal server errors. Or we may want to anonymize the logs, to remove sensitive information from them. Or even we may want to remove the variable parts of the requests, to make them uniform. We will see below how such processing can be done.
4. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
## Real-life example
We have an nginx server logging in this format:
```bash
log_format access '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'$request_length $request_time '
'"$http_referer" "$http_user_agent"';
```
First, let's find the right pattern for `log2journal`. We ask ChatGPT:
```
My nginx log uses this log format:
log_format access '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'$request_length $request_time '
'"$http_referer" "$http_user_agent"';
I want to use `log2joural` to convert this log for systemd-journal.
`log2journal` accepts a PCRE2 regular expression, using the named groups
in the pattern as the journal fields to extract from the logs.
Prefix all PCRE2 group names with `NGINX_` and use capital characters only.
For the $request, use the field `MESSAGE` (without NGINX_ prefix), so that
it will appear in systemd journals as the message of the log.
Please give me the PCRE2 pattern.
```
ChatGPT replies with this:
```regexp
^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"
```
Let's test it with a sample line (instead of `tail`):
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"'
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
```
As you can see, it extracted all the fields.
The `MESSAGE` however, has 3 fields by itself: the method, the URL and the procotol version. Let's ask ChatGPT to extract these too:
```
I see that the MESSAGE has 3 key items in it. The request method (GET, POST,
etc), the URL and HTTP protocol version.
I want to keep the MESSAGE as it is, with all the information in it, but also
extract the 3 items from it as separate fields.
Can this be done?
```
ChatGPT responded with this:
```regexp
^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"
```
Let's test this too:
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"'
MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< MESSAGE
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1 # <<<<<<<<< VERSION
NGINX_METHOD=GET # <<<<<<<<< METHOD
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html # <<<<<<<<< URL
```
Ideally, we would want the 5xx errors to be red in our `journalctl` output. To achieve that we need to add a PRIORITY field to set the log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these:
```c
#define LOG_EMERG 0 /* system is unusable */
#define LOG_ALERT 1 /* action must be taken immediately */
#define LOG_CRIT 2 /* critical conditions */
#define LOG_ERR 3 /* error conditions */
#define LOG_WARNING 4 /* warning conditions */
#define LOG_NOTICE 5 /* normal but significant condition */
#define LOG_INFO 6 /* informational */
#define LOG_DEBUG 7 /* debug-level messages */
```
Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red in `journalctl`), or 4 (yellow in `journalctl`).
To set the PRIORITY field in the output, we can use `NGINX_STATUS` fields. We need a copy of it, which we will alter later.
We can instruct `log2journal` to duplicate `NGINX_STATUS`, like this: `log2journal --duplicate=STATUS2PRIORITY=NGINX_STATUS`. Let's try it:
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1
NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
STATUS2PRIORITY=200 # <<<<<<<<< STATUS2PRIORITY IS HERE
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html
```
Now that we have the `STATUS2PRIORITY` field equal to the `NGINX_STATUS`, we can use a `sed` command to change it to the `PRIORITY` field we want. The `sed` command could be:
```bash
sed -u -e 's|STATUS2PRIORITY=5.*|PRIORITY=3|' -e 's|STATUS2PRIORITY=.*|PRIORITY=6|'
```
We use `-u` for unbuffered communication.
This command first changes all 5xx `STATUS2PRIORITY` fields to `PRIORITY=3` (error) and then changes all the rest to `PRIORITY=6` (info). Let's see the whole of it:
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS | sed -u -e 's|STATUS2PRIORITY=5.*|PRIORITY=3|' -e 's|STATUS2PRIORITY=.*|PRIORITY=6|'
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1
NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
PRIORITY=6 # <<<<<<<<< PRIORITY IS HERE
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html
```
Similarly, we could duplicate `NGINX_URL` to `NGINX_ENDPOINT` and then process it with sed to remove any query string, or replace IDs in the URL path with constant names, thus giving us uniform endpoints independently of the parameters.
To complete the example, we can also inject a `SYSLOG_IDENTIFIER` with `log2journal`, using `--inject=SYSLOG_IDENTIFIER=nginx`, like this:
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS --inject=SYSLOG_IDENTIFIER=nginx | sed -u -e 's|STATUS2PRIORITY=5.*|PRIORITY=3|' -e 's|STATUS2PRIORITY=.*|PRIORITY=6|'
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1
NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
PRIORITY=6
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html
SYSLOG_IDENTIFIER=nginx # <<<<<<<<< THIS HAS BEEN ADDED
```
Now the message is ready to be sent to a systemd-journal. For this we use `systemd-cat-native`. This command can send such messages to a journal running on the localhost, a local journal namespace, or a `systemd-journal-remote` running on another server. By just appending `| systemd-cat-native` to the command, the message will be sent to the local journal.
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS --inject=SYSLOG_IDENTIFIER=nginx | sed -u -e 's|STATUS2PRIORITY=5.*|PRIORITY=3|' -e 's|STATUS2PRIORITY=.*|PRIORITY=6|' | systemd-cat-native
# no output
# let's find the message
# journalctl -o verbose SYSLOG_IDENTIFIER=nginx
Sun 2023-11-19 04:34:06.583912 EET [s=1eb59e7934984104ab3b61f5d9648057;i=115b6d4;b=7282d89d2e6e4299969a6030302ff3e4;m=69b419673;t=60a783417ac72;x=2cec5dde8bf01ee7]
PRIORITY=6
_UID=0
_GID=0
_BOOT_ID=7282d89d2e6e4299969a6030302ff3e4
_MACHINE_ID=6b72c55db4f9411dbbb80b70537bf3a8
_HOSTNAME=costa-xps9500
_RUNTIME_SCOPE=system
_TRANSPORT=journal
_CAP_EFFECTIVE=1ffffffffff
_AUDIT_LOGINUID=1000
_AUDIT_SESSION=1
_SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope
_SYSTEMD_OWNER_UID=1000
_SYSTEMD_UNIT=user@1000.service
_SYSTEMD_USER_UNIT=vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope
_SYSTEMD_SLICE=user-1000.slice
_SYSTEMD_USER_SLICE=app-org.gnome.Terminal.slice
_SYSTEMD_INVOCATION_ID=6195d8c4c6654481ac9a30e9a8622ba1
_COMM=systemd-cat-nat
MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< CHECK
NGINX_BODY_BYTES_SENT=4172 # <<<<<<<<< CHECK
NGINX_HTTP_REFERER=- # <<<<<<<<< CHECK
NGINX_HTTP_USER_AGENT=Go-http-client/1.1 # <<<<<<<<< CHECK
NGINX_HTTP_VERSION=1.1 # <<<<<<<<< CHECK
NGINX_METHOD=GET # <<<<<<<<< CHECK
NGINX_REMOTE_ADDR=1.2.3.4 # <<<<<<<<< CHECK
NGINX_REMOTE_USER=- # <<<<<<<<< CHECK
NGINX_REQUEST_LENGTH=104 # <<<<<<<<< CHECK
NGINX_REQUEST_TIME=0.001 # <<<<<<<<< CHECK
NGINX_STATUS=200 # <<<<<<<<< CHECK
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 # <<<<<<<<< CHECK
NGINX_URL=/index.html # <<<<<<<<< CHECK
SYSLOG_IDENTIFIER=nginx # <<<<<<<<< CHECK
_PID=354312
_SOURCE_REALTIME_TIMESTAMP=1700361246583912
```
So, the log line, with all its fields parsed, ended up in systemd-journal.
The complete example, would look like the following script.
Running this script with parameter `test` will produce output on the terminal for you to inspect.
Unmatched log entries are added to the journal with PRIORITY=1 (`ERR_ALERT`), so that you can spot them.
We also used the `--filename-key` of `log2journal`, which parses the filename when `tail` switches output
between files, and adds the field `NGINX_LOG_FILE` with the filename each log line comes from.
Finally, the script also adds the field `NGINX_STATUS_FAMILY` taking values `2xx`, `3xx`, etc, so that
it is easy to find all the logs of a specific status family.
```bash
#!/usr/bin/env bash
test=0
last=0
send_or_show='./systemd-cat-native'
[ "${1}" = "test" ] && test=1 && last=100 && send_or_show=cat
pattern='(?x) # Enable PCRE2 extended mode
^
(?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
(?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
\[
(?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
\]
\s+ "
(?<MESSAGE> # MESSAGE
(?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
(?<NGINX_URL>[^ ]+) \s+ # NGINX_URL
HTTP/(?<NGINX_HTTP_VERSION>[^"]+) # NGINX_HTTP_VERSION
)
" \s+
(?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
(?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
"(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
"(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
'
tail -n $last -F /var/log/nginx/*access.log |\
log2journal "${pattern}" \
--filename-key=NGINX_LOG_FILE \
--duplicate=STATUS2PRIORITY=NGINX_STATUS \
--duplicate=STATUS_FAMILY=NGINX_STATUS \
--inject=SYSLOG_IDENTIFIER=nginx \
--unmatched-key=MESSAGE \
--inject-unmatched=PRIORITY=1 \
| sed -u \
-e 's|^STATUS2PRIORITY=5.*$|PRIORITY=3|' \
-e 's|^STATUS2PRIORITY=.*$|PRIORITY=6|' \
-e 's|^STATUS_FAMILY=\([0-9]\).*$|NGINX_STATUS_FAMILY=\1xx|' \
-e 's|^STATUS_FAMILY=.*$|NGINX_STATUS_FAMILY=UNKNOWN|' \
| $send_or_show
```
## `log2journal` options
```
Netdata log2journal v1.43.0-337-g116dc1bc3
Convert structured log input to systemd Journal Export Format.
Using PCRE2 patterns, extract the fields from structured logs on the standard
input, and generate output according to systemd Journal Export Format
Usage: ./log2journal [OPTIONS] PATTERN
Options:
--filename-key=KEY
Add a field with KEY as the key and the current filename as value.
Automatically detects filenames when piped after 'tail -F',
and tail matches multiple filenames.
To inject the filename when tailing a single file, use --inject.
--unmatched-key=KEY
Include unmatched log entries in the output with KEY as the field name.
Use this to include unmatched entries to the output stream.
Usually it should be set to --unmatched-key=MESSAGE so that the
unmatched entry will appear as the log message in the journals.
Use --inject-unmatched to inject additional fields to unmatched lines.
--duplicate=TARGET=KEY1[,KEY2[,KEY3[,...]]
Create a new key called TARGET, duplicating the values of the keys
given. Useful for further processing. When multiple keys are given,
their values are separated by comma.
Up to 2048 duplications can be given on the command line, and up to
10 keys per duplication command are allowed.
--inject=LINE
Inject constant fields to the output (both matched and unmatched logs).
--inject entries are added to unmatched lines too, when their key is
not used in --inject-unmatched (--inject-unmatched override --inject).
Up to 2048 fields can be injected.
--inject-unmatched=LINE
Inject lines into the output for each unmatched log entry.
Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched
lines as errors, so that they can easily be spotted in the journals.
Up to 2048 such lines can be injected.
-h, --help
Display this help and exit.
PATTERN
PATTERN should be a valid PCRE2 regular expression.
RE2 regular expressions (like the ones usually used in Go applications),
are usually valid PCRE2 patterns too.
Regular expressions without named groups are ignored.
The maximum line length accepted is 1048576 characters.
The maximum number of fields in the PCRE2 pattern is 8192.
JOURNAL FIELDS RULES (enforced by systemd-journald)
- field names can be up to 64 characters
- the only allowed field characters are A-Z, 0-9 and underscore
- the first character of fields cannot be a digit
- protected journal fields start with underscore:
* they are accepted by systemd-journal-remote
* they are NOT accepted by a local systemd-journald
For best results, always include these fields:
MESSAGE=TEXT
The MESSAGE is the body of the log entry.
This field is what we usually see in our logs.
PRIORITY=NUMBER
PRIORITY sets the severity of the log entry.
0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
- Emergency events (0) are usually broadcast to all terminals.
- Emergency, alert, critical, and error (0-3) are usually colored red.
- Warning (4) entries are usually colored yellow.
- Notice (5) entries are usually bold or have a brighter white color.
- Info (6) entries are the default.
- Debug (7) entries are usually grayed or dimmed.
SYSLOG_IDENTIFIER=NAME
SYSLOG_IDENTIFIER sets the name of application.
Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs
You can find the most common fields at 'man systemd.journal-fields'.
```
## `systemd-cat-native` options
```
Netdata systemd-cat-native v1.43.0-319-g4ada93a6e
This program reads from its standard input, lines in the format:
KEY1=VALUE1\n
KEY2=VALUE2\n
KEYN=VALUEN\n
\n
and sends them to systemd-journal.
- Binary journal fields are not accepted at its input
- Binary journal fields can be generated after newline processing
- Messages have to be separated by an empty line
- Keys starting with underscore are not accepted (by journald)
- Other rules imposed by systemd-journald are imposed (by journald)
Usage:
./systemd-cat-native
[--newline=STRING]
[--log-as-netdata|-N]
[--namespace=NAMESPACE] [--socket=PATH]
[--url=URL [--key=FILENAME] [--cert=FILENAME] [--trust=FILENAME|all]]
The program has the following modes of logging:
* Log to a local systemd-journald or stderr
This is the default mode. If systemd-journald is available, logs will be
sent to systemd, otherwise logs will be printed on stderr, using logfmt
formatting. Options --socket and --namespace are available to configure
the journal destination:
--socket=PATH
The path of a systemd-journald UNIX socket.
The program will use the default systemd-journald socket when this
option is not used.
--namespace=NAMESPACE
The name of a configured and running systemd-journald namespace.
The program will produce the socket path based on its internal
defaults, to send the messages to the systemd journal namespace.
* Log as Netdata, enabled with --log-as-netdata or -N
In this mode the program uses environment variables set by Netdata for
the log destination. Only log fields defined by Netdata are accepted.
If the environment variables expected by Netdata are not found, it
falls back to stderr logging in logfmt format.
* Log to a systemd-journal-remote TCP socket, enabled with --url=URL
In this mode, the program will directly sent logs to a remote systemd
journal (systemd-journal-remote expected at the destination)
This mode is available even when the local system does not support
systemd, or even it is not Linux, allowing a remote Linux systemd
journald to become the logs database of the local system.
--url=URL
The destination systemd-journal-remote address and port, similarly
to what /etc/systemd/journal-upload.conf accepts.
Usually it is in the form: https://ip.address:19532
Both http and https URLs are accepted. When using https, the
following additional options are accepted:
--key=FILENAME
The filename of the private key of the server.
The default is: /etc/ssl/private/journal-upload.pem
--cert=FILENAME
The filename of the public key of the server.
The default is: /etc/ssl/certs/journal-upload.pem
--trust=FILENAME | all
The filename of the trusted CA public key.
The default is: /etc/ssl/ca/trusted.pem
The keyword 'all' can be used to trust all CAs.
NEWLINES PROCESSING
systemd-journal logs entries may have newlines in them. However the
Journal Export Format uses binary formatted data to achieve this,
making it hard for text processing.
To overcome this limitation, this program allows single-line text
formatted values at its input, to be binary formatted multi-line Journal
Export Format at its output.
To achieve that it allows replacing a given string to a newline.
The parameter --newline=STRING allows setting the string to be replaced
with newlines.
For example by setting --newline='{NEWLINE}', the program will replace
all occurrences of {NEWLINE} with the newline character, within each
VALUE of the KEY=VALUE lines. Once this this done, the program will
switch the field to the binary Journal Export Format before sending the
log event to systemd-journal.
```

View file

@ -0,0 +1,781 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "systemd-cat-native.h"
#include "../required_dummies.h"
#ifdef __FreeBSD__
#include <sys/endian.h>
#endif
#ifdef __APPLE__
#include <machine/endian.h>
#endif
static void log_message_to_stderr(BUFFER *msg) {
CLEAN_BUFFER *tmp = buffer_create(0, NULL);
for(size_t i = 0; i < msg->len ;i++) {
if(isprint(msg->buffer[i]))
buffer_putc(tmp, msg->buffer[i]);
else {
buffer_putc(tmp, '[');
buffer_print_uint64_hex(tmp, msg->buffer[i]);
buffer_putc(tmp, ']');
}
}
fprintf(stderr, "SENDING: %s\n", buffer_tostring(tmp));
}
static inline buffered_reader_ret_t get_next_line(struct buffered_reader *reader, BUFFER *line, int timeout_ms) {
while(true) {
if(unlikely(!buffered_reader_next_line(reader, line))) {
buffered_reader_ret_t ret = buffered_reader_read_timeout(reader, STDIN_FILENO, timeout_ms, false);
if(unlikely(ret != BUFFERED_READER_READ_OK))
return ret;
continue;
}
else {
// make sure the buffer is NULL terminated
line->buffer[line->len] = '\0';
// remove the trailing newlines
while(line->len && line->buffer[line->len - 1] == '\n')
line->buffer[--line->len] = '\0';
return BUFFERED_READER_READ_OK;
}
}
}
static inline size_t copy_replacing_newlines(char *dst, size_t dst_len, const char *src, size_t src_len, const char *newline) {
if (!dst || !src) return 0;
const char *current_src = src;
const char *src_end = src + src_len; // Pointer to the end of src
char *current_dst = dst;
size_t remaining_dst_len = dst_len;
size_t newline_len = newline && *newline ? strlen(newline) : 0;
size_t bytes_copied = 0; // To track the number of bytes copied
while (remaining_dst_len > 1 && current_src < src_end) {
if (newline_len > 0) {
const char *found = strstr(current_src, newline);
if (found && found < src_end) {
size_t copy_len = found - current_src;
if (copy_len >= remaining_dst_len) copy_len = remaining_dst_len - 1;
memcpy(current_dst, current_src, copy_len);
current_dst += copy_len;
*current_dst++ = '\n';
remaining_dst_len -= (copy_len + 1);
bytes_copied += copy_len + 1; // +1 for the newline character
current_src = found + newline_len;
continue;
}
}
// Copy the remaining part of src to dst
size_t copy_len = src_end - current_src;
if (copy_len >= remaining_dst_len) copy_len = remaining_dst_len - 1;
memcpy(current_dst, current_src, copy_len);
current_dst += copy_len;
remaining_dst_len -= copy_len;
bytes_copied += copy_len;
break;
}
// Ensure the string is null-terminated
*current_dst = '\0';
return bytes_copied;
}
static inline void buffer_memcat_replacing_newlines(BUFFER *wb, const char *src, size_t src_len, const char *newline) {
if(!src) return;
const char *equal;
if(!newline || !*newline || !strstr(src, newline) || !(equal = strchr(src, '='))) {
buffer_memcat(wb, src, src_len);
buffer_putc(wb, '\n');
return;
}
size_t key_len = equal - src;
buffer_memcat(wb, src, key_len);
buffer_putc(wb, '\n');
char *length_ptr = &wb->buffer[wb->len];
uint64_t le_size = 0;
buffer_memcat(wb, &le_size, sizeof(le_size));
const char *value = ++equal;
size_t value_len = src_len - key_len - 1;
buffer_need_bytes(wb, value_len + 1);
size_t size = copy_replacing_newlines(&wb->buffer[wb->len], value_len + 1, value, value_len, newline);
wb->len += size;
buffer_putc(wb, '\n');
le_size = htole64(size);
memcpy(length_ptr, &le_size, sizeof(le_size));
}
// ----------------------------------------------------------------------------
// log to a systemd-journal-remote
#ifdef HAVE_CURL
#include <curl/curl.h>
#ifndef HOST_NAME_MAX
#define HOST_NAME_MAX 256
#endif
char global_hostname[HOST_NAME_MAX] = "";
char global_boot_id[UUID_COMPACT_STR_LEN] = "";
#define BOOT_ID_PATH "/proc/sys/kernel/random/boot_id"
#define DEFAULT_PRIVATE_KEY "/etc/ssl/private/journal-upload.pem"
#define DEFAULT_PUBLIC_KEY "/etc/ssl/certs/journal-upload.pem"
#define DEFAULT_CA_CERT "/etc/ssl/ca/trusted.pem"
struct upload_data {
char *data;
size_t length;
};
static size_t systemd_journal_remote_read_callback(void *ptr, size_t size, size_t nmemb, void *userp) {
struct upload_data *upload = (struct upload_data *)userp;
size_t buffer_size = size * nmemb;
if (upload->length) {
size_t copy_size = upload->length < buffer_size ? upload->length : buffer_size;
memcpy(ptr, upload->data, copy_size);
upload->data += copy_size;
upload->length -= copy_size;
return copy_size;
}
return 0;
}
CURL* initialize_connection_to_systemd_journal_remote(const char* url, const char* private_key, const char* public_key, const char* ca_cert, struct curl_slist **headers) {
CURL *curl = curl_easy_init();
if (!curl) {
fprintf(stderr, "Failed to initialize curl\n");
return NULL;
}
*headers = curl_slist_append(*headers, "Content-Type: application/vnd.fdo.journal");
*headers = curl_slist_append(*headers, "Transfer-Encoding: chunked");
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, *headers);
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_POST, 1L);
curl_easy_setopt(curl, CURLOPT_READFUNCTION, systemd_journal_remote_read_callback);
if (strncmp(url, "https://", 8) == 0) {
if (private_key) curl_easy_setopt(curl, CURLOPT_SSLKEY, private_key);
if (public_key) curl_easy_setopt(curl, CURLOPT_SSLCERT, public_key);
if (strcmp(ca_cert, "all") != 0) {
curl_easy_setopt(curl, CURLOPT_CAINFO, ca_cert);
} else {
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
}
}
// curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L); // Remove for less verbose output
return curl;
}
static void journal_remote_complete_event(BUFFER *msg, usec_t *monotonic_ut) {
usec_t ut = now_monotonic_usec();
if(monotonic_ut)
*monotonic_ut = ut;
buffer_sprintf(msg,
""
"__REALTIME_TIMESTAMP=%llu\n"
"__MONOTONIC_TIMESTAMP=%llu\n"
"_BOOT_ID=%s\n"
"_HOSTNAME=%s\n"
"\n"
, now_realtime_usec()
, ut
, global_boot_id
, global_hostname
);
}
static CURLcode journal_remote_send_buffer(CURL* curl, BUFFER *msg) {
// log_message_to_stderr(msg);
struct upload_data upload = {0};
if (!curl || !buffer_strlen(msg))
return CURLE_FAILED_INIT;
upload.data = (char *) buffer_tostring(msg);
upload.length = buffer_strlen(msg);
curl_easy_setopt(curl, CURLOPT_READDATA, &upload);
curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)upload.length);
return curl_easy_perform(curl);
}
typedef enum {
LOG_TO_JOURNAL_REMOTE_BAD_PARAMS = -1,
LOG_TO_JOURNAL_REMOTE_CANNOT_INITIALIZE = -2,
LOG_TO_JOURNAL_REMOTE_CANNOT_SEND = -3,
LOG_TO_JOURNAL_REMOTE_CANNOT_READ = -4,
} log_to_journal_remote_ret_t;
static log_to_journal_remote_ret_t log_input_to_journal_remote(const char *url, const char *key, const char *cert, const char *trust, const char *newline, int timeout_ms) {
if(!url || !*url) {
fprintf(stderr, "No URL is given.\n");
return LOG_TO_JOURNAL_REMOTE_BAD_PARAMS;
}
if(timeout_ms < 10)
timeout_ms = 10;
global_boot_id[0] = '\0';
char boot_id[1024];
if(read_file(BOOT_ID_PATH, boot_id, sizeof(boot_id)) == 0) {
uuid_t uuid;
if(uuid_parse_flexi(boot_id, uuid) == 0)
uuid_unparse_lower_compact(uuid, global_boot_id);
else
fprintf(stderr, "WARNING: cannot parse the UUID found in '%s'.\n", BOOT_ID_PATH);
}
if(global_boot_id[0] == '\0') {
fprintf(stderr, "WARNING: cannot read '%s'. Will generate a random _BOOT_ID.\n", BOOT_ID_PATH);
uuid_t uuid;
uuid_generate_random(uuid);
uuid_unparse_lower_compact(uuid, global_boot_id);
}
if(global_hostname[0] == '\0') {
if(gethostname(global_hostname, sizeof(global_hostname)) != 0) {
fprintf(stderr, "WARNING: cannot get system's hostname. Will use internal default.\n");
snprintfz(global_hostname, sizeof(global_hostname), "systemd-cat-native-unknown-hostname");
}
}
if(!key)
key = DEFAULT_PRIVATE_KEY;
if(!cert)
cert = DEFAULT_PUBLIC_KEY;
if(!trust)
trust = DEFAULT_CA_CERT;
char full_url[4096];
snprintfz(full_url, sizeof(full_url), "%s/upload", url);
CURL *curl;
CURLcode res = CURLE_OK;
struct curl_slist *headers = NULL;
curl_global_init(CURL_GLOBAL_ALL);
curl = initialize_connection_to_systemd_journal_remote(full_url, key, cert, trust, &headers);
if(!curl)
return LOG_TO_JOURNAL_REMOTE_CANNOT_INITIALIZE;
struct buffered_reader reader;
buffered_reader_init(&reader);
CLEAN_BUFFER *line = buffer_create(sizeof(reader.read_buffer), NULL);
CLEAN_BUFFER *msg = buffer_create(sizeof(reader.read_buffer), NULL);
size_t msg_full_events = 0;
size_t msg_partial_fields = 0;
usec_t msg_started_ut = 0;
size_t failures = 0;
size_t messages_logged = 0;
log_to_journal_remote_ret_t ret = 0;
while(true) {
buffered_reader_ret_t rc = get_next_line(&reader, line, timeout_ms);
if(rc == BUFFERED_READER_READ_POLL_TIMEOUT) {
if(msg_full_events && !msg_partial_fields) {
res = journal_remote_send_buffer(curl, msg);
if(res != CURLE_OK) {
fprintf(stderr, "journal_remote_send_buffer() failed: %s\n", curl_easy_strerror(res));
failures++;
ret = LOG_TO_JOURNAL_REMOTE_CANNOT_SEND;
goto cleanup;
}
else
messages_logged++;
msg_full_events = 0;
buffer_flush(msg);
}
}
else if(rc == BUFFERED_READER_READ_OK) {
if(!line->len) {
// an empty line - we are done for this message
if(msg_partial_fields) {
msg_partial_fields = 0;
usec_t ut;
journal_remote_complete_event(msg, &ut);
if(!msg_full_events)
msg_started_ut = ut;
msg_full_events++;
if(ut - msg_started_ut >= USEC_PER_SEC / 2) {
res = journal_remote_send_buffer(curl, msg);
if(res != CURLE_OK) {
fprintf(stderr, "journal_remote_send_buffer() failed: %s\n", curl_easy_strerror(res));
failures++;
ret = LOG_TO_JOURNAL_REMOTE_CANNOT_SEND;
goto cleanup;
}
else
messages_logged++;
msg_full_events = 0;
buffer_flush(msg);
}
}
}
else {
buffer_memcat_replacing_newlines(msg, line->buffer, line->len, newline);
msg_partial_fields++;
}
buffer_flush(line);
}
else {
fprintf(stderr, "cannot read input data, failed with code %d\n", rc);
ret = LOG_TO_JOURNAL_REMOTE_CANNOT_READ;
break;
}
}
if (msg_full_events || msg_partial_fields) {
if(msg_partial_fields) {
msg_partial_fields = 0;
msg_full_events++;
journal_remote_complete_event(msg, NULL);
}
if(msg_full_events) {
res = journal_remote_send_buffer(curl, msg);
if(res != CURLE_OK) {
fprintf(stderr, "journal_remote_send_buffer() failed: %s\n", curl_easy_strerror(res));
failures++;
}
else
messages_logged++;
msg_full_events = 0;
buffer_flush(msg);
}
}
cleanup:
curl_easy_cleanup(curl);
curl_slist_free_all(headers);
curl_global_cleanup();
return ret;
}
#endif
static int help(void) {
fprintf(stderr,
"\n"
"Netdata systemd-cat-native " PACKAGE_VERSION "\n"
"\n"
"This program reads from its standard input, lines in the format:\n"
"\n"
"KEY1=VALUE1\\n\n"
"KEY2=VALUE2\\n\n"
"KEYN=VALUEN\\n\n"
"\\n\n"
"\n"
"and sends them to systemd-journal.\n"
"\n"
" - Binary journal fields are not accepted at its input\n"
" - Binary journal fields can be generated after newline processing\n"
" - Messages have to be separated by an empty line\n"
" - Keys starting with underscore are not accepted (by journald)\n"
" - Other rules imposed by systemd-journald are imposed (by journald)\n"
"\n"
"Usage:\n"
"\n"
" %s\n"
" [--newline=STRING]\n"
" [--log-as-netdata|-N]\n"
" [--namespace=NAMESPACE] [--socket=PATH]\n"
#ifdef HAVE_CURL
" [--url=URL [--key=FILENAME] [--cert=FILENAME] [--trust=FILENAME|all]]\n"
#endif
"\n"
"The program has the following modes of logging:\n"
"\n"
" * Log to a local systemd-journald or stderr\n"
"\n"
" This is the default mode. If systemd-journald is available, logs will be\n"
" sent to systemd, otherwise logs will be printed on stderr, using logfmt\n"
" formatting. Options --socket and --namespace are available to configure\n"
" the journal destination:\n"
"\n"
" --socket=PATH\n"
" The path of a systemd-journald UNIX socket.\n"
" The program will use the default systemd-journald socket when this\n"
" option is not used.\n"
"\n"
" --namespace=NAMESPACE\n"
" The name of a configured and running systemd-journald namespace.\n"
" The program will produce the socket path based on its internal\n"
" defaults, to send the messages to the systemd journal namespace.\n"
"\n"
" * Log as Netdata, enabled with --log-as-netdata or -N\n"
"\n"
" In this mode the program uses environment variables set by Netdata for\n"
" the log destination. Only log fields defined by Netdata are accepted.\n"
" If the environment variables expected by Netdata are not found, it\n"
" falls back to stderr logging in logfmt format.\n"
#ifdef HAVE_CURL
"\n"
" * Log to a systemd-journal-remote TCP socket, enabled with --url=URL\n"
"\n"
" In this mode, the program will directly sent logs to a remote systemd\n"
" journal (systemd-journal-remote expected at the destination)\n"
" This mode is available even when the local system does not support\n"
" systemd, or even it is not Linux, allowing a remote Linux systemd\n"
" journald to become the logs database of the local system.\n"
"\n"
" Unfortunately systemd-journal-remote does not accept compressed\n"
" data over the network, so the stream will be uncompressed.\n"
"\n"
" --url=URL\n"
" The destination systemd-journal-remote address and port, similarly\n"
" to what /etc/systemd/journal-upload.conf accepts.\n"
" Usually it is in the form: https://ip.address:19532\n"
" Both http and https URLs are accepted. When using https, the\n"
" following additional options are accepted:\n"
"\n"
" --key=FILENAME\n"
" The filename of the private key of the server.\n"
" The default is: " DEFAULT_PRIVATE_KEY "\n"
"\n"
" --cert=FILENAME\n"
" The filename of the public key of the server.\n"
" The default is: " DEFAULT_PUBLIC_KEY "\n"
"\n"
" --trust=FILENAME | all\n"
" The filename of the trusted CA public key.\n"
" The default is: " DEFAULT_CA_CERT "\n"
" The keyword 'all' can be used to trust all CAs.\n"
"\n"
" --keep-trying\n"
" Keep trying to send the message, if the remote journal is not there.\n"
#endif
"\n"
" NEWLINES PROCESSING\n"
" systemd-journal logs entries may have newlines in them. However the\n"
" Journal Export Format uses binary formatted data to achieve this,\n"
" making it hard for text processing.\n"
"\n"
" To overcome this limitation, this program allows single-line text\n"
" formatted values at its input, to be binary formatted multi-line Journal\n"
" Export Format at its output.\n"
"\n"
" To achieve that it allows replacing a given string to a newline.\n"
" The parameter --newline=STRING allows setting the string to be replaced\n"
" with newlines.\n"
"\n"
" For example by setting --newline='{NEWLINE}', the program will replace\n"
" all occurrences of {NEWLINE} with the newline character, within each\n"
" VALUE of the KEY=VALUE lines. Once this this done, the program will\n"
" switch the field to the binary Journal Export Format before sending the\n"
" log event to systemd-journal.\n"
"\n",
program_name);
return 1;
}
// ----------------------------------------------------------------------------
// log as Netdata
static void lgs_reset(struct log_stack_entry *lgs) {
for(size_t i = 1; i < _NDF_MAX ;i++) {
if(lgs[i].type == NDFT_TXT && lgs[i].set && lgs[i].txt)
freez((void *)lgs[i].txt);
lgs[i] = ND_LOG_FIELD_TXT(i, NULL);
}
lgs[0] = ND_LOG_FIELD_TXT(NDF_MESSAGE, NULL);
lgs[_NDF_MAX] = ND_LOG_FIELD_END();
}
static const char *strdupz_replacing_newlines(const char *src, const char *newline) {
if(!src) src = "";
size_t src_len = strlen(src);
char *buffer = mallocz(src_len + 1);
copy_replacing_newlines(buffer, src_len + 1, src, src_len, newline);
return buffer;
}
static int log_input_as_netdata(const char *newline, int timeout_ms) {
struct buffered_reader reader;
buffered_reader_init(&reader);
CLEAN_BUFFER *line = buffer_create(sizeof(reader.read_buffer), NULL);
ND_LOG_STACK lgs[_NDF_MAX + 1] = { 0 };
ND_LOG_STACK_PUSH(lgs);
lgs_reset(lgs);
size_t fields_added = 0;
size_t messages_logged = 0;
ND_LOG_FIELD_PRIORITY priority = NDLP_INFO;
while(get_next_line(&reader, line, timeout_ms) == BUFFERED_READER_READ_OK) {
if(!line->len) {
// an empty line - we are done for this message
nd_log(NDLS_HEALTH, priority,
"added %d fields", // if the user supplied a MESSAGE, this will be ignored
fields_added);
lgs_reset(lgs);
fields_added = 0;
messages_logged++;
}
else {
char *equal = strchr(line->buffer, '=');
if(equal) {
const char *field = line->buffer;
size_t field_len = equal - line->buffer;
ND_LOG_FIELD_ID id = nd_log_field_id_by_name(field, field_len);
if(id != NDF_STOP) {
const char *value = ++equal;
if(lgs[id].txt)
freez((void *) lgs[id].txt);
lgs[id].txt = strdupz_replacing_newlines(value, newline);
lgs[id].set = true;
fields_added++;
if(id == NDF_PRIORITY)
priority = nd_log_priority2id(value);
}
else {
struct log_stack_entry backup = lgs[NDF_MESSAGE];
lgs[NDF_MESSAGE] = ND_LOG_FIELD_TXT(NDF_MESSAGE, NULL);
nd_log(NDLS_COLLECTORS, NDLP_ERR,
"Field '%.*s' is not a Netdata field. Ignoring it.",
field_len, field);
lgs[NDF_MESSAGE] = backup;
}
}
else {
struct log_stack_entry backup = lgs[NDF_MESSAGE];
lgs[NDF_MESSAGE] = ND_LOG_FIELD_TXT(NDF_MESSAGE, NULL);
nd_log(NDLS_COLLECTORS, NDLP_ERR,
"Line does not contain an = sign; ignoring it: %s",
line->buffer);
lgs[NDF_MESSAGE] = backup;
}
}
buffer_flush(line);
}
if(fields_added) {
nd_log(NDLS_HEALTH, priority, "added %d fields", fields_added);
messages_logged++;
}
return messages_logged ? 0 : 1;
}
// ----------------------------------------------------------------------------
// log to a local systemd-journald
static bool journal_local_send_buffer(int fd, BUFFER *msg) {
// log_message_to_stderr(msg);
bool ret = journal_direct_send(fd, msg->buffer, msg->len);
if (!ret)
fprintf(stderr, "Cannot send message to systemd journal.\n");
return ret;
}
static int log_input_to_journal(const char *socket, const char *namespace, const char *newline, int timeout_ms) {
char path[FILENAME_MAX + 1];
int fd = -1;
if(socket)
snprintfz(path, sizeof(path), "%s", socket);
else
journal_construct_path(path, sizeof(path), NULL, namespace);
fd = journal_direct_fd(path);
if (fd == -1) {
fprintf(stderr, "Cannot open '%s' as a UNIX socket (errno = %d)\n",
path, errno);
return 1;
}
struct buffered_reader reader;
buffered_reader_init(&reader);
CLEAN_BUFFER *line = buffer_create(sizeof(reader.read_buffer), NULL);
CLEAN_BUFFER *msg = buffer_create(sizeof(reader.read_buffer), NULL);
size_t messages_logged = 0;
size_t failed_messages = 0;
while(get_next_line(&reader, line, timeout_ms) == BUFFERED_READER_READ_OK) {
if (!line->len) {
// an empty line - we are done for this message
if (msg->len) {
if(journal_local_send_buffer(fd, msg))
messages_logged++;
else {
failed_messages++;
goto cleanup;
}
}
buffer_flush(msg);
}
else
buffer_memcat_replacing_newlines(msg, line->buffer, line->len, newline);
buffer_flush(line);
}
if (msg && msg->len) {
if(journal_local_send_buffer(fd, msg))
messages_logged++;
else
failed_messages++;
}
cleanup:
return !failed_messages && messages_logged ? 0 : 1;
}
int main(int argc, char *argv[]) {
clocks_init();
nd_log_initialize_for_external_plugins(argv[0]);
int timeout_ms = -1; // wait forever
bool log_as_netdata = false;
const char *newline = NULL;
const char *namespace = NULL;
const char *socket = getenv("NETDATA_SYSTEMD_JOURNAL_PATH");
#ifdef HAVE_CURL
const char *url = NULL;
const char *key = NULL;
const char *cert = NULL;
const char *trust = NULL;
bool keep_trying = false;
#endif
for(int i = 1; i < argc ;i++) {
const char *k = argv[i];
if(strcmp(k, "--help") == 0 || strcmp(k, "-h") == 0)
return help();
else if(strcmp(k, "--log-as-netdata") == 0 || strcmp(k, "-N") == 0)
log_as_netdata = true;
else if(strncmp(k, "--namespace=", 12) == 0)
namespace = &k[12];
else if(strncmp(k, "--socket=", 9) == 0)
socket = &k[9];
else if(strncmp(k, "--newline=", 10) == 0)
newline = &k[10];
#ifdef HAVE_CURL
else if (strncmp(k, "--url=", 6) == 0)
url = &k[6];
else if (strncmp(k, "--key=", 6) == 0)
key = &k[6];
else if (strncmp(k, "--cert=", 7) == 0)
cert = &k[7];
else if (strncmp(k, "--trust=", 8) == 0)
trust = &k[8];
else if (strcmp(k, "--keep-trying") == 0)
keep_trying = true;
#endif
else {
fprintf(stderr, "Unknown parameter '%s'\n", k);
return 1;
}
}
#ifdef HAVE_CURL
if(log_as_netdata && url) {
fprintf(stderr, "Cannot log to a systemd-journal-remote URL as Netdata. "
"Please either give --url or --log-as-netdata, not both.\n");
return 1;
}
if(socket && url) {
fprintf(stderr, "Cannot log to a systemd-journal-remote URL using a UNIX socket. "
"Please either give --url or --socket, not both.\n");
return 1;
}
if(url && namespace) {
fprintf(stderr, "Cannot log to a systemd-journal-remote URL using a namespace. "
"Please either give --url or --namespace, not both.\n");
return 1;
}
#endif
if(log_as_netdata && namespace) {
fprintf(stderr, "Cannot log as netdata using a namespace. "
"Please either give --log-as-netdata or --namespace, not both.\n");
return 1;
}
if(log_as_netdata)
return log_input_as_netdata(newline, timeout_ms);
#ifdef HAVE_CURL
if(url) {
log_to_journal_remote_ret_t rc;
do {
rc = log_input_to_journal_remote(url, key, cert, trust, newline, timeout_ms);
} while(keep_trying && rc == LOG_TO_JOURNAL_REMOTE_CANNOT_SEND);
}
#endif
return log_input_to_journal(socket, namespace, newline, timeout_ms);
}

View file

@ -0,0 +1,8 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
#ifndef NETDATA_SYSTEMD_CAT_NATIVE_H
#define NETDATA_SYSTEMD_CAT_NATIVE_H
#endif //NETDATA_SYSTEMD_CAT_NATIVE_H

View file

@ -24,7 +24,7 @@ static SOCKET_PEERS netdata_ssl_peers(NETDATA_SSL *ssl) {
}
static void netdata_ssl_log_error_queue(const char *call, NETDATA_SSL *ssl, unsigned long err) {
error_limit_static_thread_var(erl, 1, 0);
nd_log_limit_static_thread_var(erl, 1, 0);
if(err == SSL_ERROR_NONE)
err = ERR_get_error();
@ -103,8 +103,9 @@ static void netdata_ssl_log_error_queue(const char *call, NETDATA_SSL *ssl, unsi
ERR_error_string_n(err, str, 1024);
str[1024] = '\0';
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: %s() on socket local [[%s]:%d] <-> remote [[%s]:%d], returned error %lu (%s): %s",
call, peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, err, code, str);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_ERR,
"SSL: %s() on socket local [[%s]:%d] <-> remote [[%s]:%d], returned error %lu (%s): %s",
call, peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, err, code, str);
} while((err = ERR_get_error()));
}
@ -179,7 +180,7 @@ void netdata_ssl_close(NETDATA_SSL *ssl) {
}
static inline bool is_handshake_complete(NETDATA_SSL *ssl, const char *op) {
error_limit_static_thread_var(erl, 1, 0);
nd_log_limit_static_thread_var(erl, 1, 0);
if(unlikely(!ssl->conn)) {
internal_error(true, "SSL: trying to %s on a NULL connection", op);
@ -189,22 +190,25 @@ static inline bool is_handshake_complete(NETDATA_SSL *ssl, const char *op) {
switch(ssl->state) {
case NETDATA_SSL_STATE_NOT_SSL: {
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on non-SSL connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on non-SSL connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
return false;
}
case NETDATA_SSL_STATE_INIT: {
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on an incomplete connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on an incomplete connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
return false;
}
case NETDATA_SSL_STATE_FAILED: {
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on a failed connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on a failed connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
return false;
}
@ -296,7 +300,7 @@ ssize_t netdata_ssl_write(NETDATA_SSL *ssl, const void *buf, size_t num) {
}
static inline bool is_handshake_initialized(NETDATA_SSL *ssl, const char *op) {
error_limit_static_thread_var(erl, 1, 0);
nd_log_limit_static_thread_var(erl, 1, 0);
if(unlikely(!ssl->conn)) {
internal_error(true, "SSL: trying to %s on a NULL connection", op);
@ -306,8 +310,9 @@ static inline bool is_handshake_initialized(NETDATA_SSL *ssl, const char *op) {
switch(ssl->state) {
case NETDATA_SSL_STATE_NOT_SSL: {
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on non-SSL connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on non-SSL connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
return false;
}
@ -317,15 +322,17 @@ static inline bool is_handshake_initialized(NETDATA_SSL *ssl, const char *op) {
case NETDATA_SSL_STATE_FAILED: {
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on a failed connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on a failed connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
return false;
}
case NETDATA_SSL_STATE_COMPLETE: {
SOCKET_PEERS peers = netdata_ssl_peers(ssl);
error_limit(&erl, "SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on an complete connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
nd_log_limit(&erl, NDLS_DAEMON, NDLP_WARNING,
"SSL: on socket local [[%s]:%d] <-> remote [[%s]:%d], attempt to %s on an complete connection",
peers.local.ip, peers.local.port, peers.peer.ip, peers.peer.port, op);
return false;
}
}

File diff suppressed because it is too large Load diff

View file

@ -134,8 +134,6 @@ size_t netdata_threads_init(void) {
i = pthread_attr_getstacksize(netdata_threads_attr, &stacksize);
if(i != 0)
fatal("pthread_attr_getstacksize() failed with code %d.", i);
else
netdata_log_debug(D_OPTIONS, "initial pthread stack size is %zu bytes", stacksize);
return stacksize;
}
@ -152,12 +150,12 @@ void netdata_threads_init_after_fork(size_t stacksize) {
if(netdata_threads_attr && stacksize > (size_t)PTHREAD_STACK_MIN) {
i = pthread_attr_setstacksize(netdata_threads_attr, stacksize);
if(i != 0)
netdata_log_error("pthread_attr_setstacksize() to %zu bytes, failed with code %d.", stacksize, i);
nd_log(NDLS_DAEMON, NDLP_WARNING, "pthread_attr_setstacksize() to %zu bytes, failed with code %d.", stacksize, i);
else
netdata_log_info("Set threads stack size to %zu bytes", stacksize);
nd_log(NDLS_DAEMON, NDLP_DEBUG, "Set threads stack size to %zu bytes", stacksize);
}
else
netdata_log_error("Invalid pthread stacksize %zu", stacksize);
nd_log(NDLS_DAEMON, NDLP_WARNING, "Invalid pthread stacksize %zu", stacksize);
}
// ----------------------------------------------------------------------------
@ -183,12 +181,12 @@ void rrd_collector_finished(void);
static void thread_cleanup(void *ptr) {
if(netdata_thread != ptr) {
NETDATA_THREAD *info = (NETDATA_THREAD *)ptr;
netdata_log_error("THREADS: internal error - thread local variable does not match the one passed to this function. Expected thread '%s', passed thread '%s'", netdata_thread->tag, info->tag);
nd_log(NDLS_DAEMON, NDLP_ERR, "THREADS: internal error - thread local variable does not match the one passed to this function. Expected thread '%s', passed thread '%s'", netdata_thread->tag, info->tag);
}
spinlock_lock(&netdata_thread->detach_lock);
if(!(netdata_thread->options & NETDATA_THREAD_OPTION_DONT_LOG_CLEANUP))
netdata_log_info("thread with task id %d finished", gettid());
nd_log(NDLS_DAEMON, NDLP_DEBUG, "thread with task id %d finished", gettid());
rrd_collector_finished();
sender_thread_buffer_free();
@ -222,9 +220,9 @@ static void thread_set_name_np(NETDATA_THREAD *nt) {
#endif
if (ret != 0)
netdata_log_error("cannot set pthread name of %d to %s. ErrCode: %d", gettid(), threadname, ret);
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot set pthread name of %d to %s. ErrCode: %d", gettid(), threadname, ret);
else
netdata_log_info("set name of thread %d to %s", gettid(), threadname);
nd_log(NDLS_DAEMON, NDLP_DEBUG, "set name of thread %d to %s", gettid(), threadname);
}
}
@ -247,7 +245,7 @@ void uv_thread_set_name_np(uv_thread_t ut, const char* name) {
thread_name_get(true);
if (ret)
netdata_log_info("cannot set libuv thread name to %s. Err: %d", threadname, ret);
nd_log(NDLS_DAEMON, NDLP_NOTICE, "cannot set libuv thread name to %s. Err: %d", threadname, ret);
}
void os_thread_get_current_name_np(char threadname[NETDATA_THREAD_NAME_MAX + 1])
@ -264,13 +262,13 @@ static void *netdata_thread_init(void *ptr) {
netdata_thread = (NETDATA_THREAD *)ptr;
if(!(netdata_thread->options & NETDATA_THREAD_OPTION_DONT_LOG_STARTUP))
netdata_log_info("thread created with task id %d", gettid());
nd_log(NDLS_DAEMON, NDLP_DEBUG, "thread created with task id %d", gettid());
if(pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL) != 0)
netdata_log_error("cannot set pthread cancel type to DEFERRED.");
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot set pthread cancel type to DEFERRED.");
if(pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) != 0)
netdata_log_error("cannot set pthread cancel state to ENABLE.");
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot set pthread cancel state to ENABLE.");
thread_set_name_np(ptr);
@ -294,13 +292,13 @@ int netdata_thread_create(netdata_thread_t *thread, const char *tag, NETDATA_THR
int ret = pthread_create(thread, netdata_threads_attr, netdata_thread_init, info);
if(ret != 0)
netdata_log_error("failed to create new thread for %s. pthread_create() failed with code %d", tag, ret);
nd_log(NDLS_DAEMON, NDLP_ERR, "failed to create new thread for %s. pthread_create() failed with code %d", tag, ret);
else {
if (!(options & NETDATA_THREAD_OPTION_JOINABLE)) {
int ret2 = pthread_detach(*thread);
if (ret2 != 0)
netdata_log_error("cannot request detach of newly created %s thread. pthread_detach() failed with code %d", tag, ret2);
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot request detach of newly created %s thread. pthread_detach() failed with code %d", tag, ret2);
}
}
@ -318,9 +316,9 @@ int netdata_thread_cancel(netdata_thread_t thread) {
int ret = pthread_cancel(thread);
if(ret != 0)
#ifdef NETDATA_INTERNAL_CHECKS
netdata_log_error("cannot cancel thread. pthread_cancel() failed with code %d at %d@%s, function %s()", ret, line, file, function);
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot cancel thread. pthread_cancel() failed with code %d at %d@%s, function %s()", ret, line, file, function);
#else
netdata_log_error("cannot cancel thread. pthread_cancel() failed with code %d.", ret);
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot cancel thread. pthread_cancel() failed with code %d.", ret);
#endif
return ret;
@ -332,7 +330,7 @@ int netdata_thread_cancel(netdata_thread_t thread) {
int netdata_thread_join(netdata_thread_t thread, void **retval) {
int ret = pthread_join(thread, retval);
if(ret != 0)
netdata_log_error("cannot join thread. pthread_join() failed with code %d.", ret);
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot join thread. pthread_join() failed with code %d.", ret);
return ret;
}
@ -340,7 +338,7 @@ int netdata_thread_join(netdata_thread_t thread, void **retval) {
int netdata_thread_detach(pthread_t thread) {
int ret = pthread_detach(thread);
if(ret != 0)
netdata_log_error("cannot detach thread. pthread_detach() failed with code %d.", ret);
nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot detach thread. pthread_detach() failed with code %d.", ret);
return ret;
}

View file

@ -0,0 +1,8 @@
# SPDX-License-Identifier: GPL-3.0-or-later
AUTOMAKE_OPTIONS = subdir-objects
MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
dist_noinst_DATA = \
README.md \
$(NULL)

13
libnetdata/uuid/README.md Normal file
View file

@ -0,0 +1,13 @@
<!--
title: "UUID"
custom_edit_url: https://github.com/netdata/netdata/edit/master/libnetdata/uuid/README.md
sidebar_label: "UUID"
learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata"
-->
# UUID
Netdata uses libuuid for managing UUIDs.
In this folder are a few custom helpers.

179
libnetdata/uuid/uuid.c Normal file
View file

@ -0,0 +1,179 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "../libnetdata.h"
void uuid_unparse_lower_compact(const uuid_t uuid, char *out) {
static const char *hex_chars = "0123456789abcdef";
for (int i = 0; i < 16; i++) {
out[i * 2] = hex_chars[(uuid[i] >> 4) & 0x0F];
out[i * 2 + 1] = hex_chars[uuid[i] & 0x0F];
}
out[32] = '\0'; // Null-terminate the string
}
inline int uuid_parse_compact(const char *in, uuid_t uuid) {
if (strlen(in) != 32)
return -1; // Invalid input length
for (int i = 0; i < 16; i++) {
int high = hex_char_to_int(in[i * 2]);
int low = hex_char_to_int(in[i * 2 + 1]);
if (high < 0 || low < 0)
return -1; // Invalid hexadecimal character
uuid[i] = (high << 4) | low;
}
return 0; // Success
}
int uuid_parse_flexi(const char *in, uuid_t uu) {
if(!in || !*in)
return -1;
size_t hexCharCount = 0;
size_t hyphenCount = 0;
const char *s = in;
int byteIndex = 0;
uuid_t uuid; // work on a temporary place, to not corrupt the previous value of uu if we fail
while (*s && byteIndex < 16) {
if (*s == '-') {
s++;
hyphenCount++;
if (unlikely(hyphenCount > 4))
// Too many hyphens
return -2;
}
if (likely(isxdigit(*s))) {
int high = hex_char_to_int(*s++);
hexCharCount++;
if (likely(isxdigit(*s))) {
int low = hex_char_to_int(*s++);
hexCharCount++;
uuid[byteIndex++] = (high << 4) | low;
}
else
// Not a valid UUID (expected a pair of hex digits)
return -3;
}
else
// Not a valid UUID
return -4;
}
if (unlikely(byteIndex < 16))
// Not enough data to form a UUID
return -5;
if (unlikely(hexCharCount != 32))
// wrong number of hex digits
return -6;
if(unlikely(hyphenCount != 0 && hyphenCount != 4))
// wrong number of hyphens
return -7;
// copy the final value
memcpy(uu, uuid, sizeof(uuid_t));
return 0;
}
// ----------------------------------------------------------------------------
// unit test
static inline void remove_hyphens(const char *uuid_with_hyphens, char *uuid_without_hyphens) {
while (*uuid_with_hyphens) {
if (*uuid_with_hyphens != '-') {
*uuid_without_hyphens++ = *uuid_with_hyphens;
}
uuid_with_hyphens++;
}
*uuid_without_hyphens = '\0';
}
int uuid_unittest(void) {
const int num_tests = 100000;
int failed_tests = 0;
int i;
for (i = 0; i < num_tests; i++) {
uuid_t original_uuid, parsed_uuid;
char uuid_str_with_hyphens[UUID_STR_LEN], uuid_str_without_hyphens[UUID_COMPACT_STR_LEN];
// Generate a random UUID
switch(i % 2) {
case 0:
uuid_generate(original_uuid);
break;
case 1:
uuid_generate_random(original_uuid);
break;
}
// Unparse it with hyphens
bool lower = false;
switch(i % 3) {
case 0:
uuid_unparse_lower(original_uuid, uuid_str_with_hyphens);
lower = true;
break;
case 1:
uuid_unparse(original_uuid, uuid_str_with_hyphens);
break;
case 2:
uuid_unparse_upper(original_uuid, uuid_str_with_hyphens);
break;
}
// Remove the hyphens
remove_hyphens(uuid_str_with_hyphens, uuid_str_without_hyphens);
if(lower) {
char test[UUID_COMPACT_STR_LEN];
uuid_unparse_lower_compact(original_uuid, test);
if(strcmp(test, uuid_str_without_hyphens) != 0) {
printf("uuid_unparse_lower_compact() failed, expected '%s', got '%s'\n",
uuid_str_without_hyphens, test);
failed_tests++;
}
}
// Parse the UUID string with hyphens
int parse_result = uuid_parse_flexi(uuid_str_with_hyphens, parsed_uuid);
if (parse_result != 0) {
printf("uuid_parse_flexi() returned -1 (parsing error) for UUID with hyphens: %s\n", uuid_str_with_hyphens);
failed_tests++;
} else if (uuid_compare(original_uuid, parsed_uuid) != 0) {
printf("uuid_parse_flexi() parsed value mismatch for UUID with hyphens: %s\n", uuid_str_with_hyphens);
failed_tests++;
}
// Parse the UUID string without hyphens
parse_result = uuid_parse_flexi(uuid_str_without_hyphens, parsed_uuid);
if (parse_result != 0) {
printf("uuid_parse_flexi() returned -1 (parsing error) for UUID without hyphens: %s\n", uuid_str_without_hyphens);
failed_tests++;
}
else if(uuid_compare(original_uuid, parsed_uuid) != 0) {
printf("uuid_parse_flexi() parsed value mismatch for UUID without hyphens: %s\n", uuid_str_without_hyphens);
failed_tests++;
}
if(failed_tests)
break;
}
printf("UUID: failed %d out of %d tests.\n", failed_tests, i);
return failed_tests;
}

29
libnetdata/uuid/uuid.h Normal file
View file

@ -0,0 +1,29 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#ifndef NETDATA_UUID_H
#define NETDATA_UUID_H
UUID_DEFINE(streaming_from_child_msgid, 0xed,0x4c,0xdb, 0x8f, 0x1b, 0xeb, 0x4a, 0xd3, 0xb5, 0x7c, 0xb3, 0xca, 0xe2, 0xd1, 0x62, 0xfa);
UUID_DEFINE(streaming_to_parent_msgid, 0x6e, 0x2e, 0x38, 0x39, 0x06, 0x76, 0x48, 0x96, 0x8b, 0x64, 0x60, 0x45, 0xdb, 0xf2, 0x8d, 0x66);
UUID_DEFINE(health_alert_transition_msgid, 0x9c, 0xe0, 0xcb, 0x58, 0xab, 0x8b, 0x44, 0xdf, 0x82, 0xc4, 0xbf, 0x1a, 0xd9, 0xee, 0x22, 0xde);
// this is also defined in alarm-notify.sh.in
UUID_DEFINE(health_alert_notification_msgid, 0x6d, 0xb0, 0x01, 0x8e, 0x83, 0xe3, 0x43, 0x20, 0xae, 0x2a, 0x65, 0x9d, 0x78, 0x01, 0x9f, 0xb7);
#define UUID_COMPACT_STR_LEN 33
void uuid_unparse_lower_compact(const uuid_t uuid, char *out);
int uuid_parse_compact(const char *in, uuid_t uuid);
int uuid_parse_flexi(const char *in, uuid_t uuid);
static inline int uuid_memcmp(const uuid_t *uu1, const uuid_t *uu2) {
return memcmp(uu1, uu2, sizeof(uuid_t));
}
static inline int hex_char_to_int(char c) {
if (c >= '0' && c <= '9') return c - '0';
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
return -1; // Invalid hexadecimal character
}
#endif //NETDATA_UUID_H

Some files were not shown because too many files have changed in this diff Show more