0
0
Fork 0
mirror of https://github.com/netdata/netdata.git synced 2025-04-14 17:48:37 +00:00
netdata_netdata/collectors/cgroups.plugin/cgroup-network-helper.sh
Costa Tsaousis 3e508c8f95
New logging layer ()
* cleanup of logging - wip

* first working iteration

* add errno annotator

* replace old logging functions with netdata_logger()

* cleanup

* update error_limit

* fix remanining error_limit references

* work on fatal()

* started working on structured logs

* full cleanup

* default logging to files; fix all plugins initialization

* fix formatting of numbers

* cleanup and reorg

* fix coverity issues

* cleanup obsolete code

* fix formatting of numbers

* fix log rotation

* fix for older systems

* add detection of systemd journal via stderr

* finished on access.log

* remove left-over transport

* do not add empty fields to the logs

* journal get compact uuids; X-Transaction-ID header is added in web responses

* allow compiling on systems without memfd sealing

* added libnetdata/uuid directory

* move datetime formatters to libnetdata

* add missing files

* link the makefiles in libnetdata

* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses

* added stream receiver, sender, proc plugin and pluginsd log stack

* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup

* add message ids to streaming inbound and outbound connections

* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled

* send SIGABRT to external plugins only if we are not shutting down

* fix cross cleanup in pluginsd parser

* fatal when there is a stack error in logs

* compile netdata with -fexceptions

* do not kill external plugins with SIGABRT

* metasync info logs to debug level

* added severity to logs

* added json output; added options per log output; added documentation; fixed issues mentioned

* allow memfd only on linux

* moved journal low level functions to journal.c/h

* move health logs to daemon.log with proper priorities

* fixed a couple of bugs; health log in journal

* updated docs

* systemd-cat-native command to push structured logs to journal from the command line

* fix makefiles

* restored NETDATA_LOG_SEVERITY_LEVEL

* fix makefiles

* systemd-cat-native can also work as the logger of Netdata scripts

* do not require a socket to systemd-journal to log-as-netdata

* alarm notify logs in native format

* properly compare log ids

* fatals log alerts; alarm-notify.sh working

* fix overflow warning

* alarm-notify.sh now logs the request (command line)

* anotate external plugins logs with the function cmd they run

* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash

* reformatted alarm-notify logs

* unify cgroup-network-helper.sh

* added quotes around params

* charts.d.plugin switched logging to journal native

* quotes for logfmt

* unify the status codes of streaming receivers and senders

* alarm-notify: dont log anything, if there is nothing to do

* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available

* migrate cgroup-name.sh to new logging

* systemd-cat-native now supports messages with newlines

* socket.c logs use priority

* cleanup log field types

* inherit the systemd set INVOCATION_ID if found

* allow systemd-cat-native to send messages to a systemd-journal-remote URL

* log2journal command that can convert structured logs to journal export format

* various fixes and documentation of log2journal

* updated log2journal docs

* updated log2journal docs

* updated documentation of fields

* allow compiling without libcurl

* do not use socket as format string

* added version information to newly added tools

* updated documentation and help messages

* fix the namespace socket path

* print errno with error

* do not timeout

* updated docs

* updated docs

* updated docs

* log2journal updated docs and params

* when talking to a remote journal, systemd-cat-native batches the messages

* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote

* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"

This reverts commit b079d53c11.

* note about uncompressed traffic

* log2journal: code reorg and cleanup to make modular

* finished rewriting log2journal

* more comments

* rewriting rules support

* increased limits

* updated docs

* updated docs

* fix old log call

* use journal only when stderr is connected to journal

* update netdata.spec for libcurl, libpcre2 and log2journal

* pcre2-devel

* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse

* log2journal only on systems pcre2 is available

* ignore log2journal in .gitignore

* avoid log2journal on centos 7, amazonlinux 2 and opensuse

* add pcre2-8 to static build

* undo last commit

* Bundle to static

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add build deps for deb packages

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add dependencies; build from source

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Test build for amazon linux and centos expect to fail for suse

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* fix minor oversight

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Reorg code

* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

---------

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-11-22 10:27:25 +02:00

375 lines
10 KiB
Bash
Executable file

#!/usr/bin/env bash
# shellcheck disable=SC1117
# cgroup-network-helper.sh
# detect container and virtual machine interfaces
#
# (C) 2017 Costa Tsaousis
# SPDX-License-Identifier: GPL-3.0-or-later
#
# This script is called as root (by cgroup-network), with either a pid, or a cgroup path.
# It tries to find all the network interfaces that belong to the same cgroup.
#
# It supports several method for this detection:
#
# 1. cgroup-network (the binary father of this script) detects veth network interfaces,
# by examining iflink and ifindex IDs and switching namespaces
# (it also detects the interface name as it is used by the container).
#
# 2. this script, uses /proc/PID/fdinfo to find tun/tap network interfaces.
#
# 3. this script, calls virsh to find libvirt network interfaces.
#
# -----------------------------------------------------------------------------
# the system path is cleared by cgroup-network
# shellcheck source=/dev/null
[ -f /etc/profile ] && source /etc/profile
export LC_ALL=C
cmd_line="'${0}' $(printf "'%s' " "${@}")"
# -----------------------------------------------------------------------------
# logging
PROGRAM_NAME="$(basename "${0}")"
# these should be the same with syslog() priorities
NDLP_EMERG=0 # system is unusable
NDLP_ALERT=1 # action must be taken immediately
NDLP_CRIT=2 # critical conditions
NDLP_ERR=3 # error conditions
NDLP_WARN=4 # warning conditions
NDLP_NOTICE=5 # normal but significant condition
NDLP_INFO=6 # informational
NDLP_DEBUG=7 # debug-level messages
# the max (numerically) log level we will log
LOG_LEVEL=$NDLP_INFO
set_log_min_priority() {
case "${NETDATA_LOG_PRIORITY_LEVEL,,}" in
"emerg" | "emergency")
LOG_LEVEL=$NDLP_EMERG
;;
"alert")
LOG_LEVEL=$NDLP_ALERT
;;
"crit" | "critical")
LOG_LEVEL=$NDLP_CRIT
;;
"err" | "error")
LOG_LEVEL=$NDLP_ERR
;;
"warn" | "warning")
LOG_LEVEL=$NDLP_WARN
;;
"notice")
LOG_LEVEL=$NDLP_NOTICE
;;
"info")
LOG_LEVEL=$NDLP_INFO
;;
"debug")
LOG_LEVEL=$NDLP_DEBUG
;;
esac
}
set_log_min_priority
log() {
local level="${1}"
shift 1
[[ -n "$level" && -n "$LOG_LEVEL" && "$level" -gt "$LOG_LEVEL" ]] && return
systemd-cat-native --log-as-netdata --newline="{NEWLINE}" <<EOFLOG
INVOCATION_ID=${NETDATA_INVOCATION_ID}
SYSLOG_IDENTIFIER=${PROGRAM_NAME}
PRIORITY=${level}
THREAD_TAG="cgroup-network-helper.sh"
ND_LOG_SOURCE=collector
ND_REQUEST=${cmd_line}
MESSAGE=${*//[$'\r\n']/{NEWLINE}}
EOFLOG
# AN EMPTY LINE IS NEEDED ABOVE
}
info() {
log "$NDLP_INFO" "${@}"
}
warning() {
log "$NDLP_WARN" "${@}"
}
error() {
log "$NDLP_ERR" "${@}"
}
fatal() {
log "$NDLP_ALERT" "${@}"
exit 1
}
debug() {
log "$NDLP_DEBUG" "${@}"
}
debug=0
if [ "${NETDATA_CGROUP_NETWORK_HELPER_DEBUG-0}" = "1" ]; then
debug=1
LOG_LEVEL=$NDLP_DEBUG
fi
# -----------------------------------------------------------------------------
# check for BASH v4+ (required for associative arrays)
if [ ${BASH_VERSINFO[0]} -lt 4 ]; then
echo >&2 "BASH version 4 or later is required (this is ${BASH_VERSION})."
exit 1
fi
# -----------------------------------------------------------------------------
# parse the arguments
pid=
cgroup=
while [ -n "${1}" ]
do
case "${1}" in
--cgroup) cgroup="${2}"; shift 1;;
--pid|-p) pid="${2}"; shift 1;;
--debug|debug)
debug=1
LOG_LEVEL=$NDLP_DEBUG
;;
*) fatal "Cannot understand argument '${1}'";;
esac
shift
done
if [ -z "${pid}" ] && [ -z "${cgroup}" ]
then
fatal "Either --pid or --cgroup is required"
fi
# -----------------------------------------------------------------------------
set_source() {
[ ${debug} -eq 1 ] && echo "SRC ${*}"
}
# -----------------------------------------------------------------------------
# veth interfaces via cgroup
# cgroup-network can detect veth interfaces by itself (written in C).
# If you seek for a shell version of what it does, check this:
# https://github.com/netdata/netdata/issues/474#issuecomment-317866709
# -----------------------------------------------------------------------------
# tun/tap interfaces via /proc/PID/fdinfo
# find any tun/tap devices linked to a pid
proc_pid_fdinfo_iff() {
local p="${1}" # the pid
debug "Searching for tun/tap interfaces for pid ${p}..."
set_source "fdinfo"
grep "^iff:.*" "${NETDATA_HOST_PREFIX}/proc/${p}/fdinfo"/* 2>/dev/null | cut -f 2
}
find_tun_tap_interfaces_for_cgroup() {
local c="${1}" # the cgroup path
[ -d "${c}/emulator" ] && c="${c}/emulator" # check for 'emulator' subdirectory
c="${c}/cgroup.procs" # make full path
# for each pid of the cgroup
# find any tun/tap devices linked to the pid
if [ -f "${c}" ]
then
local p
for p in $(< "${c}" )
do
proc_pid_fdinfo_iff "${p}"
done
else
debug "Cannot find file '${c}', not searching for tun/tap interfaces."
fi
}
# -----------------------------------------------------------------------------
# virsh domain network interfaces
virsh_cgroup_to_domain_name() {
local c="${1}" # the cgroup path
debug "extracting a possible virsh domain from cgroup ${c}..."
# extract for the cgroup path
sed -n -e "s|.*/machine-qemu\\\\x2d[0-9]\+\\\\x2d\(.*\)\.scope$|\1|p" \
-e "s|.*/machine/qemu-[0-9]\+-\(.*\)\.libvirt-qemu$|\1|p" \
-e "s|.*/machine/\(.*\)\.libvirt-qemu$|\1|p" \
<<EOF
${c}
EOF
}
virsh_find_all_interfaces_for_cgroup() {
local c="${1}" # the cgroup path
# the virsh command
local virsh
# shellcheck disable=SC2230
virsh="$(which virsh 2>/dev/null || command -v virsh 2>/dev/null)"
if [ -n "${virsh}" ]
then
local d
d="$(virsh_cgroup_to_domain_name "${c}")"
# convert hex to character
# e.g.: vm01\x2dweb => vm01-web (https://github.com/netdata/netdata/issues/11088#issuecomment-832618149)
d="$(printf '%b' "${d}")"
if [ -n "${d}" ]
then
debug "running: virsh domiflist ${d}; to find the network interfaces"
# 'virsh -r domiflist <domain>' example output
# Interface Type Source Model MAC
#--------------------------------------------------------------
# vnet3 bridge br0 virtio 52:54:00:xx:xx:xx
# vnet4 network default virtio 52:54:00:yy:yy:yy
# match only 'network' interfaces from virsh output
set_source "virsh"
"${virsh}" -r domiflist "${d}" |\
sed -n \
-e "s|^[[:space:]]\?\([^[:space:]]\+\)[[:space:]]\+network[[:space:]]\+\([^[:space:]]\+\)[[:space:]]\+[^[:space:]]\+[[:space:]]\+[^[:space:]]\+$|\1 \1_\2|p" \
-e "s|^[[:space:]]\?\([^[:space:]]\+\)[[:space:]]\+bridge[[:space:]]\+\([^[:space:]]\+\)[[:space:]]\+[^[:space:]]\+[[:space:]]\+[^[:space:]]\+$|\1 \1_\2|p"
else
debug "no virsh domain extracted from cgroup ${c}"
fi
else
debug "virsh command is not available"
fi
}
# -----------------------------------------------------------------------------
# netnsid detected interfaces
netnsid_find_all_interfaces_for_pid() {
local pid="${1}"
[ -z "${pid}" ] && return 1
local nsid
nsid=$(lsns -t net -p "${pid}" -o NETNSID -nr 2>/dev/null)
if [ -z "${nsid}" ] || [ "${nsid}" = "unassigned" ]; then
return 1
fi
set_source "netnsid"
ip link show |\
grep -B 1 -E " link-netnsid ${nsid}($| )" |\
sed -n -e "s|^[[:space:]]*[0-9]\+:[[:space:]]\+\([A-Za-z0-9_]\+\)\(@[A-Za-z0-9_]\+\)*:[[:space:]].*$|\1|p"
}
netnsid_find_all_interfaces_for_cgroup() {
local c="${1}" # the cgroup path
if [ -f "${c}/cgroup.procs" ]; then
netnsid_find_all_interfaces_for_pid "$(head -n 1 "${c}/cgroup.procs" 2>/dev/null)"
else
debug "Cannot find file '${c}/cgroup.procs', not searching for netnsid interfaces."
fi
}
# -----------------------------------------------------------------------------
find_all_interfaces_of_pid_or_cgroup() {
local p="${1}" c="${2}" # the pid and the cgroup path
if [ -n "${pid}" ]
then
# we have been called with a pid
proc_pid_fdinfo_iff "${p}"
netnsid_find_all_interfaces_for_pid "${p}"
elif [ -n "${c}" ]
then
# we have been called with a cgroup
info "searching for network interfaces of cgroup '${c}'"
find_tun_tap_interfaces_for_cgroup "${c}"
virsh_find_all_interfaces_for_cgroup "${c}"
netnsid_find_all_interfaces_for_cgroup "${c}"
else
error "Either a pid or a cgroup path is needed"
return 1
fi
return 0
}
# -----------------------------------------------------------------------------
# an associative array to store the interfaces
# the index is the interface name as seen by the host
# the value is the interface name as seen by the guest / container
declare -A devs=()
# store all interfaces found in the associative array
# this will also give the unique devices, as seen by the host
last_src=
# shellcheck disable=SC2162
while read host_device guest_device
do
[ -z "${host_device}" ] && continue
[ "${host_device}" = "SRC" ] && last_src="${guest_device}" && continue
# the default guest_device is the host_device
[ -z "${guest_device}" ] && guest_device="${host_device}"
# when we run in debug, show the source
debug "Found host device '${host_device}', guest device '${guest_device}', detected via '${last_src}'"
if [ -z "${devs[${host_device}]}" ] || [ "${devs[${host_device}]}" = "${host_device}" ]; then
devs[${host_device}]="${guest_device}"
fi
done < <( find_all_interfaces_of_pid_or_cgroup "${pid}" "${cgroup}" )
# print the interfaces found, in the format netdata expects them
found=0
for x in "${!devs[@]}"
do
found=$((found + 1))
echo "${x} ${devs[${x}]}"
done
debug "found ${found} network interfaces for pid '${pid}', cgroup '${cgroup}', run as ${USER}, ${UID}"
# let netdata know if we found any
[ ${found} -eq 0 ] && exit 1
exit 0