* Centralize cache path handling.
* Add better debugging support for static build infrastructure.
* Centralize handling of git repository fetching for static builds.
* Centralize build directory handling in the static build.
And quit using a sub-directory of packaging/makeself from the source
tree for it.
* Remove version numbers from static build jobs.
* Better organize static build jobs.
0x numbers for non-build prep-work
1x numbers for libraries that potentially impact multiple other things
we vendor.
2x numbers for libraries that are direct dependencies of Netdata
3x numbers for combined libraries and tooling used by Netdata
4x numbers for general tooling used by Netdata
5x numbers for tooling used by single specific components
6x numbers for non-build prep-work for the Netdata build
7x numbers for the actual build of Netdata
8x numbers for post-build checks
9x numbers for the actual packaging
* Clean up variable handling in Netdata build job.
* Split post-build handling steps to their own jobs.
This will make it easier to see what is actually going on in the build
process.
* Clean up CI messages.
* Split archive creation sub-steps into indivudal jobs.
* Disable shell tracing for archive creation job.
It’s not needed in 99.9% of cases, and should only be enabled locally if
it is needed.
* Assorted fixes for code restructuring.
* Tidy up paths for runtime check.
* Fix CI handling of artifacts.
* initial implementation of libbacktrace
* in buildinfo show the parameters of libbacktrace
* do not disable libbacktrace if threading is not supported
* Don’t install libbacktrace, only build it.
* Disable libbacktrace for 32-bit ARM builds.
* Make libunwind and libbacktrace mutually exclusive at configure time.
Instead of relying on it being mutually exclusive at build time. This
ensures we don’t waste time on libunwind when using libbacktrace.
* Only use libbacktrace on Linux and Windows
* Work around broken logic in openSUSE rpmbuild.
* Fix handling of libbacktrace for 32-bit ARM static builds.
---------
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
* enable libunwind in static builds
* add libunwind and backtrace to buildinfo
* add linunwind to alpine packages
* add -dev packages
* add remove libunwind binary from the packages
* Vendor libunwind in static builds instead of using a copy from the build environment.
This is required to ensure that the C++ exception handling functionality
in libunwind is _disabled_, because it does not play nice with static
linking when using C++ with exception handling support enabled.
* Remove changes from local testing.
* Fix cross architecture builds.
* Disable libunwind on 64-bit POWER builds.
musl libc does not include functions that are required to build
libunwind for this platform, so just disable it there for now.
---------
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
* detect the system ca bundle at runtime
* minor fix
* fix for older libcurl versions
* added X509_get_default_cert_file()
* added validation for the certificates
* moved ssl/curl code to separate file; now it configured both libcurl and openssl; added defaults to libcurl static install
* run the new code only in netdata static builds
* auto to check
* disable runtime ssl checks
* Enforce usage of specific CPU models for static build runtime checks.
* Add explicit architecture overrides for ARMv6l static builds.
* Fix handling of source paths.
* Enable tracing for static build code.
* Fix cflags and version handling.
* Restructure cflags handling and add Go architecture flags.
* Don't use symlinks when preparing static build artifacts.
* Roll back to v4.4.0 for actions/upload-artifact action.
There appears to be a bug in the latest release that is causing some
files to not be found when creating artifacts.
* Bump actions/upload-artifact to v4.4.2 which fixes the bugs.
* split claiming into multiple files; WIP claiming with api
* pidfile is now dynamically allocated
* netdata_exe_path is now dynamically allocated
* remove ENABLE_CLOUD and ENABLE_ACLK
* fix compilation
* remove ENABLE_HTTPS and ENABLE_OPENSSL
* remove the ability to disable cloud
* remove netdata_cloud_enabled variable; split rooms into a json array
* global libcurl initialization
* detect common claiming errors
* more common claiming errors
* finished claiming via API
* same as before
* same as before
* remove the old claiming logic that runs the claim script
* working claim.conf
* cleanup
* fix log message; default proxy is env
* fix log message
* remove netdata-claim.sh from run.sh
* remove netdata-claim.sh from everywhere, except kickstart scripts
* create cloud.d if it does not exist.
* better error handling and logging
* handle proxy disable
* merged master
* fix cmakelists for new files
* left-overs removal
* Include libcurl in required dependencies.
* Fix typo in dependency script.
* Use pkg-config for finding cURL.
This properly handles transitive dependencies, unlike the FindCURL
module.
* netdata installer writes claiming info to /etc/netdata/claim.conf
* remove claim from netdata
* add libcurl to windows packages
* add libcurl to windows packages
* compile-on-windows.sh installs too
* add NODE_ID streaming back to child and INDIRECT cloud status
* log child kill on windows
* fixes for spawn server on windows to ensure we have a valid pid and the process is properly terminated
* better handling to windows processes exit code
* pass the cloud url from parents to children
* add retries and timeout to claiming curl request
* remove FILE * from plugins.d
* spawn-tester to unittest spawning processes communication
* spawn-tester now tests FILE pointer I/O
* external plugins run in posix mode
* set blocking I/O on all pipes
* working spawn server on windows
* latest changes in spawn_popen applied to linux tools
* push environment
* repeated tests of fds
* export variable CYGWIN_BASE_PATH
* renamed to NETDATA_CYGWIN_BASE_PATH
* added cmd and help to adapt the command and the information to be presented to users during claiming
* split spawn server versions into files
* restored spawn server libuv based
* working libuv based spawn server
* fixes in libuv for windows
* working spawn server based on posix_spawn()
* fix fd leads on all spawn servers
* fixed windows spawn server
* fix signal handling to ensure proper cooperation with libuv
* switched windows to posix_spawn() based spawn server
* improvement on libuv version
* callocz() event loop
* simplification of libuv spawn server
* minor fixes in libuv and spawn tester
* api split into parts and separated by version; introduced /api/v3; no changes to old /api/v1 and /api/v2
* completed APIs splitting
* function renames
* remove dead code
* split basic functions into a directory
* execute external plugins in nofork spawn server with posix_spawn() for improved performance
* reset signals when using posix_spawn()
* fix spawn server logs and log cmdline in posix server
* bearer_get_token() implemented as function
* agent cloud status now exposes parent claim_id in indirect mode
* fixes for node id streaming from parent to children
* extract claimed id to separate file
* claim_id is no longer in host structure; there is a global claim_id for this agent and there are parent and origin claim ids in host structure
* fix issue on older compilers
* implement /api/v3 using calls from v1 and v2
* prevent asan leaks on local-sockets callback
* codacy fixes
* moved claim web api to web/api/v2
* when the agent is offline, prefer indirect connection when available; log a warning when a node changes node id
* improve inheritance of claim id from parent
* claim_id for bearer token show match any of the claim ids known
* aclk_connected replaced with functions
* aclk api can now be limited to node information, implementing [cloud].scope = license manager
* comment out most options in stream.conf so that internal defaults will be applied
* respect negative matches for send charts matching
* hidden functions are not accessible via the API; bearer_get_token function checks the request is coming from Netdata Cloud
* /api/v3/settings API
* added error logs to settings api
* saving and loading of bearer tokens
* Fix parameter when calling send_to_plugin
* Prevent overflow
* expose struct parser and typedef PARSER to enforce strict type checking on send_to_plugin()
* ensure the parser will not go away randomly from the receiver - it is now cleared when the receiver lock is acquired; also ensure the output sockets are set in the parser as long as the parser runs
* Add newline
* Send parent claim id downstream
* do not send anything when nodeid is zero
* code re-organization and cleanup
* add aclk capabilities, nodes summary and api version and protection to /api/v2,3/info
* added /api/v3/me which returns information about the current user
* make /api/v3/info accessible always
* Partially revert "remove netdata-claim.sh from everywhere, except kickstart scripts"
Due to how we handle files in our static builds and local builds, we
actually need to continue installing `netdata-claim.sh` to enable a
seamless transition to the new claiming mechanims without breaking
compatibility with existing installs or existing automation tooling that
is directly invoking the claiming script.
The script itself will be rewritten in a subsequent commit to simply
wrap the new claiming methodology, together with some additional changes
to ensure that a warning is issued if the script is invoked by anything
other than the kickstart script.
* Rewrite claiming script to use new claiming method.
* Revert "netdata installer writes claiming info to /etc/netdata/claim.conf"
Same reasoning as for 2e27bedb3fbf9df523bff407f2e8c8428e350e38.
We need to keep the old claiming support code in the kickstart script
for the forseeable future so that existing installs can still be
claimed, since the kickstart script is _NOT_ versioned with the agent.
A later commit will add native support for the new claiming method and
use that in preference to the claiming script if it appears to be
available.
* Add support for new claiming method to kickstart.sh.
This adds native support to the kickstart script to use the new claiming
method without depending on the claiming script, as well as adding a few
extra tweaks to the claiming script to enable it to better handle the
transition.
Expected behavior is for the kickstart script to use the new claiming
code path if the claiming script is either not installed, or does not
contain the specific string `%%NEW_CLAIMING_METHOD%%`. This way we will
skip the claiming script on systems which have the updated copy that
uses the new claiming approach, which should keep kickstart behavior
consistent with what Netdata itself supports.
* Depend on JSON-C 0.14 as a minimum supported version.
Needed for uint64 functions.
* Fix claiming option validation in kickstart script.
* do not cache auth in web client
* reuse bearer tokens when the request to create one matches an existing
* dictionaries dfe loops now allow using return statement
* bearer token files are now fixed for specific agents by having the machine guid of the agent in them
* systemd journal now respects facets and disables the default facets when not given
* fixed commands.c
* restored log for not openning config file
* Fix Netdata group templating for claiming script.
* Warn on failed templating in claiming script.
* Make `--require-cloud` a slient no-op.
We don’t need to warn users that it does nothing, we should just have ti
do nothing.
* added debugging info to claiming
* log also the response
* do not send double / at the url
* properly remove keyword from parameters
* disable debug during claimming
* fix log messages
* Update packaging/installer/kickstart.sh
* Update packaging/installer/kickstart.sh
* implemented POST request payload parsing for systemd-journal
* added missing reset of facets in json parsing
* JSON payload does not need hashes any more. I can accept the raw values
---------
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
* Check for supplementary components in libexec during runtime checks.
This should catch issues like what the PR it’s part of is fixing.
* Fix building DEB packages with CPack.
This both disables the logs-management plugin in the builds (which has
never actually worked properly in our packages for multiple reasons)
and fixes a botched merge involving the OS detection in the build system.
* Fix up CI checks.
* Skip building Go components for Docker CI if they have not changed.
* Properly handle Go code in general checks PR.
* Skip Go code in build checks if it hasn’t changed.
* Fix linting issues.
* Fix propagation of installer flags.
* Fix propagation of environment variables through static build process.
* Fix handling of extra install options in static builds.
* Skip starting the agent in updater checks.
* Fix actionlint warning.
To optimize the CI we are trying to cache build artifacts such as all the software we build and statically bundle for static binaries (for each arch) In a nutshell the artifacts of these https://github.com/netdata/netdata/tree/master/packaging/makeself/jobs source files. With this https://github.com/netdata/netdata/blob/master/.github/scripts/get-static-cache-key.sh script we generate the keys for these cached artifacts taking into account the (source files of the jobs, version of the software, static packages bundled in the base images). The effort #16303 to make a centralized file for all the versions expanded the problem of not considering the exact versions.
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* cleanup of logging - wip
* first working iteration
* add errno annotator
* replace old logging functions with netdata_logger()
* cleanup
* update error_limit
* fix remanining error_limit references
* work on fatal()
* started working on structured logs
* full cleanup
* default logging to files; fix all plugins initialization
* fix formatting of numbers
* cleanup and reorg
* fix coverity issues
* cleanup obsolete code
* fix formatting of numbers
* fix log rotation
* fix for older systems
* add detection of systemd journal via stderr
* finished on access.log
* remove left-over transport
* do not add empty fields to the logs
* journal get compact uuids; X-Transaction-ID header is added in web responses
* allow compiling on systems without memfd sealing
* added libnetdata/uuid directory
* move datetime formatters to libnetdata
* add missing files
* link the makefiles in libnetdata
* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses
* added stream receiver, sender, proc plugin and pluginsd log stack
* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup
* add message ids to streaming inbound and outbound connections
* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled
* send SIGABRT to external plugins only if we are not shutting down
* fix cross cleanup in pluginsd parser
* fatal when there is a stack error in logs
* compile netdata with -fexceptions
* do not kill external plugins with SIGABRT
* metasync info logs to debug level
* added severity to logs
* added json output; added options per log output; added documentation; fixed issues mentioned
* allow memfd only on linux
* moved journal low level functions to journal.c/h
* move health logs to daemon.log with proper priorities
* fixed a couple of bugs; health log in journal
* updated docs
* systemd-cat-native command to push structured logs to journal from the command line
* fix makefiles
* restored NETDATA_LOG_SEVERITY_LEVEL
* fix makefiles
* systemd-cat-native can also work as the logger of Netdata scripts
* do not require a socket to systemd-journal to log-as-netdata
* alarm notify logs in native format
* properly compare log ids
* fatals log alerts; alarm-notify.sh working
* fix overflow warning
* alarm-notify.sh now logs the request (command line)
* anotate external plugins logs with the function cmd they run
* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash
* reformatted alarm-notify logs
* unify cgroup-network-helper.sh
* added quotes around params
* charts.d.plugin switched logging to journal native
* quotes for logfmt
* unify the status codes of streaming receivers and senders
* alarm-notify: dont log anything, if there is nothing to do
* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available
* migrate cgroup-name.sh to new logging
* systemd-cat-native now supports messages with newlines
* socket.c logs use priority
* cleanup log field types
* inherit the systemd set INVOCATION_ID if found
* allow systemd-cat-native to send messages to a systemd-journal-remote URL
* log2journal command that can convert structured logs to journal export format
* various fixes and documentation of log2journal
* updated log2journal docs
* updated log2journal docs
* updated documentation of fields
* allow compiling without libcurl
* do not use socket as format string
* added version information to newly added tools
* updated documentation and help messages
* fix the namespace socket path
* print errno with error
* do not timeout
* updated docs
* updated docs
* updated docs
* log2journal updated docs and params
* when talking to a remote journal, systemd-cat-native batches the messages
* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote
* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"
This reverts commit b079d53c11.
* note about uncompressed traffic
* log2journal: code reorg and cleanup to make modular
* finished rewriting log2journal
* more comments
* rewriting rules support
* increased limits
* updated docs
* updated docs
* fix old log call
* use journal only when stderr is connected to journal
* update netdata.spec for libcurl, libpcre2 and log2journal
* pcre2-devel
* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse
* log2journal only on systems pcre2 is available
* ignore log2journal in .gitignore
* avoid log2journal on centos 7, amazonlinux 2 and opensuse
* add pcre2-8 to static build
* undo last commit
* Bundle to static
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Add build deps for deb packages
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Add dependencies; build from source
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Test build for amazon linux and centos expect to fail for suse
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* fix minor oversight
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Reorg code
* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
---------
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
* Update bundled makeself to v2.5.0.
It includes numerous fixes that should resolve a number of the problems
we’ve seen crop up recently with our static build installation process.
* Update static archive metadata.
* Add the makeself scripts to the shellcheck CI exclude list.
They have numerous warnings, but we intentionally want to stay as close
as possible to being in-sync with the upstream copies, so just ignore
them with shellcheck in CI.
* Silence shellcheck warnings.
* Ensure required directories actually exist in static archive.
Bundle the nfacct dependencies libnetfilter_acct as static lib from source archives and libmnl as static lib from the alpines' packages (regular package manager) into our Netdata static binaries
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
* Move systemd-specific system files to their own directory.
* Move non-systemd init scripts to individual subdirectories.
* Move cron files to their own directory.
* Move logrotate config to it’s own directory.
* Fix typos in Makefile.am.
* Fix Debian package builds.
* Fixed issues reported by @andrewm4894.
* Enable use of Podman for static builds.
This way those of us who have migrated away from Docker can still easily
test static builds.
* Add basic runtime testing to the static build process.
This will help catch basic runtime issues that would not show up just
from building the agent.
* Add basic build caching support to static builds.
Cache is store din `artifacts/cache/${BUILDARCH}`. Each third-party
component utilizes a separate build cache. Invalidation is only done for
version changes (more rigorous invalidation is expected to be handled
externally).
* Integrate static build caching with CI.
* Fix fping cache handling.
* Test caching in CI.
* Properly skip rebuilds on cache hits.
* Remove static build container when done with it.
* Reuse existing image automatically if it’s for the correct platform.
* Test CI build caching.
* Fix static build job names.
* Add `-pipe` to CFLAGS in most cases for builds.
This trades marginally higher memory usage at build time (on the order
of a few hundred kB in the worst case scenario) for improved build
times by avoiding using temporary files for passing data from the
compiler to commands it invokes.
* Suppress bogus shellcheck warnings.
* Fix handling of CFLAGS in netdata-installer.sh.
- Switch from `-O3` to `-O2` for optimization CFLAGS, bringing us
in-line with all of our other builds. This should help avoid strange
edge cases specific to static builds resulting from the build process.
- Stop stripping binaries in static builds so that we can get proper
symbolic backtraces when the agent crashes in the static builds.
* Add log grouping in installer code when running under GitHub Actions.
This will make our CI logs much easier to understand.
* Add log grouping to static build process.
* Use oneliner style group commands in netdata-installer.sh