
* spelling: activity Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: adding Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: addresses Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: administrators Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: alarm Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: alignment Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: analyzing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: apcupsd Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: apply Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: around Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: associated Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: automatically Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: availability Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: background Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: bandwidth Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: berkeley Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: between Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: celsius Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: centos Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: certificate Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: cockroach Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: collectors Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: concatenation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: configuration Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: configured Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: continuous Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: correctly Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: corresponding Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: cyberpower Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: daemon Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: dashboard Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: database Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deactivating Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: dependencies Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deployment Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: determine Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: downloading Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: either Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: electric Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: entity Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: entrant Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: enumerating Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: environment Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: equivalent Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: etsy Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: everything Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: examining Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: expectations Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: explicit Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: explicitly Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: finally Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: flexible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: further Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: hddtemp Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: humidity Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: identify Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: importance Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: incoming Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: individual Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: initiate Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: installation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: integration Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: integrity Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: involuntary Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: issues Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: kernel Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: language Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: libwebsockets Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: lighttpd Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: maintained Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: meaningful Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: memory Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: metrics Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: miscellaneous Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: monitoring Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: monitors Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: monolithic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: multi Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: multiplier Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: navigation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: noisy Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: number Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: observing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: omitted Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: orchestrator Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: overall Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: overridden Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: package Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: packages Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: packet Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: pages Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: parameter Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: parsable Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: percentage Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: perfect Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: phpfpm Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: platform Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: preferred Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: prioritize Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: probabilities Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: process Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: processes Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: program Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: qos Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: quick Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: raspberry Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: received Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: recvfile Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: red hat Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: relatively Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: reliability Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: repository Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: requested Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: requests Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: retrieved Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: scenarios Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: see all Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: supported Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: supports Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: temporary Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: tsdb Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: tutorial Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: updates Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: utilization Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: value Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: variables Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: visualize Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: voluntary Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: your Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
11 KiB
Install Netdata on a Kubernetes cluster
This document details how to install Netdata on an existing Kubernetes (k8s) cluster. By following these directions, you will use Netdata's Helm chart to bootstrap a Netdata deployment on your cluster. The Helm chart installs one parent pod for storing metrics and managing alarm notifications plus an additional child pod for every node in the cluster.
Each child pod will collect metrics from the node it runs on, in addition to compatible applications, plus any endpoints covered by our generic Prometheus collector, via service discovery. Each child pod will also collect cgroups, Kubelet, and kube-proxy metrics from its node.
To install Netdata on a Kubernetes cluster, you need:
- A working cluster running Kubernetes v1.9 or newer.
- The kubectl command line tool, within one minor version difference of your cluster, on an administrative system.
- The Helm package manager v3.0.0 or newer on the same administrative system.
The default configuration creates one parent
pod, installed on one of your cluster's nodes, and a DaemonSet for
additional child
pods. This DaemonSet ensures that every node in your k8s cluster also runs a child
pod, including
the node that also runs parent
. The child
pods collect metrics and stream the information to the parent
pod, which
uses two persistent volumes to store metrics and alarms. The parent
pod also handles alarm notifications and enables
the Netdata dashboard using an ingress controller.
Install the Netdata Helm chart
We recommend you install the Helm chart using our Helm repository. In the helm install
command, replace netdata
with
the release name of your choice.
helm repo add netdata https://netdata.github.io/helmchart/
helm install netdata netdata/netdata
You can also install the Netdata Helm chart by cloning the repository and manually running Helm against the included chart.
Post-installation
Run kubectl get services
and kubectl get pods
to confirm that your cluster now runs a netdata
service, one
parent
pod, and three child
pods.
You've now installed Netdata on your Kubernetes cluster. See how to access the Netdata dashboard to confirm it's working as expected, or see the next section to configure the Helm chart to suit your cluster's particular setup.
Configure the Netdata Helm chart
Read up on the various configuration options in the Helm chart documentation to see if you need to change any of the options based on your cluster's setup.
To change a setting, use the --set
or --values
arguments with helm install
, for the initial deployment, or helm upgrade
to upgrade an existing deployment.
helm install --set a.b.c=xyz netdata netdata/netdata
helm upgrade --set a.b.c=xyz netdata netdata/netdata
For example, to change the size of the persistent metrics volume on the parent node:
helm install --set parent.database.volumesize=4Gi netdata netdata/netdata
helm upgrade --set parent.database.volumesize=4Gi netdata netdata/netdata
Configure service discovery
As mentioned in the introduction, Netdata has a service discovery plugin to identify compatible pods and collect metrics from the service they run. The Netdata Helm chart installs this service discovery plugin into your k8s cluster.
Service discovery scans your cluster for pods exposed on certain ports and with certain image names. By default, it looks for its supported services on the ports they most commonly listen on, and using default image names. Service discovery currently supports popular applications, plus any endpoints covered by our generic Prometheus collector.
If you haven't changed listening ports, image names, or other defaults, service discovery should find your pods, create the proper configurations based on the service that pod runs, and begin monitoring them immediately after deployment.
However, if you have changed some of these defaults, you need to copy a file from the Netdata Helm chart repository,
make your edits, and pass the changed file to helm install
/helm upgrade
.
First, copy the file to your administrative system.
curl https://raw.githubusercontent.com/netdata/helmchart/master/charts/netdata/sdconfig/child.yml -o child.yml
Edit the new child.yml
file according to your needs. See the Helm chart
configuration and the file itself for details.
You can then run helm install
/helm upgrade
with the --set-file
argument to use your configured child.yml
file
instead of the default, changing the path if you copied it elsewhere.
helm install --set-file sd.child.configmap.from.value=./child.yml netdata netdata/netdata
helm upgrade --set-file sd.child.configmap.from.value=./child.yml netdata netdata/netdata
Your configured service discovery is now pushed to your cluster.
Access the Netdata dashboard
Accessing the Netdata dashboard itself depends on how you set up your k8s cluster and the Netdata Helm chart. If you
installed the Helm chart with the default service.type=ClusterIP
, you will need to forward a port to the parent pod.
kubectl port-forward netdata-parent-0 19999:19999
You can now access the dashboard at http://CLUSTER:19999
, replacing CLUSTER
with the IP address or hostname of your
k8s cluster.
If you set up the Netdata Helm chart with service.type=LoadBalancer
, you can find the external IP for the load
balancer with kubectl get services
, under the EXTERNAL-IP
column.
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cockroachdb ClusterIP None <none> 26257/TCP,8080/TCP 46h
cockroachdb-public ClusterIP 10.245.148.233 <none> 26257/TCP,8080/TCP 46h
kubernetes ClusterIP 10.245.0.1 <none> 443/TCP 47h
netdata LoadBalancer 10.245.160.131 203.0.113.0 19999:32231/TCP 74m
In the above example, access the dashboard by navigating to http://203.0.113.0:19999
.
Claim a Kubernetes cluster's parent pod
You can claim a cluster's parent Netdata pod to see its real-time metrics alongside any other nodes you monitor using Netdata Cloud.
Netdata Cloud does not currently support claiming child nodes because the Helm chart does not allocate a persistent volume for them.
Ensure persistence is enabled on the parent pod by running the following helm upgrade
command.
helm upgrade \
--set parent.database.persistence=true \
--set parent.alarms.persistence=true \
netdata netdata/netdata
Next, find your claiming script in Netdata Cloud by clicking on your Space's dropdown, then Manage your Space. Click the Nodes tab. Netdata Cloud shows a script similar to the following:
sudo netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud
You will need the values of TOKEN
and ROOM1,ROOM2
for the command, which sets parent.claiming.enabled
,
parent.claiming.token
, and parent.claiming.rooms
to complete the parent pod claiming process.
Run the following helm upgrade
command after replacing TOKEN
and ROOM1,ROOM2
with the values found in the claiming
script from Netdata Cloud. The quotations are required.
helm upgrade \
--set parent.claiming.enabled=true \
--set parent.claiming.token="TOKEN" \
--set parent.claiming.rooms="ROOM1,ROOM2" \
netdata netdata/netdata
The cluster terminates the old parent pod and creates a new one with the proper claiming configuration. You can see your parent pod in Netdata Cloud after a few moments. You can now build new dashboards using the parent pod's metrics or run Metric Correlations to troubleshoot anomalies.
Update/reinstall the Netdata Helm chart
If you update the Helm chart's configuration, run helm upgrade
to redeploy your Netdata service, replacing netdata
with the name of the release, if you changed it upon installation:
helm upgrade netdata netdata/netdata
What's next?
Read the monitoring a Kubernetes cluster guide for details on the various metrics and charts created by the Helm chart and some best practices on real-time troubleshooting using Netdata.
Check out our infrastructure for details about additional k8s monitoring features, and learn more about configuring the Netdata Agent to better understand the settings you might be interested in changing.
To further configure Netdata for your cluster, see our Helm chart repository and the service discovery repository.