mirror of
https://github.com/netdata/netdata.git
synced 2025-04-28 06:32:30 +00:00
On-prem docs edits 2 (#19105)
Co-authored-by: ilyam8 <ilya@netdata.cloud>
This commit is contained in:
parent
8dea7e4702
commit
9db4982a7b
2 changed files with 31 additions and 46 deletions
docs/netdata-cloud/netdata-cloud-on-prem
|
@ -1,8 +1,8 @@
|
||||||
# Netdata Cloud On-Prem PoC without k8s
|
# Netdata Cloud On-Prem PoC without k8s
|
||||||
|
|
||||||
These instructions are about installing a light version of Netdata Cloud, for clients who do not have a Kubernetes cluster installed. This setup is **only for demonstration purposes**, as it has no built-in resiliency on failures of any kind.
|
These instructions are about installing a light version of Netdata Cloud for clients who do not have a Kubernetes cluster installed. This setup is **for demonstration purposes only**, as it has no built-in resiliency on failures of any kind.
|
||||||
|
|
||||||
## Requirements
|
## Prerequisites
|
||||||
|
|
||||||
- Ubuntu 22.04 (clean installation will work best).
|
- Ubuntu 22.04 (clean installation will work best).
|
||||||
- 10 CPU Cores and 24 GiB of memory.
|
- 10 CPU Cores and 24 GiB of memory.
|
||||||
|
@ -25,9 +25,9 @@ sudo ./provision.sh install \
|
||||||
-private-key-path ""
|
-private-key-path ""
|
||||||
```
|
```
|
||||||
|
|
||||||
What does the script do during installation?
|
The script above is responsible for:
|
||||||
|
|
||||||
1. Prompts for user to provide:
|
1. Prompting the user to provide:
|
||||||
- `-key-id` - AWS ECR access key ID.
|
- `-key-id` - AWS ECR access key ID.
|
||||||
- `-access-key` - AWS ECR Access Key.
|
- `-access-key` - AWS ECR Access Key.
|
||||||
- `-onprem-license-key` - Netdata Cloud On-Prem license key.
|
- `-onprem-license-key` - Netdata Cloud On-Prem license key.
|
||||||
|
@ -36,31 +36,30 @@ What does the script do during installation?
|
||||||
- `-certificate-path` - path to your PEM encoded certificate.
|
- `-certificate-path` - path to your PEM encoded certificate.
|
||||||
- `-private-key-path` - path to your PEM encoded key.
|
- `-private-key-path` - path to your PEM encoded key.
|
||||||
|
|
||||||
2. After all the above installation will begin. The script will install:
|
2. Installation will begin. The script will install:
|
||||||
- Helm
|
- Helm
|
||||||
- Kubectl
|
- Kubectl
|
||||||
- AWS CLI
|
- AWS CLI
|
||||||
- K3s cluster (single node)
|
- K3s cluster (single node)
|
||||||
|
|
||||||
3. When all the required software is installed script starts to provision the K3s cluster with gathered data.
|
3. The script starts to provision the K3s cluster with gathered data.
|
||||||
|
|
||||||
After cluster provisioning netdata is ready to be used.
|
After cluster provisioning, the PoC Cloud is ready to be used.
|
||||||
|
|
||||||
> WARNING:
|
> **Warning**
|
||||||
> This script will automatically expose not only netdata but also a mailcatcher under `<URL from point 1.>/mailcatcher`.
|
>
|
||||||
|
> This script will automatically expose Netdata but also a mailcatcher under `<URL from point 1.>/mailcatcher`.
|
||||||
|
|
||||||
## How to log in?
|
## Logging-in
|
||||||
|
|
||||||
Only login by mail can work without further configuration. Every mail this Netdata Cloud On-Prem sends, will appear on the mailcatcher, which acts as the SMTP server with a simple GUI to read the mails.
|
Only login by mail can work without further configuration. Every mail this PoC Cloud sends will appear on the mailcatcher, which acts as the SMTP server with a simple GUI to read the mails.
|
||||||
|
|
||||||
Steps:
|
1. Open PoC Cloud in the web browser on the URL you specified
|
||||||
|
2. Provide an email
|
||||||
1. Open Netdata Cloud On-Prem PoC in the web browser on URL you specified
|
|
||||||
2. Provide email and use the button to confirm
|
|
||||||
3. Mailcatcher will catch all the emails so go to `<URL from point 1.>/mailcatcher`. Find yours and click the link.
|
3. Mailcatcher will catch all the emails so go to `<URL from point 1.>/mailcatcher`. Find yours and click the link.
|
||||||
4. You are now logged into Netdata Cloud. Add your first nodes!
|
4. You are now logged into the PoC Cloud. Add your first nodes!
|
||||||
|
|
||||||
## How to remove Netdata Cloud On-Prem PoC?
|
## Uninstalling
|
||||||
|
|
||||||
To uninstall the whole PoC, use the same script that installed it, with the `uninstall` switch.
|
To uninstall the whole PoC, use the same script that installed it, with the `uninstall` switch.
|
||||||
|
|
||||||
|
|
|
@ -1,37 +1,23 @@
|
||||||
# Netdata Cloud On-Prem Troubleshooting
|
# Netdata Cloud On-Prem Troubleshooting
|
||||||
|
|
||||||
Netdata Cloud is a sophisticated software piece relying on in multiple infrastructure components for its operation.
|
Netdata Cloud On-Prem is an enterprise-grade monitoring solution that relies on several infrastructure components:
|
||||||
|
|
||||||
We assume that your team already manages and monitors properly the components Netdata Cloud depends upon, like the PostgreSQL, Redis and Elasticsearch databases, the Pulsar and EMQX message brokers, the traffic controllers (Ingress and Traefik) and of course the health of the Kubernetes cluster itself.
|
- Databases: PostgreSQL, Redis, Elasticsearch
|
||||||
|
- Message Brokers: Pulsar, EMQX
|
||||||
|
- Traffic Controllers: Ingress, Traefik
|
||||||
|
- Kubernetes Cluster
|
||||||
|
|
||||||
The following are questions that are usually asked by Netdata Cloud On-Prem operators.
|
These components should be monitored and managed according to your organization's established practices and requirements.
|
||||||
|
|
||||||
## Loading charts takes a long time or ends with an error
|
## Common Issues
|
||||||
|
|
||||||
The charts service is trying to collect data from the Agents involved in the query. In most of the cases, this microservice queries many Agents (depending on the Room), and all of them have to reply for the query to be satisfied.
|
### Slow Chart Loading or Chart Errors
|
||||||
|
|
||||||
One or more of the following may be the cause:
|
When charts take a long time to load or fail with errors, the issue typically stems from data collection challenges. The `charts` service must gather data from multiple Agents within a Room, requiring successful responses from all queried Agents.
|
||||||
|
|
||||||
1. **Slow Netdata Agent or Netdata Agents with unreliable connections**
|
| Issue | Symptoms | Cause | Solution |
|
||||||
|
|----------------------|-----------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
If any of the Netdata Agents queried is slow or has an unreliable network connection, the query will stall and Netdata Cloud will have timeout before responding.
|
| Agent Connectivity | - Queries stall or timeout<br/>- Inconsistent chart loading | Slow Agents or unreliable network connections prevent timely data collection | Deploy additional [Parent](/docs/observability-centralization-points/README.md) nodes to provide reliable backends. The system will automatically prefer these for queries when available |
|
||||||
|
| Kubernetes Resources | - Service throttling<br/>- Slow data processing<br/>- Delayed dashboard updates | Resource saturation at the node level or restrictive container limits | Review and adjust container resource limits and node capacity as needed |
|
||||||
When Agents are overloaded or have unreliable connections, we suggest to install more Netdata Parents for providing reliable backends to Netdata Cloud. They will automatically be preferred for all queries, when available.
|
| Database Performance | - Slow query responses<br/>- Increased latency across services | PostgreSQL performance bottlenecks | Monitor and optimize database resource utilization:<br/>- CPU usage<br/>- Memory allocation<br/>- Disk I/O performance |
|
||||||
|
| Message Broker | - Delayed node status updates (online/offline/stale)<br/>- Slow alert transitions<br/>- Dashboard update delays | Message accumulation in Pulsar due to processing bottlenecks | - Review Pulsar configuration<br/>- Adjust microservice resource allocation<br/>- Monitor message processing rates |
|
||||||
2. **Poor Kubernetes cluster management**
|
|
||||||
|
|
||||||
Another common issue is poor management of the Kubernetes cluster. When a node of a Kubernetes cluster is saturated, or the limits set to its containers are small, Netdata Cloud microservices get throttled by Kubernetes and does not get the resources required to process the responses of Netdata Agents and aggregate the results for the dashboard.
|
|
||||||
|
|
||||||
We recommend to review the throttling of the containers and increase the limits if required.
|
|
||||||
|
|
||||||
3. **Saturated Database**
|
|
||||||
|
|
||||||
Slow responses may also indicate performance issues at the PostgreSQL database.
|
|
||||||
|
|
||||||
Please review the resources utilization of the database server (CPU, Memory, and Disk I/O) and take action to improve the situation.
|
|
||||||
|
|
||||||
4. **Messages pilling up in Pulsar**
|
|
||||||
|
|
||||||
Depending on the size of the infrastructure being monitored and the resources allocated to Pulsar and the microservices, messages may be pilling up. When this happens you may also experience that nodes status updates (online, offline, stale) are slow, or alerts transitions take time to appear on the dashboard.
|
|
||||||
|
|
||||||
We recommend to review Pulsar configuration and the resources allocated of the microservices, to ensure that there is no saturation.
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue