Netdata Cloud On-Prem Troubleshooting
Netdata Cloud On-Prem is an enterprise-grade monitoring solution that relies on several infrastructure components:
- Databases: PostgreSQL, Redis, Elasticsearch
- Message Brokers: Pulsar, EMQX
- Traffic Controllers: Ingress, Traefik
- Kubernetes Cluster
These components should be monitored and managed according to your organization's established practices and requirements.
Common Issues
Slow Chart Loading or Chart Errors
When charts take a long time to load or fail with errors, the issue typically stems from data collection challenges. The charts
service must gather data from multiple Agents within a Room, requiring successful responses from all queried Agents.
Issue | Symptoms | Cause | Solution |
---|---|---|---|
Agent Connectivity | - Queries stall or timeout - Inconsistent chart loading | Slow Agents or unreliable network connections prevent timely data collection | Deploy additional Parent nodes to provide reliable backends. The system will automatically prefer these for queries when available |
Kubernetes Resources | - Service throttling - Slow data processing - Delayed dashboard updates | Resource saturation at the node level or restrictive container limits | Review and adjust container resource limits and node capacity as needed |
Database Performance | - Slow query responses - Increased latency across services | PostgreSQL performance bottlenecks | Monitor and optimize database resource utilization: - CPU usage - Memory allocation - Disk I/O performance |
Message Broker | - Delayed node status updates (online/offline/stale) - Slow alert transitions - Dashboard update delays | Message accumulation in Pulsar due to processing bottlenecks | - Review Pulsar configuration - Adjust microservice resource allocation - Monitor message processing rates |
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.