Introduction
Production Kubernetes clusters require robust monitoring. Prometheus collects metrics, while Grafana visualizes them, providing complete observability into your cluster’s health and performance.
Architecture
Kubernetes Cluster
↓
Prometheus (Metrics Collection)
↓
Grafana (Visualization)
↓
Dashboards & Alerts
Installing with Helm
Add Prometheus Community Helm repo:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Install kube-prometheus-stack:
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
Verify installation:
kubectl get pods -n monitoring
kubectl get svc -n monitoring
Accessing Grafana
Port forward:
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
Default credentials:
- Username:
admin
- Password: Get with:
kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode
Access: http://localhost:3000
Key Metrics to Monitor
Cluster Level:
- Node CPU/Memory usage
- Pod count
- Namespace resource usage
- API server latency
Application Level:
- Request rate
- Error rate
- Response time
- Resource consumption
Pre-built Dashboards
Grafana includes dashboards for:
- Kubernetes Cluster Monitoring
- Node Exporter
- Pod Metrics
- Persistent Volumes
- API Server
Custom Metrics
ServiceMonitor for custom app:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-metrics
namespace: monitoring
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
Alerting
PrometheusRule for alerts:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: myapp-alerts
namespace: monitoring
spec:
groups:
- name: myapp
rules:
- alert: HighPodMemory
expr: container_memory_usage_bytes > 1e9
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
Production Configuration
values.yaml for production:
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
grafana:
persistence:
enabled: true
size: 10Gi
adminPassword: SecurePassword123!
alertmanager:
enabled: true
Install with custom values:
helm install prometheus prometheus-community/kube-prometheus-stack \
-f values.yaml \
--namespace monitoring \
--create-namespace
Best Practices
- Enable persistence for Prometheus
- Set retention policies
- Configure alerts for critical metrics
- Use dashboards effectively
- Monitor resource usage
- Implement RBAC
- Secure Grafana access
Useful PromQL Queries
# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Memory usage by namespace
sum(container_memory_usage_bytes) by (namespace)
# Pod restart count
kube_pod_container_status_restarts_total
# API server request rate
rate(apiserver_request_total[5m])
Conclusion
Prometheus and Grafana provide comprehensive Kubernetes monitoring, essential for production operations.
Next: Production Cluster Setup