Introduction

Production Kubernetes clusters require robust monitoring. Prometheus collects metrics, while Grafana visualizes them, providing complete observability into your cluster’s health and performance.

Architecture

Kubernetes Cluster
    ↓
Prometheus (Metrics Collection)
    ↓
Grafana (Visualization)
    ↓
Dashboards & Alerts

Installing with Helm

Add Prometheus Community Helm repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Install kube-prometheus-stack:

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

Verify installation:

kubectl get pods -n monitoring
kubectl get svc -n monitoring

Accessing Grafana

Port forward:

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Default credentials:

  • Username: admin
  • Password: Get with: kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Access: http://localhost:3000

Key Metrics to Monitor

Cluster Level:

  • Node CPU/Memory usage
  • Pod count
  • Namespace resource usage
  • API server latency

Application Level:

  • Request rate
  • Error rate
  • Response time
  • Resource consumption

Pre-built Dashboards

Grafana includes dashboards for:

  • Kubernetes Cluster Monitoring
  • Node Exporter
  • Pod Metrics
  • Persistent Volumes
  • API Server

Custom Metrics

ServiceMonitor for custom app:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    interval: 30s

Alerting

PrometheusRule for alerts:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: myapp-alerts
  namespace: monitoring
spec:
  groups:
  - name: myapp
    rules:
    - alert: HighPodMemory
      expr: container_memory_usage_bytes > 1e9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage"

Production Configuration

values.yaml for production:

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
grafana:
  persistence:
    enabled: true
    size: 10Gi
  adminPassword: SecurePassword123!
alertmanager:
  enabled: true

Install with custom values:

helm install prometheus prometheus-community/kube-prometheus-stack \
  -f values.yaml \
  --namespace monitoring \
  --create-namespace

Best Practices

  1. Enable persistence for Prometheus
  2. Set retention policies
  3. Configure alerts for critical metrics
  4. Use dashboards effectively
  5. Monitor resource usage
  6. Implement RBAC
  7. Secure Grafana access

Useful PromQL Queries

# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

# Memory usage by namespace
sum(container_memory_usage_bytes) by (namespace)

# Pod restart count
kube_pod_container_status_restarts_total

# API server request rate
rate(apiserver_request_total[5m])

Conclusion

Prometheus and Grafana provide comprehensive Kubernetes monitoring, essential for production operations.

Next: Production Cluster Setup

Resources