Kubernetes Monitoring with Prometheus and Grafana: Complete Setup Guide

Introduction

Production Kubernetes clusters require robust monitoring. Prometheus collects metrics, while Grafana visualizes them, providing complete observability into your cluster’s health and performance.

Architecture

Kubernetes Cluster
    ↓
Prometheus (Metrics Collection)
    ↓
Grafana (Visualization)
    ↓
Dashboards & Alerts

Installing with Helm

Add Prometheus Community Helm repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Install kube-prometheus-stack:

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

Verify installation:

kubectl get pods -n monitoring
kubectl get svc -n monitoring

Accessing Grafana

Port forward:

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Default credentials:

Username: admin
Password: Get with: kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Access: http://localhost:3000

Key Metrics to Monitor

Cluster Level:

Node CPU/Memory usage
Pod count
Namespace resource usage
API server latency

Application Level:

Request rate
Error rate
Response time
Resource consumption

Pre-built Dashboards

Grafana includes dashboards for:

Kubernetes Cluster Monitoring
Node Exporter
Pod Metrics
Persistent Volumes
API Server

Custom Metrics

ServiceMonitor for custom app:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    interval: 30s

Alerting

PrometheusRule for alerts:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: myapp-alerts
  namespace: monitoring
spec:
  groups:
  - name: myapp
    rules:
    - alert: HighPodMemory
      expr: container_memory_usage_bytes > 1e9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage"

Production Configuration

values.yaml for production:

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
grafana:
  persistence:
    enabled: true
    size: 10Gi
  adminPassword: SecurePassword123!
alertmanager:
  enabled: true

Install with custom values:

helm install prometheus prometheus-community/kube-prometheus-stack \
  -f values.yaml \
  --namespace monitoring \
  --create-namespace

Best Practices

Enable persistence for Prometheus
Set retention policies
Configure alerts for critical metrics
Use dashboards effectively
Monitor resource usage
Implement RBAC
Secure Grafana access

Useful PromQL Queries

# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

# Memory usage by namespace
sum(container_memory_usage_bytes) by (namespace)

# Pod restart count
kube_pod_container_status_restarts_total

# API server request rate
rate(apiserver_request_total[5m])

Conclusion

Prometheus and Grafana provide comprehensive Kubernetes monitoring, essential for production operations.

Next: Production Cluster Setup

Introduction#

Architecture#

Installing with Helm#

Accessing Grafana#

Key Metrics to Monitor#

Pre-built Dashboards#

Custom Metrics#

Alerting#

Production Configuration#

Best Practices#

Useful PromQL Queries#

Conclusion#

Resources#