Kubernetes DaemonSets and Jobs: Running System Services and Batch Workloads

Introduction

Not all workloads need to run continuously or be replicated across nodes. DaemonSets ensure one Pod per node (perfect for logging agents), while Jobs and CronJobs handle batch processing and scheduled tasks.

Understanding Different Workload Types

Kubernetes Workload Controllers:

Controller	Purpose	Replicas	Lifecycle	Use Case
Deployment	Stateless apps	Multiple	Continuous	Web servers, APIs
StatefulSet	Stateful apps	Multiple	Continuous	Databases, queues
DaemonSet	Node services	One per node	Continuous	Logging, monitoring
Job	Batch tasks	Configurable	Run to completion	Data processing
CronJob	Scheduled tasks	Configurable	Scheduled	Backups, reports

Why Different Controllers?

Deployments: For applications that can scale horizontally
DaemonSets: For node-level infrastructure services
Jobs: For one-time or batch processing tasks
CronJobs: For recurring scheduled operations

Part 1: DaemonSets - Running One Pod Per Node

What is a DaemonSet? A DaemonSet ensures that all (or some) nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed, those Pods are garbage collected.

Why Use DaemonSets?

Node-Level Services: Every node needs the service (logging, monitoring)
Automatic Scaling: New nodes automatically get the Pod
Infrastructure Services: Network plugins, storage daemons
Cluster-Wide Operations: Security agents, performance monitoring

DaemonSet vs Deployment:

Feature	DaemonSet	Deployment
Pods per Node	Exactly 1	Variable (based on replicas)
Scaling	Automatic with nodes	Manual or HPA
Node Selection	All or selected nodes	Scheduler decides
Use Case	Node services	Application services
Example	Log collector	Web application

How DaemonSets Work:

DaemonSet controller watches for nodes
Creates one Pod on each matching node
If node added → Pod created automatically
If node removed → Pod deleted automatically
If Pod fails → Recreated on same node

Common Use Cases:

1. Log Collection:

Fluentd, Filebeat, Logstash
Collect logs from all nodes
Forward to centralized logging

2. Monitoring:

Prometheus Node Exporter
cAdvisor
Collect metrics from each node

3. Network:

Calico, Weave, Flannel
CNI plugins for networking
Run on every node

4. Storage:

Ceph, GlusterFS
Distributed storage daemons
Node-level storage services

5. Security:

Security agents
Vulnerability scanners
Compliance monitoring

Example 1: Basic DaemonSet (Log Collector)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      # Tolerate master node taint
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd:v1.14
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers

Example 2: DaemonSet on Specific Nodes (Monitoring)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      # Only run on nodes with monitoring label
      nodeSelector:
        monitoring: "true"
      hostNetwork: true  # Use host network
      hostPID: true      # Access host processes
      containers:
      - name: node-exporter
        image: prom/node-exporter:latest
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        ports:
        - containerPort: 9100
          hostPort: 9100
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys

Example 3: DaemonSet with Update Strategy

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: security-agent
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Update one node at a time
  selector:
    matchLabels:
      app: security-agent
  template:
    metadata:
      labels:
        app: security-agent
    spec:
      containers:
      - name: agent
        image: security-agent:v2.0

DaemonSet Commands:

# List DaemonSets
kubectl get daemonsets
kubectl get ds  # Short form
kubectl get ds -A  # All namespaces

# Describe DaemonSet
kubectl describe daemonset fluentd

# Check which nodes have DaemonSet pods
kubectl get pods -o wide -l app=fluentd

# Update DaemonSet image
kubectl set image daemonset/fluentd fluentd=fluent/fluentd:v1.15

# Delete DaemonSet
kubectl delete daemonset fluentd

# Delete DaemonSet but keep pods
kubectl delete daemonset fluentd --cascade=orphan

Part 2: Jobs - Running Tasks to Completion

What is a Job? A Job creates one or more Pods and ensures that a specified number of them successfully terminate. Jobs track successful completions and retry failed Pods.

Why Use Jobs?

Batch Processing: Process large datasets
One-time Tasks: Database migrations, data imports
Parallel Processing: Distribute work across multiple Pods
Finite Workloads: Tasks that complete and exit

Job vs Deployment:

Feature	Job	Deployment
Lifecycle	Run to completion	Continuous
Restart	On failure only	Always
Success Criteria	Completions count	Always running
Use Case	Batch tasks	Long-running services

How Jobs Work:

Job controller creates Pods
Pods run until successful completion
Failed Pods are retried (up to backoffLimit)
Job completes when desired completions reached
Pods remain for log inspection (unless cleaned up)

Job Patterns:

1. Simple Job (Single Completion)

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-calculation
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never  # Never or OnFailure
  backoffLimit: 4  # Retry up to 4 times

2. Parallel Jobs (Work Queue Pattern)

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-processing
spec:
  parallelism: 3        # Run 3 pods in parallel
  completions: 10       # Complete 10 tasks total
  template:
    spec:
      containers:
      - name: worker
        image: worker:latest
        command: ["./process-task.sh"]
      restartPolicy: Never

How it works:

Creates 3 Pods initially
As each Pod completes, new Pod starts
Continues until 10 successful completions

3. Parallel Jobs (Fixed Completion Count)

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processor
spec:
  parallelism: 5
  completions: 100
  template:
    spec:
      containers:
      - name: processor
        image: data-processor:v1
        env:
        - name: TASK_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
      restartPolicy: OnFailure

4. Job with Timeout

apiVersion: batch/v1
kind: Job
metadata:
  name: timeout-job
spec:
  activeDeadlineSeconds: 300  # Fail after 5 minutes
  backoffLimit: 3
  template:
    spec:
      containers:
      - name: task
        image: long-running-task:latest
      restartPolicy: Never

5. Job with Resource Limits

apiVersion: batch/v1
kind: Job
metadata:
  name: resource-intensive-job
spec:
  template:
    spec:
      containers:
      - name: processor
        image: heavy-processor:latest
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
      restartPolicy: Never

Job Configuration Options:

Field	Description	Default
`completions`	Number of successful completions needed	1
`parallelism`	Max pods running in parallel	1
`backoffLimit`	Number of retries before marking failed	6
`activeDeadlineSeconds`	Max time job can run	None
`ttlSecondsAfterFinished`	Auto-delete after completion	None

Job Commands:

# List jobs
kubectl get jobs
kubectl get jobs -w  # Watch

# Describe job
kubectl describe job pi-calculation

# View logs
kubectl logs job/pi-calculation
kubectl logs -f job/pi-calculation  # Follow

# Check job status
kubectl get job pi-calculation -o yaml

# Delete job
kubectl delete job pi-calculation

# Delete job and pods
kubectl delete job pi-calculation --cascade=foreground

# Auto-cleanup completed jobs (add to spec)
ttlSecondsAfterFinished: 100

Part 3: CronJobs - Scheduled Jobs

What is a CronJob? A CronJob creates Jobs on a repeating schedule. It’s like cron in Linux but for Kubernetes Jobs.

Why Use CronJobs?

Scheduled Backups: Database, file backups
Report Generation: Daily/weekly reports
Data Cleanup: Remove old data periodically
Health Checks: Periodic system checks
Batch Processing: Scheduled data processing

CronJob vs Job:

Feature	CronJob	Job
Execution	Scheduled	One-time
Trigger	Time-based	Manual
Recurrence	Repeating	Single
Use Case	Backups, reports	Migrations, imports

Basic CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-job
spec:
  schedule: "0 2 * * *"  # Every day at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool:latest
            command: ["/bin/sh", "-c", "backup-script.sh"]
          restartPolicy: OnFailure

Cron Schedule Examples:

*/5 * * * *     # Every 5 minutes
0 */2 * * *     # Every 2 hours
0 0 * * 0       # Every Sunday at midnight
0 0 1 * *       # First day of month
0 9-17 * * 1-5  # 9 AM to 5 PM, Monday-Friday

Advanced CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
spec:
  schedule: "0 2 * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  concurrencyPolicy: Forbid  # Don't run if previous still running
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:14
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: password
            command:
            - /bin/sh
            - -c
            - pg_dump -h db-host -U postgres mydb > /backup/backup-$(date +%Y%m%d).sql
            volumeMounts:
            - name: backup-volume
              mountPath: /backup
          volumes:
          - name: backup-volume
            persistentVolumeClaim:
              claimName: backup-pvc
          restartPolicy: OnFailure

Concurrency Policies:

Allow: Allow concurrent jobs
Forbid: Skip if previous still running
Replace: Cancel previous, start new

Commands:

kubectl get cronjobs
kubectl describe cronjob backup-job
kubectl get jobs --watch
kubectl delete cronjob backup-job

Production Examples

Log Collection DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

Database Backup CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "0 3 * * *"
  successfulJobsHistoryLimit: 7
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:14-alpine
            env:
            - name: PGHOST
              value: postgres-service
            - name: PGUSER
              value: postgres
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
            command:
            - /bin/sh
            - -c
            - |
              BACKUP_FILE="/backup/db-$(date +%Y%m%d-%H%M%S).sql.gz"
              pg_dump mydb | gzip > $BACKUP_FILE
              echo "Backup completed: $BACKUP_FILE"
              # Keep only last 7 days
              find /backup -name "db-*.sql.gz" -mtime +7 -delete
            volumeMounts:
            - name: backup
              mountPath: /backup
          volumes:
          - name: backup
            persistentVolumeClaim:
              claimName: backup-pvc
          restartPolicy: OnFailure

Best Practices

DaemonSets:

Use for node-level services only
Set resource limits
Use tolerations for master nodes if needed
Monitor DaemonSet health

Jobs:

Set backoffLimit appropriately
Use activeDeadlineSeconds for timeouts
Clean up completed jobs
Use parallelism for batch processing

CronJobs:

Set history limits
Use concurrencyPolicy wisely
Test schedules before production
Monitor job failures
Implement idempotency

Troubleshooting

DaemonSet not on all nodes:

kubectl describe daemonset fluentd
# Check: Node selectors, taints, resource constraints

Job not completing:

kubectl describe job my-job
kubectl logs job/my-job
# Check: Container errors, resource limits, backoffLimit

CronJob not running:

kubectl describe cronjob backup-job
kubectl get jobs
# Check: Schedule syntax, concurrency policy, suspended status

Conclusion

DaemonSets, Jobs, and CronJobs handle specialized workloads:

DaemonSets: Node-level services
Jobs: One-time batch tasks
CronJobs: Scheduled tasks

Next: Kubernetes Ingress

Introduction#

Understanding Different Workload Types#

Part 1: DaemonSets - Running One Pod Per Node#

Part 2: Jobs - Running Tasks to Completion#

Part 3: CronJobs - Scheduled Jobs#

Production Examples#

Best Practices#

Troubleshooting#

Conclusion#

Resources#