Introduction
Not all workloads need to run continuously or be replicated across nodes. DaemonSets ensure one Pod per node (perfect for logging agents), while Jobs and CronJobs handle batch processing and scheduled tasks.
Understanding Different Workload Types
Kubernetes Workload Controllers:
Controller | Purpose | Replicas | Lifecycle | Use Case |
---|---|---|---|---|
Deployment | Stateless apps | Multiple | Continuous | Web servers, APIs |
StatefulSet | Stateful apps | Multiple | Continuous | Databases, queues |
DaemonSet | Node services | One per node | Continuous | Logging, monitoring |
Job | Batch tasks | Configurable | Run to completion | Data processing |
CronJob | Scheduled tasks | Configurable | Scheduled | Backups, reports |
Why Different Controllers?
- Deployments: For applications that can scale horizontally
- DaemonSets: For node-level infrastructure services
- Jobs: For one-time or batch processing tasks
- CronJobs: For recurring scheduled operations
Part 1: DaemonSets - Running One Pod Per Node
What is a DaemonSet? A DaemonSet ensures that all (or some) nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed, those Pods are garbage collected.
Why Use DaemonSets?
- Node-Level Services: Every node needs the service (logging, monitoring)
- Automatic Scaling: New nodes automatically get the Pod
- Infrastructure Services: Network plugins, storage daemons
- Cluster-Wide Operations: Security agents, performance monitoring
DaemonSet vs Deployment:
Feature | DaemonSet | Deployment |
---|---|---|
Pods per Node | Exactly 1 | Variable (based on replicas) |
Scaling | Automatic with nodes | Manual or HPA |
Node Selection | All or selected nodes | Scheduler decides |
Use Case | Node services | Application services |
Example | Log collector | Web application |
How DaemonSets Work:
- DaemonSet controller watches for nodes
- Creates one Pod on each matching node
- If node added → Pod created automatically
- If node removed → Pod deleted automatically
- If Pod fails → Recreated on same node
Common Use Cases:
1. Log Collection:
- Fluentd, Filebeat, Logstash
- Collect logs from all nodes
- Forward to centralized logging
2. Monitoring:
- Prometheus Node Exporter
- cAdvisor
- Collect metrics from each node
3. Network:
- Calico, Weave, Flannel
- CNI plugins for networking
- Run on every node
4. Storage:
- Ceph, GlusterFS
- Distributed storage daemons
- Node-level storage services
5. Security:
- Security agents
- Vulnerability scanners
- Compliance monitoring
Example 1: Basic DaemonSet (Log Collector)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
# Tolerate master node taint
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd:v1.14
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers
Example 2: DaemonSet on Specific Nodes (Monitoring)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
# Only run on nodes with monitoring label
nodeSelector:
monitoring: "true"
hostNetwork: true # Use host network
hostPID: true # Access host processes
containers:
- name: node-exporter
image: prom/node-exporter:latest
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
ports:
- containerPort: 9100
hostPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
Example 3: DaemonSet with Update Strategy
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: security-agent
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Update one node at a time
selector:
matchLabels:
app: security-agent
template:
metadata:
labels:
app: security-agent
spec:
containers:
- name: agent
image: security-agent:v2.0
DaemonSet Commands:
# List DaemonSets
kubectl get daemonsets
kubectl get ds # Short form
kubectl get ds -A # All namespaces
# Describe DaemonSet
kubectl describe daemonset fluentd
# Check which nodes have DaemonSet pods
kubectl get pods -o wide -l app=fluentd
# Update DaemonSet image
kubectl set image daemonset/fluentd fluentd=fluent/fluentd:v1.15
# Delete DaemonSet
kubectl delete daemonset fluentd
# Delete DaemonSet but keep pods
kubectl delete daemonset fluentd --cascade=orphan
Part 2: Jobs - Running Tasks to Completion
What is a Job? A Job creates one or more Pods and ensures that a specified number of them successfully terminate. Jobs track successful completions and retry failed Pods.
Why Use Jobs?
- Batch Processing: Process large datasets
- One-time Tasks: Database migrations, data imports
- Parallel Processing: Distribute work across multiple Pods
- Finite Workloads: Tasks that complete and exit
Job vs Deployment:
Feature | Job | Deployment |
---|---|---|
Lifecycle | Run to completion | Continuous |
Restart | On failure only | Always |
Success Criteria | Completions count | Always running |
Use Case | Batch tasks | Long-running services |
How Jobs Work:
- Job controller creates Pods
- Pods run until successful completion
- Failed Pods are retried (up to backoffLimit)
- Job completes when desired completions reached
- Pods remain for log inspection (unless cleaned up)
Job Patterns:
1. Simple Job (Single Completion)
apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculation
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never # Never or OnFailure
backoffLimit: 4 # Retry up to 4 times
2. Parallel Jobs (Work Queue Pattern)
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-processing
spec:
parallelism: 3 # Run 3 pods in parallel
completions: 10 # Complete 10 tasks total
template:
spec:
containers:
- name: worker
image: worker:latest
command: ["./process-task.sh"]
restartPolicy: Never
How it works:
- Creates 3 Pods initially
- As each Pod completes, new Pod starts
- Continues until 10 successful completions
3. Parallel Jobs (Fixed Completion Count)
apiVersion: batch/v1
kind: Job
metadata:
name: batch-processor
spec:
parallelism: 5
completions: 100
template:
spec:
containers:
- name: processor
image: data-processor:v1
env:
- name: TASK_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
restartPolicy: OnFailure
4. Job with Timeout
apiVersion: batch/v1
kind: Job
metadata:
name: timeout-job
spec:
activeDeadlineSeconds: 300 # Fail after 5 minutes
backoffLimit: 3
template:
spec:
containers:
- name: task
image: long-running-task:latest
restartPolicy: Never
5. Job with Resource Limits
apiVersion: batch/v1
kind: Job
metadata:
name: resource-intensive-job
spec:
template:
spec:
containers:
- name: processor
image: heavy-processor:latest
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
restartPolicy: Never
Job Configuration Options:
Field | Description | Default |
---|---|---|
completions |
Number of successful completions needed | 1 |
parallelism |
Max pods running in parallel | 1 |
backoffLimit |
Number of retries before marking failed | 6 |
activeDeadlineSeconds |
Max time job can run | None |
ttlSecondsAfterFinished |
Auto-delete after completion | None |
Job Commands:
# List jobs
kubectl get jobs
kubectl get jobs -w # Watch
# Describe job
kubectl describe job pi-calculation
# View logs
kubectl logs job/pi-calculation
kubectl logs -f job/pi-calculation # Follow
# Check job status
kubectl get job pi-calculation -o yaml
# Delete job
kubectl delete job pi-calculation
# Delete job and pods
kubectl delete job pi-calculation --cascade=foreground
# Auto-cleanup completed jobs (add to spec)
ttlSecondsAfterFinished: 100
Part 3: CronJobs - Scheduled Jobs
What is a CronJob? A CronJob creates Jobs on a repeating schedule. It’s like cron in Linux but for Kubernetes Jobs.
Why Use CronJobs?
- Scheduled Backups: Database, file backups
- Report Generation: Daily/weekly reports
- Data Cleanup: Remove old data periodically
- Health Checks: Periodic system checks
- Batch Processing: Scheduled data processing
CronJob vs Job:
Feature | CronJob | Job |
---|---|---|
Execution | Scheduled | One-time |
Trigger | Time-based | Manual |
Recurrence | Repeating | Single |
Use Case | Backups, reports | Migrations, imports |
Basic CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup-job
spec:
schedule: "0 2 * * *" # Every day at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup-tool:latest
command: ["/bin/sh", "-c", "backup-script.sh"]
restartPolicy: OnFailure
Cron Schedule Examples:
*/5 * * * * # Every 5 minutes
0 */2 * * * # Every 2 hours
0 0 * * 0 # Every Sunday at midnight
0 0 1 * * # First day of month
0 9-17 * * 1-5 # 9 AM to 5 PM, Monday-Friday
Advanced CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup
spec:
schedule: "0 2 * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
concurrencyPolicy: Forbid # Don't run if previous still running
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:14
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
command:
- /bin/sh
- -c
- pg_dump -h db-host -U postgres mydb > /backup/backup-$(date +%Y%m%d).sql
volumeMounts:
- name: backup-volume
mountPath: /backup
volumes:
- name: backup-volume
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
Concurrency Policies:
- Allow: Allow concurrent jobs
- Forbid: Skip if previous still running
- Replace: Cancel previous, start new
Commands:
kubectl get cronjobs
kubectl describe cronjob backup-job
kubectl get jobs --watch
kubectl delete cronjob backup-job
Production Examples
Log Collection DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
serviceAccountName: fluentd
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch.logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
Database Backup CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 3 * * *"
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:14-alpine
env:
- name: PGHOST
value: postgres-service
- name: PGUSER
value: postgres
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
command:
- /bin/sh
- -c
- |
BACKUP_FILE="/backup/db-$(date +%Y%m%d-%H%M%S).sql.gz"
pg_dump mydb | gzip > $BACKUP_FILE
echo "Backup completed: $BACKUP_FILE"
# Keep only last 7 days
find /backup -name "db-*.sql.gz" -mtime +7 -delete
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
Best Practices
DaemonSets:
- Use for node-level services only
- Set resource limits
- Use tolerations for master nodes if needed
- Monitor DaemonSet health
Jobs:
- Set
backoffLimit
appropriately - Use
activeDeadlineSeconds
for timeouts - Clean up completed jobs
- Use parallelism for batch processing
CronJobs:
- Set history limits
- Use
concurrencyPolicy
wisely - Test schedules before production
- Monitor job failures
- Implement idempotency
Troubleshooting
DaemonSet not on all nodes:
kubectl describe daemonset fluentd
# Check: Node selectors, taints, resource constraints
Job not completing:
kubectl describe job my-job
kubectl logs job/my-job
# Check: Container errors, resource limits, backoffLimit
CronJob not running:
kubectl describe cronjob backup-job
kubectl get jobs
# Check: Schedule syntax, concurrency policy, suspended status
Conclusion
DaemonSets, Jobs, and CronJobs handle specialized workloads:
- DaemonSets: Node-level services
- Jobs: One-time batch tasks
- CronJobs: Scheduled tasks
Next: Kubernetes Ingress