Introduction
A Dockerfile is the blueprint for building Docker images. It’s a text file containing a series of instructions that Docker reads from top to bottom to assemble an image. Understanding how to write efficient, secure, and maintainable Dockerfiles is fundamental to mastering Docker.
This comprehensive guide takes you from Dockerfile basics to advanced optimization techniques, covering every instruction, when to use it, and best practices for production-ready images.
Part 1: How Dockerfiles Work
The Build Process
When you run docker build
, Docker reads your Dockerfile and executes each instruction in order. Each instruction creates a new layer in the image.
FROM ubuntu:20.04 # Layer 1
RUN apt-get update # Layer 2
RUN apt-get install -y python3 # Layer 3
COPY app.py /app/ # Layer 4
CMD ["python3", "/app/app.py"] # Layer 5 (metadata only)
Key Concepts:
-
Layers are Immutable: Once created, a layer never changes. If you modify a file in a later layer, Docker creates a new layer with the changes.
-
Layer Caching: Docker caches each layer. If an instruction and its context haven’t changed, Docker reuses the cached layer, making rebuilds fast.
-
Union File System: Docker uses a Union File System to stack layers. The final image is the combination of all layers.
-
Build Context: When you run
docker build .
, the.
is the build context—all files in that directory are sent to the Docker daemon.
The Anatomy of a Dockerfile Instruction
INSTRUCTION arguments
- INSTRUCTION: The command (e.g.,
FROM
,RUN
,COPY
) - arguments: The parameters for that instruction
Part 2: Essential Dockerfile Instructions (Basic)
1. FROM - Setting the Base Image
Purpose: Every Dockerfile must start with FROM
. It specifies the base image to build upon.
Syntax:
FROM <image>:<tag>
FROM <image>@<digest>
Examples:
# Use a specific version (recommended)
FROM python:3.9-slim
# Use Alpine for minimal size
FROM node:16-alpine
# Use a specific digest for maximum reproducibility
FROM ubuntu@sha256:abc123...
# Multi-stage builds can have multiple FROM statements
FROM golang:1.19 AS builder
FROM alpine:latest
When to Use:
- Always - It’s required as the first instruction (except for
ARG
beforeFROM
) - Choose the smallest base image that meets your needs
Best Practices:
- Pin to specific versions, not
latest
- Use
-slim
or-alpine
variants for smaller images - For compiled languages (Go, Rust), use multi-stage builds with
scratch
ordistroless
as the final base
2. RUN - Executing Commands
Purpose: Executes commands in a new layer and commits the results.
Syntax:
# Shell form (runs in /bin/sh -c)
RUN <command>
# Exec form (doesn't invoke a shell)
RUN ["executable", "param1", "param2"]
Examples:
# Install packages (shell form)
RUN apt-get update && apt-get install -y \
curl \
git \
vim
# Execute a script (exec form)
RUN ["/bin/bash", "-c", "echo hello"]
# Chain commands to reduce layers
RUN apt-get update && \
apt-get install -y python3 && \
rm -rf /var/lib/apt/lists/*
When to Use:
- Installing packages
- Running build scripts
- Creating directories
- Downloading files
Best Practices:
- Combine related commands with
&&
to reduce layers - Clean up in the same
RUN
command (e.g., remove package caches) - Use
\
for multi-line commands for readability - Use
--no-install-recommends
with apt-get to avoid unnecessary packages
3. COPY - Copying Files
Purpose: Copies files or directories from the build context into the image.
Syntax:
COPY <src>... <dest>
COPY ["<src>",... "<dest>"] # For paths with spaces
Examples:
# Copy a single file
COPY app.py /app/
# Copy multiple files
COPY app.py requirements.txt /app/
# Copy a directory
COPY ./src /app/src
# Copy with wildcard
COPY *.py /app/
# Copy and rename
COPY config.json /app/settings.json
When to Use:
- Copying application code
- Copying configuration files
- Copying static assets
Best Practices:
- Copy only what you need
- Use
.dockerignore
to exclude unnecessary files - Copy dependency files first (e.g.,
package.json
) to leverage caching - Copy source code last (it changes most frequently)
4. ADD - Advanced Copy
Purpose: Like COPY
, but with extra features (auto-extraction of tar files, URL downloads).
Syntax:
ADD <src>... <dest>
Examples:
# Auto-extract a tar file
ADD archive.tar.gz /app/
# Download from URL (not recommended)
ADD https://example.com/file.tar.gz /tmp/
When to Use:
- Only when you need auto-extraction of local tar files
- Prefer
COPY
for everything else
Best Practices:
- Use
COPY
unless you specifically needADD
’s features - For URLs, use
RUN curl
orRUN wget
instead for better control
5. WORKDIR - Setting the Working Directory
Purpose: Sets the working directory for subsequent RUN
, CMD
, ENTRYPOINT
, COPY
, and ADD
instructions.
Syntax:
WORKDIR /path/to/directory
Examples:
WORKDIR /app
# Creates the directory if it doesn't exist
WORKDIR /app/src
# Can use environment variables
ENV APP_HOME /application
WORKDIR $APP_HOME
When to Use:
- Setting a consistent working directory for your application
- Organizing file structure in the image
Best Practices:
- Use absolute paths
- Prefer
WORKDIR
overRUN cd /path
(which doesn’t persist) - Set it early in the Dockerfile
6. CMD - Default Command
Purpose: Provides the default command to run when a container starts. Only the last CMD
in a Dockerfile takes effect.
Syntax:
# Exec form (preferred)
CMD ["executable", "param1", "param2"]
# Shell form
CMD command param1 param2
# As default parameters to ENTRYPOINT
CMD ["param1", "param2"]
Examples:
# Run a Python script
CMD ["python3", "app.py"]
# Start a web server
CMD ["nginx", "-g", "daemon off;"]
# Shell form (runs in /bin/sh -c)
CMD python3 app.py
When to Use:
- Defining the default command for your container
- Can be overridden by
docker run <image> <command>
Best Practices:
- Use exec form for better signal handling
- Don’t use shell form unless you need shell features
- Combine with
ENTRYPOINT
for more flexibility
7. ENTRYPOINT - Configurable Command
Purpose: Configures a container to run as an executable. Unlike CMD
, it’s not easily overridden.
Syntax:
# Exec form (preferred)
ENTRYPOINT ["executable", "param1"]
# Shell form
ENTRYPOINT command param1
Examples:
# Make the container run as a specific command
ENTRYPOINT ["python3"]
CMD ["app.py"] # Default argument to python3
# Create a wrapper script
ENTRYPOINT ["/entrypoint.sh"]
# Combined with CMD for default args
ENTRYPOINT ["nginx"]
CMD ["-g", "daemon off;"]
When to Use:
- When you want the container to always run a specific executable
- When you want to accept arguments from
docker run
Best Practices:
- Use exec form
- Combine with
CMD
to provide default arguments - Use for creating “executable” containers
Part 3: Intermediate Dockerfile Instructions
8. ENV - Environment Variables
Purpose: Sets environment variables that persist in the container.
Syntax:
ENV <key>=<value> ...
ENV <key> <value>
Examples:
# Single variable
ENV NODE_ENV=production
# Multiple variables
ENV APP_HOME=/app \
APP_USER=appuser \
APP_PORT=8080
# Use in subsequent instructions
ENV APP_DIR /application
WORKDIR $APP_DIR
When to Use:
- Setting configuration values
- Defining paths used in multiple places
- Configuring application behavior
Best Practices:
- Use for values that should persist at runtime
- Group related variables
- Use
ARG
for build-time only variables
9. ARG - Build Arguments
Purpose: Defines build-time variables that users can pass at build time with --build-arg
.
Syntax:
ARG <name>[=<default value>]
Examples:
# With default value
ARG VERSION=1.0
ARG NODE_ENV=production
# Use in FROM
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim
# Use in RUN
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y package
Build with:
docker build --build-arg VERSION=2.0 --build-arg NODE_ENV=development .
When to Use:
- Parameterizing base image versions
- Conditional build logic
- Build-time configuration
Best Practices:
- Provide sensible defaults
- Document required build args
- Use
ENV
to persistARG
values if needed at runtime
10. EXPOSE - Document Ports
Purpose: Documents which ports the container listens on. Does not actually publish the port.
Syntax:
EXPOSE <port> [<port>/<protocol>...]
Examples:
# Single port
EXPOSE 8080
# Multiple ports
EXPOSE 80 443
# Specify protocol
EXPOSE 8080/tcp
EXPOSE 53/udp
When to Use:
- Documenting which ports your application uses
- Metadata for tools and developers
Best Practices:
- Always document exposed ports
- Remember: you still need
-p
flag indocker run
to actually publish
11. VOLUME - Define Mount Points
Purpose: Creates a mount point and marks it as holding externally mounted volumes.
Syntax:
VOLUME ["/data"]
VOLUME /var/log /var/db
Examples:
# Single volume
VOLUME /app/data
# Multiple volumes
VOLUME ["/var/log", "/var/db"]
When to Use:
- Marking directories that should persist data
- Indicating directories that should be mounted from the host
Best Practices:
- Use for data that should persist beyond container lifecycle
- Prefer named volumes in docker-compose or
-v
flag for more control
12. USER - Set User Context
Purpose: Sets the user (and optionally group) to use for subsequent RUN
, CMD
, and ENTRYPOINT
instructions.
Syntax:
USER <user>[:<group>]
USER <UID>[:<GID>]
Examples:
# Create and switch to non-root user
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
USER appuser
# Use UID/GID
USER 1000:1000
# Switch back to root if needed
USER root
RUN apt-get update
USER appuser
When to Use:
- Always for production containers (security)
- Running as non-root user
Best Practices:
- Never run production containers as root
- Create a dedicated user for your application
- Set ownership of files before switching users
Part 4: Advanced Dockerfile Techniques
Multi-Stage Builds
Purpose: Use multiple FROM
statements to create intermediate images, copying only necessary artifacts to the final image.
Example: Go Application
# Stage 1: Build
FROM golang:1.19-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o main .
# Stage 2: Production
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]
Example: Node.js Application
# Stage 1: Dependencies
FROM node:16-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
# Stage 2: Build
FROM node:16-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 3: Production
FROM node:16-alpine
WORKDIR /app
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package*.json ./
CMD ["node", "dist/index.js"]
Benefits:
- Dramatically smaller final images
- Separate build and runtime dependencies
- Better security (no build tools in production)
HEALTHCHECK - Container Health Monitoring
Purpose: Tells Docker how to test if the container is still working.
Syntax:
HEALTHCHECK [OPTIONS] CMD command
HEALTHCHECK NONE # Disable healthcheck
Options:
--interval=DURATION
(default: 30s)--timeout=DURATION
(default: 30s)--start-period=DURATION
(default: 0s)--retries=N
(default: 3)
Examples:
# HTTP health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Custom script
HEALTHCHECK CMD /app/healthcheck.sh
# Disable inherited healthcheck
HEALTHCHECK NONE
ONBUILD - Trigger Instructions
Purpose: Adds a trigger instruction to be executed when the image is used as a base for another build.
Syntax:
ONBUILD <INSTRUCTION>
Example:
# In base image
FROM node:16-alpine
WORKDIR /app
ONBUILD COPY package*.json ./
ONBUILD RUN npm install
ONBUILD COPY . .
# In child image
FROM my-node-base # Triggers ONBUILD instructions
CMD ["npm", "start"]
When to Use:
- Creating base images for a team
- Standardizing build processes
Part 5: Optimization Best Practices
1. Minimize Layers
Bad:
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
Good:
RUN apt-get update && apt-get install -y \
curl \
git \
&& rm -rf /var/lib/apt/lists/*
2. Leverage Build Cache
Bad:
COPY . /app
RUN npm install
Good:
COPY package*.json /app/
RUN npm install
COPY . /app
3. Use .dockerignore
Create a .dockerignore
file:
node_modules
.git
.env
*.log
.DS_Store
4. Choose the Right Base Image
# Largest
FROM ubuntu:20.04 # ~72MB
# Medium
FROM python:3.9-slim # ~120MB
# Smallest
FROM python:3.9-alpine # ~45MB
# For static binaries
FROM scratch # 0MB
Conclusion
Mastering Dockerfiles is essential for creating efficient, secure, and maintainable container images. This guide covered:
- How Dockerfiles work (layers, caching, build process)
- All essential instructions (FROM, RUN, COPY, CMD, etc.)
- When to use each instruction
- Advanced techniques (multi-stage builds, health checks)
- Optimization best practices
Key Takeaways:
- Start with the right base image
- Order instructions from least to most frequently changing
- Combine RUN commands to minimize layers
- Use multi-stage builds for compiled languages
- Always run as a non-root user in production
- Leverage build cache for faster builds
- Use .dockerignore to exclude unnecessary files
By following these practices, you’ll create Docker images that are small, fast, secure, and production-ready.