Introduction

A Dockerfile is the blueprint for building Docker images. It’s a text file containing a series of instructions that Docker reads from top to bottom to assemble an image. Understanding how to write efficient, secure, and maintainable Dockerfiles is fundamental to mastering Docker.

This comprehensive guide takes you from Dockerfile basics to advanced optimization techniques, covering every instruction, when to use it, and best practices for production-ready images.

Part 1: How Dockerfiles Work

The Build Process

When you run docker build, Docker reads your Dockerfile and executes each instruction in order. Each instruction creates a new layer in the image.

FROM ubuntu:20.04          # Layer 1
RUN apt-get update         # Layer 2
RUN apt-get install -y python3  # Layer 3
COPY app.py /app/          # Layer 4
CMD ["python3", "/app/app.py"]  # Layer 5 (metadata only)

Key Concepts:

  1. Layers are Immutable: Once created, a layer never changes. If you modify a file in a later layer, Docker creates a new layer with the changes.

  2. Layer Caching: Docker caches each layer. If an instruction and its context haven’t changed, Docker reuses the cached layer, making rebuilds fast.

  3. Union File System: Docker uses a Union File System to stack layers. The final image is the combination of all layers.

  4. Build Context: When you run docker build ., the . is the build context—all files in that directory are sent to the Docker daemon.

The Anatomy of a Dockerfile Instruction

INSTRUCTION arguments
  • INSTRUCTION: The command (e.g., FROM, RUN, COPY)
  • arguments: The parameters for that instruction

Part 2: Essential Dockerfile Instructions (Basic)

1. FROM - Setting the Base Image

Purpose: Every Dockerfile must start with FROM. It specifies the base image to build upon.

Syntax:

FROM <image>:<tag>
FROM <image>@<digest>

Examples:

# Use a specific version (recommended)
FROM python:3.9-slim

# Use Alpine for minimal size
FROM node:16-alpine

# Use a specific digest for maximum reproducibility
FROM ubuntu@sha256:abc123...

# Multi-stage builds can have multiple FROM statements
FROM golang:1.19 AS builder
FROM alpine:latest

When to Use:

  • Always - It’s required as the first instruction (except for ARG before FROM)
  • Choose the smallest base image that meets your needs

Best Practices:

  • Pin to specific versions, not latest
  • Use -slim or -alpine variants for smaller images
  • For compiled languages (Go, Rust), use multi-stage builds with scratch or distroless as the final base

2. RUN - Executing Commands

Purpose: Executes commands in a new layer and commits the results.

Syntax:

# Shell form (runs in /bin/sh -c)
RUN <command>

# Exec form (doesn't invoke a shell)
RUN ["executable", "param1", "param2"]

Examples:

# Install packages (shell form)
RUN apt-get update && apt-get install -y \
    curl \
    git \
    vim

# Execute a script (exec form)
RUN ["/bin/bash", "-c", "echo hello"]

# Chain commands to reduce layers
RUN apt-get update && \
    apt-get install -y python3 && \
    rm -rf /var/lib/apt/lists/*

When to Use:

  • Installing packages
  • Running build scripts
  • Creating directories
  • Downloading files

Best Practices:

  • Combine related commands with && to reduce layers
  • Clean up in the same RUN command (e.g., remove package caches)
  • Use \ for multi-line commands for readability
  • Use --no-install-recommends with apt-get to avoid unnecessary packages

3. COPY - Copying Files

Purpose: Copies files or directories from the build context into the image.

Syntax:

COPY <src>... <dest>
COPY ["<src>",... "<dest>"]  # For paths with spaces

Examples:

# Copy a single file
COPY app.py /app/

# Copy multiple files
COPY app.py requirements.txt /app/

# Copy a directory
COPY ./src /app/src

# Copy with wildcard
COPY *.py /app/

# Copy and rename
COPY config.json /app/settings.json

When to Use:

  • Copying application code
  • Copying configuration files
  • Copying static assets

Best Practices:

  • Copy only what you need
  • Use .dockerignore to exclude unnecessary files
  • Copy dependency files first (e.g., package.json) to leverage caching
  • Copy source code last (it changes most frequently)

4. ADD - Advanced Copy

Purpose: Like COPY, but with extra features (auto-extraction of tar files, URL downloads).

Syntax:

ADD <src>... <dest>

Examples:

# Auto-extract a tar file
ADD archive.tar.gz /app/

# Download from URL (not recommended)
ADD https://example.com/file.tar.gz /tmp/

When to Use:

  • Only when you need auto-extraction of local tar files
  • Prefer COPY for everything else

Best Practices:

  • Use COPY unless you specifically need ADD’s features
  • For URLs, use RUN curl or RUN wget instead for better control

5. WORKDIR - Setting the Working Directory

Purpose: Sets the working directory for subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions.

Syntax:

WORKDIR /path/to/directory

Examples:

WORKDIR /app

# Creates the directory if it doesn't exist
WORKDIR /app/src

# Can use environment variables
ENV APP_HOME /application
WORKDIR $APP_HOME

When to Use:

  • Setting a consistent working directory for your application
  • Organizing file structure in the image

Best Practices:

  • Use absolute paths
  • Prefer WORKDIR over RUN cd /path (which doesn’t persist)
  • Set it early in the Dockerfile

6. CMD - Default Command

Purpose: Provides the default command to run when a container starts. Only the last CMD in a Dockerfile takes effect.

Syntax:

# Exec form (preferred)
CMD ["executable", "param1", "param2"]

# Shell form
CMD command param1 param2

# As default parameters to ENTRYPOINT
CMD ["param1", "param2"]

Examples:

# Run a Python script
CMD ["python3", "app.py"]

# Start a web server
CMD ["nginx", "-g", "daemon off;"]

# Shell form (runs in /bin/sh -c)
CMD python3 app.py

When to Use:

  • Defining the default command for your container
  • Can be overridden by docker run <image> <command>

Best Practices:

  • Use exec form for better signal handling
  • Don’t use shell form unless you need shell features
  • Combine with ENTRYPOINT for more flexibility

7. ENTRYPOINT - Configurable Command

Purpose: Configures a container to run as an executable. Unlike CMD, it’s not easily overridden.

Syntax:

# Exec form (preferred)
ENTRYPOINT ["executable", "param1"]

# Shell form
ENTRYPOINT command param1

Examples:

# Make the container run as a specific command
ENTRYPOINT ["python3"]
CMD ["app.py"]  # Default argument to python3

# Create a wrapper script
ENTRYPOINT ["/entrypoint.sh"]

# Combined with CMD for default args
ENTRYPOINT ["nginx"]
CMD ["-g", "daemon off;"]

When to Use:

  • When you want the container to always run a specific executable
  • When you want to accept arguments from docker run

Best Practices:

  • Use exec form
  • Combine with CMD to provide default arguments
  • Use for creating “executable” containers

Part 3: Intermediate Dockerfile Instructions

8. ENV - Environment Variables

Purpose: Sets environment variables that persist in the container.

Syntax:

ENV <key>=<value> ...
ENV <key> <value>

Examples:

# Single variable
ENV NODE_ENV=production

# Multiple variables
ENV APP_HOME=/app \
    APP_USER=appuser \
    APP_PORT=8080

# Use in subsequent instructions
ENV APP_DIR /application
WORKDIR $APP_DIR

When to Use:

  • Setting configuration values
  • Defining paths used in multiple places
  • Configuring application behavior

Best Practices:

  • Use for values that should persist at runtime
  • Group related variables
  • Use ARG for build-time only variables

9. ARG - Build Arguments

Purpose: Defines build-time variables that users can pass at build time with --build-arg.

Syntax:

ARG <name>[=<default value>]

Examples:

# With default value
ARG VERSION=1.0
ARG NODE_ENV=production

# Use in FROM
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim

# Use in RUN
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y package

Build with:

docker build --build-arg VERSION=2.0 --build-arg NODE_ENV=development .

When to Use:

  • Parameterizing base image versions
  • Conditional build logic
  • Build-time configuration

Best Practices:

  • Provide sensible defaults
  • Document required build args
  • Use ENV to persist ARG values if needed at runtime

10. EXPOSE - Document Ports

Purpose: Documents which ports the container listens on. Does not actually publish the port.

Syntax:

EXPOSE <port> [<port>/<protocol>...]

Examples:

# Single port
EXPOSE 8080

# Multiple ports
EXPOSE 80 443

# Specify protocol
EXPOSE 8080/tcp
EXPOSE 53/udp

When to Use:

  • Documenting which ports your application uses
  • Metadata for tools and developers

Best Practices:

  • Always document exposed ports
  • Remember: you still need -p flag in docker run to actually publish

11. VOLUME - Define Mount Points

Purpose: Creates a mount point and marks it as holding externally mounted volumes.

Syntax:

VOLUME ["/data"]
VOLUME /var/log /var/db

Examples:

# Single volume
VOLUME /app/data

# Multiple volumes
VOLUME ["/var/log", "/var/db"]

When to Use:

  • Marking directories that should persist data
  • Indicating directories that should be mounted from the host

Best Practices:

  • Use for data that should persist beyond container lifecycle
  • Prefer named volumes in docker-compose or -v flag for more control

12. USER - Set User Context

Purpose: Sets the user (and optionally group) to use for subsequent RUN, CMD, and ENTRYPOINT instructions.

Syntax:

USER <user>[:<group>]
USER <UID>[:<GID>]

Examples:

# Create and switch to non-root user
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
USER appuser

# Use UID/GID
USER 1000:1000

# Switch back to root if needed
USER root
RUN apt-get update
USER appuser

When to Use:

  • Always for production containers (security)
  • Running as non-root user

Best Practices:

  • Never run production containers as root
  • Create a dedicated user for your application
  • Set ownership of files before switching users

Part 4: Advanced Dockerfile Techniques

Multi-Stage Builds

Purpose: Use multiple FROM statements to create intermediate images, copying only necessary artifacts to the final image.

Example: Go Application

# Stage 1: Build
FROM golang:1.19-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o main .

# Stage 2: Production
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]

Example: Node.js Application

# Stage 1: Dependencies
FROM node:16-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Stage 2: Build
FROM node:16-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 3: Production
FROM node:16-alpine
WORKDIR /app
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package*.json ./
CMD ["node", "dist/index.js"]

Benefits:

  • Dramatically smaller final images
  • Separate build and runtime dependencies
  • Better security (no build tools in production)

HEALTHCHECK - Container Health Monitoring

Purpose: Tells Docker how to test if the container is still working.

Syntax:

HEALTHCHECK [OPTIONS] CMD command
HEALTHCHECK NONE  # Disable healthcheck

Options:

  • --interval=DURATION (default: 30s)
  • --timeout=DURATION (default: 30s)
  • --start-period=DURATION (default: 0s)
  • --retries=N (default: 3)

Examples:

# HTTP health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

# Custom script
HEALTHCHECK CMD /app/healthcheck.sh

# Disable inherited healthcheck
HEALTHCHECK NONE

ONBUILD - Trigger Instructions

Purpose: Adds a trigger instruction to be executed when the image is used as a base for another build.

Syntax:

ONBUILD <INSTRUCTION>

Example:

# In base image
FROM node:16-alpine
WORKDIR /app
ONBUILD COPY package*.json ./
ONBUILD RUN npm install
ONBUILD COPY . .

# In child image
FROM my-node-base  # Triggers ONBUILD instructions
CMD ["npm", "start"]

When to Use:

  • Creating base images for a team
  • Standardizing build processes

Part 5: Optimization Best Practices

1. Minimize Layers

Bad:

RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git

Good:

RUN apt-get update && apt-get install -y \
    curl \
    git \
    && rm -rf /var/lib/apt/lists/*

2. Leverage Build Cache

Bad:

COPY . /app
RUN npm install

Good:

COPY package*.json /app/
RUN npm install
COPY . /app

3. Use .dockerignore

Create a .dockerignore file:

node_modules
.git
.env
*.log
.DS_Store

4. Choose the Right Base Image

# Largest
FROM ubuntu:20.04  # ~72MB

# Medium
FROM python:3.9-slim  # ~120MB

# Smallest
FROM python:3.9-alpine  # ~45MB

# For static binaries
FROM scratch  # 0MB

Conclusion

Mastering Dockerfiles is essential for creating efficient, secure, and maintainable container images. This guide covered:

  • How Dockerfiles work (layers, caching, build process)
  • All essential instructions (FROM, RUN, COPY, CMD, etc.)
  • When to use each instruction
  • Advanced techniques (multi-stage builds, health checks)
  • Optimization best practices

Key Takeaways:

  • Start with the right base image
  • Order instructions from least to most frequently changing
  • Combine RUN commands to minimize layers
  • Use multi-stage builds for compiled languages
  • Always run as a non-root user in production
  • Leverage build cache for faster builds
  • Use .dockerignore to exclude unnecessary files

By following these practices, you’ll create Docker images that are small, fast, secure, and production-ready.