Problem Statement
Explain Docker multi-stage builds for optimizing images. Include build strategies, layer caching, security considerations, and CI/CD integration.
Explanation
Multi-stage builds separate build and runtime environments, creating smaller, more secure images by excluding build tools and intermediate artifacts from final image.
Basic multi-stage example (Node.js):
```dockerfile
# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
RUN npm prune --production
# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
USER node
EXPOSE 3000
CMD ["node", "dist/index.js"]
```
Benefits: builder stage includes devDependencies and build tools, production stage only contains runtime dependencies and compiled code, final image significantly smaller (50-80% reduction).
Java example with Maven:
```dockerfile
# Build stage
FROM maven:3.8-jdk-11 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
# Production stage
FROM openjdk:11-jre-slim
WORKDIR /app
COPY --from=builder /app/target/*.jar app.jar
USER nobody
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
```
Layer caching optimization - order instructions from least to most frequently changing:
```dockerfile
FROM node:18 AS builder
WORKDIR /app
# Cache layer 1: package files (rarely change)
COPY package*.json ./
RUN npm ci
# Cache layer 2: source code (changes frequently)
COPY . .
RUN npm run build
# Production
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
```
Changes to source code don't invalidate dependency installation layer.
BuildKit features (Docker 18.09+):
```dockerfile
# syntax=docker/dockerfile:1
FROM node:18 AS builder
WORKDIR /app
# Mount cache for npm
RUN --mount=type=cache,target=/root/.npm \
npm ci
# Mount source without copying (faster)
RUN --mount=type=bind,source=package.json,target=package.json \
--mount=type=bind,source=src,target=src \
npm run build
```
Security best practices:
```dockerfile
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production with security
FROM node:18-alpine
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
# Set ownership
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs package*.json ./
# Switch to non-root
USER nodejs
EXPOSE 3000
CMD ["node", "dist/index.js"]
```
Security measures: use minimal base images (alpine), run as non-root user, scan for vulnerabilities, don't include secrets in image, use .dockerignore to exclude unnecessary files.
.dockerignore:
```
node_modules
npm-debug.log
.git
.env
*.md
.DS_Store
tests
.github
```
CI/CD integration (GitLab):
```yaml
build_image:
stage: build
image: docker:20.10
services:
- docker:20.10-dind
variables:
DOCKER_TLS_CERTDIR: "/certs"
DOCKER_BUILDKIT: 1
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- |
docker build \
--cache-from $CI_REGISTRY_IMAGE:latest \
--tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \
--tag $CI_REGISTRY_IMAGE:latest \
.
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- docker push $CI_REGISTRY_IMAGE:latest
```
GitHub Actions with caching:
```yaml
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: |
myregistry.com/myapp:${{ github.sha }}
myregistry.com/myapp:latest
cache-from: type=gha
cache-to: type=gha,mode=max
```
Image scanning:
```yaml
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:${{ github.sha }}'
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
```
Best practices: use multi-stage builds for all languages, order instructions for optimal caching, use minimal base images, run as non-root, scan images for vulnerabilities, leverage BuildKit caching, use .dockerignore, tag images with commit SHA and semantic version, implement image signing. Understanding Docker optimization creates secure, efficient container images for production deployments.