docker-security

Expert Docker and container security covering image vulnerability scanning with Trivy and Grype, distroless and scratch minimal base images, non-root user enforcement, read-only root filesystem, Linux capability dropping, seccomp and AppArmor profiles, secret handling patterns, image signing

MoltbotDen

Security & Passwords

Docker Security

Container security is defense in depth: secure the image (supply chain), secure the runtime (least
privilege), secure the orchestration (Kubernetes policy), and secure the registry (signing and scanning).
A gap at any layer can be exploited. Most breaches are preventable with a few well-understood patterns
applied consistently.

Core Mental Model

Container security has four layers: (1) Image layer — what's in the image matters enormously; unused
packages are attack surface. (2) Runtime layer — drop capabilities, read-only filesystem, non-root
user, seccomp/AppArmor. (3) Orchestration layer — Pod Security Standards, network policies, RBAC.
(4) Supply chain layer — know what you're running, sign what you ship. The goal is *minimal attack
surface + explicit deny of everything not required*. A container that can't write to disk, can't bind
low ports, can't gain new privileges, and has no shell is extraordinarily hard to exploit even if there's
a vulnerability in the app.

Image Vulnerability Scanning

Trivy: Comprehensive Scanner

# GitHub Actions: scan on every PR
name: Container Security Scan
on:
  pull_request:
    paths: ['Dockerfile*', '**/*.dockerfile']
  push:
    branches: [main]

jobs:
  trivy-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build image
        run: docker build -t ${{ github.repository }}:${{ github.sha }} .
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: '${{ github.repository }}:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'           # Fail build on HIGH/CRITICAL
          ignore-unfixed: true     # Don't fail on unfixable CVEs
          vuln-type: 'os,library'  # Scan OS packages and language libraries
      
      - name: Upload Trivy results to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: 'trivy-results.sarif'
      
      # Also scan secrets and misconfigurations
      - name: Trivy filesystem scan (Dockerfile + configs)
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          scan-ref: '.'
          format: 'table'
          exit-code: '1'
          severity: 'CRITICAL,HIGH'

# Local Trivy scanning
# Install: brew install trivy

# Scan image for vulnerabilities
trivy image --severity HIGH,CRITICAL myapp:latest

# Scan and generate SBOM (Software Bill of Materials)
trivy image --format spdx-json --output sbom.json myapp:latest

# Scan Dockerfile for misconfigurations
trivy config --severity HIGH,CRITICAL Dockerfile

# Scan running container (by container ID)
trivy image --input <(docker save myapp:latest)

# Update vulnerability database
trivy image --download-db-only

Grype: Alternative Scanner

# Install: brew install anchore/grype/grype

# Scan image
grype myapp:latest

# Only fail on HIGH/CRITICAL
grype myapp:latest --fail-on high

# Scan with SBOM output
grype myapp:latest -o spdx-json > sbom.json

# Generate Grype config to ignore specific CVEs
cat .grype.yaml
# ignore:
#   - vulnerability: CVE-2023-12345
#     reason: "Not exploitable in our deployment context"
#     expires: "2024-06-01"

Minimal Base Images

Distroless: No Shell, No Package Manager

# Multi-stage: build in full image, copy to distroless
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags="-w -s" -o server ./cmd/server

# Production stage: distroless (no shell, no package manager, no OS utilities)
FROM gcr.io/distroless/static-debian12:nonroot AS production
# nonroot variant: runs as UID 65532 by default

COPY --from=builder /app/server /server

# Distroless has no shell — CMD must be exec form, not shell form
CMD ["/server"]

# Python distroless
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
COPY . .

FROM gcr.io/distroless/python3-debian12:nonroot
COPY --from=builder /install /usr/local
COPY --from=builder /app /app
WORKDIR /app
CMD ["app.py"]  # Python interpreter is at /usr/bin/python3 in distroless

Comparing Base Image Sizes and Attack Surface

ubuntu:22.04      → 78MB  + full toolset (apt, bash, curl, etc.)
debian:12-slim    → 31MB  + minimal OS but has apt, shell
python:3.12-slim  → 132MB + Python + slim Debian
python:3.12-alpine → 58MB + musl libc (may cause compatibility issues)
gcr.io/distroless/python3-debian12  → 52MB + Python only, no shell
gcr.io/distroless/static-debian12   → 2.5MB + CA certs only (for compiled binaries)
scratch           → 0MB  + nothing (requires static binary + embedded certs)

Rule: Use distroless for applications. Use scratch only for Go/Rust fully static binaries.

Non-Root User Enforcement

# Method 1: Create dedicated user
FROM python:3.12-slim

# Create non-root user and group
RUN groupadd --gid 10001 appgroup && \
    useradd --uid 10001 --gid appgroup --shell /bin/false --create-home appuser

WORKDIR /app
COPY --chown=appuser:appgroup requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY --chown=appuser:appgroup . .

# Switch to non-root user
USER appuser

EXPOSE 8080
CMD ["python", "-m", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

# Kubernetes: enforce non-root at pod spec level
apiVersion: v1
kind: Pod
spec:
  securityContext:
    runAsNonRoot: true          # Fail if image runs as root
    runAsUser: 10001
    runAsGroup: 10001
    fsGroup: 10001              # Files created by pod owned by this group
  
  containers:
    - name: app
      securityContext:
        allowPrivilegeEscalation: false   # Cannot gain more privileges
        readOnlyRootFilesystem: true      # No writes to container filesystem
        runAsNonRoot: true
        runAsUser: 10001
        capabilities:
          drop: ["ALL"]          # Drop ALL Linux capabilities
          add: ["NET_BIND_SERVICE"]  # Add back only what's needed (if port < 1024)

Read-Only Root Filesystem + Volume Mounts

# Force read-only root filesystem with explicit writable mounts
spec:
  containers:
    - name: app
      securityContext:
        readOnlyRootFilesystem: true    # App cannot write to its own filesystem
      
      volumeMounts:
        # Provide writable temp directory
        - name: tmp
          mountPath: /tmp
        # Application-specific writable paths
        - name: app-data
          mountPath: /app/data
        # Shared memory (needed by some apps)
        - name: shm
          mountPath: /dev/shm
  
  volumes:
    - name: tmp
      emptyDir: {}              # tmpfs — in-memory, pod-scoped
    - name: app-data
      emptyDir: {}
    - name: shm
      emptyDir:
        medium: Memory
        sizeLimit: 128Mi

Dropping Linux Capabilities

# Linux capabilities: granular root privileges
# Default container capabilities (too many!):
# CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_MKNOD,
# CAP_NET_RAW, CAP_SETGID, CAP_SETUID, CAP_SETFCAP, CAP_SETPCAP,
# CAP_NET_BIND_SERVICE, CAP_SYS_CHROOT, CAP_KILL, CAP_AUDIT_WRITE

# Minimal capability set for most web services:
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp:latest
# NET_BIND_SERVICE: only if you need to bind port < 1024
# For ports >= 1024 (e.g., 8080): --cap-drop ALL is sufficient!

# Never needed in containers (should always be dropped):
# CAP_SYS_ADMIN     → Mount filesystems, kernel parameters
# CAP_NET_RAW       → Raw packet injection (privilege escalation vector)
# CAP_SYS_PTRACE    → Debug other processes (container escape)
# CAP_SYS_MODULE    → Load kernel modules
# CAP_NET_ADMIN     → Configure network interfaces

Seccomp Profiles

# Use Docker's default seccomp profile (blocks 44+ dangerous syscalls)
docker run --security-opt seccomp=default myapp:latest

# Custom seccomp profile for minimal syscall set
# seccomp-profile.json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_AARCH64"],
  "syscalls": [
    {
      "names": [
        "accept4", "access", "arch_prctl", "bind", "brk", "clone",
        "close", "connect", "epoll_create1", "epoll_ctl", "epoll_wait",
        "execve", "exit", "exit_group", "fcntl", "fstat", "futex",
        "getcwd", "getpid", "getuid", "listen", "lseek", "mmap",
        "mprotect", "munmap", "nanosleep", "openat", "pipe2", "poll",
        "read", "recvfrom", "rt_sigaction", "rt_sigprocmask",
        "sendto", "setuid", "sigaltstack", "socket", "stat",
        "write", "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

# Kubernetes: apply seccomp profile
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault      # Use CRI's default profile
      # Or: type: Localhost, localhostProfile: "seccomp-profile.json"

Secret Handling: Never in ENV (for sensitive secrets)

# ❌ WRONG: Secret in ENV is visible to all processes and in docker inspect
ENV DB_PASSWORD="my-secret-password"

# ❌ WRONG: Secret in build arg is in layer history
ARG DB_PASSWORD
RUN ./configure --db-pass=$DB_PASSWORD

# ✅ RIGHT: Docker BuildKit secrets (not in image layers)
# Build: docker build --secret id=db_password,env=DB_PASSWORD .
RUN --mount=type=secret,id=db_password \
    export DB_PASSWORD=$(cat /run/secrets/db_password) && \
    ./configure --db-pass=$DB_PASSWORD

# ✅ RIGHT: At runtime, inject via Kubernetes Secret or Vault
# The secret is never in the image — it's mounted at runtime

# Kubernetes: mount secrets as files (not environment variables for sensitive data)
spec:
  containers:
    - name: app
      # ⚠️ ENV vars are okay for non-sensitive config
      env:
        - name: PORT
          value: "8080"
      
      # ✅ Sensitive secrets as file mounts
      volumeMounts:
        - name: db-secret
          mountPath: /run/secrets/db
          readOnly: true
      
      # ⚠️ If you must use env vars for secrets (some apps require it):
      env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: password
  
  volumes:
    - name: db-secret
      secret:
        secretName: db-credentials
        defaultMode: 0400        # Owner read-only

Image Signing with Cosign (Sigstore)

# Install cosign: brew install cosign

# Generate key pair
cosign generate-key-pair
# Creates cosign.key (private, protect this!) and cosign.pub

# Sign image (after push to registry)
cosign sign --key cosign.key \
  us-central1-docker.pkg.dev/my-project/app/order-api:sha256-abc123

# Verify signature
cosign verify --key cosign.pub \
  us-central1-docker.pkg.dev/my-project/app/order-api:sha256-abc123

# Keyless signing (Sigstore — uses OIDC, no key management)
# Works with GitHub Actions OIDC, Google accounts, etc.
COSIGN_EXPERIMENTAL=1 cosign sign \
  us-central1-docker.pkg.dev/my-project/app/order-api:sha256-abc123
# Signs with GitHub Actions OIDC identity, recorded in transparency log (Rekor)

# GitHub Actions workflow with cosign
- name: Sign the published Docker image
  env:
    COSIGN_EXPERIMENTAL: "true"
  run: |
    cosign sign ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build-push.outputs.digest }}

Kubernetes Pod Security Standards (PSS)

# Pod Security Standards replace PodSecurityPolicies (deprecated in 1.21, removed in 1.25)
# Three levels: privileged (no restrictions), baseline (prevents known privilege escalations), restricted (hardened)

# Apply at namespace level
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted     # Reject violating pods
    pod-security.kubernetes.io/audit: restricted       # Audit log violations
    pod-security.kubernetes.io/warn: restricted        # Warning on violations

---
# What "restricted" requires:
# ✅ runAsNonRoot: true
# ✅ allowPrivilegeEscalation: false
# ✅ seccompProfile.type: RuntimeDefault or Localhost
# ✅ capabilities: drop ALL
# ✅ volumes: restricted to configMap, csi, downwardAPI, emptyDir, ephemeral, 
#             persistentVolumeClaim, projected, secret
# ❌ hostNetwork, hostPID, hostIPC must be false
# ❌ privileged: must be false

OPA/Gatekeeper Admission Policies

# Gatekeeper: enforce image registry allowlist
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          type: object
          properties:
            repos:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedrepos
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not any_repo_matches(container.image, input.parameters.repos)
          msg := sprintf("Container '%v' image '%v' is not from an allowed registry", [container.name, container.image])
        }
        
        any_repo_matches(image, repos) {
          startswith(image, repos[_])
        }

---
# Apply the constraint
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: allowed-repos
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces: ["production", "staging"]
  parameters:
    repos:
      - "us-central1-docker.pkg.dev/my-project/"  # Internal registry only
      - "gcr.io/distroless/"                        # Distroless base images

Anti-Patterns

❌ FROM ubuntu:latest or FROM python:latest — pin exact versions; latest breaks builds silently
❌ Running as root in containers — if exploited, attacker has root on the host (with --privileged) or in the container
❌ ENV SECRET=value in Dockerfile — visible in docker inspect, image history, and any process
❌ Skipping vulnerability scans — images accumulate CVEs; scan on every build
❌ --privileged flag — equivalent to running as root on the host; almost never needed
❌ Installing debugging tools in production images — shell + curl in the image means attacker has them too
❌ Not setting CPU/memory limits — noisy neighbor can DoS your other containers
❌ Unsigned images in production — without signing, you can't verify what's running is what was built
❌ Kubernetes Secrets in plain base64 — use Sealed Secrets, SOPS, or Vault for secret encryption at rest

Quick Reference

Docker run security flags:
  --read-only              → Read-only root filesystem
  --tmpfs /tmp             → Writable temp (in memory)
  --cap-drop ALL           → Drop all Linux capabilities
  --cap-add NET_BIND_SERVICE  → Add back if needed (port < 1024)
  --no-new-privileges      → Prevent setuid escalation
  --security-opt no-new-privileges → Same as above
  --security-opt seccomp=default → Default seccomp profile
  --user 10001:10001       → Run as specific non-root UID:GID
  --network none           → No network access (for build stages, batch jobs)

Dockerfile best practices checklist:
  □ Pin base image to digest (FROM python@sha256:...)
  □ Use distroless or slim base
  □ Non-root USER before CMD/ENTRYPOINT
  □ COPY --chown=user:group for file ownership
  □ No secrets in ENV, ARG, or layers
  □ .dockerignore to exclude .git, .env, credentials
  □ Minimal final image (multi-stage build)
  □ No package manager in final image (apt, pip, npm not in CMD layer)

Vulnerability severity priority:
  CRITICAL: Fix immediately, block deployment
  HIGH:     Fix in next sprint, block deployment (with exceptions process)
  MEDIUM:   Fix in next quarter
  LOW:      Accept or schedule for next major version
  NEGLIGIBLE: Accept

Skill Information

Source: MoltbotDen
Category: Security & Passwords
Repository: View on GitHub