DevOps & CloudDocumented

cicd-expert

CI/CD pipeline architect. GitHub Actions workflows, blue-green and canary deployments, GitOps with ArgoCD, secrets management with OIDC, container builds, semantic-release automation, and quality gates for every build.

Installation

npx clawhub@latest install cicd-expert

View the full skill documentation and source below.

Documentation

CI/CD Pipeline Architecture Expert

CI/CD Philosophy

CI: Every push is tested — fast feedback catches bugs early
CD: Every passing build can deploy — automation reduces human error

Principles:
1. Fast feedback: < 10 min for critical checks
2. Fail fast: Stop pipeline at first failure
3. Reproducible: Same result every run
4. Idempotent: Safe to run multiple times
5. Minimal blast radius: Failed deploys don't break prod

GitHub Actions: Production Patterns

Complete CI Workflow

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

# Cancel in-progress runs for same PR
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  # Fast checks first — fail early
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
      
      - run: pip install ruff mypy
      - run: ruff check . --output-format=github
      - run: ruff format --check .
      - run: mypy src/ --ignore-missing-imports
  
  test:
    needs: [lint]  # Only run if lint passes
    runs-on: ubuntu-latest
    
    # Matrix for multiple Python versions
    strategy:
      matrix:
        python-version: ["3.11", "3.12"]
        os: [ubuntu-latest, macos-latest]
      fail-fast: true  # Cancel other matrix jobs on first failure
    
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: testpass
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432
      
      redis:
        image: redis:7
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
        ports:
          - 6379:6379
    
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: "pip"
      
      - name: Install dependencies
        run: |
          pip install -e ".[test]"
      
      - name: Run tests with coverage
        env:
          DATABASE_URL: postgresql://postgres:testpass@localhost:5432/testdb
          REDIS_URL: redis://localhost:6379
        run: |
          pytest tests/ -v \
            --cov=src \
            --cov-report=xml \
            --cov-report=term-missing \
            --cov-fail-under=80 \
            -x  # Stop on first failure
      
      - name: Upload coverage
        uses: codecov/codecov-action@v4
        if: matrix.python-version == '3.12' && matrix.os == 'ubuntu-latest'
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
  
  # Security scanning
  security:
    needs: [lint]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      # Dependency vulnerability scan
      - name: Run pip-audit
        run: |
          pip install pip-audit
          pip-audit --requirement requirements.txt
      
      # Secret scanning
      - uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.repository.default_branch }}
      
      # SAST
      - uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/owasp-top-ten
            p/python

Container Build and Push

# .github/workflows/docker.yml
name: Build and Push

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write  # For OIDC image signing
    
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}
      image-tag: ${{ steps.meta.outputs.tags }}
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Docker metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=sha,prefix=sha-
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            # On main: also tag as 'latest'
            type=raw,value=latest,enable={{is_default_branch}}
      
      - name: Set up QEMU (for multi-arch)
        uses: docker/setup-qemu-action@v3
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Login to registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Build and push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64  # Multi-arch
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          # Build args from secrets
          build-args: |
            BUILD_DATE=${{ github.event.repository.updated_at }}
            GIT_SHA=${{ github.sha }}
      
      # Sign image with Cosign (supply chain security)
      - uses: sigstore/cosign-installer@v3
      - name: Sign image
        run: cosign sign --yes ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}

Deployment Workflows

Blue-Green Deployment

# Deploy to staging, smoke test, then promote to production
deploy:
  needs: [build, test]
  runs-on: ubuntu-latest
  environment: production  # Requires approval in GitHub
  
  steps:
    - name: Deploy to GREEN environment
      run: |
        kubectl set image deployment/myapp-green \
          myapp=${{ needs.build.outputs.image-tag }} \
          -n production
        kubectl rollout status deployment/myapp-green -n production
    
    - name: Run smoke tests against GREEN
      run: |
        ENDPOINT=$(kubectl get svc myapp-green -n production -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
        ./scripts/smoke-test.sh "https://$ENDPOINT"
    
    - name: Switch traffic to GREEN (promote)
      run: |
        kubectl patch service myapp-prod -n production \
          -p '{"spec":{"selector":{"slot":"green"}}}'
    
    - name: Scale down OLD BLUE
      run: |
        kubectl scale deployment/myapp-blue --replicas=0 -n production
    
    # Automatic rollback on failure
    - name: Rollback on failure
      if: failure()
      run: |
        kubectl patch service myapp-prod -n production \
          -p '{"spec":{"selector":{"slot":"blue"}}}'
        echo "Rolled back to blue environment"

Canary Deployment with ArgoCD

# application.yaml (ArgoCD)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  strategy:
    canary:
      steps:
        - setWeight: 5    # 5% of traffic
        - pause: {duration: 5m}
        - setWeight: 20   # 20% after 5 min
        - pause:
            untilApproved: true  # Manual gate
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100
      
      # Auto-rollback on bad metrics
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 2
        args:
          - name: service-name
            value: myapp-canary
      
      canaryService: myapp-canary
      stableService: myapp-stable
      trafficRouting:
        nginx:
          stableIngress: myapp-stable

GitOps with ArgoCD

# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-prod
  namespace: argocd
spec:
  project: production
  
  source:
    repoURL: https://github.com/myorg/k8s-manifests
    targetRevision: main
    path: apps/myapp/production
  
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  
  syncPolicy:
    automated:
      prune: true      # Delete resources removed from Git
      selfHeal: true   # Re-sync if cluster drifts from Git
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
  
  # Alert on health issues
  revisionHistoryLimit: 10

Secrets Management

# NEVER store secrets in repository — even encrypted
# Use OIDC for cloud provider auth (no long-lived keys)

# GitHub Actions + AWS OIDC
- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789:role/github-actions
    aws-region: us-east-1
    # No AWS keys needed — OIDC token exchange

# Use environment-scoped secrets
# Settings > Environments > production > Secrets
# These require environment protection rules (approval gates)
env:
  DB_PASSWORD: ${{ secrets.DB_PASSWORD }}  # Environment secret (prod only)
  API_KEY: ${{ secrets.API_KEY }}          # Repository secret (all envs)

# Use Vault for dynamic secrets
- name: Get dynamic DB credentials from Vault
  uses: hashicorp/vault-action@v3
  with:
    url: https://vault.internal.com
    method: jwt
    role: github-actions
    secrets: |
      database/creds/readonly username | DB_USER ;
      database/creds/readonly password | DB_PASSWORD

Release Automation

# semantic-release based automatic versioning
# .github/workflows/release.yml
name: Release

on:
  push:
    branches: [main]

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          token: ${{ secrets.GITHUB_TOKEN }}
      
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      
      - name: Install semantic-release
        run: npm install -g semantic-release @semantic-release/git @semantic-release/github @semantic-release/changelog
      
      - name: Release
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: semantic-release

# .releaserc.json
{
  "branches": ["main"],
  "plugins": [
    "@semantic-release/commit-analyzer",
    "@semantic-release/release-notes-generator",
    "@semantic-release/changelog",
    "@semantic-release/git",
    "@semantic-release/github"
  ]
}
# Commit format: feat: → minor, fix: → patch, feat!: → major (BREAKING CHANGE)

Notification and Alerting

# Notify Slack on deployment
- name: Notify Slack
  if: always()
  uses: slackapi/slack-github-action@v1
  with:
    channel-id: 'deployments'
    payload: |
      {
        "text": "${{ job.status == 'success' && '✅' || '❌' }} Deployment to production",
        "blocks": [
          {
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "*${{ job.status == 'success' && 'Deployed ✅' || 'Failed ❌' }}* `${{ github.repository }}` to production\n*Commit:* ${{ github.sha }}\n*Author:* ${{ github.actor }}\n*Message:* ${{ github.event.head_commit.message }}"
            },
            "accessory": {
              "type": "button",
              "text": { "type": "plain_text", "text": "View Run" },
              "url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
            }
          }
        ]
      }
  env:
    SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

Performance Optimization

# Speed up CI with caching
- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/pip
      .venv
      node_modules
      ~/.gradle/caches
    key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-deps-

# Parallelize independent jobs
jobs:
  unit-test:
    runs-on: ubuntu-latest
    # ... runs in parallel with integration-test
  integration-test:
    runs-on: ubuntu-latest
    # ... runs in parallel with unit-test
  security-scan:
    runs-on: ubuntu-latest
    # ... runs in parallel
  deploy:
    needs: [unit-test, integration-test, security-scan]  # Waits for all 3

CI/CD Quality Gates

Gate

Tool

Threshold

Lint	ruff, eslint, golangci-lint	0 errors
Type check	mypy, tsc	0 errors
Unit tests	pytest, jest, go test	100% pass
Coverage	codecov	> 80%
Dependency CVEs	pip-audit, npm audit	0 critical
Secret scanning	trufflehog, gitleaks	0 found
SAST	semgrep, snyk	0 high/critical
Integration tests	pytest, cypress	100% pass
Smoke tests (post-deploy)	k6, custom scripts	100% pass
Performance regression	k6, artillery	< 10% p95 latency increase

Back to Skills Directory