DevOps & CloudDocumented
cicd-expert
CI/CD pipeline architect. GitHub Actions workflows, blue-green and canary deployments, GitOps with ArgoCD, secrets management with OIDC, container builds, semantic-release automation, and quality gates for every build.
Share:
Installation
npx clawhub@latest install cicd-expertView the full skill documentation and source below.
Documentation
CI/CD Pipeline Architecture Expert
CI/CD Philosophy
CI: Every push is tested — fast feedback catches bugs early
CD: Every passing build can deploy — automation reduces human error
Principles:
1. Fast feedback: < 10 min for critical checks
2. Fail fast: Stop pipeline at first failure
3. Reproducible: Same result every run
4. Idempotent: Safe to run multiple times
5. Minimal blast radius: Failed deploys don't break prod
GitHub Actions: Production Patterns
Complete CI Workflow
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
# Cancel in-progress runs for same PR
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
# Fast checks first — fail early
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- run: pip install ruff mypy
- run: ruff check . --output-format=github
- run: ruff format --check .
- run: mypy src/ --ignore-missing-imports
test:
needs: [lint] # Only run if lint passes
runs-on: ubuntu-latest
# Matrix for multiple Python versions
strategy:
matrix:
python-version: ["3.11", "3.12"]
os: [ubuntu-latest, macos-latest]
fail-fast: true # Cancel other matrix jobs on first failure
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: testpass
POSTGRES_DB: testdb
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
- name: Install dependencies
run: |
pip install -e ".[test]"
- name: Run tests with coverage
env:
DATABASE_URL: postgresql://postgres:testpass@localhost:5432/testdb
REDIS_URL: redis://localhost:6379
run: |
pytest tests/ -v \
--cov=src \
--cov-report=xml \
--cov-report=term-missing \
--cov-fail-under=80 \
-x # Stop on first failure
- name: Upload coverage
uses: codecov/codecov-action@v4
if: matrix.python-version == '3.12' && matrix.os == 'ubuntu-latest'
with:
token: ${{ secrets.CODECOV_TOKEN }}
# Security scanning
security:
needs: [lint]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Dependency vulnerability scan
- name: Run pip-audit
run: |
pip install pip-audit
pip-audit --requirement requirements.txt
# Secret scanning
- uses: trufflesecurity/trufflehog@main
with:
path: ./
base: ${{ github.event.repository.default_branch }}
# SAST
- uses: returntocorp/semgrep-action@v1
with:
config: >-
p/owasp-top-ten
p/python
Container Build and Push
# .github/workflows/docker.yml
name: Build and Push
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
id-token: write # For OIDC image signing
outputs:
image-digest: ${{ steps.build.outputs.digest }}
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix=sha-
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
# On main: also tag as 'latest'
type=raw,value=latest,enable={{is_default_branch}}
- name: Set up QEMU (for multi-arch)
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
id: build
uses: docker/build-push-action@v6
with:
context: .
platforms: linux/amd64,linux/arm64 # Multi-arch
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Build args from secrets
build-args: |
BUILD_DATE=${{ github.event.repository.updated_at }}
GIT_SHA=${{ github.sha }}
# Sign image with Cosign (supply chain security)
- uses: sigstore/cosign-installer@v3
- name: Sign image
run: cosign sign --yes ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}
Deployment Workflows
Blue-Green Deployment
# Deploy to staging, smoke test, then promote to production
deploy:
needs: [build, test]
runs-on: ubuntu-latest
environment: production # Requires approval in GitHub
steps:
- name: Deploy to GREEN environment
run: |
kubectl set image deployment/myapp-green \
myapp=${{ needs.build.outputs.image-tag }} \
-n production
kubectl rollout status deployment/myapp-green -n production
- name: Run smoke tests against GREEN
run: |
ENDPOINT=$(kubectl get svc myapp-green -n production -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
./scripts/smoke-test.sh "https://$ENDPOINT"
- name: Switch traffic to GREEN (promote)
run: |
kubectl patch service myapp-prod -n production \
-p '{"spec":{"selector":{"slot":"green"}}}'
- name: Scale down OLD BLUE
run: |
kubectl scale deployment/myapp-blue --replicas=0 -n production
# Automatic rollback on failure
- name: Rollback on failure
if: failure()
run: |
kubectl patch service myapp-prod -n production \
-p '{"spec":{"selector":{"slot":"blue"}}}'
echo "Rolled back to blue environment"
Canary Deployment with ArgoCD
# application.yaml (ArgoCD)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
spec:
strategy:
canary:
steps:
- setWeight: 5 # 5% of traffic
- pause: {duration: 5m}
- setWeight: 20 # 20% after 5 min
- pause:
untilApproved: true # Manual gate
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100
# Auto-rollback on bad metrics
analysis:
templates:
- templateName: success-rate
startingStep: 2
args:
- name: service-name
value: myapp-canary
canaryService: myapp-canary
stableService: myapp-stable
trafficRouting:
nginx:
stableIngress: myapp-stable
GitOps with ArgoCD
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-prod
namespace: argocd
spec:
project: production
source:
repoURL: https://github.com/myorg/k8s-manifests
targetRevision: main
path: apps/myapp/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Re-sync if cluster drifts from Git
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# Alert on health issues
revisionHistoryLimit: 10
Secrets Management
# NEVER store secrets in repository — even encrypted
# Use OIDC for cloud provider auth (no long-lived keys)
# GitHub Actions + AWS OIDC
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions
aws-region: us-east-1
# No AWS keys needed — OIDC token exchange
# Use environment-scoped secrets
# Settings > Environments > production > Secrets
# These require environment protection rules (approval gates)
env:
DB_PASSWORD: ${{ secrets.DB_PASSWORD }} # Environment secret (prod only)
API_KEY: ${{ secrets.API_KEY }} # Repository secret (all envs)
# Use Vault for dynamic secrets
- name: Get dynamic DB credentials from Vault
uses: hashicorp/vault-action@v3
with:
url: https://vault.internal.com
method: jwt
role: github-actions
secrets: |
database/creds/readonly username | DB_USER ;
database/creds/readonly password | DB_PASSWORD
Release Automation
# semantic-release based automatic versioning
# .github/workflows/release.yml
name: Release
on:
push:
branches: [main]
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- uses: actions/setup-node@v4
with:
node-version: 20
- name: Install semantic-release
run: npm install -g semantic-release @semantic-release/git @semantic-release/github @semantic-release/changelog
- name: Release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: semantic-release
# .releaserc.json
{
"branches": ["main"],
"plugins": [
"@semantic-release/commit-analyzer",
"@semantic-release/release-notes-generator",
"@semantic-release/changelog",
"@semantic-release/git",
"@semantic-release/github"
]
}
# Commit format: feat: → minor, fix: → patch, feat!: → major (BREAKING CHANGE)
Notification and Alerting
# Notify Slack on deployment
- name: Notify Slack
if: always()
uses: slackapi/slack-github-action@v1
with:
channel-id: 'deployments'
payload: |
{
"text": "${{ job.status == 'success' && '✅' || '❌' }} Deployment to production",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*${{ job.status == 'success' && 'Deployed ✅' || 'Failed ❌' }}* `${{ github.repository }}` to production\n*Commit:* ${{ github.sha }}\n*Author:* ${{ github.actor }}\n*Message:* ${{ github.event.head_commit.message }}"
},
"accessory": {
"type": "button",
"text": { "type": "plain_text", "text": "View Run" },
"url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
Performance Optimization
# Speed up CI with caching
- uses: actions/cache@v4
with:
path: |
~/.cache/pip
.venv
node_modules
~/.gradle/caches
key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-deps-
# Parallelize independent jobs
jobs:
unit-test:
runs-on: ubuntu-latest
# ... runs in parallel with integration-test
integration-test:
runs-on: ubuntu-latest
# ... runs in parallel with unit-test
security-scan:
runs-on: ubuntu-latest
# ... runs in parallel
deploy:
needs: [unit-test, integration-test, security-scan] # Waits for all 3
CI/CD Quality Gates
| Gate | Tool | Threshold |
| Lint | ruff, eslint, golangci-lint | 0 errors |
| Type check | mypy, tsc | 0 errors |
| Unit tests | pytest, jest, go test | 100% pass |
| Coverage | codecov | > 80% |
| Dependency CVEs | pip-audit, npm audit | 0 critical |
| Secret scanning | trufflehog, gitleaks | 0 found |
| SAST | semgrep, snyk | 0 high/critical |
| Integration tests | pytest, cypress | 100% pass |
| Smoke tests (post-deploy) | k6, custom scripts | 100% pass |
| Performance regression | k6, artillery | < 10% p95 latency increase |