ansible-expert
Expert Ansible automation covering playbook structure, inventory design, variable precedence, idempotency patterns, roles with dependencies, handlers, Jinja2 templating, Vault secrets, selective execution with tags, Molecule for testing, and AWX/Tower integration.
Ansible Expert
Ansible is infrastructure as code without a daemon — push-based, SSH-native, and readable by anyone
who can read YAML. Its power and peril are the same thing: it's easy to write Ansible that works once
but isn't idempotent. Expert Ansible means every task can run 100 times and leave the system in exactly
the same state as after the first run.
Core Mental Model
Ansible's execution model is: inventory (what hosts exist) + playbook (what to do) + variables
(how to customize). The play runs tasks against hosts, and tasks call modules. Modules do the
idempotent work. Your job is to use modules, not shell commands — modules check state before changing
it; shell commands don't. Variables have a complex precedence order (22 levels!), but in practice: role
defaults < group_vars < host_vars < playbook vars < extra-vars. Understand this or spend hours debugging
mysterious variable values.
Inventory Design
inventory/
├── production/
│ ├── hosts.ini # Static hosts
│ ├── gcp.yaml # Dynamic inventory plugin
│ ├── group_vars/
│ │ ├── all.yaml # Variables for ALL hosts
│ │ ├── webservers.yaml # Variables for webservers group
│ │ └── databases/
│ │ ├── vars.yaml # Non-sensitive vars
│ │ └── vault.yaml # Encrypted secrets (ansible-vault)
│ └── host_vars/
│ └── db-prod-01.yaml # Host-specific variables
└── staging/
├── hosts.ini
└── group_vars/
└── all.yaml
Static Inventory (hosts.ini)
[webservers]
web-01.example.com ansible_host=10.0.1.10
web-02.example.com ansible_host=10.0.1.11
[databases]
db-primary.example.com ansible_host=10.0.2.10 db_role=primary
db-replica.example.com ansible_host=10.0.2.11 db_role=replica
[webservers:vars]
ansible_user=ubuntu
ansible_python_interpreter=/usr/bin/python3
nginx_version=1.24
[all:vars]
ansible_ssh_private_key_file=~/.ssh/production_key
ansible_ssh_common_args='-o StrictHostKeyChecking=accept-new'
Dynamic Inventory (GCP)
# inventory/production/gcp.yaml
plugin: google.cloud.gcp_compute
projects:
- my-gcp-project
regions:
- us-central1
filters:
- status = RUNNING
hostnames:
- name
groups:
webservers: "'webserver' in labels"
databases: "'database' in labels"
compose:
ansible_host: networkInterfaces[0].networkIP
db_tier: labels.tier
environment: labels.environment
Variable Precedence (Low → High, 22 levels)
1. role defaults (defaults/main.yml)
2. inventory file or script group vars
3. inventory group_vars/all
4. playbook group_vars/all
5. inventory group_vars/*
6. playbook group_vars/*
7. inventory file or script host vars
8. inventory host_vars/*
9. playbook host_vars/*
10. host facts / cached set_facts
11. play vars
12. play vars_prompt
13. play vars_files
14. role vars (vars/main.yml)
15. block vars
16. task vars (only for that task)
17. include_vars
18. set_facts / registered vars
19. role (and include_role) params
20. include params
21. extra vars (command line -e) ← ALWAYS WINS
Practical rule: Use defaults/main.yml for role defaults (easily overridden). Use vars/main.yml
only for role-internal constants that must not be overridden. Use group_vars for environment config.
Playbook Structure Best Practices
# site.yml — top-level playbook
---
- import_playbook: playbooks/common.yml
- import_playbook: playbooks/webservers.yml
- import_playbook: playbooks/databases.yml
# playbooks/webservers.yml
---
- name: Configure web servers
hosts: webservers
become: yes # sudo
gather_facts: yes # Set to no for speed in large inventories
pre_tasks:
- name: Update apt cache (once per play)
apt:
update_cache: yes
cache_valid_time: 3600 # Skip if cache is < 1 hour old
when: ansible_os_family == "Debian"
tags: [always] # Run even when --tags is specified
roles:
- role: common
tags: [common]
- role: nginx
tags: [nginx]
vars:
nginx_worker_processes: 4
post_tasks:
- name: Verify nginx is running
service:
name: nginx
state: started
check_mode: yes # Test without changing
tags: [verify]
Idempotency Patterns
# ✅ Use modules instead of shell — modules are idempotent
- name: Install packages
apt:
name: [nginx, git, python3-pip]
state: present # Ensures installed; doesn't reinstall if present
# ❌ Shell is NOT idempotent
- name: Install nginx
shell: apt-get install -y nginx # Runs every time, no idempotency check
# ✅ When you MUST use shell: add changed_when / failed_when
- name: Run custom migration script
shell: /opt/app/migrate.sh
args:
creates: /opt/app/.migration_complete # Skip if this file exists (idempotent!)
register: migration_result
changed_when: migration_result.rc == 0 and 'already up to date' not in migration_result.stdout
failed_when: migration_result.rc != 0 and 'already up to date' not in migration_result.stdout
# ✅ Lineinfile for config file modification (idempotent)
- name: Set kernel parameter
lineinfile:
path: /etc/sysctl.conf
regexp: '^net.core.somaxconn'
line: 'net.core.somaxconn = 65535'
state: present
notify: Apply sysctl
# ✅ Template for full file management (idempotent, diffs on change)
- name: Deploy nginx config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
validate: 'nginx -t -c %s' # Validate before deploying!
notify: Reload nginx
Role Structure
roles/nginx/
├── defaults/
│ └── main.yml # Default variables (lowest priority, easily overridden)
├── vars/
│ └── main.yml # Role-internal constants (high priority, rarely override)
├── tasks/
│ ├── main.yml # Entry point (import other task files)
│ ├── install.yml
│ ├── configure.yml
│ └── ssl.yml
├── handlers/
│ └── main.yml # Handlers (triggered by notify)
├── templates/
│ └── nginx.conf.j2 # Jinja2 templates
├── files/
│ └── dhparam.pem # Static files
├── meta/
│ └── main.yml # Role metadata and dependencies
└── molecule/ # Molecule tests
└── default/
├── molecule.yml
├── converge.yml
└── verify.yml
# roles/nginx/defaults/main.yml
nginx_version: "1.24"
nginx_worker_processes: "auto"
nginx_worker_connections: 4096
nginx_keepalive_timeout: 65
nginx_server_tokens: "off"
nginx_gzip_enabled: true
nginx_ssl_protocols: "TLSv1.2 TLSv1.3"
# roles/nginx/meta/main.yml
galaxy_info:
author: platform-team
description: Nginx web server installation and configuration
license: MIT
min_ansible_version: "2.12"
platforms:
- name: Ubuntu
versions: ["20.04", "22.04"]
dependencies:
- role: common # Ensure common role runs first
- role: certbot # Install certbot before nginx tries to use certs
when: nginx_ssl_enabled | default(false)
Handlers
# roles/nginx/handlers/main.yml
---
- name: Reload nginx
service:
name: nginx
state: reloaded
- name: Restart nginx
service:
name: nginx
state: restarted
- name: Apply sysctl
command: sysctl -p /etc/sysctl.conf
# Handlers run ONCE at the end of a play, even if notified multiple times
# Tasks notify handlers like this:
# - name: Deploy config
# template:
# src: nginx.conf.j2
# dest: /etc/nginx/nginx.conf
# notify: Reload nginx
Jinja2 Templating
# templates/nginx.conf.j2
worker_processes {{ nginx_worker_processes }};
worker_rlimit_nofile {{ nginx_worker_rlimit_nofile | default(65535) }};
events {
worker_connections {{ nginx_worker_connections }};
use epoll;
multi_accept on;
}
http {
# Generate upstream block for each server in the group
upstream app_backend {
least_conn;
{% for host in groups['webservers'] %}
server {{ hostvars[host]['ansible_host'] }}:{{ app_port | default(8080) }};
{% endfor %}
keepalive 32;
}
{% if nginx_gzip_enabled %}
gzip on;
gzip_comp_level {{ nginx_gzip_level | default(6) }};
gzip_types {% for type in nginx_gzip_types %}{{ type }}{% if not loop.last %} {% endif %}{% endfor %};
{% endif %}
# Conditionally include SSL config
{% if nginx_ssl_enabled | default(false) %}
ssl_protocols {{ nginx_ssl_protocols }};
ssl_certificate {{ nginx_ssl_cert_path }};
ssl_certificate_key {{ nginx_ssl_key_path }};
{% endif %}
# Template from variable map
{% for key, value in nginx_headers.items() %}
add_header {{ key }} "{{ value }}" always;
{% endfor %}
}
# Useful Jinja2 filters in Ansible
{{ my_list | join(', ') }}
{{ my_string | upper | trim }}
{{ my_dict | to_json }}
{{ my_dict | to_nice_yaml }}
{{ my_path | basename }}
{{ my_path | dirname }}
{{ my_var | default('fallback') }}
{{ my_var | default(omit) }} # Omit key entirely if undefined
{{ my_list | selectattr('enabled', 'equalto', true) | list }}
{{ my_list | map(attribute='name') | list }}
{{ my_string | regex_replace('^prefix_', '') }}
{{ 1024 * 1024 | human_readable }}
{{ lookup('env', 'HOME') }} # Lookup from controller environment
{{ lookup('file', '/etc/hosts') }} # Read file on controller
Ansible Vault
# Encrypt a variables file
ansible-vault encrypt group_vars/production/vault.yaml
# Encrypt a single value (for embedding in plaintext files)
ansible-vault encrypt_string 'my-secret-password' --name 'db_password'
# Result:
# db_password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# 38623435...
# Edit encrypted file
ansible-vault edit group_vars/production/vault.yaml
# Run playbook with vault password from file
ansible-playbook site.yml --vault-password-file ~/.vault_password
# Or use environment variable
export ANSIBLE_VAULT_PASSWORD_FILE=~/.vault_password
ansible-playbook site.yml
# vault.yaml example
db_password: "production-db-password"
api_secret_key: "production-api-key"
ssl_private_key: |
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
Molecule Testing
# molecule/default/molecule.yml
driver:
name: docker
platforms:
- name: ubuntu-22
image: ubuntu:22.04
pre_build_image: true
command: /lib/systemd/systemd
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
cgroupns_mode: host
provisioner:
name: ansible
playbooks:
converge: converge.yml
verify: verify.yml
inventory:
host_vars:
ubuntu-22:
nginx_ssl_enabled: false
nginx_worker_processes: 2
verifier:
name: ansible
# molecule/default/verify.yml
---
- name: Verify nginx role
hosts: all
gather_facts: false
tasks:
- name: Check nginx is running
service_facts:
- name: Assert nginx is active
assert:
that:
- "'nginx' in services"
- "services['nginx'].state == 'running'"
- "services['nginx'].status == 'enabled'"
- name: Check nginx port is open
wait_for:
port: 80
timeout: 10
- name: Verify nginx config is valid
command: nginx -t
changed_when: false
- name: Make HTTP request to verify response
uri:
url: http://localhost/health
status_code: 200
register: health_response
- name: Assert health response
assert:
that: health_response.status == 200
Anti-Patterns
❌ shell: or command: without creates: or changed_when: — breaks idempotency
❌ ignore_errors: yes everywhere — hide failures until they're catastrophic
❌ Hardcoded passwords in tasks — use ansible-vault encrypted group_vars
❌ become: yes on every task — only elevate where actually needed
❌ gather_facts: no everywhere for speed — facts are needed for OS-conditional tasks
❌ No handlers for service restarts — tasks that change config should notify handlers
❌ Huge playbooks instead of roles — roles make logic reusable and testable
❌ No molecule tests — untested roles break in production when you change the base image
❌ --extra-vars in CI for secrets — use vault-encrypted vars with vault password from CI secret
Quick Reference
Ansible commands:
ansible all -m ping -i inventory/ → Test connectivity
ansible-playbook site.yml -i inventory/ --check → Dry run (check mode)
ansible-playbook site.yml --tags nginx,ssl → Run specific tags
ansible-playbook site.yml --skip-tags common → Skip tags
ansible-playbook site.yml --limit webservers → Limit to host group
ansible-playbook site.yml --limit web-01.example.com → Single host
ansible-playbook site.yml -e "nginx_version=1.25" → Override variable
ansible-playbook site.yml --start-at-task "Task name" → Resume from task
ansible all -a "systemctl status nginx" -i inventory/ → Ad-hoc command
Useful modules cheat sheet:
apt/yum/dnf: Package management
service: Start/stop/enable services
template: Deploy Jinja2 templates
copy: Deploy static files
file: Create files/dirs/symlinks, set permissions
lineinfile: Manage single lines in files
blockinfile: Manage blocks of text in files
user: Manage Linux users
git: Clone/update git repos
uri: Make HTTP requests
assert: Test conditions (use in verify.yml)
wait_for: Wait for port/file/condition
debug: Print variable values (use during development)
set_fact: Create/update variablesSkill Information
- Source
- MoltbotDen
- Category
- DevOps & Cloud
- Repository
- View on GitHub
Related Skills
kubernetes-expert
Deploy, scale, and operate production Kubernetes clusters. Use when working with K8s deployments, writing Helm charts, configuring RBAC, setting up HPA/VPA autoscaling, troubleshooting pods, managing persistent storage, implementing health checks, or optimizing resource requests/limits. Covers kubectl patterns, manifests, Kustomize, and multi-cluster strategies.
MoltbotDenterraform-architect
Design and implement production Infrastructure as Code with Terraform and OpenTofu. Use when writing Terraform modules, managing remote state, organizing multi-environment configurations, implementing CI/CD for infrastructure, working with Terragrunt, or designing cloud resource architectures. Covers AWS, GCP, Azure providers with security and DRY patterns.
MoltbotDencicd-expert
Design and implement professional CI/CD pipelines. Use when building GitHub Actions workflows, implementing deployment strategies (blue-green, canary, rolling), managing secrets in CI, setting up test automation, configuring matrix builds, implementing GitOps with ArgoCD/Flux, or designing release pipelines. Covers GitHub Actions, GitLab CI, and cloud-native deployment patterns.
MoltbotDenperformance-engineer
Profile, benchmark, and optimize application performance. Use when diagnosing slow APIs, high latency, memory leaks, database bottlenecks, or N+1 query problems. Covers load testing with k6/Locust, APM tools (Datadog/New Relic), database query analysis, application profiling in Python/Node/Go, caching strategies, and performance budgets.
MoltbotDenaws-architect
Expert-level AWS architecture patterns covering the Well-Architected Framework, IAM least privilege design, VPC networking, CDK infrastructure-as-code, compute tradeoffs, database selection, and cost optimization. Trigger phrases: designing AWS infrastructure, AWS CDK, IAM policies, VPC design, Lamb
MoltbotDen