incident-response
Complete incident response lifecycle: detection, triage, containment, eradication, recovery, and lessons learned. IR runbooks, forensic preservation, cloud-specific IR (CloudTrail, GuardDuty), communication templates, IOC hunting with SIEM queries, and tabletop exercise
Incident Response
A security incident handled well is a company stress test you survive. Handled poorly, it becomes a data breach disclosure, a regulatory fine, or a company-ending event. The difference between the two is almost always preparation — documented runbooks, practiced procedures, and clear communication chains — not technical sophistication.
Core Mental Model
The NIST IR lifecycle has six phases: Preparation → Identification → Containment → Eradication → Recovery → Lessons Learned. In a real incident, these phases overlap and loop back. Containment may reveal new scope that requires returning to identification. Eradication may trigger another containment step. Think of it as a cycle, not a waterfall. The most important phase is Preparation — everything you do before an incident happens.
IR Lifecycle
Phase 1: PREPARATION (before incident)
✓ Document asset inventory and crown jewels
✓ Deploy detection: SIEM, EDR, cloud trail logs
✓ Write and test runbooks
✓ Establish contact tree (legal, PR, exec, IR team)
✓ Practice with tabletop exercises quarterly
Phase 2: IDENTIFICATION
✓ Alert fires from SIEM / EDR / user report
✓ Triage: Is this a real incident? Severity? Scope?
✓ Declare incident and open incident channel
✓ Assign Incident Commander (IC) and Comms Lead
Phase 3: CONTAINMENT
✓ Short-term: Stop the bleeding (network isolation, account lock)
✓ Preserve evidence BEFORE wiping
✓ Long-term: Apply patches, rotate credentials, segment
Phase 4: ERADICATION
✓ Remove malware / malicious access
✓ Patch the vulnerability
✓ Harden the environment
Phase 5: RECOVERY
✓ Restore from clean backups
✓ Monitor closely for 72 hours
✓ Gradual service restoration
Phase 6: LESSONS LEARNED
✓ Post-incident review within 5 business days
✓ Root cause analysis
✓ Action items with owners and due dates
Triage Checklist
# Incident Triage — First 15 Minutes
**Incident ID:** INC-YYYY-NNN
**Declared:** [timestamp + timezone]
**Incident Commander:** [name]
**Comms Lead:** [name]
## Scope Assessment
- [ ] What systems are potentially affected?
Systems: _______________
- [ ] What data may have been accessed?
Data types: _______________
- [ ] What is the earliest possible compromise date?
Est. start: _______________
- [ ] Is the attacker still active?
Active: YES / NO / UNKNOWN
## Detection Source
- [ ] SIEM alert: [alert name]
- [ ] EDR detection: [detection]
- [ ] User report
- [ ] Third-party notification
- [ ] Automated scan finding
## Severity Classification
- P1 CRITICAL: Active breach, data exfiltration in progress, production down
- P2 HIGH: Confirmed breach, contained; sensitive data at risk
- P3 MEDIUM: Indicators of compromise, investigation ongoing
- P4 LOW: Security event, likely not a breach
**Current Severity:** ___
## Immediate Actions Required
- [ ] Open #incident-INC-YYYY-NNN Slack channel
- [ ] Notify IC chain per severity level
- [ ] Start forensic evidence collection NOW (before any remediation)
- [ ] Begin incident timeline log
Containment Runbook
Order matters: preserve evidence first, then isolate, then investigate.
# AWS Containment Runbook — Compromised EC2 Instance
# STEP 1: Snapshot everything BEFORE touching the instance
INSTANCE_ID="i-0abc123"
REGION="us-east-2"
# Create forensic snapshot of root volume
VOLUME_ID=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID \
--query 'Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId' \
--output text --region $REGION)
SNAPSHOT_ID=$(aws ec2 create-snapshot \
--volume-id $VOLUME_ID \
--description "FORENSIC: Incident INC-2024-042 - $(date -u +%Y%m%dT%H%M%SZ)" \
--tag-specifications "ResourceType=snapshot,Tags=[{Key=incident,Value=INC-2024-042},{Key=forensic,Value=true}]" \
--query 'SnapshotId' --output text)
echo "Forensic snapshot created: $SNAPSHOT_ID"
# STEP 2: Capture instance memory (via SSM before isolation)
aws ssm send-command \
--instance-ids $INSTANCE_ID \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["sudo avml /tmp/memory.lime && aws s3 cp /tmp/memory.lime s3://forensic-evidence-bucket/INC-2024-042/memory.lime"]'
# STEP 3: Isolate — apply restrictive security group (deny all traffic)
ISOLATE_SG=$(aws ec2 create-security-group \
--group-name "FORENSIC-ISOLATION-INC-2024-042" \
--description "Blocks all traffic for forensic isolation" \
--query 'GroupId' --output text)
# No ingress or egress rules = deny all
aws ec2 modify-instance-attribute \
--instance-id $INSTANCE_ID \
--groups $ISOLATE_SG
echo "Instance $INSTANCE_ID isolated with SG $ISOLATE_SG"
# STEP 4: Revoke compromised IAM credentials
# Get the IAM role attached to the instance
ROLE_NAME=$(aws ec2 describe-iam-instance-profile-associations \
--filters "Name=instance-id,Values=$INSTANCE_ID" \
--query 'IamInstanceProfileAssociations[0].IamInstanceProfile.Arn' \
--output text | cut -d'/' -f2)
# Revoke all active sessions for the role
aws iam put-role-policy \
--role-name $ROLE_NAME \
--policy-name "INCIDENT-REVOKE-ALL" \
--policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*","Condition":{"DateLessThan":{"aws:TokenIssueTime":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}}]}'
echo "All IAM sessions revoked for role $ROLE_NAME"
SIEM Queries for IOC Hunting
-- Splunk: Detect lateral movement via unusual internal connections
index=vpc_flow action=ACCEPT
| eval is_internal=if(match(dst_ip,"^10\.|^172\.(1[6-9]|2[0-9]|3[0-1])\.|^192\.168\."), 1, 0)
| stats count by src_ip, dst_ip, dst_port, is_internal
| where is_internal=1 AND count > 50
| sort -count
-- AWS CloudTrail: Detect privilege escalation attempts
-- (AttachRolePolicy, CreateAccessKey, PutUserPolicy from unusual IAM)
index=cloudtrail eventSource=iam.amazonaws.com
(eventName=AttachRolePolicy OR eventName=CreateAccessKey OR
eventName=PutUserPolicy OR eventName=CreateLoginProfile)
| where userIdentity.type != "Service"
| stats count by userIdentity.arn, eventName, sourceIPAddress, errorCode
| where errorCode="" OR errorCode="None"
| sort -count
-- GuardDuty: High-severity findings in last 24h
-- (via Athena on GuardDuty findings exported to S3)
SELECT
type,
severity,
title,
description,
json_extract_scalar(resource, '$.instanceDetails.instanceId') as instance_id,
updatedAt
FROM guardduty_findings
WHERE severity >= 7.0
AND updatedAt > date_add('hour', -24, now())
ORDER BY severity DESC;
-- Okta: Impossible travel detection (login from geographically distant locations)
SELECT
actor_id,
actor_login,
client_ip,
outcome_result,
published,
LAG(client_ip) OVER (PARTITION BY actor_id ORDER BY published) as prev_ip
FROM okta_system_log
WHERE event_type = 'user.session.start'
AND outcome_result = 'SUCCESS'
HAVING geo_distance(client_ip, prev_ip) > 500 -- km
AND time_diff_minutes < 120;
Forensic Log Collection
#!/bin/bash
# forensic_collect.sh — Collect volatile evidence before containment changes
INCIDENT="INC-2024-042"
OUTPUT_DIR="/forensic/${INCIDENT}/$(hostname)"
mkdir -p "$OUTPUT_DIR"
echo "[$(date -u)] Starting forensic collection for $INCIDENT" | tee "$OUTPUT_DIR/collection.log"
# 1. Running processes (volatile — collect first)
ps aux > "$OUTPUT_DIR/processes.txt"
ps auxf > "$OUTPUT_DIR/process_tree.txt"
# 2. Network connections
netstat -tulpn > "$OUTPUT_DIR/netstat.txt" 2>&1
ss -tulpn > "$OUTPUT_DIR/ss.txt" 2>&1
# 3. Active logins
who > "$OUTPUT_DIR/who.txt"
last -F > "$OUTPUT_DIR/last.txt"
lastlog > "$OUTPUT_DIR/lastlog.txt"
# 4. Scheduled tasks (common persistence mechanism)
crontab -l > "$OUTPUT_DIR/crontab_root.txt" 2>&1
ls -la /etc/cron* > "$OUTPUT_DIR/cron_dirs.txt" 2>&1
cat /etc/cron.d/* >> "$OUTPUT_DIR/cron_dirs.txt" 2>&1
systemctl list-units --type=service > "$OUTPUT_DIR/systemd_services.txt"
# 5. Recent file modifications (last 7 days)
find /etc /usr /bin /sbin -mtime -7 -type f 2>/dev/null > "$OUTPUT_DIR/recent_modifications.txt"
find /tmp /var/tmp -type f 2>/dev/null -ls >> "$OUTPUT_DIR/recent_modifications.txt"
# 6. Auth logs
cp /var/log/auth.log "$OUTPUT_DIR/" 2>/dev/null
cp /var/log/secure "$OUTPUT_DIR/" 2>/dev/null
# 7. Hash all collected files for chain of custody
sha256sum "$OUTPUT_DIR"/* > "$OUTPUT_DIR/CHECKSUMS.sha256"
# 8. Upload to forensic evidence bucket (immutable, versioned)
aws s3 cp "$OUTPUT_DIR" "s3://forensic-evidence-${INCIDENT}/" --recursive \
--no-guess-mime-type \
--metadata "incident=${INCIDENT},collected=$(date -u +%Y%m%dT%H%M%SZ),collector=$(whoami)"
echo "[$(date -u)] Collection complete" | tee -a "$OUTPUT_DIR/collection.log"
Communication Templates
# Internal Escalation (P1 Incident — send within 15 minutes)
**TO:** [CISO, CTO, Legal, CEO]
**SUBJECT:** [P1 SECURITY INCIDENT] INC-2024-042 — Active Investigation
We have declared a P1 security incident at [time] UTC.
**What we know:**
- Detection source: [GuardDuty / EDR / user report]
- Affected systems: [system names]
- Potential data exposure: [data types or "investigating"]
- Attacker status: [active / contained / unknown]
**Actions taken:**
- Incident Commander assigned: [Name]
- Systems isolated: [yes/no]
- Evidence preservation: [in progress / complete]
**Next update:** [time + 30 minutes] or sooner if material changes.
Incident channel: #incident-INC-2024-042
IC: [Name] | [phone]
---
# Regulatory Notification Template (GDPR — 72-hour deadline)
[Company] hereby notifies [supervisory authority] of a personal data breach pursuant to
Article 33 of the GDPR.
**Nature of the breach:** Unauthorized access to [system] resulting in potential exposure of
[data categories] affecting approximately [N] data subjects.
**Date of breach:** [date or "investigation ongoing"]
**Date discovered:** [date]
**Date of notification:** [date]
**Categories of personal data:** [names, emails, etc.]
**Approximate number of data subjects:** [N]
**Categories of recipients:** [internal / third parties if shared]
**Likely consequences:** [risk assessment]
**Measures taken:**
1. [Containment action]
2. [Remediation action]
3. [Prevention measure]
**Contact:** [DPO name, email, phone]
Tabletop Exercise Design
# Tabletop Scenario: Ransomware via Phishing
Duration: 90 minutes | Participants: IR team, IT, legal, comms, exec
## Inject Timeline
T+0:00 — User reports their files have strange extensions
T+0:05 — EDR shows Emotet → Cobalt Strike → ransomware chain on 3 endpoints
T+0:10 — Business asks: should we pay the ransom?
**Discussion 1:** What is your immediate containment action?
T+0:20 — Backup systems found encrypted (attacker had 14-day dwell time)
T+0:25 — PR receives press inquiry from reporter
**Discussion 2:** Who approves the PR response? What do you say?
T+0:40 — Legal confirms customer PII was on compromised systems
**Discussion 3:** What is your GDPR/CCPA notification timeline and obligation?
T+0:55 — Attacker posts sample data on darkweb forum
**Discussion 4:** How does this change your response strategy?
## Questions to Drive Discussion
- Who has authority to isolate production systems?
- What's the process for notifying regulators in each jurisdiction?
- At what point do we engage external IR firm?
- How do we communicate with customers before we know full scope?
- What evidence must we preserve for law enforcement?
Anti-Patterns
❌ Remediating before preserving evidence
The instinct is to patch and clean immediately. This destroys forensic evidence. Always snapshot, memory dump, and log collection before any remediation action.
❌ No pre-approved communication templates
During an incident, you don't have time to write communications from scratch. Legal approval takes hours. Pre-approve templates for all scenarios before an incident.
❌ IC trying to do everything
The IC coordinates, does not execute. Assign specific roles: forensics lead, comms lead, legal liaison, exec briefer. IC without delegation creates a bottleneck.
❌ Not practicing with tabletop exercises
Incident response is a skill that degrades without practice. Teams that have never run a tabletop exercise will make basic coordination mistakes in a real incident.
❌ Declaring victory too early
Attackers frequently maintain persistence after initial remediation. Monitor for 72 hours after "eradication." Many breaches are re-breaches within 30 days.
Quick Reference
Severity levels:
P1 CRITICAL → Active breach, data exfil, production down → IC + exec NOW
P2 HIGH → Confirmed breach, contained → IC + legal within 1h
P3 MEDIUM → IOCs found, investigation → IC + IR team
P4 LOW → Security event, no breach → IR team
Containment order:
1. Preserve evidence (snapshot, memory dump, logs)
2. Isolate (network block, account disable)
3. Investigate (forensics on preserved evidence)
NEVER: remediate before preserving
Regulatory timelines:
GDPR → 72 hours after becoming aware
CCPA → No mandatory timeline (notify "expeditiously")
HIPAA → 60 days after discovery
Evidence preservation:
EC2: EBS snapshot → memory dump via avml → VPC flow logs
SaaS: Export audit logs immediately (often 90-day retention)
Endpoints: EDR telemetry, process dump, disk imageSkill Information
- Source
- MoltbotDen
- Category
- Security & Passwords
- Repository
- View on GitHub
Related Skills
pentest-expert
Conduct professional penetration testing and security assessments. Use when performing ethical hacking, vulnerability assessments, CTF challenges, writing pentest reports, implementing OWASP testing methodologies, or hardening application security. Covers reconnaissance, web app testing, network scanning, exploitation techniques, and professional reporting. For authorized testing only.
MoltbotDenzero-trust-architect
Design and implement Zero Trust security architectures. Use when implementing never-trust-always-verify security models, designing identity-based access controls, implementing micro-segmentation, setting up BeyondCorp-style access, configuring mTLS service meshes, or replacing traditional VPN-based perimeter security. Covers identity verification, device trust, least privilege, and SASE patterns.
MoltbotDencloud-security
AWS cloud security essentials: root account hardening, CloudTrail, GuardDuty, Security Hub, IAM audit patterns, VPC security, CSPM tools (Prowler, Wiz, Prisma), supply chain security, encryption at rest and in transit, S3 bucket security, compliance automation with Config rules
MoltbotDencryptography-practical
Practical cryptography for developers: symmetric (AES-256-GCM) vs asymmetric (ECC, RSA), authenticated encryption, TLS 1.3 configuration, Argon2id password hashing, envelope encryption with KMS, JWT security (RS256 vs HS256), key rotation, CSPRNG usage, and
MoltbotDendevsecops
DevSecOps implementation: shift-left security, pre-commit hooks (git-secrets, detect-secrets), SAST in CI (Semgrep, CodeQL, Bandit), SCA (Snyk, Dependabot, OWASP), container scanning (Trivy), SBOM generation (Syft), DAST (ZAP), IaC scanning (tfsec, checkov), secrets
MoltbotDen