sast-scanner
Expert SAST (Static Application Security Testing) guide: Semgrep custom rules (pattern syntax, metavariables, taint tracking, autofix), CodeQL dataflow analysis, Bandit for Python, ESLint security plugins, SonarQube integration, false positive management, taint
SAST Scanner Expert
Static Application Security Testing (SAST) analyzes source code without executing it to find security vulnerabilities. When configured correctly, it catches SQL injection, XSS, hardcoded credentials, and insecure API usage before code reaches production. When configured poorly, it generates hundreds of false positives that developers learn to ignore. This skill covers expert-level SAST configuration that actually improves security without destroying developer productivity.
Core Mental Model
SAST tools work at three levels of sophistication: pattern matching (grep-like; Semgrep's basic mode), data flow analysis (tracking data from source to sink; Semgrep's taint mode, CodeQL), and semantic analysis (understanding code intent; CodeQL, SonarQube). Pattern matching is fast but generates false positives; data flow analysis is slower but much more precise. The right approach: use pattern matching for high-confidence, low-FP rules (hardcoded secrets, banned functions) and data flow analysis for complex vulnerabilities (SQL injection, XSS) where user input travels through multiple functions.
Taint Analysis Concepts
Taint analysis tracks "tainted" (untrusted) data from SOURCE → through transforms → to SINK.
A vulnerability exists when tainted data reaches a dangerous SINK without a SANITIZER.
SOURCES (untrusted input):
- HTTP request parameters, headers, body
- Database reads (if source data was originally user-supplied)
- File reads from user-uploaded files
- Environment variables in some contexts
SINKS (dangerous operations):
- SQL execution (SQLi if tainted)
- HTML rendering (XSS if tainted)
- Shell command execution (command injection)
- File path operations (path traversal)
- HTTP requests (SSRF if URL is tainted)
- Serialization/deserialization (RCE if tainted)
SANITIZERS (functions that neutralize taint):
- Parameterized query (sanitizes SQL sink)
- HTML encoding (sanitizes HTML sink)
- URL allowlist check (sanitizes SSRF sink)
- Input validation with strict allowlist
Example taint flow (VULNERABLE):
request.args.get("user_id") # SOURCE: HTTP param
↓
username = format_username(user_id) # PROPAGATION: taint flows through
↓
db.execute(f"SELECT * FROM users WHERE id = {username}") # SINK: SQL exec → VULNERABILITY
Example taint flow (SAFE):
request.args.get("user_id") # SOURCE
↓
db.execute("SELECT * FROM users WHERE id = %s", (user_id,)) # SANITIZER + SINK: parameterized
Semgrep: Pattern Syntax
# Basic Semgrep patterns
# 1. Exact match
rules:
- id: use-of-md5
pattern: hashlib.md5(...)
message: MD5 is cryptographically broken. Use SHA-256 or better.
languages: [python]
severity: WARNING
# 2. Metavariables (capture any expression)
- id: pickle-loads
patterns:
- pattern: pickle.loads($DATA)
message: "pickle.loads() with untrusted data leads to RCE. Use json.loads() instead."
languages: [python]
severity: ERROR
# 3. Multiple patterns (AND logic)
- id: flask-debug-mode
patterns:
- pattern: app.run(...)
- pattern: app.run(debug=True)
message: "Flask debug mode exposes interactive debugger — never enable in production."
languages: [python]
severity: ERROR
# 4. Pattern-not (exclude false positives)
- id: sql-string-format
patterns:
- pattern: $DB.execute($QUERY % ...)
- pattern-not: $DB.execute("SELECT 1" % ...) # Exclude health checks
- pattern-not-inside: |
# type: ignore[sqli]
...
message: "Possible SQL injection via string formatting. Use parameterized queries."
languages: [python]
severity: ERROR
# 5. Pattern inside (scope matching)
- id: password-in-test
patterns:
- pattern: |
$VAR = "..."
- pattern-regex: '(?i)password|secret|token|api_key'
- pattern-not-inside: |
# nosec
...
message: "Hardcoded credential found."
languages: [python, javascript, typescript]
severity: WARNING
paths:
exclude:
- "**/test_fixtures/**"
- "**/*.test.*"
Semgrep: Taint Mode (Data Flow)
# Semgrep Pro/AppSec taint mode — tracks data from source to sink across functions
rules:
- id: sql-injection-taint
mode: taint
pattern-sources:
- patterns:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
- pattern: request.json
- pattern: flask.request.get_json()
pattern-sinks:
- patterns:
- pattern: $DB.execute($QUERY, ...)
where:
- focus-metavariable: $QUERY # Only flag when QUERY is tainted (not the params)
- pattern: $CURSOR.executemany($QUERY, ...)
where:
- focus-metavariable: $QUERY
pattern-sanitizers:
- patterns:
- pattern: sqlalchemy.text(...) # SQLAlchemy's safe query wrapper
message: |
SQL injection: User-controlled input reaches SQL execution without parameterization.
Use: db.execute("SELECT ... WHERE id = %s", (user_input,))
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-89"
owasp: "A03:2021"
confidence: high
- id: xss-taint
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
pattern-sinks:
- pattern: flask.render_template_string($TEMPLATE, ...)
where:
- focus-metavariable: $TEMPLATE
- pattern: Markup($HTML)
where:
- focus-metavariable: $HTML
pattern-sanitizers:
- pattern: markupsafe.escape(...)
- pattern: bleach.clean(...)
message: "XSS: User input rendered as HTML without sanitization."
languages: [python]
severity: ERROR
Semgrep Autofix
# Semgrep can automatically fix some patterns
rules:
- id: assert-in-production
pattern: assert $CONDITION, $MSG
fix: |
if not $CONDITION:
raise AssertionError($MSG)
message: "assert statements are disabled with Python -O flag. Use explicit checks."
languages: [python]
severity: WARNING
- id: print-to-logger
pattern: print($MSG)
fix: logger.info($MSG)
message: "Replace print() with logger for production code."
languages: [python]
severity: INFO
paths:
include:
- "src/**"
exclude:
- "scripts/**"
CodeQL: Deep Semantic Analysis
// CodeQL query: Find SQL injection via dataflow analysis
// This is more powerful than Semgrep — tracks across class boundaries, imports, etc.
import python
import semmle.python.security.dataflow.SqlInjectionQuery
// Using the built-in SQL injection library
from SqlInjectionFlow::PathNode source, SqlInjectionFlow::PathNode sink
where SqlInjectionFlow::flowPath(source, sink)
select sink.getNode(), source, sink,
"SQL query constructed from user-controlled $@", source.getNode(), "value"
// Custom CodeQL query: Find unvalidated redirect (open redirect vulnerability)
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.ApiGraphs
class FlaskRedirect extends DataFlow::CallCfgNode {
FlaskRedirect() {
this = API::moduleImport("flask").getMember("redirect").getACall()
}
DataFlow::Node getLocation() {
result = this.getArg(0)
}
}
class UserRequest extends DataFlow::Node {
UserRequest() {
this = API::moduleImport("flask").getMember("request")
.getMember("args").getMember("get").getACall()
}
}
// Track user input to flask redirect
from UserRequest source, FlaskRedirect redirect
where DataFlow::localFlow(source, redirect.getLocation())
select redirect, "Potential open redirect: user-controlled URL passed to redirect()"
# GitHub Actions: CodeQL integration
name: CodeQL Analysis
on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly full scan on Mondays
jobs:
analyze:
name: CodeQL
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
matrix:
language: ['python', 'javascript', 'go']
steps:
- uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: security-extended # More thorough than default
config-file: .github/codeql-config.yml
- name: Autobuild
uses: github/codeql-action/autobuild@v3
- name: Analyze
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{ matrix.language }}"
output: sarif-results
upload: true
Bandit: Python Security Linting
# Bandit configuration (.bandit or setup.cfg)
cat > .bandit << 'EOF'
[bandit]
exclude_dirs = tests,docs,.venv
tests = B102,B103,B301,B302,B303,B304,B305,B306,B307,B321,B323,B324,B401,B403,B404,B501,B502,B503,B504,B505,B506,B601,B602,B603,B604,B605,B606,B607,B608,B609,B610,B611
skips = B101,B311 # Skip: assert_used (B101), random (B311 - not always security-relevant)
EOF
# Run Bandit
bandit -r src/ \
--severity-level medium \ # Only report medium and above
--confidence-level medium \
--format sarif \
--output bandit-results.sarif
# Bandit key checks:
# B301-B302: pickle/marshal (RCE risk)
# B501-B506: SSL/TLS misconfig
# B601-B611: Injection (shell, SQL, code execution)
# B303-B307: Crypto (MD5, SHA1, weak modes)
# B104: Hardcoded bind all interfaces
# Common Bandit findings and fixes
# B602 — subprocess shell injection
import subprocess
# ❌ B602: shell=True with user input
subprocess.call(f"echo {user_input}", shell=True) # Command injection!
# ✅ Correct: list args, shell=False (default)
subprocess.call(["echo", user_input], shell=False)
# B303 — MD5 for security purposes
import hashlib
# ❌ B303: MD5 is cryptographically broken
hashlib.md5(password.encode()).hexdigest()
# ✅ Correct: SHA-256 minimum (but use Argon2id for passwords)
hashlib.sha256(data.encode()).hexdigest()
# B501 — SSL verification disabled
import requests
# ❌ B501: Never disable SSL verification in production
requests.get(url, verify=False)
# ✅ Correct
requests.get(url) # verify=True is default
# Or specify CA bundle: requests.get(url, verify='/path/to/ca-bundle.crt')
ESLint Security Plugin
// .eslintrc.js — security-focused ESLint configuration
module.exports = {
plugins: ['security', 'no-secrets', 'xss'],
extends: [
'plugin:security/recommended',
],
rules: {
// Detect potential ReDoS (regex denial of service)
'security/detect-unsafe-regex': 'error',
// Detect non-literal RegExp constructor (user-controlled regex)
'security/detect-non-literal-regexp': 'warn',
// Detect eval() and similar (code injection)
'security/detect-eval-with-expression': 'error',
'no-eval': 'error',
'no-new-func': 'error',
// Detect possible object prototype injection
'security/detect-object-injection': 'warn',
// Detect hardcoded secrets
'no-secrets/no-secrets': ['error', {tolerance: 4.0}],
// Disable dangerouslySetInnerHTML without sanitization
'react/no-danger': 'warn',
// Detect postMessage without origin validation
'security/detect-non-literal-fs-filename': 'warn',
}
};
SonarQube Integration
# GitHub Actions: SonarQube scan with quality gate
sonarqube:
name: SonarQube Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for blame annotations
- name: SonarQube Scan
uses: sonarsource/sonarqube-scan-action@master
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
with:
args: >
-Dsonar.projectKey=my-project
-Dsonar.python.coverage.reportPaths=coverage.xml
-Dsonar.python.bandit.reportPaths=bandit-results.json
-Dsonar.qualitygate.wait=true # Fail if quality gate fails
- name: Check Quality Gate
uses: sonarsource/sonarqube-quality-gate-action@master
timeout-minutes: 5
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
False Positive Management
# Inline suppression — use when false positive is confirmed
# MUST include justification comment
# nosec B603 — subprocess with list args is safe; no shell injection possible
result = subprocess.run( # nosec B603
["git", "log", "--oneline"], # All hardcoded, no user input
capture_output=True,
)
# noqa: S608 — this is a test fixture, not production SQL
TEST_QUERY = "SELECT * FROM test_table" # noqa: S608
# semgrep: ignore — Semgrep suppression
secret_key = config.SECRET_KEY # nosemgrep: hardcoded-secret (loaded from config, not hardcoded)
# Semgrep baseline: suppress known false positives by file hash
# Create: semgrep --config=auto --json > .semgrep_baseline.json
# Use: semgrep --config=auto --baseline=.semgrep_baseline.json
# Severity thresholds in GitHub Actions
- name: Check for blocking findings
run: |
CRITICAL=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' semgrep-results.json)
HIGH=$(jq '[.results[] | select(.extra.severity == "WARNING")] | length' semgrep-results.json)
echo "Critical findings: $CRITICAL"
echo "High findings: $HIGH"
if [ "$CRITICAL" -gt 0 ]; then
echo "❌ BLOCKING: $CRITICAL critical security findings. Fix before merging."
exit 1
fi
if [ "$HIGH" -gt 10 ]; then
echo "⚠️ WARNING: $HIGH high-severity findings. Review before merging."
# Don't block on HIGH unless exceeds threshold
fi
SARIF Upload to GitHub (PR Annotations)
# All SAST tools produce SARIF — upload to GitHub for PR annotations
# Security tab in GitHub shows all findings across tools
- name: Upload SARIF results
uses: github/codeql-action/upload-sarif@v3
if: always() # Upload even on scan failure
with:
sarif_file: |
semgrep-results.sarif
bandit-results.sarif
trivy-results.sarif
# Results appear in PR "Files changed" view as inline annotations
# Also visible in repo Security → Code scanning alerts tab
category: "sast-${{ github.job }}"
wait-for-processing: true
Anti-Patterns
❌ Running all rules without tuning
Default rulesets generate hundreds of false positives. Tune severity thresholds, exclude test directories, and create baselines before enforcing in CI.
❌ Blocking CI on medium-severity findings without triage
A rule that blocks all Medium findings will generate bypass pressure. Block on Critical/High with high confidence; warn on Medium; never block on Low/Informational.
❌ Ignoring without justification # nosec with no explanation creates technical debt and makes audits impossible. Always require # nosec B603 — reason: list args, no user input format.
❌ Only running SAST, skipping SCA
Your code may be perfect; your dependencies are not. Run SCA (Snyk, Dependabot) alongside SAST — they catch different vulnerability classes.
❌ Not writing custom rules for business logic
Generic rules won't find that your app is supposed to always validate user ownership but doesn't. Write custom Semgrep rules for your domain-specific security invariants.
Quick Reference
Tool selection by use case:
Fast pattern matching → Semgrep (YAML rules, easy to write)
Deep semantic analysis → CodeQL (QL queries, more setup)
Python security → Bandit (fast, Python-only)
JavaScript/TypeScript → ESLint-security + Semgrep
Multi-language comprehensive → Semgrep + CodeQL + SonarQube
Severity threshold for CI blocking:
BLOCK: Critical (ERROR in Semgrep) — high confidence, exploitable
WARN: High (WARNING) — review required, but don't block PR
INFORM: Medium/Low — show in PR but never block
Taint analysis coverage:
Sources to always track: HTTP params, headers, body, file uploads
Sinks to always check: SQL, HTML render, shell exec, file paths, HTTP fetch
Sanitizers to define: parameterized queries, HTML encode, URL validate
Semgrep rule writing checklist:
☐ Test with a known-vulnerable code sample (rule fires)
☐ Test with a safe equivalent (rule doesn't fire)
☐ Add pattern-not for common false positive patterns
☐ Include fix suggestion in message
☐ Add CWE and OWASP metadata
☐ Test performance (avoid patterns that time out on large files)Skill Information
- Source
- MoltbotDen
- Category
- Security & Passwords
- Repository
- View on GitHub
Related Skills
pentest-expert
Conduct professional penetration testing and security assessments. Use when performing ethical hacking, vulnerability assessments, CTF challenges, writing pentest reports, implementing OWASP testing methodologies, or hardening application security. Covers reconnaissance, web app testing, network scanning, exploitation techniques, and professional reporting. For authorized testing only.
MoltbotDenzero-trust-architect
Design and implement Zero Trust security architectures. Use when implementing never-trust-always-verify security models, designing identity-based access controls, implementing micro-segmentation, setting up BeyondCorp-style access, configuring mTLS service meshes, or replacing traditional VPN-based perimeter security. Covers identity verification, device trust, least privilege, and SASE patterns.
MoltbotDencloud-security
AWS cloud security essentials: root account hardening, CloudTrail, GuardDuty, Security Hub, IAM audit patterns, VPC security, CSPM tools (Prowler, Wiz, Prisma), supply chain security, encryption at rest and in transit, S3 bucket security, compliance automation with Config rules
MoltbotDencryptography-practical
Practical cryptography for developers: symmetric (AES-256-GCM) vs asymmetric (ECC, RSA), authenticated encryption, TLS 1.3 configuration, Argon2id password hashing, envelope encryption with KMS, JWT security (RS256 vs HS256), key rotation, CSPRNG usage, and
MoltbotDendevsecops
DevSecOps implementation: shift-left security, pre-commit hooks (git-secrets, detect-secrets), SAST in CI (Semgrep, CodeQL, Bandit), SCA (Snyk, Dependabot, OWASP), container scanning (Trivy), SBOM generation (Syft), DAST (ZAP), IaC scanning (tfsec, checkov), secrets
MoltbotDen