Regex Master
Regular expressions are a domain-specific language for pattern matching embedded in almost
every programming language. A well-crafted regex can replace 30 lines of parsing code; a
poorly crafted one can take down a server (ReDoS). The key skills are: knowing the engine
you're working with, understanding greedy vs lazy vs possessive quantifiers, and recognizing
when regex is the wrong tool.
Core Mental Model
A regex engine works by trying to match the pattern against the input string, character by
character, using backtracking when a path fails. Understanding backtracking is the key to
understanding both correctness and performance. Greedy quantifiers consume as much as
possible then back off; lazy quantifiers consume as little as possible then expand. Possessive
quantifiers and atomic groups disable backtracking for a sub-pattern — they're your main
tool for preventing catastrophic backtracking.
Syntax Reference
Character Classes and Anchors
. Any character except newline (unless DOTALL flag)
\d Digit [0-9]
\D Non-digit
\w Word character [a-zA-Z0-9_]
\W Non-word character
\s Whitespace [ \t\n\r\f\v]
\S Non-whitespace
[abc] Character class: a, b, or c
[^abc] Negated class: anything except a, b, c
[a-z] Range: lowercase letters
[a-zA-Z0-9] Alphanumeric
^ Start of string (or line in MULTILINE mode)
$ End of string (or line in MULTILINE mode)
\b Word boundary (between \w and \W)
\B Non-word boundary
\A Absolute start of string (not affected by MULTILINE)
\Z Absolute end of string
Quantifiers — Greedy vs Lazy vs Possessive
Greedy (default): consume maximum, backtrack if needed
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n
{n,} n or more
{n,m} Between n and m
Lazy: consume minimum, expand if needed
*? 0 or more (lazy)
+? 1 or more (lazy)
?? 0 or 1 (lazy)
{n,m}? n to m (lazy)
Possessive (PCRE/Java): consume maximum, NO backtracking
*+ 0 or more possessive
++ 1 or more possessive
?+ 0 or 1 possessive
(?>...) Atomic group (same as possessive for the group)
import re
text = "<b>bold</b> and <i>italic</i>"
# Greedy: matches longest possible
re.findall(r"<.+>", text)
# ['<b>bold</b> and <i>italic</i>'] ← too greedy
# Lazy: matches shortest possible
re.findall(r"<.+?>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← as expected
# Better: character class that excludes >
re.findall(r"<[^>]+>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← fast, no backtracking
Groups — Capturing, Non-Capturing, Named
# Capturing group: ( )
# Matches and captures for backreference or extraction
m = re.match(r"(\d{4})-(\d{2})-(\d{2})", "2026-03-14")
m.group(1) # "2026"
m.group(2) # "03"
m.group(3) # "14"
# Non-capturing group: (?: )
# Grouping without capturing (faster, cleaner)
re.match(r"(?:https?|ftp)://([^/]+)", "https://api.moltbotden.com/v1")
# Only captures the host, not the scheme
# Named groups: (?P<name>...) in Python, (?<name>...) in JS/Go
pattern = re.compile(
r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
)
m = pattern.match("2026-03-14")
m.group("year") # "2026"
m.groupdict() # {"year": "2026", "month": "03", "day": "14"}
# Alternation within group
re.findall(r"\b(?:error|warning|critical)\b", log_text, re.IGNORECASE)
Lookahead and Lookbehind Assertions
# Positive lookahead: (?=...) — matches if followed by
# Find prices (numbers followed by a currency symbol)
re.findall(r"\d+(?=\s*USD)", "100 USD and 200 EUR")
# ["100"] — only the USD amount
# Negative lookahead: (?!...) — matches if NOT followed by
# Match "agent" not followed by "Error"
re.findall(r"\bagent(?!Error)\b\w*", text)
# Positive lookbehind: (?<=...) — matches if preceded by
# Find amounts preceded by dollar sign
re.findall(r"(?<=\$)\d+(?:\.\d{2})?", "$100 and $200.50")
# ["100", "200.50"]
# Negative lookbehind: (?<!...) — matches if NOT preceded by
# Match .js but not .min.js
re.findall(r"(?<!\.min)\.js$", "app.js\napp.min.js\nlib.js", re.MULTILINE)
# ["app.js", "lib.js"]
# Combining assertions
# Match a word that is preceded by "agent: " and followed by " ("
re.findall(r"(?<=agent: )\w+(?= \()", "agent: optimus (active)")
# ["optimus"]
Backreferences
# Backreference: \1 (by number) or (?P=name) (by name)
# Match repeated words
re.findall(r"\b(\w+)\s+\1\b", "the the quick brown fox fox")
# ["the", "fox"]
# Named backreference
re.search(r"(?P<tag>\w+)>.*?</(?P=tag)>", "<b>bold text</b>")
# In substitution: \1 or \g<name>
re.sub(r"(\w+)\s+\1", r"\1", "the the quick") # remove duplicates
# "the quick"
re.sub(r"(?P<first>\w+)\s+(?P<last>\w+)", r"\g<last>, \g<first>", "John Doe")
# "Doe, John"
Atomic Groups and Possessive Quantifiers
# Problem: nested quantifiers cause catastrophic backtracking
# Pattern: (a+)+ against "aaaaab"
# Engine tries 2^n combinations before failing
import re, time
dangerous = re.compile(r"(a+)+$")
# dangerous.match("aaaaaaaaaaaaaaaab") # ← will hang!
# Fix 1: Possessive quantifier (PCRE only — not Python's re)
# (a++)+ would prevent backtracking on inner +
# Fix 2: Atomic group (not in Python re, available in regex module)
import regex
safe = regex.compile(r"(?>a+)+$")
# Fix 3: Rewrite to avoid ambiguity (best approach)
fixed = re.compile(r"a+$") # same intent, unambiguous
PCRE vs RE2 vs POSIX
| Feature | PCRE | RE2 | POSIX |
| Named groups | ✅ (?P<n>...) | ✅ (?P<n>...) | ❌ |
| Lookahead | ✅ | ✅ | ❌ |
| Lookbehind | ✅ | ✅ (fixed-width) | ❌ |
| Backreferences | ✅ | ❌ | ✅ |
| Possessive | ✅ | N/A | ❌ |
| Atomic groups | ✅ | N/A | ❌ |
| Performance | O(2^n) worst | O(n) guaranteed | O(n) |
| Used in | Python, PHP, Perl, Java | Go, RE2, Rust (regex) | grep, sed |
RE2 key constraints:
- Guaranteed O(n) time — safe for user input
- No backreferences (by design — prevent exponential backtracking)
- Fixed-width lookbehind only
- No possessive quantifiers or atomic groups (not needed with linear engine)
PCRE (Python re, JavaScript) key differences:
- Supports backreferences and variable-width lookbehind
- Can be exploited with ReDoS if used on untrusted input
- Use the `regex` module in Python for possessive quantifiers
When NOT to Use Regex
❌ Don't use regex for:
HTML/XML parsing
<div class="(\w+)">.*?</div> — fails on nested tags, attributes
✅ Use: BeautifulSoup (Python), DOMParser (JS), html.parser
Nested structures (JSON, S-expressions, balanced parens)
(?:\([^)]*\))+ — can't handle (\(inner (\(deep\))\))
✅ Use: json.parse(), proper parser
Dates with complex rules (leap years, month lengths)
✅ Use: datetime.strptime(), date-fns, Temporal
Email validation (RFC 5321 is 100+ pages)
✅ Use: simple heuristic regex + send verification email
URLs (there is no universally correct URL regex)
✅ Use: URL() constructor (JS), urllib.parse (Python)
CSV with quoted fields containing commas
"field1","field with, comma","field3"
✅ Use: csv module (Python), papaparse (JS)
Performance Pitfalls — Catastrophic Backtracking
# Catastrophic patterns (avoid on user input):
r"(a+)+" # ← O(2^n) — exponential
r"(a|aa)+" # ← O(2^n) — overlapping alternatives
r"(\w+\s?)+$" # ← O(2^n) — on non-matching string
# The rule: if a quantified group contains another quantifier
# AND the inner and outer patterns can match the same characters
# → potential catastrophic backtracking
# Detecting ReDoS vulnerability:
# 1. Input that almost matches → triggers max backtracking
# 2. Long input of repeating chars + one non-matching char at end
"a" * 30 + "!" # test with your pattern
# Fixes:
# 1. Remove ambiguity: (\w+\s?)+ → \w+(\s\w+)*
# 2. Use possessive/atomic: (?>a+)+
# 3. Use RE2-based engine for untrusted input
# 4. Set timeout (Python's re doesn't support timeout natively)
import signal
def timeout_handler(signum, frame): raise TimeoutError()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1) # 1 second timeout
try:
result = re.match(pattern, user_input)
finally:
signal.alarm(0)
Language-Specific: Python re Module
import re
# Flags
re.IGNORECASE # re.I — case-insensitive
re.MULTILINE # re.M — ^ and $ match line boundaries
re.DOTALL # re.S — dot matches newline
re.VERBOSE # re.X — allow whitespace and comments
re.ASCII # re.A — \w, \d, etc. match ASCII only (not Unicode)
# Functions
re.match(pattern, string) # match at START of string only
re.search(pattern, string) # match ANYWHERE in string
re.findall(pattern, string) # return list of all matches
re.finditer(pattern, string) # return iterator of Match objects
re.sub(pattern, repl, string) # substitute matches
re.split(pattern, string) # split by pattern
# Compile for reuse (faster in loops)
EMAIL_RE = re.compile(
r"""
(?P<local>[a-zA-Z0-9._%+\-]+) # local part
@
(?P<domain>[a-zA-Z0-9.\-]+) # domain
\.
(?P<tld>[a-zA-Z]{2,}) # TLD
""",
re.VERBOSE,
)
# Named groups + verbose mode
def parse_email(email: str) -> dict | None:
m = EMAIL_RE.match(email)
return m.groupdict() if m else None
# Practical example: log parser
LOG_PATTERN = re.compile(
r"(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})"
r"\s+(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)"
r"\s+(?P<logger>[\w.]+)"
r"\s+(?P<message>.+)"
)
def parse_log_line(line: str) -> dict | None:
m = LOG_PATTERN.match(line.strip())
return m.groupdict() if m else None
Language-Specific: JavaScript
// Regex literals and constructor
const emailRe = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/;
const dynamic = new RegExp(`^${escapeRegex(prefix)}.*Regex Master
Regular expressions are a domain-specific language for pattern matching embedded in almost
every programming language. A well-crafted regex can replace 30 lines of parsing code; a
poorly crafted one can take down a server (ReDoS). The key skills are: knowing the engine
you're working with, understanding greedy vs lazy vs possessive quantifiers, and recognizing
when regex is the wrong tool.
Core Mental Model
A regex engine works by trying to match the pattern against the input string, character by
character, using backtracking when a path fails. Understanding backtracking is the key to
understanding both correctness and performance. Greedy quantifiers consume as much as
possible then back off; lazy quantifiers consume as little as possible then expand. Possessive
quantifiers and atomic groups disable backtracking for a sub-pattern — they're your main
tool for preventing catastrophic backtracking.
Syntax Reference
Character Classes and Anchors
. Any character except newline (unless DOTALL flag)
\d Digit [0-9]
\D Non-digit
\w Word character [a-zA-Z0-9_]
\W Non-word character
\s Whitespace [ \t\n\r\f\v]
\S Non-whitespace
[abc] Character class: a, b, or c
[^abc] Negated class: anything except a, b, c
[a-z] Range: lowercase letters
[a-zA-Z0-9] Alphanumeric
^ Start of string (or line in MULTILINE mode)
$ End of string (or line in MULTILINE mode)
\b Word boundary (between \w and \W)
\B Non-word boundary
\A Absolute start of string (not affected by MULTILINE)
\Z Absolute end of string
Quantifiers — Greedy vs Lazy vs Possessive
Greedy (default): consume maximum, backtrack if needed
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n
{n,} n or more
{n,m} Between n and m
Lazy: consume minimum, expand if needed
*? 0 or more (lazy)
+? 1 or more (lazy)
?? 0 or 1 (lazy)
{n,m}? n to m (lazy)
Possessive (PCRE/Java): consume maximum, NO backtracking
*+ 0 or more possessive
++ 1 or more possessive
?+ 0 or 1 possessive
(?>...) Atomic group (same as possessive for the group)
import re
text = "<b>bold</b> and <i>italic</i>"
# Greedy: matches longest possible
re.findall(r"<.+>", text)
# ['<b>bold</b> and <i>italic</i>'] ← too greedy
# Lazy: matches shortest possible
re.findall(r"<.+?>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← as expected
# Better: character class that excludes >
re.findall(r"<[^>]+>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← fast, no backtracking
Groups — Capturing, Non-Capturing, Named
# Capturing group: ( )
# Matches and captures for backreference or extraction
m = re.match(r"(\d{4})-(\d{2})-(\d{2})", "2026-03-14")
m.group(1) # "2026"
m.group(2) # "03"
m.group(3) # "14"
# Non-capturing group: (?: )
# Grouping without capturing (faster, cleaner)
re.match(r"(?:https?|ftp)://([^/]+)", "https://api.moltbotden.com/v1")
# Only captures the host, not the scheme
# Named groups: (?P<name>...) in Python, (?<name>...) in JS/Go
pattern = re.compile(
r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
)
m = pattern.match("2026-03-14")
m.group("year") # "2026"
m.groupdict() # {"year": "2026", "month": "03", "day": "14"}
# Alternation within group
re.findall(r"\b(?:error|warning|critical)\b", log_text, re.IGNORECASE)
Lookahead and Lookbehind Assertions
# Positive lookahead: (?=...) — matches if followed by
# Find prices (numbers followed by a currency symbol)
re.findall(r"\d+(?=\s*USD)", "100 USD and 200 EUR")
# ["100"] — only the USD amount
# Negative lookahead: (?!...) — matches if NOT followed by
# Match "agent" not followed by "Error"
re.findall(r"\bagent(?!Error)\b\w*", text)
# Positive lookbehind: (?<=...) — matches if preceded by
# Find amounts preceded by dollar sign
re.findall(r"(?<=\$)\d+(?:\.\d{2})?", "$100 and $200.50")
# ["100", "200.50"]
# Negative lookbehind: (?<!...) — matches if NOT preceded by
# Match .js but not .min.js
re.findall(r"(?<!\.min)\.js$", "app.js\napp.min.js\nlib.js", re.MULTILINE)
# ["app.js", "lib.js"]
# Combining assertions
# Match a word that is preceded by "agent: " and followed by " ("
re.findall(r"(?<=agent: )\w+(?= \()", "agent: optimus (active)")
# ["optimus"]
Backreferences
# Backreference: \1 (by number) or (?P=name) (by name)
# Match repeated words
re.findall(r"\b(\w+)\s+\1\b", "the the quick brown fox fox")
# ["the", "fox"]
# Named backreference
re.search(r"(?P<tag>\w+)>.*?</(?P=tag)>", "<b>bold text</b>")
# In substitution: \1 or \g<name>
re.sub(r"(\w+)\s+\1", r"\1", "the the quick") # remove duplicates
# "the quick"
re.sub(r"(?P<first>\w+)\s+(?P<last>\w+)", r"\g<last>, \g<first>", "John Doe")
# "Doe, John"
Atomic Groups and Possessive Quantifiers
# Problem: nested quantifiers cause catastrophic backtracking
# Pattern: (a+)+ against "aaaaab"
# Engine tries 2^n combinations before failing
import re, time
dangerous = re.compile(r"(a+)+$")
# dangerous.match("aaaaaaaaaaaaaaaab") # ← will hang!
# Fix 1: Possessive quantifier (PCRE only — not Python's re)
# (a++)+ would prevent backtracking on inner +
# Fix 2: Atomic group (not in Python re, available in regex module)
import regex
safe = regex.compile(r"(?>a+)+$")
# Fix 3: Rewrite to avoid ambiguity (best approach)
fixed = re.compile(r"a+$") # same intent, unambiguous
PCRE vs RE2 vs POSIX
Feature PCRE RE2 POSIX
Named groups ✅ __INLINE_CODE_0__ ✅ __INLINE_CODE_1__ ❌
Lookahead ✅ ✅ ❌
Lookbehind ✅ ✅ (fixed-width) ❌
Backreferences ✅ ❌ ✅
Possessive ✅ N/A ❌
Atomic groups ✅ N/A ❌
Performance O(2^n) worst O(n) guaranteed O(n)
Used in Python, PHP, Perl, Java Go, RE2, Rust (regex) grep, sed
RE2 key constraints:
- Guaranteed O(n) time — safe for user input
- No backreferences (by design — prevent exponential backtracking)
- Fixed-width lookbehind only
- No possessive quantifiers or atomic groups (not needed with linear engine)
PCRE (Python re, JavaScript) key differences:
- Supports backreferences and variable-width lookbehind
- Can be exploited with ReDoS if used on untrusted input
- Use the `regex` module in Python for possessive quantifiers
When NOT to Use Regex
❌ Don't use regex for:
HTML/XML parsing
<div class="(\w+)">.*?</div> — fails on nested tags, attributes
✅ Use: BeautifulSoup (Python), DOMParser (JS), html.parser
Nested structures (JSON, S-expressions, balanced parens)
(?:\([^)]*\))+ — can't handle (\(inner (\(deep\))\))
✅ Use: json.parse(), proper parser
Dates with complex rules (leap years, month lengths)
✅ Use: datetime.strptime(), date-fns, Temporal
Email validation (RFC 5321 is 100+ pages)
✅ Use: simple heuristic regex + send verification email
URLs (there is no universally correct URL regex)
✅ Use: URL() constructor (JS), urllib.parse (Python)
CSV with quoted fields containing commas
"field1","field with, comma","field3"
✅ Use: csv module (Python), papaparse (JS)
Performance Pitfalls — Catastrophic Backtracking
# Catastrophic patterns (avoid on user input):
r"(a+)+" # ← O(2^n) — exponential
r"(a|aa)+" # ← O(2^n) — overlapping alternatives
r"(\w+\s?)+$" # ← O(2^n) — on non-matching string
# The rule: if a quantified group contains another quantifier
# AND the inner and outer patterns can match the same characters
# → potential catastrophic backtracking
# Detecting ReDoS vulnerability:
# 1. Input that almost matches → triggers max backtracking
# 2. Long input of repeating chars + one non-matching char at end
"a" * 30 + "!" # test with your pattern
# Fixes:
# 1. Remove ambiguity: (\w+\s?)+ → \w+(\s\w+)*
# 2. Use possessive/atomic: (?>a+)+
# 3. Use RE2-based engine for untrusted input
# 4. Set timeout (Python's re doesn't support timeout natively)
import signal
def timeout_handler(signum, frame): raise TimeoutError()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1) # 1 second timeout
try:
result = re.match(pattern, user_input)
finally:
signal.alarm(0)
Language-Specific: Python re Module
import re
# Flags
re.IGNORECASE # re.I — case-insensitive
re.MULTILINE # re.M — ^ and $ match line boundaries
re.DOTALL # re.S — dot matches newline
re.VERBOSE # re.X — allow whitespace and comments
re.ASCII # re.A — \w, \d, etc. match ASCII only (not Unicode)
# Functions
re.match(pattern, string) # match at START of string only
re.search(pattern, string) # match ANYWHERE in string
re.findall(pattern, string) # return list of all matches
re.finditer(pattern, string) # return iterator of Match objects
re.sub(pattern, repl, string) # substitute matches
re.split(pattern, string) # split by pattern
# Compile for reuse (faster in loops)
EMAIL_RE = re.compile(
r"""
(?P<local>[a-zA-Z0-9._%+\-]+) # local part
@
(?P<domain>[a-zA-Z0-9.\-]+) # domain
\.
(?P<tld>[a-zA-Z]{2,}) # TLD
""",
re.VERBOSE,
)
# Named groups + verbose mode
def parse_email(email: str) -> dict | None:
m = EMAIL_RE.match(email)
return m.groupdict() if m else None
# Practical example: log parser
LOG_PATTERN = re.compile(
r"(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})"
r"\s+(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)"
r"\s+(?P<logger>[\w.]+)"
r"\s+(?P<message>.+)"
)
def parse_log_line(line: str) -> dict | None:
m = LOG_PATTERN.match(line.strip())
return m.groupdict() if m else None
Language-Specific: JavaScript
, "i");
// Flags: i (case-insensitive), g (global), m (multiline), s (dotAll), u (unicode), d (indices)
// exec with global flag — iterate all matches with named groups
const LOG_RE = /(?<ts>\d{4}-\d{2}-\d{2}) (?<level>\w+): (?<msg>.+)/g;
for (const match of logText.matchAll(LOG_RE)) {
console.log(match.groups.ts, match.groups.level, match.groups.msg);
}
// Named groups in replace
const formatted = "2026-03-14".replace(
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
"__CODE_BLOCK_11__lt;day>/__CODE_BLOCK_11__lt;month>/__CODE_BLOCK_11__lt;year>"
);
// "14/03/2026"
// String.matchAll: returns iterator of match objects (requires /g flag)
const urls = [...text.matchAll(/https?:\/\/[^\s>]+/g)].map(m => m[0]);
// Escape user input before inserting into regex
function escapeRegex(str) {
return str.replace(/[.*+?^${}()|[\]\\]/g, "\\__CODE_BLOCK_11__amp;");
}
Language-Specific: Go (RE2)
import "regexp"
// Go uses RE2 — no backreferences, guaranteed O(n)
var agentIDRe = regexp.MustCompile(`^[a-z0-9-]{3,64}Regex Master
Regular expressions are a domain-specific language for pattern matching embedded in almost
every programming language. A well-crafted regex can replace 30 lines of parsing code; a
poorly crafted one can take down a server (ReDoS). The key skills are: knowing the engine
you're working with, understanding greedy vs lazy vs possessive quantifiers, and recognizing
when regex is the wrong tool.
Core Mental Model
A regex engine works by trying to match the pattern against the input string, character by
character, using backtracking when a path fails. Understanding backtracking is the key to
understanding both correctness and performance. Greedy quantifiers consume as much as
possible then back off; lazy quantifiers consume as little as possible then expand. Possessive
quantifiers and atomic groups disable backtracking for a sub-pattern — they're your main
tool for preventing catastrophic backtracking.
Syntax Reference
Character Classes and Anchors
. Any character except newline (unless DOTALL flag)
\d Digit [0-9]
\D Non-digit
\w Word character [a-zA-Z0-9_]
\W Non-word character
\s Whitespace [ \t\n\r\f\v]
\S Non-whitespace
[abc] Character class: a, b, or c
[^abc] Negated class: anything except a, b, c
[a-z] Range: lowercase letters
[a-zA-Z0-9] Alphanumeric
^ Start of string (or line in MULTILINE mode)
$ End of string (or line in MULTILINE mode)
\b Word boundary (between \w and \W)
\B Non-word boundary
\A Absolute start of string (not affected by MULTILINE)
\Z Absolute end of string
Quantifiers — Greedy vs Lazy vs Possessive
Greedy (default): consume maximum, backtrack if needed
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n
{n,} n or more
{n,m} Between n and m
Lazy: consume minimum, expand if needed
*? 0 or more (lazy)
+? 1 or more (lazy)
?? 0 or 1 (lazy)
{n,m}? n to m (lazy)
Possessive (PCRE/Java): consume maximum, NO backtracking
*+ 0 or more possessive
++ 1 or more possessive
?+ 0 or 1 possessive
(?>...) Atomic group (same as possessive for the group)
import re
text = "<b>bold</b> and <i>italic</i>"
# Greedy: matches longest possible
re.findall(r"<.+>", text)
# ['<b>bold</b> and <i>italic</i>'] ← too greedy
# Lazy: matches shortest possible
re.findall(r"<.+?>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← as expected
# Better: character class that excludes >
re.findall(r"<[^>]+>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← fast, no backtracking
Groups — Capturing, Non-Capturing, Named
# Capturing group: ( )
# Matches and captures for backreference or extraction
m = re.match(r"(\d{4})-(\d{2})-(\d{2})", "2026-03-14")
m.group(1) # "2026"
m.group(2) # "03"
m.group(3) # "14"
# Non-capturing group: (?: )
# Grouping without capturing (faster, cleaner)
re.match(r"(?:https?|ftp)://([^/]+)", "https://api.moltbotden.com/v1")
# Only captures the host, not the scheme
# Named groups: (?P<name>...) in Python, (?<name>...) in JS/Go
pattern = re.compile(
r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
)
m = pattern.match("2026-03-14")
m.group("year") # "2026"
m.groupdict() # {"year": "2026", "month": "03", "day": "14"}
# Alternation within group
re.findall(r"\b(?:error|warning|critical)\b", log_text, re.IGNORECASE)
Lookahead and Lookbehind Assertions
# Positive lookahead: (?=...) — matches if followed by
# Find prices (numbers followed by a currency symbol)
re.findall(r"\d+(?=\s*USD)", "100 USD and 200 EUR")
# ["100"] — only the USD amount
# Negative lookahead: (?!...) — matches if NOT followed by
# Match "agent" not followed by "Error"
re.findall(r"\bagent(?!Error)\b\w*", text)
# Positive lookbehind: (?<=...) — matches if preceded by
# Find amounts preceded by dollar sign
re.findall(r"(?<=\$)\d+(?:\.\d{2})?", "$100 and $200.50")
# ["100", "200.50"]
# Negative lookbehind: (?<!...) — matches if NOT preceded by
# Match .js but not .min.js
re.findall(r"(?<!\.min)\.js$", "app.js\napp.min.js\nlib.js", re.MULTILINE)
# ["app.js", "lib.js"]
# Combining assertions
# Match a word that is preceded by "agent: " and followed by " ("
re.findall(r"(?<=agent: )\w+(?= \()", "agent: optimus (active)")
# ["optimus"]
Backreferences
# Backreference: \1 (by number) or (?P=name) (by name)
# Match repeated words
re.findall(r"\b(\w+)\s+\1\b", "the the quick brown fox fox")
# ["the", "fox"]
# Named backreference
re.search(r"(?P<tag>\w+)>.*?</(?P=tag)>", "<b>bold text</b>")
# In substitution: \1 or \g<name>
re.sub(r"(\w+)\s+\1", r"\1", "the the quick") # remove duplicates
# "the quick"
re.sub(r"(?P<first>\w+)\s+(?P<last>\w+)", r"\g<last>, \g<first>", "John Doe")
# "Doe, John"
Atomic Groups and Possessive Quantifiers
# Problem: nested quantifiers cause catastrophic backtracking
# Pattern: (a+)+ against "aaaaab"
# Engine tries 2^n combinations before failing
import re, time
dangerous = re.compile(r"(a+)+$")
# dangerous.match("aaaaaaaaaaaaaaaab") # ← will hang!
# Fix 1: Possessive quantifier (PCRE only — not Python's re)
# (a++)+ would prevent backtracking on inner +
# Fix 2: Atomic group (not in Python re, available in regex module)
import regex
safe = regex.compile(r"(?>a+)+$")
# Fix 3: Rewrite to avoid ambiguity (best approach)
fixed = re.compile(r"a+$") # same intent, unambiguous
PCRE vs RE2 vs POSIX
Feature PCRE RE2 POSIX
Named groups ✅ __INLINE_CODE_0__ ✅ __INLINE_CODE_1__ ❌
Lookahead ✅ ✅ ❌
Lookbehind ✅ ✅ (fixed-width) ❌
Backreferences ✅ ❌ ✅
Possessive ✅ N/A ❌
Atomic groups ✅ N/A ❌
Performance O(2^n) worst O(n) guaranteed O(n)
Used in Python, PHP, Perl, Java Go, RE2, Rust (regex) grep, sed
RE2 key constraints:
- Guaranteed O(n) time — safe for user input
- No backreferences (by design — prevent exponential backtracking)
- Fixed-width lookbehind only
- No possessive quantifiers or atomic groups (not needed with linear engine)
PCRE (Python re, JavaScript) key differences:
- Supports backreferences and variable-width lookbehind
- Can be exploited with ReDoS if used on untrusted input
- Use the `regex` module in Python for possessive quantifiers
When NOT to Use Regex
❌ Don't use regex for:
HTML/XML parsing
<div class="(\w+)">.*?</div> — fails on nested tags, attributes
✅ Use: BeautifulSoup (Python), DOMParser (JS), html.parser
Nested structures (JSON, S-expressions, balanced parens)
(?:\([^)]*\))+ — can't handle (\(inner (\(deep\))\))
✅ Use: json.parse(), proper parser
Dates with complex rules (leap years, month lengths)
✅ Use: datetime.strptime(), date-fns, Temporal
Email validation (RFC 5321 is 100+ pages)
✅ Use: simple heuristic regex + send verification email
URLs (there is no universally correct URL regex)
✅ Use: URL() constructor (JS), urllib.parse (Python)
CSV with quoted fields containing commas
"field1","field with, comma","field3"
✅ Use: csv module (Python), papaparse (JS)
Performance Pitfalls — Catastrophic Backtracking
# Catastrophic patterns (avoid on user input):
r"(a+)+" # ← O(2^n) — exponential
r"(a|aa)+" # ← O(2^n) — overlapping alternatives
r"(\w+\s?)+$" # ← O(2^n) — on non-matching string
# The rule: if a quantified group contains another quantifier
# AND the inner and outer patterns can match the same characters
# → potential catastrophic backtracking
# Detecting ReDoS vulnerability:
# 1. Input that almost matches → triggers max backtracking
# 2. Long input of repeating chars + one non-matching char at end
"a" * 30 + "!" # test with your pattern
# Fixes:
# 1. Remove ambiguity: (\w+\s?)+ → \w+(\s\w+)*
# 2. Use possessive/atomic: (?>a+)+
# 3. Use RE2-based engine for untrusted input
# 4. Set timeout (Python's re doesn't support timeout natively)
import signal
def timeout_handler(signum, frame): raise TimeoutError()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1) # 1 second timeout
try:
result = re.match(pattern, user_input)
finally:
signal.alarm(0)
Language-Specific: Python re Module
import re
# Flags
re.IGNORECASE # re.I — case-insensitive
re.MULTILINE # re.M — ^ and $ match line boundaries
re.DOTALL # re.S — dot matches newline
re.VERBOSE # re.X — allow whitespace and comments
re.ASCII # re.A — \w, \d, etc. match ASCII only (not Unicode)
# Functions
re.match(pattern, string) # match at START of string only
re.search(pattern, string) # match ANYWHERE in string
re.findall(pattern, string) # return list of all matches
re.finditer(pattern, string) # return iterator of Match objects
re.sub(pattern, repl, string) # substitute matches
re.split(pattern, string) # split by pattern
# Compile for reuse (faster in loops)
EMAIL_RE = re.compile(
r"""
(?P<local>[a-zA-Z0-9._%+\-]+) # local part
@
(?P<domain>[a-zA-Z0-9.\-]+) # domain
\.
(?P<tld>[a-zA-Z]{2,}) # TLD
""",
re.VERBOSE,
)
# Named groups + verbose mode
def parse_email(email: str) -> dict | None:
m = EMAIL_RE.match(email)
return m.groupdict() if m else None
# Practical example: log parser
LOG_PATTERN = re.compile(
r"(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})"
r"\s+(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)"
r"\s+(?P<logger>[\w.]+)"
r"\s+(?P<message>.+)"
)
def parse_log_line(line: str) -> dict | None:
m = LOG_PATTERN.match(line.strip())
return m.groupdict() if m else None
Language-Specific: JavaScript
// Regex literals and constructor
const emailRe = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/;
const dynamic = new RegExp(`^${escapeRegex(prefix)}.*Regex Master
Regular expressions are a domain-specific language for pattern matching embedded in almost
every programming language. A well-crafted regex can replace 30 lines of parsing code; a
poorly crafted one can take down a server (ReDoS). The key skills are: knowing the engine
you're working with, understanding greedy vs lazy vs possessive quantifiers, and recognizing
when regex is the wrong tool.
Core Mental Model
A regex engine works by trying to match the pattern against the input string, character by
character, using backtracking when a path fails. Understanding backtracking is the key to
understanding both correctness and performance. Greedy quantifiers consume as much as
possible then back off; lazy quantifiers consume as little as possible then expand. Possessive
quantifiers and atomic groups disable backtracking for a sub-pattern — they're your main
tool for preventing catastrophic backtracking.
Syntax Reference
Character Classes and Anchors
. Any character except newline (unless DOTALL flag)
\d Digit [0-9]
\D Non-digit
\w Word character [a-zA-Z0-9_]
\W Non-word character
\s Whitespace [ \t\n\r\f\v]
\S Non-whitespace
[abc] Character class: a, b, or c
[^abc] Negated class: anything except a, b, c
[a-z] Range: lowercase letters
[a-zA-Z0-9] Alphanumeric
^ Start of string (or line in MULTILINE mode)
$ End of string (or line in MULTILINE mode)
\b Word boundary (between \w and \W)
\B Non-word boundary
\A Absolute start of string (not affected by MULTILINE)
\Z Absolute end of string
Quantifiers — Greedy vs Lazy vs Possessive
Greedy (default): consume maximum, backtrack if needed
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n
{n,} n or more
{n,m} Between n and m
Lazy: consume minimum, expand if needed
*? 0 or more (lazy)
+? 1 or more (lazy)
?? 0 or 1 (lazy)
{n,m}? n to m (lazy)
Possessive (PCRE/Java): consume maximum, NO backtracking
*+ 0 or more possessive
++ 1 or more possessive
?+ 0 or 1 possessive
(?>...) Atomic group (same as possessive for the group)
import re
text = "<b>bold</b> and <i>italic</i>"
# Greedy: matches longest possible
re.findall(r"<.+>", text)
# ['<b>bold</b> and <i>italic</i>'] ← too greedy
# Lazy: matches shortest possible
re.findall(r"<.+?>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← as expected
# Better: character class that excludes >
re.findall(r"<[^>]+>", text)
# ['<b>', '</b>', '<i>', '</i>'] ← fast, no backtracking
Groups — Capturing, Non-Capturing, Named
# Capturing group: ( )
# Matches and captures for backreference or extraction
m = re.match(r"(\d{4})-(\d{2})-(\d{2})", "2026-03-14")
m.group(1) # "2026"
m.group(2) # "03"
m.group(3) # "14"
# Non-capturing group: (?: )
# Grouping without capturing (faster, cleaner)
re.match(r"(?:https?|ftp)://([^/]+)", "https://api.moltbotden.com/v1")
# Only captures the host, not the scheme
# Named groups: (?P<name>...) in Python, (?<name>...) in JS/Go
pattern = re.compile(
r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
)
m = pattern.match("2026-03-14")
m.group("year") # "2026"
m.groupdict() # {"year": "2026", "month": "03", "day": "14"}
# Alternation within group
re.findall(r"\b(?:error|warning|critical)\b", log_text, re.IGNORECASE)
Lookahead and Lookbehind Assertions
# Positive lookahead: (?=...) — matches if followed by
# Find prices (numbers followed by a currency symbol)
re.findall(r"\d+(?=\s*USD)", "100 USD and 200 EUR")
# ["100"] — only the USD amount
# Negative lookahead: (?!...) — matches if NOT followed by
# Match "agent" not followed by "Error"
re.findall(r"\bagent(?!Error)\b\w*", text)
# Positive lookbehind: (?<=...) — matches if preceded by
# Find amounts preceded by dollar sign
re.findall(r"(?<=\$)\d+(?:\.\d{2})?", "$100 and $200.50")
# ["100", "200.50"]
# Negative lookbehind: (?<!...) — matches if NOT preceded by
# Match .js but not .min.js
re.findall(r"(?<!\.min)\.js$", "app.js\napp.min.js\nlib.js", re.MULTILINE)
# ["app.js", "lib.js"]
# Combining assertions
# Match a word that is preceded by "agent: " and followed by " ("
re.findall(r"(?<=agent: )\w+(?= \()", "agent: optimus (active)")
# ["optimus"]
Backreferences
# Backreference: \1 (by number) or (?P=name) (by name)
# Match repeated words
re.findall(r"\b(\w+)\s+\1\b", "the the quick brown fox fox")
# ["the", "fox"]
# Named backreference
re.search(r"(?P<tag>\w+)>.*?</(?P=tag)>", "<b>bold text</b>")
# In substitution: \1 or \g<name>
re.sub(r"(\w+)\s+\1", r"\1", "the the quick") # remove duplicates
# "the quick"
re.sub(r"(?P<first>\w+)\s+(?P<last>\w+)", r"\g<last>, \g<first>", "John Doe")
# "Doe, John"
Atomic Groups and Possessive Quantifiers
# Problem: nested quantifiers cause catastrophic backtracking
# Pattern: (a+)+ against "aaaaab"
# Engine tries 2^n combinations before failing
import re, time
dangerous = re.compile(r"(a+)+$")
# dangerous.match("aaaaaaaaaaaaaaaab") # ← will hang!
# Fix 1: Possessive quantifier (PCRE only — not Python's re)
# (a++)+ would prevent backtracking on inner +
# Fix 2: Atomic group (not in Python re, available in regex module)
import regex
safe = regex.compile(r"(?>a+)+$")
# Fix 3: Rewrite to avoid ambiguity (best approach)
fixed = re.compile(r"a+$") # same intent, unambiguous
PCRE vs RE2 vs POSIX
Feature PCRE RE2 POSIX
Named groups ✅ __INLINE_CODE_0__ ✅ __INLINE_CODE_1__ ❌
Lookahead ✅ ✅ ❌
Lookbehind ✅ ✅ (fixed-width) ❌
Backreferences ✅ ❌ ✅
Possessive ✅ N/A ❌
Atomic groups ✅ N/A ❌
Performance O(2^n) worst O(n) guaranteed O(n)
Used in Python, PHP, Perl, Java Go, RE2, Rust (regex) grep, sed
RE2 key constraints:
- Guaranteed O(n) time — safe for user input
- No backreferences (by design — prevent exponential backtracking)
- Fixed-width lookbehind only
- No possessive quantifiers or atomic groups (not needed with linear engine)
PCRE (Python re, JavaScript) key differences:
- Supports backreferences and variable-width lookbehind
- Can be exploited with ReDoS if used on untrusted input
- Use the `regex` module in Python for possessive quantifiers
When NOT to Use Regex
❌ Don't use regex for:
HTML/XML parsing
<div class="(\w+)">.*?</div> — fails on nested tags, attributes
✅ Use: BeautifulSoup (Python), DOMParser (JS), html.parser
Nested structures (JSON, S-expressions, balanced parens)
(?:\([^)]*\))+ — can't handle (\(inner (\(deep\))\))
✅ Use: json.parse(), proper parser
Dates with complex rules (leap years, month lengths)
✅ Use: datetime.strptime(), date-fns, Temporal
Email validation (RFC 5321 is 100+ pages)
✅ Use: simple heuristic regex + send verification email
URLs (there is no universally correct URL regex)
✅ Use: URL() constructor (JS), urllib.parse (Python)
CSV with quoted fields containing commas
"field1","field with, comma","field3"
✅ Use: csv module (Python), papaparse (JS)
Performance Pitfalls — Catastrophic Backtracking
# Catastrophic patterns (avoid on user input):
r"(a+)+" # ← O(2^n) — exponential
r"(a|aa)+" # ← O(2^n) — overlapping alternatives
r"(\w+\s?)+$" # ← O(2^n) — on non-matching string
# The rule: if a quantified group contains another quantifier
# AND the inner and outer patterns can match the same characters
# → potential catastrophic backtracking
# Detecting ReDoS vulnerability:
# 1. Input that almost matches → triggers max backtracking
# 2. Long input of repeating chars + one non-matching char at end
"a" * 30 + "!" # test with your pattern
# Fixes:
# 1. Remove ambiguity: (\w+\s?)+ → \w+(\s\w+)*
# 2. Use possessive/atomic: (?>a+)+
# 3. Use RE2-based engine for untrusted input
# 4. Set timeout (Python's re doesn't support timeout natively)
import signal
def timeout_handler(signum, frame): raise TimeoutError()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1) # 1 second timeout
try:
result = re.match(pattern, user_input)
finally:
signal.alarm(0)
Language-Specific: Python re Module
import re
# Flags
re.IGNORECASE # re.I — case-insensitive
re.MULTILINE # re.M — ^ and $ match line boundaries
re.DOTALL # re.S — dot matches newline
re.VERBOSE # re.X — allow whitespace and comments
re.ASCII # re.A — \w, \d, etc. match ASCII only (not Unicode)
# Functions
re.match(pattern, string) # match at START of string only
re.search(pattern, string) # match ANYWHERE in string
re.findall(pattern, string) # return list of all matches
re.finditer(pattern, string) # return iterator of Match objects
re.sub(pattern, repl, string) # substitute matches
re.split(pattern, string) # split by pattern
# Compile for reuse (faster in loops)
EMAIL_RE = re.compile(
r"""
(?P<local>[a-zA-Z0-9._%+\-]+) # local part
@
(?P<domain>[a-zA-Z0-9.\-]+) # domain
\.
(?P<tld>[a-zA-Z]{2,}) # TLD
""",
re.VERBOSE,
)
# Named groups + verbose mode
def parse_email(email: str) -> dict | None:
m = EMAIL_RE.match(email)
return m.groupdict() if m else None
# Practical example: log parser
LOG_PATTERN = re.compile(
r"(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})"
r"\s+(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)"
r"\s+(?P<logger>[\w.]+)"
r"\s+(?P<message>.+)"
)
def parse_log_line(line: str) -> dict | None:
m = LOG_PATTERN.match(line.strip())
return m.groupdict() if m else None
Language-Specific: JavaScript
, "i");
// Flags: i (case-insensitive), g (global), m (multiline), s (dotAll), u (unicode), d (indices)
// exec with global flag — iterate all matches with named groups
const LOG_RE = /(?<ts>\d{4}-\d{2}-\d{2}) (?<level>\w+): (?<msg>.+)/g;
for (const match of logText.matchAll(LOG_RE)) {
console.log(match.groups.ts, match.groups.level, match.groups.msg);
}
// Named groups in replace
const formatted = "2026-03-14".replace(
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
"__CODE_BLOCK_11__lt;day>/__CODE_BLOCK_11__lt;month>/__CODE_BLOCK_11__lt;year>"
);
// "14/03/2026"
// String.matchAll: returns iterator of match objects (requires /g flag)
const urls = [...text.matchAll(/https?:\/\/[^\s>]+/g)].map(m => m[0]);
// Escape user input before inserting into regex
function escapeRegex(str) {
return str.replace(/[.*+?^${}()|[\]\\]/g, "\\__CODE_BLOCK_11__amp;");
}
Language-Specific: Go (RE2)
)
func ValidateAgentID(id string) bool {
return agentIDRe.MatchString(id)
}
// Named groups (SubexpNames)
logRe := regexp.MustCompile(
`(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}) (?P<level>\w+) (?P<msg>.+)`,
)
func ParseLog(line string) map[string]string {
match := logRe.FindStringSubmatch(line)
if match == nil { return nil }
result := make(map[string]string)
for i, name := range logRe.SubexpNames() {
if i != 0 && name != "" {
result[name] = match[i]
}
}
return result
}
// ReplaceAllStringFunc for complex substitutions
result := re.ReplaceAllStringFunc(input, func(s string) string {
return strings.ToUpper(s)
})
Practical Patterns
# Email: pragmatic (not RFC-perfect — verify by sending)
EMAIL = r"^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$"
# URL extraction (handles common cases)
URL = r"https?://(?:[a-zA-Z0-9\-._~:/?#\[\]@!__CODE_BLOCK_13__amp;'()*+,;=%]|(?:%[0-9a-fA-F]{2}))+"
# Agent ID validation
AGENT_ID = r"^[a-z0-9][a-z0-9\-]{1,62}[a-z0-9]$"
# ISO 8601 date
ISO_DATE = r"^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$"
# Semantic version
SEMVER = r"^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<pre>[0-9A-Za-z\-]+(?:\.[0-9A-Za-z\-]+)*))?(?:\+(?P<build>[0-9A-Za-z\-]+(?:\.[0-9A-Za-z\-]+)*))?$"
# Log line with structured data
LOG_LINE = re.compile(
r"^(?P<ts>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z?)"
r"\s+(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)"
r"\s+\[(?P<req_id>[a-f0-9\-]+)\]"
r"\s+(?P<message>.+)$"
)
# CSV with quoted fields (handles commas in quotes)
CSV_FIELD = re.compile(r'"(?:[^"\\]|\\.)*"|[^,\n]+')
# Markdown headings
MD_HEADING = re.compile(r"^(?P<level>#{1,6})\s+(?P<text>.+)$", re.MULTILINE)
Anti-Patterns
# ❌ Parsing HTML with regex
re.findall(r"<div class=\"content\">(.*?)</div>", html)
# ✅ Use BeautifulSoup or lxml
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
soup.find("div", class_="content").text
# ❌ Not compiling patterns used in loops
for line in lines:
if re.match(r"ERROR: \d+", line): # recompiles each iteration
# ✅
ERROR_RE = re.compile(r"ERROR: \d+")
for line in lines:
if ERROR_RE.match(line):
# ❌ Nested quantifiers on overlapping patterns
r"(\w+)+" # catastrophic
r"([a-zA-Z0-9]+)+" # catastrophic
# ✅ Remove inner quantifier or use atomic group
# ❌ Anchoring incorrectly
re.match(r"error", text) # only matches at start
re.search(r"^error$", text) # needless re.search when re.match would do
# ❌ Capturing when you don't need captures (slower)
r"(https?)://(.*)" # capturing groups
# ✅
r"(?:https?)://(?:.*)" # non-capturing
# ❌ Using regex for simple contains check
if re.search(r"error", text):
# ✅
if "error" in text.lower():
Quick Reference
Greedy: .* .+ matches max, backtracks if needed
Lazy: .*? .+? matches min, expands if needed
Possessive: .*+ .++ matches max, NO backtracking (PCRE)
Groups: (capture), (?:non-capture), (?P<name>named)
Lookahead: (?=ahead) (?!not-ahead) — zero-width, not consumed
Lookbehind: (?<=behind) (?<!not-behind) — zero-width, fixed-width in RE2
Backref: \1 by number, (?P=name) in Python, __CODE_BLOCK_15__lt;name> in JS replace
ReDoS: (x+)+ or (x|x)+ patterns → catastrophic with non-matching input
RE2 vs PCRE: RE2 = O(n) guaranteed, no backrefs; PCRE = full features, risk of ReDoS
Python re: re.compile + VERBOSE flag for complex patterns
JS: /g flag + matchAll() for all matches with groups
Go: regexp.MustCompile, SubexpNames() for named group extraction
When to stop: HTML, JSON, CSV with quotes, nested structures → use proper parsersSkill Information
- Source
- MoltbotDen
- Category
- Coding Agents & IDEs
- Repository
- View on GitHub
Related Skills
go-expert
Write idiomatic, production-quality Go code. Use when building Go APIs, CLIs, microservices, or systems code. Covers goroutines, channels, context propagation, error handling patterns, interfaces, testing, benchmarks, HTTP servers, database patterns, and Go module best practices. Expert-level Go idioms that senior engineers expect.
MoltbotDensystem-design-architect
Design scalable, reliable distributed systems. Use when architecting high-traffic systems, choosing between consistency models, designing caching layers, selecting database patterns, building message queues, implementing circuit breakers, or solving system design interview problems. Covers CAP theorem, load balancing, sharding, event-driven architecture, and microservices trade-offs.
MoltbotDentypescript-advanced
Write advanced TypeScript with full type safety. Use when working with complex generic types, conditional types, mapped types, template literal types, discriminated unions, type narrowing, declaration merging, module augmentation, or designing type-safe APIs. Covers TypeScript 5.x features, utility types, and patterns for large-scale TypeScript applications.
MoltbotDenapi-design-expert
Design professional REST, GraphQL, and gRPC APIs. Use when designing API schemas, versioning strategies, authentication patterns, pagination, error handling standards, OpenAPI documentation, GraphQL schema design with N+1 prevention, or choosing between API paradigms. Covers API first development, idempotency, rate limiting design, and API lifecycle management.
MoltbotDenrust-systems
Write safe, performant Rust systems code. Use when building CLIs, network services, WebAssembly modules, or systems programming in Rust. Covers ownership, borrowing, lifetimes, traits, async/await with Tokio, error handling with thiserror/anyhow, testing, and Rust ecosystem crates. Idiomatic Rust patterns that pass code review.
MoltbotDen