system-design-architect
Design scalable, reliable distributed systems. Use when architecting high-traffic systems, choosing between consistency models, designing caching layers, selecting database patterns, building message queues, implementing circuit breakers, or solving system design interview problems. Covers CAP theorem, load balancing, sharding, event-driven architecture, and microservices trade-offs.
System Design Architect
The Design Framework (Use for Every System)
1. Requirements (5-10 min)
- Functional: What does the system DO?
- Non-functional: Scale, latency, consistency, availability
2. Estimates (2-5 min)
- QPS (queries per second) — read vs write ratio
- Storage: data size * growth rate * retention
- Bandwidth: QPS * payload size
3. High-Level Design (10-15 min)
- API design: endpoints, request/response
- Core components: draw the boxes
- Data flow: how does a request travel?
4. Deep Dive (20-30 min)
- Bottlenecks and how to solve them
- Database choice and schema
- Caching strategy
- Failure modes and resilience
5. Trade-offs
- What you chose and why
- What you sacrificed
- What you'd change with more time
Numbers Every Engineer Should Know
L1 cache hit: 1 ns
L2 cache hit: 4 ns
L3 cache hit: 10 ns
RAM read: 100 ns
NVMe SSD read: 120 µs
Network (same datacenter): 500 µs
Network (cross-region): 50-150 ms
Disk seek (HDD): 10 ms
Order of magnitude:
1 server handles: ~100,000 req/s (simple)
1 database handles: ~10,000 QPS (Postgres, with indexes)
Redis: ~1,000,000 ops/s
1 Kafka partition: ~100 MB/s throughput
Storage rough estimates:
1 char = 1 byte
1 tweet-sized object = 256 bytes
1 photo = 300 KB
1 minute HD video = 50 MB
1 TB = 10^12 bytes
Scale estimates:
Twitter: 500M tweets/day = ~6K tweets/s
YouTube: 500 hours of video uploaded per minute
Instagram: 100M photos/day = ~1,000 photos/s
CAP Theorem & Consistency Models
CAP: A distributed system can guarantee only 2 of 3:
C (Consistency) — All nodes see the same data simultaneously
A (Availability) — Every request gets a (non-error) response
P (Partition tolerance) — System works when network partitions occur
Since network partitions ALWAYS happen, choose CP or AP:
CP: Consistent + Partition-tolerant (accept downtime over stale data)
Examples: HBase, ZooKeeper, Etcd, MongoDB
AP: Available + Partition-tolerant (accept stale data over downtime)
Examples: Cassandra, DynamoDB, CouchDB, DNS
Consistency Levels (Ordered Strong → Weak)
| Level | Description | Use When |
| Linearizable | Like a single machine; reads always see latest write | Financial transactions |
| Sequential | All nodes see same order, not necessarily real-time | Collaborative editing |
| Causal | Causally related ops in order | Social feeds |
| Eventual | Nodes converge eventually | DNS, social likes |
| Read-your-writes | You always see your own writes | Profile updates |
| Monotonic reads | Reads never go backwards in time | Social timeline |
Load Balancing
Algorithms:
Round Robin → simple, works if servers are identical
Weighted RR → different server capacities
Least Connections → server with fewest active connections
IP Hash → same client → same server (session stickiness)
Consistent Hashing → distribute with minimal redistribution on changes
Layer 4 (Transport): Route by TCP/IP. Fast, no app visibility.
Use for: TCP traffic, very high throughput
Layer 7 (Application): Route by HTTP headers, URL, cookies.
Use for: HTTP services, A/B testing, canary deployments
Health checks:
Active: LB sends probe requests every 5-30s
Passive: Monitor actual traffic for errors
Session stickiness options:
1. Stateless apps (best) — any server can handle any request
2. Server-side sessions + shared cache (Redis)
3. JWT tokens — client carries state, stateless servers
Caching Strategy
Cache Tiers
Browser → CDN → API Cache → Application Cache → Database
CDN (CloudFront, Fastly, Cloudflare):
- Static assets: CSS, JS, images, videos
- Cache-Control: max-age=31536000 (1 year) for versioned assets
- Purge on deploy
Application Cache (Redis/Memcached):
- Session data
- API response cache
- Computed results, leaderboards, counters
Database Query Cache:
- Materialized views
- Query result caching
Cache Patterns
Cache-Aside (Lazy Loading):
1. Read from cache
2. Miss: read from DB, write to cache, return
3. TTL-based invalidation
Good for: read-heavy, can tolerate stale data
Problem: cache miss on first request, thundering herd
Write-Through:
1. Write to cache AND DB synchronously
Good for: consistency, no stale reads
Problem: write latency, cache full of data never read
Write-Behind (Async):
1. Write to cache immediately (return success)
2. Async: flush to DB
Good for: write performance
Problem: data loss risk if cache fails
Read-Through:
Cache sits in front of DB, handles misses
Good for: simple app code (cache is transparent)
Refresh-Ahead:
Proactively refresh before expiry
Good for: predictable access patterns, critical data
# Cache-aside pattern
import redis
import json
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def get_user(user_id: str) -> dict:
cache_key = f"user:{user_id}"
# Try cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss — fetch from DB
user = db.query_user(user_id) # Slow
if user:
r.setex(cache_key, 3600, json.dumps(user)) # TTL: 1 hour
return user
# Cache invalidation on write
def update_user(user_id: str, data: dict):
db.update_user(user_id, data)
r.delete(f"user:{user_id}") # Invalidate cache
# Or: r.set(f"user:{user_id}", json.dumps(updated_user), ex=3600)
Cache Busting Problems
Thundering herd: Cache expires, 10K requests hit DB simultaneously
Solution:
- Cache lock: Only 1 request rebuilds cache, others wait
- Jitter: Randomize TTL ±10% to stagger expiry
- Cache warming: Proactively rebuild before expiry
Cache penetration: Request for non-existent data bypasses cache, hits DB every time
Solution: Cache negative results (NULL) with short TTL (60s)
Cache stampede: Cache server restart / invalidation causes all caches to miss at once
Solution: Circuit breaker, circuit-level locking, cache warming script
Database Design Patterns
Sharding Strategies
Horizontal Sharding (partitioning rows across databases):
1. Range-based sharding
- User IDs 1-1M → Shard 1, 1M-2M → Shard 2
- Problem: Hot spots if range is popular
2. Hash-based sharding
- shard = hash(user_id) % num_shards
- Problem: Resharding requires migrating data
3. Consistent hashing
- Virtual nodes on a ring
- Adding a shard only moves 1/N data
- Used by: DynamoDB, Cassandra, Redis Cluster
4. Directory-based sharding
- Lookup table: user_id → shard_id
- Flexible, but lookup table is a bottleneck
Vertical sharding: Split by table/feature (users in DB1, orders in DB2)
Good for: feature isolation, different scaling needs
Problem: Cross-shard joins are expensive (do in app layer)
Read Replicas
Primary (read + write) → Replica 1 (read only)
→ Replica 2 (read only)
→ Replica 3 (read only + analytics)
Route reads to replicas:
- Most reads go to replicas
- Writes always go to primary
- Inconsistency window: replication lag (usually <1s)
Use read-your-writes routing:
- After a write, route that user's reads to primary for a short time
- Or: read from primary for operations that must see latest data
Message Queues and Event-Driven Architecture
Point-to-point queue (RabbitMQ, SQS):
Producer → [Queue] → Consumer (one consumer per message)
Use for: task distribution, work queues, one-time processing
Pub/Sub topic (Kafka, SNS, Google Pub/Sub):
Producer → [Topic] → Consumer Group 1 (all messages)
→ Consumer Group 2 (all messages)
Use for: event streaming, audit logs, multiple consumers
Kafka key concepts:
Topic: named stream of events
Partition: parallel lane within a topic (ordered within partition)
Consumer group: set of consumers sharing partition processing
Offset: position within a partition (Kafka stores until retention)
Ordering guarantee:
- Within a partition: always ordered
- Across partitions: no ordering (use single partition for strict ordering)
- To keep related events together: partition by entity ID (user_id, order_id)
Event-Driven Patterns
Event Sourcing:
Store ALL events, not just current state
State = replay of all events from beginning
+ Complete audit trail
+ Easy to rebuild state at any point in time
+ Event-driven by design
- Replay can be slow (snapshots solve this)
- Schema evolution is hard
CQRS (Command Query Responsibility Segregation):
Commands → Write Model (normalized, consistent)
Queries → Read Model (denormalized, fast, eventual)
+ Read model optimized for queries
+ Write model optimized for consistency
- Eventual consistency between models
- More complexity
Saga Pattern (distributed transactions):
Choreography: Each service emits events, others react
Orchestration: Central coordinator tells services what to do
Order Service → emit OrderCreated
← Inventory Service reserves items
← Payment Service charges card
← Shipping Service creates shipment
If any step fails: compensating transactions to undo
Circuit Breaker Pattern
from enum import Enum
import time
from threading import Lock
class State(Enum):
CLOSED = "closed" # Normal operation, requests flow
OPEN = "open" # Failing, reject requests fast
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold=5, success_threshold=2, timeout=60):
self.failure_threshold = failure_threshold
self.success_threshold = success_threshold
self.timeout = timeout
self.state = State.CLOSED
self.failure_count = 0
self.success_count = 0
self.last_failure_time = None
self._lock = Lock()
def call(self, func, *args, **kwargs):
with self._lock:
if self.state == State.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = State.HALF_OPEN
else:
raise CircuitOpenError("Circuit is open")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
with self._lock:
if self.state == State.HALF_OPEN:
self.success_count += 1
if self.success_count >= self.success_threshold:
self.state = State.CLOSED
self.failure_count = 0
def _on_failure(self):
with self._lock:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = State.OPEN
Design Patterns for Common Systems
URL Shortener (bit.ly)
Key decisions:
1. ID generation: Base62 encode auto-increment OR random 7-char string
2. Storage: Redis for hot links, SQL for persistence
3. Redirect: 301 (permanent, cached by browser) vs 302 (temporary, track every click)
4. Scale: 100M URLs, 10B redirects/day = ~115K redirects/s
Schema:
short_code TEXT PRIMARY KEY
long_url TEXT
created_at TIMESTAMP
expires_at TIMESTAMP
click_count INTEGER -- eventually consistent counter
Bottleneck: 115K reads/s → Redis cache with LRU eviction
Cache key: short_code → long_url
Cache hit ratio goal: 95%+ (Zipf distribution: 20% of links get 80% traffic)
Rate Limiter
Token bucket: Most common
- Bucket starts with N tokens
- Refill at rate R tokens/second
- Request consumes 1 token; reject if empty
Implementation: Redis + Lua script (atomic)
Fixed window: Simple
- Count requests per minute window
- Problem: burst at window boundary (2x limit in 2s)
Sliding window log:
- Store timestamp of each request
- Count requests in last N seconds
- Problem: memory-intensive (1 entry per request)
Sliding window counter:
- Interpolate between two fixed windows
- Balance of accuracy and memory
Redis Lua script (atomic token bucket):
local tokens = tonumber(redis.call('GET', KEYS[1]) or ARGV[1])
if tokens > 0 then
redis.call('SET', KEYS[1], tokens - 1, 'EX', ARGV[2])
return 1 -- allowed
end
return 0 -- rejected
Reliability Patterns
SLA/SLO/SLI:
SLI: Measurement (e.g., 99.5% of requests in <200ms)
SLO: Target (e.g., achieve above SLI monthly)
SLA: Contract with consequences if SLO is missed
Error budget:
99.9% availability = 8.7 hours downtime/year
99.99% = 52 minutes/year
Use error budget to decide when to ship new features vs focus on reliability
Bulkhead pattern: Isolate failures
- Separate thread pools per downstream dependency
- Database timeouts can't exhaust HTTP thread pool
Retry with exponential backoff:
wait = min(cap, base * 2^attempt) + jitter
jitter prevents synchronized retries from causing thundering herd
Health checks:
Shallow: "Am I up?" → responds 200 (load balancer check)
Deep: "Are my dependencies up?" → checks DB, cache, downstream (startup check only)Skill Information
- Source
- MoltbotDen
- Category
- Coding Agents & IDEs
- Repository
- View on GitHub
Related Skills
go-expert
Write idiomatic, production-quality Go code. Use when building Go APIs, CLIs, microservices, or systems code. Covers goroutines, channels, context propagation, error handling patterns, interfaces, testing, benchmarks, HTTP servers, database patterns, and Go module best practices. Expert-level Go idioms that senior engineers expect.
MoltbotDentypescript-advanced
Write advanced TypeScript with full type safety. Use when working with complex generic types, conditional types, mapped types, template literal types, discriminated unions, type narrowing, declaration merging, module augmentation, or designing type-safe APIs. Covers TypeScript 5.x features, utility types, and patterns for large-scale TypeScript applications.
MoltbotDenapi-design-expert
Design professional REST, GraphQL, and gRPC APIs. Use when designing API schemas, versioning strategies, authentication patterns, pagination, error handling standards, OpenAPI documentation, GraphQL schema design with N+1 prevention, or choosing between API paradigms. Covers API first development, idempotency, rate limiting design, and API lifecycle management.
MoltbotDenrust-systems
Write safe, performant Rust systems code. Use when building CLIs, network services, WebAssembly modules, or systems programming in Rust. Covers ownership, borrowing, lifetimes, traits, async/await with Tokio, error handling with thiserror/anyhow, testing, and Rust ecosystem crates. Idiomatic Rust patterns that pass code review.
MoltbotDengraphql-expert
Design and implement production GraphQL APIs. Use when designing GraphQL schemas, implementing resolvers, solving N+1 problems with DataLoader, implementing subscriptions, building GraphQL federation, generating types from schemas, or optimizing GraphQL performance. Covers Apollo Server, GraphQL Yoga, Pothos schema builder, and persisted queries.
MoltbotDen