Skip to main content

redis-expert

Expert knowledge of Redis data structures, eviction policies, Lua scripting, messaging patterns, cluster topologies, memory optimization, and advanced caching strategies. Trigger phrases: when using Redis, Redis data structure selection, distributed locks with Redis,

MoltbotDen
Data & Analytics

Redis Expert

Redis is simultaneously a cache, message broker, session store, leaderboard engine, and stream processor — but only if you choose the right data structure. The biggest Redis mistakes are: using Strings when a Hash would save memory, using blocking operations in application hot paths, and not planning for eviction. Redis is single-threaded for command execution (I/O is multi-threaded since 6.0), so O(n) commands like KEYS, SMEMBERS on large sets, and LRANGE 0 -1 can block the server.

Core Mental Model

Every Redis data structure solves a different problem class. Strings are versatile but wasteful at scale. Hashes are memory-efficient objects. Sorted Sets are the Swiss army knife for ranking, scheduling, and range queries. Streams are the correct answer for durable messaging — not pub/sub. Memory is finite and Redis will evict or OOM if you don't plan TTLs and eviction policy. Cluster adds horizontal scale but complicates multi-key operations and Lua scripts.


Data Structure Selection Guide

StructureUse ForAvoid When
StringSimple KV, counters, locks, small serialized objectsMany fields per key (use Hash)
HashObjects, session data, user profilesMore than a few thousand fields
ListWork queues, activity feeds (bounded), stacksLarge random-access needs
SetUnique membership, tagging, intersection/unionOrdered access needed
Sorted SetLeaderboards, rate limiting, scheduling, priority queuesPure unordered membership
StreamDurable event log, consumer groups, audit trailFire-and-forget (use pub/sub)
HyperLogLogApproximate unique counts (±0.81% error)Exact counts needed
Bloom Filter"Definitely not in set" checks (RedisBloom)Membership must be certain
GeoDistance queries, nearby searchComplex polygon queries
BitmapBit-level flags, DAU countingNon-integer keys
# Memory comparison: 1000 user objects
# 1000 individual Strings (JSON serialized)
SET user:1 '{"id":1,"name":"Alice","email":"[email protected]","score":42}'
# ~120 bytes per key × 1000 = ~120KB + overhead per key = ~200KB total

# vs 1 Hash per user (ziplist encoded if fields < hash-max-ziplist-entries)
HSET user:1 name Alice email [email protected] score 42
# ~65 bytes × 1000 = ~65KB — nearly 3x more memory-efficient

TTL and Eviction Policies

Eviction Policy Selection

maxmemory-policy options (set in redis.conf or CONFIG SET):

noeviction          → Return error when memory full. Use for: queues, data you can't afford to lose
allkeys-lru         → Evict least recently used from ALL keys. Use for: general cache
volatile-lru        → Evict LRU from keys WITH expiry only. Use for: cache + persistent mix
allkeys-lfu         → Evict least frequently used. Use for: Zipf-distributed access patterns
volatile-lfu        → LFU from keys with expiry
allkeys-random      → Random eviction. Rarely correct
volatile-ttl        → Evict keys with shortest remaining TTL first

Recommendation:
  Pure cache:                     allkeys-lru or allkeys-lfu
  Cache + durable data:           volatile-lru (set TTL on cache keys, not on durable keys)
  Queue / stream (no eviction):   noeviction + monitor memory
  Hot/cold access patterns:       allkeys-lfu (LFU handles Zipf better than LRU)
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lfu
maxmemory-samples 10  # LRU/LFU approximation sample size (higher = more accurate, more CPU)

# Runtime change
CONFIG SET maxmemory-policy allkeys-lfu
CONFIG SET maxmemory 4gb

# Check eviction stats
INFO stats | grep evicted_keys
INFO memory | grep used_memory_human

Distributed Lock (Redlock)

# Single-instance lock (sufficient for most use cases)
import redis
import uuid
import time

def acquire_lock(r: redis.Redis, lock_name: str, timeout_ms: int = 30000) -> str | None:
    """Returns lock token if acquired, None if lock is held."""
    token = str(uuid.uuid4())
    acquired = r.set(
        f"lock:{lock_name}",
        token,
        px=timeout_ms,   # expiry in milliseconds
        nx=True          # only set if Not eXists
    )
    return token if acquired else None

def release_lock(r: redis.Redis, lock_name: str, token: str) -> bool:
    """Atomic release — only release if we own the lock."""
    script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    else
        return 0
    end
    """
    result = r.eval(script, 1, f"lock:{lock_name}", token)
    return bool(result)

# Usage
token = acquire_lock(r, "payment_processor_user_42", timeout_ms=10000)
if token:
    try:
        process_payment(user_id=42)
    finally:
        release_lock(r, "payment_processor_user_42", token)
else:
    raise Exception("Could not acquire lock — another process is running")
// Redlock for multi-node Redis (true distributed lock)
import Redlock from "redlock";

const redlock = new Redlock([client1, client2, client3], {
  retryCount:  3,
  retryDelay:  200,   // ms between retries
  retryJitter: 100,   // random jitter to prevent thundering herd
  driftFactor: 0.01   // clock drift tolerance
});

const lock = await redlock.acquire(["lock:payment:user:42"], 10000);
try {
  await processPayment(userId);
} finally {
  await lock.release();
}

Rate Limiter

# Sliding window rate limiter using Sorted Set
def is_rate_limited(r: redis.Redis, user_id: str,
                    limit: int = 100, window_seconds: int = 60) -> bool:
    key = f"rate:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    # Remove entries outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count requests in window
    pipe.zcard(key)
    # Add current request (score = timestamp)
    pipe.zadd(key, {str(uuid.uuid4()): now})
    # Set TTL to clean up idle keys
    pipe.expire(key, window_seconds + 1)

    _, count, _, _ = pipe.execute()

    return count >= limit  # True means rate limited

# Fixed window counter (simpler, tiny thundering herd at window boundary)
def fixed_window_limit(r: redis.Redis, user_id: str,
                       limit: int = 100, window_seconds: int = 60) -> bool:
    key = f"ratelimit:{user_id}:{int(time.time() // window_seconds)}"
    count = r.incr(key)
    if count == 1:
        r.expire(key, window_seconds)
    return count > limit

Leaderboard with Sorted Set

# Add/update scores
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2300 "player:bob"
ZADD leaderboard 890  "player:carol"
ZINCRBY leaderboard 100 "player:alice"   # atomic increment

# Top N players (descending score)
ZREVRANGEBYSCORE leaderboard +inf -inf WITHSCORES LIMIT 0 10

# Player's rank (0-indexed, use ZREVRANK for highest-first)
ZREVRANK leaderboard "player:alice"   # → 1 (0-based, so rank 2)

# Players in score range
ZRANGEBYSCORE leaderboard 1000 2000 WITHSCORES

# Player's score
ZSCORE leaderboard "player:alice"

# Window around player (show neighbors)
ZREVRANK leaderboard "player:alice"  # get rank first
ZREVRANGE leaderboard 0 9 WITHSCORES  # get surrounding players
# Full leaderboard service
class Leaderboard:
    def __init__(self, r: redis.Redis, key: str):
        self.r = r
        self.key = key

    def add_score(self, player_id: str, score: float):
        self.r.zadd(self.key, {player_id: score})

    def increment_score(self, player_id: str, delta: float) -> float:
        return self.r.zincrby(self.key, delta, player_id)

    def get_rank(self, player_id: str) -> int | None:
        rank = self.r.zrevrank(self.key, player_id)
        return rank + 1 if rank is not None else None  # 1-indexed

    def get_top(self, n: int = 10) -> list[dict]:
        entries = self.r.zrevrangebyscore(self.key, "+inf", "-inf",
                                          withscores=True, start=0, num=n)
        return [{"player": p.decode(), "score": s, "rank": i+1}
                for i, (p, s) in enumerate(entries)]

    def get_around_player(self, player_id: str, radius: int = 5) -> list[dict]:
        rank = self.r.zrevrank(self.key, player_id)
        if rank is None:
            return []
        start = max(0, rank - radius)
        stop  = rank + radius
        entries = self.r.zrevrange(self.key, start, stop, withscores=True)
        return [{"player": p.decode(), "score": s, "rank": start + i + 1}
                for i, (p, s) in enumerate(entries)]

Pub/Sub vs Streams vs Lists

PatternDeliveryHistoryConsumer GroupsPersistenceUse For
Pub/SubFire-and-forget❌ NoneReal-time notifications, live updates
List (LPUSH/BRPOP)At-least-once❌ (consumed)AOF/RDBSimple work queue
StreamsAt-least-once + ACK✅ Consumer groupsAOF/RDBDurable event log, multi-consumer
# Redis Streams: durable event processing with consumer groups

# Producer
r.xadd("orders", {
    "order_id": "ord_123",
    "customer": "alice",
    "total": "99.99",
    "status": "new"
})

# Create consumer group (read from beginning: '0', or latest: '

Lua Scripting for Atomic Operations

-- redis-lua: atomic check-and-set with complex logic
-- KEYS[1] = counter key, ARGV[1] = limit, ARGV[2] = ttl_seconds
local current = redis.call('GET', KEYS[1])
if current and tonumber(current) >= tonumber(ARGV[1]) then
    return 0  -- rate limited
end
local new_val = redis.call('INCR', KEYS[1])
if new_val == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return 1  -- allowed
# Load and execute Lua script (cached by SHA)
rate_limit_script = r.register_script("""
    local current = redis.call('GET', KEYS[1])
    if current and tonumber(current) >= tonumber(ARGV[1]) then
        return 0
    end
    local new_val = redis.call('INCR', KEYS[1])
    if new_val == 1 then
        redis.call('EXPIRE', KEYS[1], ARGV[2])
    end
    return 1
""")

allowed = rate_limit_script(keys=[f"rate:{user_id}"], args=[limit, window_seconds])

Cache Stampede Prevention

# Problem: cache expires, 1000 requests all miss and query DB simultaneously

# Solution 1: Probabilistic early recomputation (XFetch algorithm)
import math
import random

def get_with_xfetch(r: redis.Redis, key: str, ttl: int,
                    fetch_fn, beta: float = 1.0):
    """Proactively recompute before expiry using probabilistic early refresh."""
    data = r.get(key)
    if data:
        value, expiry = deserialize(data)
        remaining = expiry - time.time()
        delta = time.time() - fetch_fn.last_duration
        # Recompute early based on remaining TTL and fetch cost
        if remaining - delta * beta * math.log(random.random()) < 0:
            return refresh(r, key, ttl, fetch_fn)  # early refresh
        return value
    return refresh(r, key, ttl, fetch_fn)

# Solution 2: Lock-based (only one refresh, others wait)
def get_or_compute(r: redis.Redis, key: str, ttl: int, compute_fn):
    value = r.get(key)
    if value:
        return deserialize(value)

    lock_key = f"{key}:computing"
    lock_token = acquire_lock(r, lock_key, timeout_ms=5000)

    if lock_token:
        try:
            # Double-check after acquiring lock
            value = r.get(key)
            if value:
                return deserialize(value)
            result = compute_fn()
            r.setex(key, ttl, serialize(result))
            return result
        finally:
            release_lock(r, lock_key, lock_token)
    else:
        # Another worker is computing — wait briefly and retry
        time.sleep(0.1)
        return get_or_compute(r, key, ttl, compute_fn)

Memory Optimization

# Check encoding of a key
OBJECT ENCODING mykey
# Possible values: int, embstr, raw, ziplist, listpack, hashtable, skiplist, quicklist

# redis.conf thresholds for compact encoding
hash-max-listpack-entries 128   # Hash uses listpack if ≤ 128 fields
hash-max-listpack-value   64    # and all values ≤ 64 bytes
zset-max-listpack-entries 128   # Sorted Set uses listpack if ≤ 128 members
zset-max-listpack-value   64
set-max-intset-entries    512   # Set uses intset if all members are integers ≤ 512

# Memory analysis
MEMORY USAGE mykey              # bytes for a specific key
MEMORY DOCTOR                   # recommendations
DEBUG OBJECT mykey              # encoding + serialized length

# Find large keys (use SCAN, never KEYS in production)
redis-cli --bigkeys             # scans and reports largest keys per type
redis-cli --memkeys             # reports memory usage per key

# SCAN instead of KEYS
SCAN 0 MATCH "user:*" COUNT 100  # cursor-based, non-blocking
# Iterate until cursor returns 0

Cluster vs Sentinel vs Standalone

Standalone:    Single node. Simple ops. Zero HA. Dev/test only.

Sentinel:      HA with automatic failover. 3+ sentinel processes.
               Primary + replicas. Reads can go to replicas.
               No horizontal scaling. Good for < ~25GB, moderate throughput.

Cluster:       Horizontal scaling + HA. 3+ primary nodes.
               Data automatically sharded across nodes (16384 hash slots).
               Multi-key ops require keys on same slot (use hash tags: {user}.profile, {user}.session)
               Lua scripts limited to keys on same slot.
               Use for: large datasets, high throughput needs.

Hash tags for cluster co-location:
  MSET {user:42}.profile "..." {user:42}.session "..."  ✅ same slot
  MSET user:42:profile "..." user:42:session "..."      ❌ potentially different slots

Anti-Patterns

# ❌ KEYS in production (blocks server while scanning ALL keys)
KEYS user:*
# ✅ SCAN with cursor
SCAN 0 MATCH "user:*" COUNT 100

# ❌ Large collections without pagination
SMEMBERS huge_set          # O(N) — blocks if N is large
LRANGE mylist 0 -1         # entire list
# ✅ Paginate
SSCAN myset 0 COUNT 100
LRANGE mylist 0 99         # page 1

# ❌ Storing large blobs (> 100KB) per key
SET user:42:avatar [50KB binary]
# ✅ Store in object storage (S3/GCS), store URL in Redis

# ❌ No TTL on cache keys (memory fills, eviction kicks in unpredictably)
SET cache:user:42 "..."
# ✅ Always set TTL
SETEX cache:user:42 3600 "..."

# ❌ pub/sub for reliable messaging (messages lost if subscriber is down)
PUBLISH notifications '{"event":"payment_complete"}'
# ✅ Streams for reliability
XADD notifications * event payment_complete user_id 42

# ❌ String for every field of an object (1 key per field)
SET user:42:name "Alice"
SET user:42:email "[email protected]"
# ✅ Hash for objects
HSET user:42 name Alice email [email protected]

Quick Reference

Data Structure Decision:
  Simple KV / counter / flag       → String
  Object with multiple fields       → Hash
  Work queue / stack                → List (LPUSH/BRPOP)
  Unique membership / tag sets      → Set
  Ranking / scheduling / ranges     → Sorted Set
  Durable event log / multi-consumer → Stream
  Approx unique count               → HyperLogLog
  "Definitely not present" check    → Bloom Filter (RedisBloom)

Eviction Policy:
  Pure cache                        → allkeys-lru or allkeys-lfu
  Mixed cache + persistent          → volatile-lru
  Queue / stream (no loss allowed)  → noeviction + alerting

Distributed Lock:
  Single node                       → SET NX PX + Lua release
  Multi-node (true distributed)     → Redlock (3+ nodes)

Rate Limiting:
  Sliding window (accurate)         → ZADD + ZREMRANGEBYSCORE
  Fixed window (simple)             → INCR + EXPIRE

Topology:
  Dev / test                        → Standalone
  HA, < 25GB                        → Sentinel (3 nodes)
  Scale out, > 25GB                 → Cluster (6+ nodes: 3 primary + 3 replica)
) r.xgroup_create("orders", "order_processors", id="0", mkstream=True) # Consumer (in worker process) while True: # XREADGROUP: read up to 10 messages, block 2s if empty messages = r.xreadgroup( groupname="order_processors", consumername="worker-1", streams={"orders": ">"}, # ">" = new undelivered messages count=10, block=2000 ) for stream_name, entries in messages or []: for msg_id, fields in entries: try: process_order(fields) r.xack("orders", "order_processors", msg_id) # ACK = processed except Exception as e: log_error(e) # Message stays in PEL (pending entry list) for retry/DLQ # Claim stale messages (messages pending > 60s — worker may have crashed) stale = r.xautoclaim("orders", "order_processors", "worker-1", min_idle_time=60000, start_id="0-0")

Lua Scripting for Atomic Operations

__CODE_BLOCK_9__ __CODE_BLOCK_10__

Cache Stampede Prevention

__CODE_BLOCK_11__

Memory Optimization

__CODE_BLOCK_12__

Cluster vs Sentinel vs Standalone

__CODE_BLOCK_13__

Anti-Patterns

__CODE_BLOCK_14__

Quick Reference

__CODE_BLOCK_15__

Skill Information

Source
MoltbotDen
Category
Data & Analytics
Repository
View on GitHub

Related Skills