dynamodb-expert
Expert knowledge of DynamoDB single table design, access pattern driven modeling, GSI/LSI design, conditional writes, DynamoDB Streams, TTL, transactions, and cost optimization. Trigger phrases: when working with DynamoDB, single table design DynamoDB, DynamoDB GSI design,
DynamoDB Expert
DynamoDB is the most misunderstood database in AWS. Developers from relational backgrounds try to model it like a table and end up with either an inefficient multi-table design or a poorly performing single-table design. The key inversion: define your access patterns first, then design your table. DynamoDB forces you to commit to query patterns at design time, rewarding this discipline with single-digit millisecond latency at any scale, zero operational overhead, and a cost model that's genuinely cheap when used correctly.
Core Mental Model
DynamoDB is a key-value and document store with exactly two native retrieval methods: GetItem (fetch by exact partition key + optional sort key) and Query (fetch items matching a partition key, optionally filtered by sort key range). Everything else — relationships, secondary lookups, list views — must be baked into your key design or solved with a Global Secondary Index (GSI). The partition key determines the physical node; the sort key determines the ordering within that node. All items in a single partition key are stored together — this is the item collection.
Access Pattern Driven Design
Design workflow (MANDATORY order):
1. List ALL required access patterns
2. For each pattern, determine: what is the partition key? sort key? filters?
3. Design the primary key to satisfy the most critical patterns
4. Add GSIs for remaining patterns (each GSI doubles write cost)
5. Denormalize aggressively — duplicate data to avoid multiple round-trips
Example entity model: E-commerce platform
Entities: Customer, Order, Product, OrderItem, Review
Access patterns:
AP1: Get customer by ID
AP2: Get order by ID
AP3: Get all orders for a customer (sorted by date desc)
AP4: Get all items in an order
AP5: Get all orders in PENDING status
AP6: Get product by ID
AP7: Get all reviews for a product (sorted by rating desc)
AP8: Get all reviews by a customer
Single Table Design
Key pattern strategy: use type-prefixed composite keys
PK format: ENTITY_TYPE#id
SK format: ENTITY_TYPE#id or ENTITY_TYPE#date or METADATA
Entity → PK/SK mapping:
Customer PK=CUSTOMER#alice-1 SK=PROFILE
Order PK=ORDER#ord-123 SK=METADATA
Order items PK=ORDER#ord-123 SK=ITEM#prod-456
Customer orders PK=CUSTOMER#alice-1 SK=ORDER#2024-01-15T10:30:00#ord-123
Product PK=PRODUCT#prod-456 SK=METADATA
Review PK=PRODUCT#prod-456 SK=REVIEW#4.5#rev-789
import boto3
from boto3.dynamodb.conditions import Key, Attr
from datetime import datetime
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')
table = dynamodb.Table('ecommerce')
# AP1: Get customer by ID
def get_customer(customer_id: str) -> dict:
response = table.get_item(
Key={
'PK': f'CUSTOMER#{customer_id}',
'SK': 'PROFILE'
}
)
return response.get('Item')
# AP3: Get all orders for a customer (sorted by date desc)
def get_customer_orders(customer_id: str, limit: int = 20) -> list:
response = table.query(
KeyConditionExpression=
Key('PK').eq(f'CUSTOMER#{customer_id}') &
Key('SK').begins_with('ORDER#'),
ScanIndexForward=False, # descending order
Limit=limit
)
return response['Items']
# AP4: Get all items in an order
def get_order_items(order_id: str) -> list:
response = table.query(
KeyConditionExpression=
Key('PK').eq(f'ORDER#{order_id}') &
Key('SK').begins_with('ITEM#')
)
return response['Items']
# AP7: Get top-rated product reviews
def get_product_reviews(product_id: str, limit: int = 10) -> list:
response = table.query(
KeyConditionExpression=
Key('PK').eq(f'PRODUCT#{product_id}') &
Key('SK').begins_with('REVIEW#'),
ScanIndexForward=False, # highest rating first (sort key = REVIEW#5.0#rev-123)
Limit=limit
)
return response['Items']
GSI Design
# GSI1: Access pattern AP5 — get orders by status
# GSI1PK = status (e.g., "PENDING")
# GSI1SK = order_date
# When writing an order, include GSI keys
def create_order(order_id: str, customer_id: str, status: str = 'PENDING'):
now = datetime.utcnow().isoformat()
table.put_item(Item={
'PK': f'ORDER#{order_id}',
'SK': 'METADATA',
'GSI1PK': f'STATUS#{status}', # GSI partition key
'GSI1SK': now, # GSI sort key
'entity_type': 'ORDER',
'order_id': order_id,
'customer_id': customer_id,
'status': status,
'created_at': now
})
# Query GSI1 for pending orders
def get_orders_by_status(status: str, limit: int = 100) -> list:
response = table.query(
IndexName='GSI1',
KeyConditionExpression=
Key('GSI1PK').eq(f'STATUS#{status}'),
ScanIndexForward=False,
Limit=limit
)
return response['Items']
# GSI overloading: use same GSI for multiple access patterns
# Entity A: GSI1PK = "TYPE#customer_id", GSI1SK = timestamp
# Entity B: GSI1PK = "STATUS#pending", GSI1SK = order_date
# Same GSI serves multiple entity types via different key formats
Conditional Writes (Optimistic Locking)
# Optimistic locking with version counter
def update_order_status(order_id: str, new_status: str, expected_version: int):
try:
table.update_item(
Key={
'PK': f'ORDER#{order_id}',
'SK': 'METADATA'
},
UpdateExpression='SET #s = :new_status, version = :new_version, updated_at = :now',
ConditionExpression='version = :expected_version',
ExpressionAttributeNames={
'#s': 'status' # status is a reserved word
},
ExpressionAttributeValues={
':new_status': new_status,
':new_version': expected_version + 1,
':now': datetime.utcnow().isoformat(),
':expected_version': expected_version
}
)
except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
# Another process updated this item — retry with fresh version
raise OptimisticLockException(f"Order {order_id} was modified concurrently")
# Conditional create (create only if not exists)
def create_unique_username(username: str, user_id: str):
try:
table.put_item(
Item={
'PK': f'USERNAME#{username}',
'SK': 'RESERVATION',
'user_id': user_id,
'created_at': datetime.utcnow().isoformat()
},
ConditionExpression='attribute_not_exists(PK)'
)
except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
raise UsernameAlreadyTakenException(username)
Transactions
# TransactWriteItems: atomic cross-item writes (up to 100 items, 4MB)
def place_order(order_id: str, customer_id: str,
items: list[dict], payment_method_id: str):
"""Atomically create order and deduct inventory."""
transact_items = [
# Create the order
{
'Put': {
'TableName': 'ecommerce',
'Item': {
'PK': {'S': f'ORDER#{order_id}'},
'SK': {'S': 'METADATA'},
'status': {'S': 'PENDING'},
'customer_id': {'S': customer_id},
'created_at': {'S': datetime.utcnow().isoformat()}
},
'ConditionExpression': 'attribute_not_exists(PK)'
}
}
]
# Deduct inventory for each item
for item in items:
transact_items.append({
'Update': {
'TableName': 'ecommerce',
'Key': {
'PK': {'S': f'PRODUCT#{item["product_id"]}'},
'SK': {'S': 'INVENTORY'}
},
'UpdateExpression': 'SET stock = stock - :qty',
'ConditionExpression': 'stock >= :qty', # prevent overselling
'ExpressionAttributeValues': {
':qty': {'N': str(item['quantity'])}
}
}
})
try:
dynamodb.meta.client.transact_write(TransactItems=transact_items)
except dynamodb.meta.client.exceptions.TransactionCanceledException as e:
reasons = e.response['CancellationReasons']
for i, reason in enumerate(reasons):
if reason['Code'] == 'ConditionalCheckFailed':
if i == 0:
raise DuplicateOrderError(order_id)
else:
raise InsufficientStockError(items[i-1]['product_id'])
raise
DynamoDB Streams
# Lambda handler for DynamoDB Streams
def handle_stream(event, context):
for record in event['Records']:
event_name = record['eventName'] # INSERT, MODIFY, REMOVE
if event_name == 'INSERT':
new_item = deserialize(record['dynamodb']['NewImage'])
handle_new_item(new_item)
elif event_name == 'MODIFY':
old_item = deserialize(record['dynamodb']['OldImage'])
new_item = deserialize(record['dynamodb']['NewImage'])
# Detect status change
if old_item.get('status') != new_item.get('status'):
handle_status_change(
item_id=new_item['PK'],
old_status=old_item['status'],
new_status=new_item['status']
)
elif event_name == 'REMOVE':
old_item = deserialize(record['dynamodb']['OldImage'])
handle_deletion(old_item)
# DynamoDB Streams: 24-hour retention, ordered per partition key
# Use StreamViewType:
# NEW_IMAGE: only new item (after write)
# OLD_IMAGE: only old item (before write) — useful for REMOVE
# NEW_AND_OLD_IMAGES: both — needed for MODIFY change detection
# KEYS_ONLY: only PK+SK — cheapest, use when you'll re-fetch
TTL for Automatic Expiration
import time
# TTL is stored as a Unix epoch seconds attribute
def create_session(session_id: str, user_id: str, ttl_seconds: int = 86400):
expires_at = int(time.time()) + ttl_seconds
table.put_item(Item={
'PK': f'SESSION#{session_id}',
'SK': 'DATA',
'user_id': user_id,
'expires_at': expires_at, # TTL attribute — must be a number
'created_at': datetime.utcnow().isoformat()
})
# Enable TTL on the table (one-time setup)
# aws dynamodb update-time-to-live \
# --table-name ecommerce \
# --time-to-live-specification "Enabled=true, AttributeName=expires_at"
# TTL notes:
# - Deletion is eventual (typically within 48 hours of expiry)
# - Expired items won't be returned in queries (filtered by DynamoDB)
# - TTL deletions appear in Streams — filter them with userIdentity.type = "Service"
# - Free! TTL deletions don't consume write capacity
Cost Model
Read/Write capacity units:
1 RCU = 1 strongly consistent read of up to 4KB
or 2 eventually consistent reads of 4KB
1 WCU = 1 write of up to 1KB
Cost tips:
✅ Eventually consistent reads cost half (use for most reads)
✅ Batch operations (BatchGetItem, BatchWriteItem) amortize per-request overhead
✅ DAX cache eliminates read costs for hot items
✅ On-demand mode: pay per request, no capacity planning (up to 2x more expensive than provisioned)
✅ Provisioned + auto-scaling: cheaper for predictable workloads
❌ GSIs double your write cost (each GSI is a full copy)
❌ Scans read every item — avoid in production
❌ Storing large items (> 1 KB each write = multiple WCUs)
On-demand vs Provisioned:
On-demand: unpredictable traffic, new apps, traffic spikes
Provisioned: steady state, cost optimization, > 80% utilization
Savings Plans: commit to usage for 1-3 years, save up to 76%
Anti-Patterns
# ❌ Scan: reads every item in the table (expensive, slow)
table.scan(FilterExpression=Attr('status').eq('pending'))
# ✅ Query on GSI where GSI1PK = 'STATUS#pending'
# ❌ Storing relationships in separate tables (N+1 problem)
customer = customers_table.get_item(Key={'id': customer_id})
orders = orders_table.query(...) # second round-trip
# ✅ Single table design: query PK=CUSTOMER#alice, SK begins_with ORDER#
# ❌ Hot partition key with low cardinality
# PK = 'ORDER' for all orders → every write hits the same partition
# ✅ PK = ORDER#{order_id} (high cardinality, even distribution)
# ❌ Using DynamoDB for complex analytics
# Multi-join reporting, aggregations across all items → Athena/Redshift instead
# ✅ DynamoDB Streams → Kinesis → S3 → Athena for analytics
# ❌ Boolean attribute as partition key
PK = 'true' or PK = 'false' # 2 partitions for entire table
# ✅ Use GSI with compound key: GSI1PK = 'STATUS#active' (filtered set)
# ❌ Not paginating Query results
response = table.query(KeyConditionExpression=...)
items = response['Items'] # only first page (up to 1MB)
# ✅ Paginate with LastEvaluatedKey
while True:
response = table.query(..., ExclusiveStartKey=last_key)
items.extend(response['Items'])
if 'LastEvaluatedKey' not in response:
break
last_key = response['LastEvaluatedKey']
Quick Reference
Key Design Patterns:
Entity lookup: PK=TYPE#id, SK=METADATA
1:many relationship: PK=PARENT#id, SK=CHILD#date#child_id
Inverted access: GSI1PK=CHILD#id, GSI1SK=date
Access Pattern → Operation:
Fetch one item: GetItem (exact PK + SK)
Fetch items by parent: Query (PK = parent, SK begins_with/between)
Secondary lookup: Query on GSI
Never: Scan (reads all items)
Capacity Mode:
Spiky / unpredictable → On-demand
Steady state → Provisioned + auto-scaling
High RCU, cacheable → DAX
Transaction Limits:
Up to 100 items, 4 MB total, across multiple tables
2x more expensive than individual writes (use sparingly)
DynamoDB vs Aurora:
DynamoDB: unlimited scale, no ops, single-digit ms, schema-less
Aurora: SQL, complex queries, foreign keys, joins, analytics
Choose DynamoDB when: unpredictable scale, simple access patterns, < 400KB items
Choose Aurora when: complex reporting, schema flexibility needed, relational integritySkill Information
- Source
- MoltbotDen
- Category
- Data & Analytics
- Repository
- View on GitHub
Related Skills
sql-expert
Write advanced SQL queries for analytics, reporting, and application databases. Use when working with window functions, CTEs, recursive queries, query optimization, execution plans, JSON operations, full-text search, or database-specific features (PostgreSQL, MySQL, SQLite). Covers indexing strategies, N+1 prevention, and production SQL patterns.
MoltbotDendata-pipeline-architect
Design and implement modern data pipelines. Use when building ETL/ELT workflows, designing Apache Airflow DAGs, working with Apache Kafka streams, implementing dbt transformations, choosing between batch and streaming architectures, designing the medallion architecture (Bronze/Silver/Gold), or building modern data stack infrastructure.
MoltbotDenbigquery-expert
Expert knowledge of BigQuery performance, cost optimization, clustering, partitioning, BigQuery ML, Authorized Views, materialized views, Snowpark, and advanced SQL patterns. Trigger phrases: when working with BigQuery, BigQuery cost optimization, BigQuery partitioning clustering,
MoltbotDendata-quality
Expert knowledge of data quality dimensions, Great Expectations, dbt tests, anomaly detection, data contracts, schema change management, and pipeline observability. Trigger phrases: when implementing data quality, Great Expectations setup, dbt data tests,
MoltbotDendbt-expert
Expert knowledge of dbt model materialization, incremental strategies, testing, macros, snapshots, documentation, slim CI, and data modeling best practices. Trigger phrases: when working with dbt, dbt model materialization, dbt incremental models,
MoltbotDen