product-analytics
Data-driven product decision making with expert-level analytics methodology. Covers north star metrics, funnel and cohort analysis, A/B testing, event taxonomy design, attribution modeling, session recording patterns, and the balance between data-informed and data-driven decisions. Trigger phrases:
Product Analytics
Analytics without a measurement strategy is just expensive data hoarding. Most product teams track dozens of metrics but can't tell you what their north star is or whether it moved last quarter. This skill covers how to instrument products correctly, analyze data rigorously, and — critically — know when to override the data with judgment.
Core Mental Model
Metrics are proxies for value, not value itself. Optimizing a metric blindly causes Goodhart's Law failures: "When a measure becomes a target, it ceases to be a good measure." Daily active users can be gamed with dark patterns. Conversion rates can be inflated with low-quality sign-ups. The key is building a metric tree where improving each leaf metric reflects genuine value creation for users.
Layer your metrics:
- North Star Metric — single metric most correlated with long-term value delivery
- Leading Indicators — metrics that predict north star movement (can act on now)
- Guardrail Metrics — metrics you cannot degrade while chasing the north star
- Diagnostic Metrics — help you understand why north star moved
North Star Metric
The north star is the ONE metric that best captures the value your product delivers to customers at scale.
Characteristics of a Good North Star
- Reflects customer value received, not activity
- Predictive of long-term revenue
- Lagging enough to matter, leading enough to act on
- Understandable by the whole company
- One number (not a composite)
Company North Star
---------------------------------
Airbnb Nights booked
Spotify Time spent listening
Slack Messages sent within a workspace
Facebook Daily Active Users (their original, now controversial)
Stripe Total payment volume
Duolingo Daily active learners
HubSpot Weekly active teams using ≥3 features
Anti-patterns:
Revenue (lag, can obscure user value loss before churn)
Pageviews (activity, not value)
Sign-ups (output, not engagement)
App installs (output)
North Star Metric Framework
Step 1: List the value moments in your product
"User gets value when they: send first message / complete a project /
receive first payment / reach their goal"
Step 2: Find the metric that best proxies that moment at scale
"Weekly projects completed" captures the value moment
Step 3: Stress-test it
- Can it be gamed without delivering real value?
- Does it degrade if we compromise quality?
- Does it grow when our best customers engage more?
Step 4: Build the metric tree under it
North Star: "Weekly projects completed"
└── Activation: % who complete first project in 7 days
└── Engagement: Projects/user/week
└── Retention: % users active week-over-week
└── Expansion: Teams inviting 2+ members
Funnel Analysis
Funnels show conversion rates between sequential steps. Use them to find where users drop off and prioritize optimization.
Funnel Construction
-- Example: Signup funnel (Mixpanel/Amplitude SQL equivalent)
SELECT
step,
COUNT(DISTINCT user_id) as users,
COUNT(DISTINCT user_id) * 100.0 / MAX(COUNT(DISTINCT user_id)) OVER() as pct_of_top
FROM (
SELECT user_id, 'visited_landing' as step, created_at FROM page_views WHERE path = '/'
UNION ALL
SELECT user_id, 'clicked_signup', created_at FROM events WHERE event = 'signup_clicked'
UNION ALL
SELECT user_id, 'completed_signup', created_at FROM events WHERE event = 'signup_completed'
UNION ALL
SELECT user_id, 'completed_onboarding', created_at FROM events WHERE event = 'onboarding_finished'
)
GROUP BY step
Interpreting Drop-Off
Funnel:
Visited landing: 100,000 (100%)
Clicked sign-up: 22,000 (22%) ← 78% drop here
Completed sign-up: 15,000 (68% of prev, 15% total)
Completed onboarding: 6,000 (40% of prev, 6% total)
Activated (week 1): 3,200 (53% of prev, 3.2% total)
Analysis:
- Biggest absolute drop: landing → click (78K users lost)
→ A/B test headline, CTA, value prop
- Biggest % drop: sign-up → onboarding (40% lost)
→ Investigate: too many steps? Email verification blocking?
- Highest leverage: improving landing→click by 5pp adds ~5K sign-ups
Statistical Significance for Funnel Changes
Before declaring a funnel improvement: did the change cause it?from scipy import stats
# Chi-squared test for conversion rate changes
control_conversions = 150
control_visitors = 1000
test_conversions = 175
test_visitors = 1000
chi2, p_value = stats.chi2_contingency([
[control_conversions, control_visitors - control_conversions],
[test_conversions, test_visitors - test_conversions]
])[:2]
print(f"p-value: {p_value:.4f}") # < 0.05 = statistically significant
Cohort Analysis
Cohorts group users by when they first performed an action. Essential for understanding retention and the impact of product changes on different user groups.
Retention Cohort Table
Sign-up cohort Day 1 Day 7 Day 14 Day 30 Day 60 Day 90
Jan 2025 45% 28% 22% 18% 15% 14% ← flattening
Feb 2025 48% 30% 24% 19% 16% -
Mar 2025 52% 35% 28% 22% - -
Apr 2025 55% 38% 30% - - -
Reading: Jan's 14% D90 retention means 14% of Jan signups
were still active 90 days later.
Sign of PMF: Retention flattens (stops declining) — means
you have a core audience that keeps coming back.
Sign of trouble: Retention approaches 0% — bleeding all users.
Cohort Analysis for Feature Impact
Scenario: Did the new onboarding (shipped March 15) improve retention?
Pre-onboarding cohorts (Jan-Mar): D30 retention = 18% avg
Post-onboarding cohorts (Apr-Jun): D30 retention = 24% avg
Is this the feature? Check:
1. Did other things change? (marketing channel, seasonality)
2. Are cohort sizes similar? (mix shift can distort)
3. Is the difference statistically significant? (run t-test on user-level data)
4. Is the new cohort large enough? (wait 30 days for D30 data)
A/B Testing Fundamentals
Before You Test — Sample Size
Calculate required sample size BEFORE running the experiment. Running until it "looks significant" inflates false positive rates.
import math
def required_sample_size(baseline_rate, min_detectable_effect, power=0.8, significance=0.05):
"""
baseline_rate: current conversion rate (e.g., 0.05 for 5%)
min_detectable_effect: smallest relative change worth detecting (e.g., 0.10 for 10%)
power: probability of detecting a real effect (0.8 = 80%)
significance: acceptable false positive rate (0.05 = 5%)
"""
p1 = baseline_rate
p2 = baseline_rate * (1 + min_detectable_effect)
z_alpha = 1.96 # for 95% confidence (two-tailed)
z_beta = 0.842 # for 80% power
n = (z_alpha * math.sqrt(2 * p1 * (1-p1)) + z_beta * math.sqrt(p1*(1-p1) + p2*(1-p2))) ** 2 / (p2 - p1) ** 2
return math.ceil(n)
# Example: 5% baseline, want to detect 10% lift
n = required_sample_size(0.05, 0.10)
print(f"Required per variant: {n:,}") # ~14,000 per variant
# Total duration = (n × 2) / daily_traffic
Interpreting Results
p-value: probability of seeing this result if null hypothesis is true
p < 0.05 → reject null (likely real effect)
p > 0.05 → don't reject null (effect may not be real)
Confidence interval: [lower, upper] — if it doesn't cross 0, the effect is real
Practical significance: Is the lift large enough to matter?
Statistically significant 0.1% lift on checkout: not worth shipping complexity
Statistically significant 5% lift on checkout: ship immediately
Novelty effect: New features often show inflated early results.
Run tests for at least 2 full business cycles (2 weeks minimum).
Segment: "new users during test" vs "existing users during test"
— existing users show the novelty effect; new users show steady state.
A/B Test Decision Framework
Result Action
----------- --------
Significant positive → Ship (verify guardrails didn't degrade)
Significant negative → Drop + analyze why
Inconclusive → Assess: extend runtime? Increase sample? Simplify hypothesis?
"Directionally positive" → Be skeptical. Either extend or run a bigger bet.
Multi-armed bandit: Use for content/messaging experiments where you
want to exploit winning variants quickly.
Use classic A/B for feature experiments where you need
clean before/after attribution.
Event Taxonomy Design
Well-designed events are the foundation of all analytics. Bad taxonomy = unmaintainable, uninterpretable data.
Noun-Verb Convention
Format: {object}_{action} (snake_case, past tense)
✅ Good events:
user_signed_up
project_created
payment_completed
team_member_invited
feature_flag_enabled
file_exported
search_performed
onboarding_step_completed
❌ Bad events:
button_clicked (what button? what did it do?)
page_viewed (use consistent noun: dashboard_viewed)
action_performed (meaningless)
UserSignedUp (wrong casing convention)
sign up complete (spaces, ambiguous)
Event Properties (the real value is in properties)
// user_signed_up event with rich properties
analytics.track('user_signed_up', {
// Identity
user_id: 'usr_abc123',
email_domain: 'acme.com', // not full email — privacy
// Acquisition
signup_source: 'organic', // 'organic' | 'paid' | 'referral' | 'direct'
utm_campaign: 'spring-sale',
utm_medium: 'email',
referrer_user_id: 'usr_xyz', // if invited
// Context
signup_method: 'google_oauth', // 'email_password' | 'google_oauth' | 'github'
plan_selected: 'pro',
// Experiment
experiment_variant: 'onboarding_v2', // which A/B variant they saw
// Timing
time_from_landing_to_signup_seconds: 142,
})
Event Taxonomy Schema (Amplitude/Mixpanel)
Object Type → Action → Properties
------------------------------------
User → signed_up, logged_in, profile_updated, deleted_account
Project → created, renamed, archived, deleted, shared, exported
Content → created, published, edited, deleted, viewed, liked, shared
Payment → initiated, completed, failed, refunded, subscription_created
Team → created, member_invited, member_removed, role_changed
Feature → enabled, disabled, usage (with feature_name property)
Error → api_error, validation_error (with error_code, error_message)
Amplitude/Mixpanel Implementation Patterns
// Mixpanel: identify user with traits
mixpanel.identify(user.id)
mixpanel.people.set({
'$email': user.email,
'$name': user.name,
'plan': user.plan,
'company': user.company,
'created_at': user.createdAt,
})
// Amplitude: set user properties
amplitude.setUserId(user.id)
const identify = new amplitude.Identify()
.set('plan', user.plan)
.set('company_size', user.companySize)
.setOnce('signup_date', user.createdAt) // setOnce prevents overwrites
amplitude.identify(identify)
// Group analytics (org-level metrics)
mixpanel.set_group('company', user.companyId)
mixpanel.get_group('company', user.companyId).set({
'plan': org.plan,
'seat_count': org.seats,
})
Attribution Modeling
How do you credit conversions across multiple touchpoints?
Customer journey:
Day 1: Saw Twitter ad → Ad spend: $0.50
Day 3: Read blog post (organic)
Day 7: Clicked Google Search ad → Ad spend: $2.00
Day 8: Opened welcome email
Day 10: Converted (paid $99)
Attribution models:
First-touch: Twitter ad gets 100% credit ($99)
Last-touch: Google Search ad gets 100% credit ($99) ← default in GA4
Linear: Each touchpoint gets $24.75 (4 touchpoints)
Time-decay: More credit to touchpoints closer to conversion
Google Search: ~40%, Email: ~30%, Blog: ~20%, Twitter: ~10%
Data-driven: ML model based on historical patterns (requires lots of data)
What to use when:
- First-touch: Understanding top-of-funnel channel value
- Last-touch: SEM/paid campaigns where click → convert is the model
- Linear/Time-decay: Content marketing attribution
- Data-driven: Large companies with enough events for ML models
Session Recording Analysis
Tools: Hotjar, FullStory, Microsoft Clarity (free)
Patterns to Look For
Rage clicks: User clicks same area repeatedly → something isn't working
Dead clicks: Clicking non-interactive elements → perceived affordance mismatch
Scroll depth: Where do users stop reading? → CTA placement optimization
U-turns: Back-and-forth between two pages → navigation confusion
Form abandonment: Which field causes drop-off? → form friction analysis
Hotjar heatmap reading:
Dark red spots → most attention/interaction
Cold blue spots → ignored content
Common findings:
- Hero image gets more clicks than CTA
- Users try to click non-link text
- Footer links have surprising engagement
Data-Informed vs Data-Driven
The most important distinction in analytics philosophy.
Data-driven: The data makes the decision. The metrics determine the action. No override.
Data-informed: Data is a critical input, but judgment, strategy, and ethics also inform the decision.
When to trust the data over judgment:
- A/B test with sufficient power and clear result
- Funnel drop-off is obvious and unambiguous
- Retention cohort shows clear inflection from product change
When to override data with judgment:
- Metric being optimized conflicts with long-term user trust
("notification click rates are up" — but we're burning goodwill)
- Small n — your sample is too small to be directional
- Survivorship bias — data only reflects users who stayed
- The strategy requires investment before metrics improve
(new market expansion looks "bad" in data before it gets good)
- Ethical concerns about a tactic that "works" in the data
Anti-Patterns
❌ Vanity metrics — pageviews, downloads, Twitter followers. They move easily, don't predict revenue, and create a false sense of progress.
❌ Peeking at A/B tests — checking results before hitting your required sample size inflates false positives dramatically (up to 3x more false positives).
❌ One-size-fits-all metrics — different user segments should have segment-specific KPIs. Power users and casual users have different value patterns.
❌ Event names that change — signup_complete becomes registration_finished in v2. Now you can't compare cohorts. Lock naming conventions and enforce them.
❌ Tracking everything — 500 events with no ownership creates a graveyard. Each event should answer a specific question. Delete events not used in 6 months.
❌ Correlation is causation — "Sign-ups increased the week we shipped X" is not evidence X caused it. You need controlled experiments.
Quick Reference
A/B Test Checklist
- [ ] Hypothesis written (If X, then Y, because Z)
- [ ] Primary metric defined before launch
- [ ] Guardrail metrics defined (won't degrade)
- [ ] Sample size calculated (not based on time, based on events)
- [ ] Test runs minimum 2 full business cycles
- [ ] Segment analysis planned (new vs existing users)
- [ ] Ship/no-ship threshold defined upfront
North Star Metric Check
- [ ] Reflects value delivered to customers (not activity)
- [ ] Predictive of long-term revenue
- [ ] Can be decomposed into leading indicators
- [ ] Can't be easily gamed without delivering real value
- [ ] Whole team understands what it means and how to move it
Event Taxonomy Rules
Format: {noun}_{verb} (past tense, snake_case)
Properties: Always include user_id, timestamp (auto), experiment_variant
Avoid: PII (email, phone), button/UI names (use semantic names)
Test in: Dev environment before production
Review: Analytics review every 6 months — delete unused eventsSkill Information
- Source
- MoltbotDen
- Category
- Product & Design
- Repository
- View on GitHub
Related Skills
engineering-management
Senior engineering management practices for tech leads and managers. Covers 1:1 structure, SBI feedback model, structured hiring, team health metrics, technical debt communication, estimation techniques, managing up, IC vs management track, and psychological safety. Trigger phrases: engineering mana
MoltbotDenfigma-expert
Professional-grade Figma workflow for product designers and design system builders. Covers auto layout, component architecture, design tokens with Variables, prototyping, developer handoff, design system governance, and productivity plugins. Trigger phrases: Figma component, auto layout, design toke
MoltbotDenprd-writing
Expert product requirements document writing for shipping high-quality features. Covers PRD structure, user story format with acceptance criteria, problem framing techniques, requirement types, edge cases, design review, success metrics, and keeping PRDs as living documents. Trigger phrases: write a
MoltbotDenproduct-strategy
Senior product management thinking for vision setting, prioritization, roadmapping, and stakeholder alignment. Covers opportunity sizing, RICE/Kano/ICE frameworks, OKR writing, product-market fit signals, and roadmap communication strategies. Trigger phrases: product roadmap, feature prioritization,
MoltbotDentechnical-writing
Expert-level technical writing covering docs-as-code workflow, the Divio documentation system, API documentation, README structure, Architecture Decision Records, changelog conventions, writing for scanning, code example quality, and self-review checklists.
MoltbotDen