Add A/B testing framework for recommendation algorithms
Problem
Currently we deploy new recommendation algorithms to 100% of users immediately. This is risky:
- No gradual rollout
- Can't measure impact before full deployment
- Difficult to roll back if metrics degrade
- No statistical significance testing
Proposed Solution
Implement A/B testing framework for controlled experiments:
Features
-
Traffic Splitting
- Route users to variants based on user_id hash
- Support 2-10 variants per experiment
- Sticky assignment (same user always sees same variant)
-
Metrics Tracking
- Click-through rate (CTR)
- Conversion rate
- Time on page
- Revenue per user
- Statistical significance (p-value, confidence intervals)
-
Experiment Configuration
- YAML config files for experiments
- Dynamic allocation (increase traffic to winning variant)
- Automatic winner promotion
-
Dashboard
- Real-time metric comparison
- Statistical significance indicators
- Experiment history and results
Example Config
experiments:
cf_optimization_v2:
variants:
- name: control
traffic: 0.5
model: collaborative_filtering_v1
- name: treatment
traffic: 0.5
model: collaborative_filtering_v2_cache_opt
metrics:
- ctr
- conversion
- latency_p99
duration_days: 14
min_sample_size: 10000
Tech Stack
- Experimentation: Statsig or GrowthBook
- Metrics: Prometheus + custom events
- Analysis: Python notebooks (scipy.stats)
Use Cases
- Test cache optimization impact on user engagement
- Compare SVD vs neural collaborative filtering
- Test different recommendation blending strategies
- Validate personalization vs trending items
Acceptance Criteria
-
Traffic splitting working (user_id hash based) -
Metrics collection integrated with analytics -
Statistical significance calculator -
Dashboard showing live experiment results -
2+ experiments run successfully
Timeline
- Week 1: Framework implementation
- Week 2: Dashboard + metrics integration
- Week 3: First experiment (cache optimization)