Skip to content
Launch GitLab Knowledge Graph

Add A/B testing framework for recommendation algorithms

Problem

Currently we deploy new recommendation algorithms to 100% of users immediately. This is risky:

  • No gradual rollout
  • Can't measure impact before full deployment
  • Difficult to roll back if metrics degrade
  • No statistical significance testing

Proposed Solution

Implement A/B testing framework for controlled experiments:

Features

  1. Traffic Splitting

    • Route users to variants based on user_id hash
    • Support 2-10 variants per experiment
    • Sticky assignment (same user always sees same variant)
  2. Metrics Tracking

    • Click-through rate (CTR)
    • Conversion rate
    • Time on page
    • Revenue per user
    • Statistical significance (p-value, confidence intervals)
  3. Experiment Configuration

    • YAML config files for experiments
    • Dynamic allocation (increase traffic to winning variant)
    • Automatic winner promotion
  4. Dashboard

    • Real-time metric comparison
    • Statistical significance indicators
    • Experiment history and results

Example Config

experiments:
  cf_optimization_v2:
    variants:
      - name: control
        traffic: 0.5
        model: collaborative_filtering_v1
      - name: treatment  
        traffic: 0.5
        model: collaborative_filtering_v2_cache_opt
    metrics:
      - ctr
      - conversion
      - latency_p99
    duration_days: 14
    min_sample_size: 10000

Tech Stack

  • Experimentation: Statsig or GrowthBook
  • Metrics: Prometheus + custom events
  • Analysis: Python notebooks (scipy.stats)

Use Cases

  1. Test cache optimization impact on user engagement
  2. Compare SVD vs neural collaborative filtering
  3. Test different recommendation blending strategies
  4. Validate personalization vs trending items

Acceptance Criteria

  • Traffic splitting working (user_id hash based)
  • Metrics collection integrated with analytics
  • Statistical significance calculator
  • Dashboard showing live experiment results
  • 2+ experiments run successfully

Timeline

  • Week 1: Framework implementation
  • Week 2: Dashboard + metrics integration
  • Week 3: First experiment (cache optimization)

cc @dmitry @jean_gabriel