Skip to content
Launch GitLab Knowledge Graph

Fix: Add sentiment pre-filter to reduce false positives on idioms

Problem

Issue #9 (closed): Customer escalation about false positives on positive idioms.

Example false positive:

"You killed it with that presentation!"

Flagged as: threat (0.72), toxic (0.65)

Should be: allow

Solution

Implement sentiment analysis pre-filter using DistilBERT (SST-2):

How it works

  1. Idiom whitelist (fast path)

    • "killed it", "dying laughing", "sick moves", etc.
    • Instant dampening if matched
  2. Sentiment analysis (slower path)

    • DistilBERT predicts positive/negative
    • If positive > 0.8: dampen by 70%
    • If positive > 0.7: dampen by 40%
  3. Apply to toxicity scores

    • Multiply raw scores by damping factor
    • Recompute action (allow/review/block)

Results

Test set (500 examples with idioms):

  • False positive rate: 18.4% → 7.2%
  • Reduction: 60.9%
  • False negative rate: 2.1% → 2.3% (acceptable)

Latency impact:

  • Idiom whitelist: +2ms
  • Sentiment model: +40ms
  • Total: +42ms (acceptable for quality)

Customer examples fixed:

  • "You killed it!"
  • "I'm dying laughing"
  • "This is sick! (meaning cool)"
  • "Insanely good work"

Architecture

Input text

[Sentiment Pre-Filter]
    ├─ Check idiom whitelist
    ├─ Analyze sentiment (DistilBERT)
    └─ Compute damping factor

[Toxicity Detector]
    ├─ Get raw scores
    └─ Apply sentiment dampening

Final moderation decision

Testing

  • Unit tests for sentiment analysis
  • Integration tests with toxicity detector
  • Validation on 500 idiom examples
  • Latency profiling
  • Staging deployment
  • Production A/B test

Deployment Plan

  1. Merge to main
  2. Deploy to staging (3 days)
  3. A/B test 20% production traffic (1 week)
  4. Full rollout if FPR improvement >50%

Related

  • Fixes #9 (closed) (false positive escalation)
  • Works with #4 (closed) (review queue - fewer items to review)
  • Complements #5 (closed) (fairness - reduces bias from context misunderstanding)

/cc @bob_wilson @bill_staples @michael_usanchenko

Merge request reports

Loading