Skip to content
Launch GitLab Knowledge Graph

FALSE POSITIVE: Customer complaint about legitimate content flagged as toxic

Customer Report

Customer ID: enterprise_customer_001 Date: 2025-01-18 Severity: High - Customer escalation

Issue

Customer reported that the following message was incorrectly flagged as toxic:

"This is absolutely fantastic work! You killed it with this presentation!"

The word "killed" triggered a false positive (threat category, confidence: 0.72).

Context

  • This is an idiom meaning "did very well"
  • Model is being too literal with certain words
  • Customer has threatened to churn if not fixed

Analysis

  • Current model: BERT fine-tuned on Civil Comments
  • Training data may lack positive idioms with violent words
  • Confidence calibration may need adjustment for idiomatic expressions

Proposed Solution

  1. Add idiom detection preprocessing layer
  2. Augment training data with positive idioms
  3. Context-aware scoring (sentiment + toxicity)
  4. Lower threshold for high-confidence "positive sentiment + violent word" cases

Action Items

  • Collect more examples of false positives on idioms
  • Create idiom whitelist (temporary fix)
  • Retrain model with idiom-aware data augmentation
  • Add sentiment analysis as additional signal

Priority: High - customer impact cc: @bob @sabrina_farmer