FALSE POSITIVE: Customer complaint about legitimate content flagged as toxic
Customer Report
Customer ID: enterprise_customer_001 Date: 2025-01-18 Severity: High - Customer escalation
Issue
Customer reported that the following message was incorrectly flagged as toxic:
"This is absolutely fantastic work! You killed it with this presentation!"
The word "killed" triggered a false positive (threat category, confidence: 0.72).
Context
- This is an idiom meaning "did very well"
- Model is being too literal with certain words
- Customer has threatened to churn if not fixed
Analysis
- Current model: BERT fine-tuned on Civil Comments
- Training data may lack positive idioms with violent words
- Confidence calibration may need adjustment for idiomatic expressions
Proposed Solution
- Add idiom detection preprocessing layer
- Augment training data with positive idioms
- Context-aware scoring (sentiment + toxicity)
- Lower threshold for high-confidence "positive sentiment + violent word" cases
Action Items
-
Collect more examples of false positives on idioms -
Create idiom whitelist (temporary fix) -
Retrain model with idiom-aware data augmentation -
Add sentiment analysis as additional signal
Priority: High - customer impact cc: @bob @sabrina_farmer