Launch GitLab Knowledge Graph

FALSE POSITIVE: Customer complaint about legitimate content flagged as toxic

Customer Report

Customer ID: enterprise_customer_001 Date: 2025-01-18 Severity: High - Customer escalation

Issue

Customer reported that the following message was incorrectly flagged as toxic:

"This is absolutely fantastic work! You killed it with this presentation!"

The word "killed" triggered a false positive (threat category, confidence: 0.72).

Context

This is an idiom meaning "did very well"
Model is being too literal with certain words
Customer has threatened to churn if not fixed

Analysis

Current model: BERT fine-tuned on Civil Comments
Training data may lack positive idioms with violent words
Confidence calibration may need adjustment for idiomatic expressions

Proposed Solution

Add idiom detection preprocessing layer
Augment training data with positive idioms
Context-aware scoring (sentiment + toxicity)
Lower threshold for high-confidence "positive sentiment + violent word" cases

Action Items

Collect more examples of false positives on idioms
Create idiom whitelist (temporary fix)
Retrain model with idiom-aware data augmentation
Add sentiment analysis as additional signal

Priority: High - customer impact cc: @bob @sabrina_farmer