Launch GitLab Knowledge Graph

Implement text toxicity detection model

Overview

Implement ML model to detect toxic, abusive, and harmful text content.

Model Architecture

BERT-based transformer model
Fine-tuned on toxicity datasets (Civil Comments, Jigsaw)
Multi-label classification: toxic, severe_toxic, obscene, threat, insult, identity_hate

Performance Requirements

Accuracy: >92% on test set
Inference latency: <100ms
False positive rate: <5%

Tech Stack

Python 3.11+
PyTorch / HuggingFace Transformers
ONNX for optimized inference
TensorRT for GPU acceleration

Training Data

500K+ labeled examples
Balanced across toxicity categories
Regular retraining with new data

Acceptance Criteria

Model accuracy >92%
API response time <100ms p95
Support for 10+ languages
Bias audit completed