Skip to content
Launch GitLab Knowledge Graph

Implement text toxicity detection model

Overview

Implement ML model to detect toxic, abusive, and harmful text content.

Model Architecture

  • BERT-based transformer model
  • Fine-tuned on toxicity datasets (Civil Comments, Jigsaw)
  • Multi-label classification: toxic, severe_toxic, obscene, threat, insult, identity_hate

Performance Requirements

  • Accuracy: >92% on test set
  • Inference latency: <100ms
  • False positive rate: <5%

Tech Stack

  • Python 3.11+
  • PyTorch / HuggingFace Transformers
  • ONNX for optimized inference
  • TensorRT for GPU acceleration

Training Data

  • 500K+ labeled examples
  • Balanced across toxicity categories
  • Regular retraining with new data

Acceptance Criteria

  • Model accuracy >92%
  • API response time <100ms p95
  • Support for 10+ languages
  • Bias audit completed