S

Scandi Fine Web Cleaner

Developed by davanstrien
This model is a demonstration classifier designed to identify problematic content (wrong language, garbled text) in Danish and Swedish web pages.
Downloads 42
Release Time : 1/10/2025

Model Overview

Developed by fine-tuning XLM-RoBERTa-base on the FineWeb-c dataset, this model serves as a preliminary filter for web text to improve annotation efficiency.

Model Features

High precision
Achieves 95.2% precision, meaning fewer false positives
Bilingual support
Specifically optimized for Danish and Swedish content
Web text filtering
Designed as a preliminary filter to enhance web data annotation efficiency

Model Capabilities

Identify wrong language content
Detect garbled text
Web text classification

Use Cases

Data cleaning
Web data preprocessing
Filter low-quality content before data annotation
Improves annotation efficiency and quality
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase