Math Fasttext Classifier
A fasttext-based text classifier for categorizing text into mathematical or non-mathematical classes, suitable for LLM pre-training data organization
Downloads 124
Release Time : 2/25/2025
Model Overview
This model is an efficient fasttext classifier specifically designed to identify mathematical content. It was trained on a balanced dataset of 1.6 million records, achieving a test F1 score of 0.99, making it particularly suitable for enhancing LLM pre-training data organization for mathematical capabilities.
Model Features
High-Performance Classification
Achieves an F1 score of 0.99 on the test set, accurately distinguishing between mathematical and non-mathematical content
Ultra-Fast Processing
Capable of processing approximately 2000 documents per second on CPU
Designed for Data Organization
Specifically designed for LLM pre-training data organization, ideal for enhancing mathematical capabilities
Balanced Dataset
Trained on a balanced dataset with a 50:50 ratio of mathematical and non-mathematical content
Model Capabilities
Text Classification
Mathematical Content Recognition
High-Speed Text Processing
Use Cases
LLM Pre-training
Mathematical Capability Enhancement
Used to filter and enhance mathematical content in LLM pre-training data
Helps improve LLM's mathematical reasoning capabilities, as demonstrated by QWEN2.5-MATH
Content Filtering
Mathematical Content Screening
Quickly identifies mathematical content from large volumes of text
Efficiently separates mathematical and non-mathematical content
Featured Recommended AI Models
Š 2025AIbase