L

Llm Data Textbook Quality Fasttext Classifier V2

Developed by kenhktsui
This is an educational value classifier built on fasttext, designed to determine whether online text has high educational value, suitable for large language model (LLM) pre-training data filtering.
Downloads 3,651
Release Time : 5/19/2024

Model Overview

This classifier can assess the educational value level of text, categorized into high, medium, and low levels, particularly useful for quality filtering of LLM training data.

Model Features

Efficient CPU Inference
Built on fasttext, it can classify over 2000 samples per second on CPU, suitable for real-time use
Three-Level Educational Value Assessment
Provides high, medium, and low educational value levels, offering more granular evaluation than binary classification
Quantized Model Support
Includes a quantized model version (model_quantized.bin) for optimized inference efficiency

Model Capabilities

Text Classification
Educational Value Assessment
Data Quality Filtering

Use Cases

LLM Training Data Filtering
Pre-training Data Filtering
Filtering high-quality educational text data before LLM pre-training
Improves training data quality and enhances model performance
Educational Content Analysis
Textbook Content Evaluation
Assessing the educational value level of different educational materials
Helps identify high-quality educational content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase