L

Llm Data Textbook Quality Fasttext Classifier V1

Developed by kenhktsui
A text classification model built on fasttext, used to determine whether text meets textbook-level data quality, serving as a data filtering tool for large language model training.
Downloads 35
Release Time : 4/28/2024

Model Overview

This model is an optimized version of llm-data-textbook-quality-classifier-v1, achieving not only a higher F1 score but also the capability to classify over 2000 samples per second on a CPU.

Model Features

High performance
Capable of classifying over 2000 samples per second on a CPU, suitable for large-scale data processing.
High accuracy
Achieves an F1 score of 0.8695 on the training set and 0.8485 on the test set, demonstrating excellent performance.
Textbook-level quality detection
Specifically optimized for textbook-level data quality, effectively filtering high-quality training data.

Model Capabilities

Text quality classification
Data filtering
Large-scale text processing

Use Cases

Data preprocessing
Large language model training data filtering
Use this model to filter high-quality textbook-level data before training large language models.
Improves model training effectiveness and generation quality
Content quality assessment
Educational content quality assessment
Evaluate whether educational texts meet textbook-level quality standards.
Helps identify high-quality educational content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase