F

Fineweb Edu Classifier

Developed by HuggingFaceFW
A webpage educational value assessment classifier trained on the FineWeb dataset, used for filtering high-quality educational content
Downloads 150.77k
Release Time : 5/6/2024

Model Overview

This model is specifically designed to evaluate the educational value of webpage content. Trained on 450,000 webpage samples annotated by Llama3, it can score webpage content on a 0-5 scale to help filter high-quality educational content.

Model Features

High-Quality Training Data
Trained on 450,000 webpage samples annotated by LLama3-70B-instruct
Precise Scoring System
Provides a detailed 0-5 scoring system, where 0 indicates no educational value and 5 indicates extremely high educational value
Optimized Training Strategy
Freezes embedding and encoder layers, focuses on optimizing the classification head, trained for 20 epochs with a learning rate of 3e-4

Model Capabilities

Webpage Content Educational Value Assessment
Educational Content Quality Scoring
Educational Content Filtering

Use Cases

Educational Content Filtering
Building Educational Datasets
Used for constructing the FineWeb-Edu educational dataset
Successfully applied in the construction of the FineWeb-Edu dataset
Webpage Content Quality Assessment
Evaluating the educational value of webpage content
Achieved an F1 score of 82% (with a threshold of 3)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase