F

Formatclassifier

Developed by WebOrganizer
The FormatClassifier model categorizes web content into 24 classes based on URL and text content.
Downloads 2,429
Release Time : 2/10/2025

Model Overview

This model is fine-tuned from gte-base-en-v1.5 for 24-class format classification of web content, suitable for content organization and data preprocessing tasks.

Model Features

Multi-stage Training
Uses two-phase training data annotated by Llama-3.1-8B and Llama-3.1-405B-FP8
URL-aware Classification
Simultaneously utilizes URL and text content for more accurate classification
Efficient Inference
Supports xformers acceleration and memory optimization

Model Capabilities

Web Content Classification
Text Format Recognition
URL Analysis

Use Cases

Content Management
Web Content Archiving
Automatically classifies web content into predefined 24 format categories
Improves content organization efficiency
Data Preprocessing
Provides format labels for downstream tasks (e.g., search, recommendation)
Enhances downstream task performance
Information Filtering
Spam Ad Detection
Identifies and filters spam advertisement content
19 categories specifically designed for spam ad detection
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase