H

HTML Pruner Phi 3.8B

Developed by zstanjj
An HTML pruning model designed for RAG systems where HTML is more suitable than plain text for modeling retrieval results
Downloads 319
Release Time : 10/16/2024

Model Overview

This model specializes in processing HTML-formatted retrieval results, optimizing knowledge retrieval efficiency in RAG systems through lossless HTML cleaning and two-step HTML pruning based on block trees.

Model Features

Lossless HTML Cleaning
Only removes completely irrelevant content and compresses redundant structures while preserving all semantic information in the original HTML.
Two-Step HTML Pruning Based on Block Trees
First step uses embedding models to calculate block scores, second step uses path generation models to achieve efficient HTML pruning.
HTML Format Optimization
Specifically optimizes HTML-formatted retrieval results for RAG systems to improve knowledge retrieval efficiency.

Model Capabilities

HTML Document Cleaning
HTML Content Pruning
Semantic Information Preservation
RAG System Optimization

Use Cases

Information Retrieval
Web Content Simplification
Extracts key information from complex HTML web pages while removing redundant content.
Obtains more concise HTML content that retains semantic meaning.
RAG System Knowledge Formatting
Prepares HTML-formatted external knowledge sources for RAG systems.
Improves retrieval efficiency and accuracy of RAG systems.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase