M

Msmarco T5 Base V1

Developed by doc2query
T5-based doc2query model for document expansion and training data generation
Downloads 112
Release Time : 3/2/2022

Model Overview

This model is based on the T5 architecture, primarily used for document expansion and domain-specific training data generation. It can generate multiple relevant queries for input text to enhance retrieval system performance.

Model Features

Document Expansion
Can generate 20-40 queries for a paragraph, co-indexing the paragraph with generated queries to improve retrieval effectiveness
Training Data Generation
Can be used to generate training data for embedding models, creating (query, text) pairs for unlabeled text
Bridging Semantic Gaps
Generates queries containing synonyms to bridge semantic gaps in lexical retrieval

Model Capabilities

Text Generation
Query Generation
Document Expansion

Use Cases

Information Retrieval
Search Engine Optimization
Co-index generated queries with original documents to enhance BM25 retrieval performance
Validated as an effective search engine in the BEIR benchmark
Machine Learning
Training Data Generation
Generate (query, text) pairs for unlabeled text to train dense embedding models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase