M

Msmarco Portuguese Mt5 Base V1

Developed by doc2query
mT5-based doc2query model for document expansion and domain-specific training data generation
Downloads 44
Release Time : 4/29/2022

Model Overview

This model, based on the mT5 architecture, can generate 20-40 queries per passage for document expansion or creating training data for embedding models.

Model Features

Document Expansion
Generates 20-40 queries per passage to enhance search engine indexing effectiveness
Training Data Generation
Generates (query, text) pairs for training high-performance dense embedding models
Multilingual Support
Based on mT5 architecture with Portuguese language processing capability

Model Capabilities

Text Generation
Query Generation
Document Expansion

Use Cases

Information Retrieval
Search Engine Enhancement
Index generated queries alongside original passages to improve BM25 retrieval performance
Validated in BEIR paper as BM25+docT5query being a powerful search engine
Machine Learning Training
Embedding Model Training
Generates (query, text) pairs for unlabeled text collections to train dense embedding models
Effectiveness demonstrated in GPL paper and examples on SBERT.net
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase