🚀 DRAMA-1B: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
DRAMA-1B is a dense retrieval model built on a pruned large language model backbone. It's created by pruning a large language model and fine - tuned for efficient and generalizable multilingual text retrieval. Despite its compact size, DRAMA-1B achieves strong performance in both English and multilingual retrieval tasks by leveraging large language models for high - quality data augmentation.
The default embedding size of drama-1b
is 2048. Thanks to the adoption of Matryoshka Representation Learning, the dimensionality can be flexibly truncated to values like 768 or 256.
For more details, please check our paper.
🚀 Quick Start
This section provides a quick guide on how to use the DRAMA-1B model for encoding queries and documents.
✨ Features
- Multilingual Support: Supports 20 languages including Arabic, Bengali, Chinese, etc., enabling effective retrieval across different languages.
- Flexible Dimensionality: Utilizes Matryoshka Representation Learning, allowing the embedding dimensionality to be flexibly adjusted.
- Strong Performance: Demonstrates excellent performance in both English and multilingual retrieval tasks.
📦 Installation
The README does not provide specific installation steps, so this section is skipped.
💻 Usage Examples
Basic Usage
Below are examples of using drama-base
to encode query and document examples from the MIRACL dataset, using either Transformers or Sentence Transformers.
Transformers
import torch
from transformers import AutoTokenizer, AutoModel
queries = [
'What percentage of the Earth\'s atmosphere is oxygen?',
'意大利首都是哪里?',
]
documents = [
"The amount of oxygen in the atmosphere has fluctuated over the last 600 million years, reaching a peak of 35% during the Carboniferous period, significantly higher than today's 21%.",
"羅馬是欧洲国家意大利首都和罗马首都广域市的首府及意大利全国的政治、经济、文化和交通中心,位于意大利半島中部的台伯河下游平原地,建城初期在七座小山丘上,故又名“七丘之城”。按城市范围内的人口计算,罗马是意大利人口最多的城市,也是欧盟人口第三多的城市。",
]
model_name = "facebook/drama-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device)
query_embs = model.encode_queries(tokenizer, queries)
doc_embs = model.encode_documents(tokenizer, documents)
scores = query_embs @ doc_embs.T
print(scores.tolist())
⚠️ Important Note
The trust_remote_code
will use our customized drama_modeling.py
with two details:
- We use bi - directional attention instead of uni - directional attention.
- We add
"Query: "
as prefix for query text. (No prefix added to document).
DRAMA models are trained using Matryoshka Representation Learning (MRL) to support flexible dimensionality. Both queries and documents can be encoded into smaller dimensions, such as 256, using the following:
query_embs = model.encode_queries(tokenizer, queries, dim=256)
doc_embs = model.encode_documents(tokenizer, documents, dim=256)
scores = query_embs @ doc_embs.T
print(scores.tolist())
Sentence Transformers
from sentence_transformers import SentenceTransformer
queries = [
'What percentage of the Earth\'s atmosphere is oxygen?',
'意大利首都是哪里?',
]
documents = [
"The amount of oxygen in the atmosphere has fluctuated over the last 600 million years, reaching a peak of 35% during the Carboniferous period, significantly higher than today's 21%.",
"羅馬是欧洲国家意大利首都和罗马首都广域市的首府及意大利全国的政治、经济、文化和交通中心,位于意大利半島中部的台伯河下游平原地,建城初期在七座小山丘上,故又名“七丘之城”。按城市范围内的人口计算,罗马是意大利人口最多的城市,也是欧盟人口第三多的城市。",
]
model = SentenceTransformer("facebook/drama-1b", trust_remote_code=True)
query_embs = model.encode(queries, prompt_name="query")
doc_embs = model.encode(documents)
scores = model.similarity(query_embs, doc_embs)
print(scores.tolist())
⚠️ Important Note
- The
trust_remote_code
will use our customized drama_modeling.py
which uses bi - directional attention instead of uni - directional attention.
- For queries, you have to use
prompt_name="query"
to select the prompt called "query", or prompt="Query: "
to specify the prompt string manually.
DRAMA models are trained using Matryoshka Representation Learning (MRL) to support flexible dimensionality. Both queries and documents can be encoded into smaller dimensions, such as 256, using the following:
from sentence_transformers import SentenceTransformer
queries = [
'What percentage of the Earth\'s atmosphere is oxygen?',
'意大利首都是哪里?',
]
documents = [
"The amount of oxygen in the atmosphere has fluctuated over the last 600 million years, reaching a peak of 35% during the Carboniferous period, significantly higher than today's 21%.",
"羅馬是欧洲国家意大利首都和罗马首都广域市的首府及意大利全国的政治、经济、文化和交通中心,位于意大利半島中部的台伯河下游平原地,建城初期在七座小山丘上,故又名“七丘之城”。按城市范围内的人口计算,罗马是意大利人口最多的城市,也是欧盟人口第三多的城市。",
]
model = SentenceTransformer("facebook/drama-1b", truncate_dim=256, trust_remote_code=True)
query_embs = model.encode(queries, prompt_name="query")
doc_embs = model.encode(documents)
scores = model.similarity(query_embs, doc_embs)
print(scores.tolist())
📚 Documentation
Evaluation
The model has been evaluated on multiple retrieval benchmarks, including [BEIR](https://github.com/beir - cellar/beir), [MIRACL](https://github.com/project - miracl/miracl), MLDR, and several multilingual retrieval tasks in [MTEB](https://github.com/embeddings - benchmark/mteb). It shows strong performance in both English and multilingual retrieval tasks.
drama-1b
released on this page corresponds to the line DRAMA-1B with 1B non - embedding parameters.
Supported Languages
DRAMA-1B was initialized from [Llama3.2-1B](https://huggingface.co/meta - llama/Llama - 3.2-1B) (originally pruned from [Llama3.1-8B](https://huggingface.co/meta - llama/Llama - 3.1-8B)). During retriever training, the training data covered the following 20 languages (sorted alphabetically):
Arabic, Bengali, Chinese, English, Finnish, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Portuguese, Russian, Spanish, Swahili, Telugu, Thai, Yoruba
Other languages may have degraded performance.
📄 License
The model uses the CC - BY - NC - 4.0 license.
📚 Citation
If you find our paper or models helpful, please consider citing as follows:
@article{drama,
title={{Drama}: Diverse Augmentation from Large Language Models To Smaller Dense Retrievers},
author={Ma, Xueguang and Lin, Victoria Xi and Oguz, Barlas and Lin, Jimmy and Yih, Wen - tau and Chen, Xilun},
journal={arXiv:2502.18460},
year={2025}
}