🚀 PyLate model based on EuroBERT/EuroBERT-210m
This fine - tuned model, fjmgAI/col1 - 210M - EuroBERT
, is based on EuroBERT/EuroBERT - 210m
. It maps sentences and paragraphs to 128 - dimensional dense vectors and can be used for semantic textual similarity, especially suitable for Spanish applications in question - answering and document retrieval.

✨ Features
- Based on the
EuroBERT/EuroBERT - 210m
base model.
- Fine - tuned using PyLate with contrastive training.
- Can map sentences and paragraphs to 128 - dimensional dense vectors.
- Suitable for semantic textual similarity using the MaxSim operator.
- Designed for Spanish applications in question - answering and document retrieval.
📦 Installation
First install the PyLate library:
pip install -U pylate
💻 Usage Examples
Basic Usage
import torch
from pylate import models
model = models.ColBERT("fjmgAI/col1-210M-EuroBERT", trust_remote_code=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
query = "¿Cuál es la capital de España?"
positive_doc = "La capital de España es Madrid."
negative_doc = "Florida es un estado en los Estados Unidos."
sentences = [query, positive_doc, negative_doc]
inputs = model.tokenize(sentences)
inputs = {key: value.to(device) for key, value in inputs.items()}
with torch.no_grad():
embeddings_dict = model(inputs)
embeddings = embeddings_dict['token_embeddings']
def colbert_similarity(query_emb, doc_emb):
"""
Computes ColBERT-style similarity between query and document embeddings.
Uses maximum similarity (MaxSim) between individual tokens.
Args:
query_emb: [query_tokens, embedding_dim]
doc_emb: [doc_tokens, embedding_dim]
Returns:
Normalized similarity score
"""
similarity_matrix = torch.matmul(query_emb, doc_emb.T)
max_similarities = similarity_matrix.max(dim=1)[0]
return max_similarities.sum() / query_emb.shape[0]
query_emb = embeddings[0]
positive_emb = embeddings[1]
negative_emb = embeddings[2]
positive_score = colbert_similarity(query_emb, positive_emb)
negative_score = colbert_similarity(query_emb, negative_emb)
print(f"Similarity with positive document: {positive_score.item():.4f}")
print(f"Similarity with negative document: {negative_score.item():.4f}")
📚 Documentation
Base Model
EuroBERT/EuroBERT - 210m
Fine - Tuning Method
Fine - tuning was performed using PyLate, with contrastive training on the [rag - comprehensive - triplets](https://huggingface.co/datasets/baconnier/rag - comprehensive - triplets) dataset. It maps sentences & paragraphs to sequences of 128 - dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
Dataset
[baconnier/rag - comprehensive - triplets
](https://huggingface.co/datasets/baconnier/rag - comprehensive - triplets)
Description
This dataset has been filtered for the Spanish language containing 303,000 examples, designed for rag - comprehensive - triplets.
Fine - Tuning Details
- The model was trained using the Contrastive Training.
- Evaluated with
pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator
Property |
Details |
Model Type |
PyLate model based on EuroBERT/EuroBERT - 210m |
Training Data |
baconnier/rag - comprehensive - triplets |
Metric |
Accuracy: 0.9848384857177734 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.4.1
- PyLate: 1.1.7
- Transformers: 4.48.2
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Purpose
This tuned model is designed for Spanish applications that require the use of efficient semantic search comparing embeddings at the token level with its MaxSim operation, ideal for question - answering and document retrieval.
📄 License
- Developed by: fjmgAI
- License: apache - 2.0
