🚀 Sentence-CamemBERT-Large
This model is an embedding model for French, capable of representing sentence semantics as vectors and offering powerful semantic search capabilities.
✨ Features
📦 Model Information
Property |
Details |
Pipeline Tag |
sentence-similarity |
Language |
fr |
Datasets |
stsb_multi_mt |
Tags |
Text, Sentence Similarity, Sentence - Embedding, camembert - large |
License |
apache - 2.0 |
Library Name |
sentence - transformers |
Model Name |
sentence - camembert - large by Van Tuan DANG |
Task |
Sentence - Embedding (Text Similarity) |
Dataset for Evaluation |
Text Similarity fr (stsb_multi_mt, args: fr) |
Metric (Test Pearson correlation coefficient) |
88.63 |
💻 Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Lajavaness/sentence-camembert-large")
sentences = ["Un avion est en train de décoller.",
"Un homme joue d'une grande flûte.",
"Un homme étale du fromage râpé sur une pizza.",
"Une personne jette un chat au plafond.",
"Une personne est en train de plier un morceau de papier.",
]
embeddings = model.encode(sentences)
Advanced Usage (Evaluation)
from sentence_transformers import SentenceTransformer
from sentence_transformers.readers import InputExample
from datasets import load_dataset
def convert_dataset(dataset):
dataset_samples=[]
for df in dataset:
score = float(df['similarity_score'])/5.0
inp_example = InputExample(texts=[df['sentence1'],
df['sentence2']], label=score)
dataset_samples.append(inp_example)
return dataset_samples
df_dev = load_dataset("stsb_multi_mt", name="fr", split="dev")
df_test = load_dataset("stsb_multi_mt", name="fr", split="test")
dev_samples = convert_dataset(df_dev)
val_evaluator = EmbeddingSimilarityEvaluator.from_input_examples(dev_samples, name='sts-dev')
val_evaluator(model, output_path="./")
test_samples = convert_dataset(df_test)
test_evaluator = EmbeddingSimilarityEvaluator.from_input_examples(test_samples, name='sts-test')
test_evaluator(model, output_path="./")
📚 Evaluation Results
Dev Set
Test Set - Pearson Score
Test Set - Spearman Score
📄 License
This project is licensed under the Apache 2.0 License.
📚 Citation
@article{reimers2019sentence,
title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
author={Nils Reimers, Iryna Gurevych},
journal={https://arxiv.org/abs/1908.10084},
year={2019}
}
@article{martin2020camembert,
title={CamemBERT: a Tasty French Language Mode},
author={Martin, Louis and Muller, Benjamin and Suárez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, Éric Villemonte and Seddah, Djamé and Sagot, Benoît},
journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}
}