🚀 SentenceTransformer based on intfloat/multilingual-e5-large
This is a fine - tuned model from intfloat/multilingual - e5 - large using sentence - transformers. It maps sentences and paragraphs to a 1024 - dimensional dense vector space and can be applied in semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and other tasks.
📚 Documentation
✨ Features
- Model Type: Sentence Transformer
- Base model: intfloat/multilingual - e5 - large
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
📦 Installation
First, install the Sentence Transformers library:
pip install -U sentence-transformers
💻 Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("meandyou200175/e5_large_finetune_word")
sentences = [
'A long appendage protruding from the lower back. Often covered in fur or scales. A common feature of animal girls.',
'tail',
'stomach day',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
📊 Evaluation
Information Retrieval
Evaluated with InformationRetrievalEvaluator
Property |
Details |
cosine_accuracy@1 |
0.9073 |
cosine_accuracy@2 |
0.9739 |
cosine_accuracy@5 |
0.9942 |
cosine_accuracy@10 |
0.999 |
cosine_accuracy@100 |
1.0 |
cosine_precision@1 |
0.9073 |
cosine_precision@2 |
0.487 |
cosine_precision@5 |
0.1988 |
cosine_precision@10 |
0.0999 |
cosine_precision@100 |
0.01 |
cosine_recall@1 |
0.9073 |
cosine_recall@2 |
0.9739 |
cosine_recall@5 |
0.9942 |
cosine_recall@10 |
0.999 |
cosine_recall@100 |
1.0 |
cosine_ndcg@10 |
0.9602 |
cosine_mrr@1 |
0.9073 |
cosine_mrr@2 |
0.9406 |
cosine_mrr@5 |
0.9463 |
cosine_mrr@10 |
0.947 |
cosine_mrr@100 |
0.9471 |
cosine_map@100 |
0.9471 |
🔧 Technical Details
Training Dataset
Unnamed Dataset
Evaluation Dataset
word_embedding
- Dataset: word_embedding at af76b11
- Size: 1,036 evaluation samples
- Columns:
query
and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
type |
string |
string |
details |
- min: 4 tokens
- mean: 35.89 tokens
- max: 164 tokens
|
- min: 3 tokens
- mean: 5.38 tokens
- max: 14 tokens
|
- Samples:
query |
positive |
A machine that manipulates data according to a list of instructions. The ability to store and execute lists of instructions called programs make computers extremely versatile. On Danbooru's images they are most often used for drawing, playing games and accessing the internet. |
computer |
A playing card with two clubs. |
two of clubs |
Yebisu (ヱビス, Ebisu) is a beer produced by Sapporo Breweries. It is one of the most popular beers in Japan. |
Yebisu beer |
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)