Bilingual Embedding Base
B
Bilingual Embedding Base
Developed by Lajavaness
A bilingual sentence embedding model based on sentence-transformers, supporting feature extraction and sentence similarity calculation
Downloads 1,269
Release Time : 6/26/2024
Model Overview
This model is primarily used to generate sentence embeddings, applicable for tasks such as sentence similarity calculation, clustering, retrieval, and reranking
Model Features
Bilingual Support
Supports bilingual processing, enabling cross-language sentence similarity calculation
Multifunctional Applications
Supports various downstream tasks including clustering, retrieval, and reranking
High Performance
Performs excellently on multiple evaluation datasets, particularly outstanding in reranking tasks
Model Capabilities
Sentence embedding generation
Sentence similarity calculation
Text clustering
Information retrieval
Result reranking
Use Cases
Education
Educational Content Clustering
Automatically clusters education-related texts
Achieved a v_measure of 64.71 on the AlloProfClusteringP2P dataset
Information Retrieval
Legal Document Retrieval
Retrieves relevant documents from legal texts
Achieved a map@100 of 10.852 on the BSARDRetrieval dataset
E-commerce
Product Review Classification
Performs sentiment classification on product reviews
Achieved an accuracy of 40.354% on the AmazonReviewsClassification(fr) dataset
🚀 Sentence-Transformers
Sentence-Transformers is a library for state-of-the-art sentence, text and image embeddings. This README presents the performance results of the bilingual-embedding-base model on various tasks.
📚 Documentation
Model Information
Property | Details |
---|---|
Library Name | sentence-transformers |
Pipeline Tag | sentence-similarity |
Tags | sentence-transformers, feature-extraction, sentence-similarity, transformers, sentence-embedding, mteb |
Model Name | bilingual-embedding-base |
Performance Results
Clustering on MTEB AlloProfClusteringP2P
- Task Type: Clustering
- Dataset: lyon-nlp/alloprof (MTEB AlloProfClusteringP2P, default config, test split, revision 392ba3f5bcc8c51f578786c1fc3dae648662cb9b)
- Metrics:
- v_measure: 64.71347977413274
- v_measures: [0.6276406100583347, 0.6380984027637511, 0.6322659733117469, 0.652473343931587, 0.6411502615838207]
Clustering on MTEB AlloProfClusteringS2S
- Task Type: Clustering
- Dataset: lyon-nlp/alloprof (MTEB AlloProfClusteringS2S, default config, test split, revision 392ba3f5bcc8c51f578786c1fc3dae648662cb9b)
- Metrics:
- v_measure: 45.56886694296516
- v_measures: [0.4359417286866465, 0.4218168523895086, 0.419693469863105, 0.4981808644314091, 0.4546120704986696]
Reranking on MTEB AlloprofReranking
- Task Type: Reranking
- Dataset: lyon-nlp/mteb-fr-reranking-alloprof-s2p (MTEB AlloprofReranking, default config, test split, revision 65393d0d7a08a10b4e348135e824f385d420b0fd)
- Metrics:
- map: 70.45979049191963
- mrr: 71.61250582558874
- nAUC_map_diff1: 53.81172404763562
- nAUC_map_max: 11.344605627946006
- nAUC_mrr_diff1: 53.73572411920392
- nAUC_mrr_max: 11.900785644756448
Retrieval on MTEB AlloprofRetrieval
- Task Type: Retrieval
- Dataset: lyon-nlp/alloprof (MTEB AlloprofRetrieval, default config, test split, revision fcf295ea64c750f41fadbaa37b9b861558e1bfbd)
- Metrics:
- map_at_1: 24.956999999999997
- map_at_10: 35.519
- map_at_100: 36.602000000000004
- map_at_1000: 36.667
- map_at_20: 36.199
- map_at_3: 32.397999999999996
- map_at_5: 34.121
- mrr_at_1: 24.956822107081177
- mrr_at_10: 35.5187515420676
- mrr_at_100: 36.60233410773095
- mrr_at_1000: 36.6673425266803
- mrr_at_20: 36.19886807640962
- mrr_at_3: 32.39781232009212
- mrr_at_5: 34.120610247553294
- nauc_map_at_1000_diff1: 35.47549954610535
- nauc_map_at_1000_max: 32.90078190148035
- nauc_map_at_100_diff1: 35.450977423889455
- nauc_map_at_100_max: 32.94163322687819
- nauc_map_at_10_diff1: 35.33565787959566
- nauc_map_at_10_max: 32.64293321320758
- nauc_map_at_1_diff1: 41.36063963274824
- nauc_map_at_1_max: 29.231198539339754
- nauc_map_at_20_diff1: 35.43536321023612
- nauc_map_at_20_max: 32.88345868813372
- nauc_map_at_3_diff1: 35.686505961183194
- nauc_map_at_3_max: 32.00806639149525
- nauc_map_at_5_diff1: 35.34083004527267
- nauc_map_at_5_max: 32.34459863266794
- nauc_mrr_at_1000_diff1: 35.47549954610535
- nauc_mrr_at_1000_max: 32.90078190148035
- nauc_mrr_at_100_diff1: 35.450977423889455
- nauc_mrr_at_100_max: 32.94163322687819
- nauc_mrr_at_10_diff1: 35.33565787959566
- nauc_mrr_at_10_max: 32.64293321320758
- nauc_mrr_at_1_diff1: 41.36063963274824
- nauc_mrr_at_1_max: 29.231198539339754
- nauc_mrr_at_20_diff1: 35.43536321023612
- nauc_mrr_at_20_max: 32.88345868813372
- nauc_mrr_at_3_diff1: 35.686505961183194
- nauc_mrr_at_3_max: 32.00806639149525
- nauc_mrr_at_5_diff1: 35.34083004527267
- nauc_mrr_at_5_max: 32.34459863266794
- nauc_ndcg_at_1000_diff1: 34.4325639393338
- nauc_ndcg_at_1000_max: 34.44147157978809
- nauc_ndcg_at_100_diff1: 33.88653913244061
- nauc_ndcg_at_100_max: 35.59739772150559
- nauc_ndcg_at_10_diff1: 33.51057326557921
- nauc_ndcg_at_10_max: 34.18300339583124
- nauc_ndcg_at_1_diff1: 41.36063963274824
- nauc_ndcg_at_1_max: 29.231198539339754
- nauc_ndcg_at_20_diff1: 33.804940889687714
- nauc_ndcg_at_20_max: 35.13066047975507
- nauc_ndcg_at_3_diff1: 34.167179830350406
- nauc_ndcg_at_3_max: 32.94370726673571
- nauc_ndcg_at_5_diff1: 33.52138730976869
- nauc_ndcg_at_5_max: 33.48723912009657
- nauc_precision_at_1000_diff1: 22.67827963096167
- nauc_precision_at_1000_max: 67.17958434804059
- nauc_precision_at_100_diff1: 24.83989747177019
- nauc_precision_at_100_max: 56.86858468981111
- nauc_precision_at_10_diff1: 27.973507280697774
- nauc_precision_at_10_max: 39.14247264250278
- nauc_precision_at_1_diff1: 41.36063963274824
- nauc_precision_at_1_max: 29.231198539339754
- nauc_precision_at_20_diff1: 28.244479044808852
- nauc_precision_at_20_max: 44.5538108461975
- nauc_precision_at_3_diff1: 30.09786756288918
- nauc_precision_at_3_max: 35.51252285506707
- nauc_precision_at_5_diff1: 28.444019525872083
- nauc_precision_at_5_max: 36.71010320598834
- nauc_recall_at_1000_diff1: 22.67827963096142
- nauc_recall_at_1000_max: 67.17958434804105
- nauc_recall_at_100_diff1: 24.83989747177035
- nauc_recall_at_100_max: 56.868584689811186
- nauc_recall_at_10_diff1: 27.973507280697774
- nauc_recall_at_10_max: 39.14247264250283
- nauc_recall_at_1_diff1: 41.36063963274824
- nauc_recall_at_1_max: 29.231198539339754
- nauc_recall_at_20_diff1: 28.244479044808823
- nauc_recall_at_20_max: 44.553810846197486
- nauc_recall_at_3_diff1: 30.097867562889213
- nauc_recall_at_3_max: 35.51252285506708
- nauc_recall_at_5_diff1: 28.444019525872054
- nauc_recall_at_5_max: 36.71010320598833
- ndcg_at_1: 24.956999999999997
- ndcg_at_10: 41.260999999999996
- ndcg_at_100: 46.628
- ndcg_at_1000: 48.339999999999996
- ndcg_at_20: 43.71
- ndcg_at_3: 34.771
- ndcg_at_5: 37.862
- precision_at_1: 24.956999999999997
- precision_at_10: 5.963
- precision_at_100: 0.8500000000000001
- precision_at_1000: 0.098
- precision_at_20: 3.463
- precision_at_3: 13.874
- precision_at_5: 9.818999999999999
- recall_at_1: 24.956999999999997
- recall_at_10: 59.629
- recall_at_100: 84.974
- recall_at_1000: 98.402
- recall_at_20: 69.257
- recall_at_3: 41.623
- recall_at_5: 49.092999999999996
Classification on MTEB AmazonReviewsClassification (fr)
- Task Type: Classification
- Dataset: mteb/amazon_reviews_multi (MTEB AmazonReviewsClassification (fr), fr config, test split, revision 1399c76144fd37290681b995c656ef9b2e06e26d)
- Metrics:
- accuracy: 40.354
- f1: 38.96127209677864
- f1_weighted: 38.96127209677863
Retrieval on MTEB BSARDRetrieval
- Task Type: Retrieval
- Dataset: maastrichtlawtech/bsard (MTEB BSARDRetrieval, default config, test split, revision 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59)
- Metrics:
- map_at_1: 5.405
- map_at_10: 9.715
- map_at_100: 10.852
- map_at_1000: 10.986
- map_at_20: 10.324
- map_at_3: 8.183
- map_at_5: 9.129
- mrr_at_1: 5.405405405405405
- mrr_at_10: 9.715250965250963
- mrr_at_100: 10.852488713991455
- mrr_at_1000: 10.986128810251913
- mrr_at_20: 10.324372731493472
- mrr_at_3: 8.183183183183184
- mrr_at_5: 9.12912912912913
- nauc_map_at_1000_diff1: 27.989147369852507
- nauc_map_at_1000_max: 13.956676274011212
- nauc_map_at_100_diff1: 27.97931691839789
- nauc_map_at_100_max: 13.885054931557967
- nauc_map_at_10_diff1: 28.39484917161119
- nauc_map_at_10_max: 12.859153645903026
- nauc_map_at_1_diff1: 40.37326707123
- nauc_map_at_1_max: 19.53225223707901
- nauc_map_at_20_diff1: 27.969553431914463
- nauc_map_at_20_max: 13.993555023169318
- nauc_map_at_3_diff1: 31.130088078198526
- nauc_map_at_3_max: 11.59332560179606
- nauc_map_at_5_diff1: 28.51357198186801
- nauc_map_at_5_max: 12.629395513278464
- nauc_mrr_at_1000_diff1: 27.989147369852507
- nauc_mrr_at_1000_max: 13.956676274011212
- nauc_mrr_at_100_diff1: 27.97931691839789
- nauc_mrr_at_100_max: 13.885054931557967
- nauc_mrr_at_10_diff1: 28.39484917161119
- nauc_mrr_at_10_max: 12.859153645903026
- nauc_mrr_at_1_diff1: 40.37326707123
- nauc_mrr_at_1_max: 19.53225223707901
- nauc_mrr_at_20_diff1: 27.969553431914463
- nauc_mrr_at_20_max: 13.993555023169318
- nauc_mrr_at_3_diff1: 31.130088078198526
- nauc_mrr_at_3_max: 11.59332560179606
- nauc_mrr_at_5_diff1: 28.51357198186801
- nauc_mrr_at_5_max: 12.629395513278464
- nauc_ndcg_at_1000_diff1: 23.808642111518818
- nauc_ndcg_at_1000_max: 16.627566094152723
- nauc_ndcg_at_100_diff1: 24.15753828838301
- nauc_ndcg_at_100_max: 15.687219543962943
- nauc_ndcg_at_10_diff1: 25.37534871155007
- nauc_ndcg_at_10_max: 12.129150838266701
- nauc_ndcg_at_1_diff1: 40.37326707123
- nauc_ndcg_at_1_max: 19.53225223707901
- nauc_ndcg_at_20_diff1: 24.446964236295017
- nauc_ndcg_at_20_max: 15.718345074121808
- nauc_ndcg_at_3_diff1: 29.121862169292296
- nauc_ndcg_at_3_max: 9.481317109332187
- nauc_ndcg_at_5_diff1: 25.17815567035254
- nauc_ndcg_at_5_max: 11.187689974665869
- nauc_precision_at_1000_diff1: 10.131813130975075
- nauc_precision_at_1000_max: 30.42776840947068
- nauc_precision_at_100_diff1: 17.8621079715631
- nauc_precision_at_100_max: 20.81364427037172
- nauc_precision_at_10_diff1: 20.451314767316635
- nauc_precision_at_10_max: 11.400840293532708
- nauc_precision_at_1_diff1: 40.37326707123
- nauc_precision_at_1_max: 19.53225223707901
- nauc_precision_at_20_diff1: 19.17170129809007
- nauc_precision_at_20_max: 20.190489899791007
- nauc_precision_at_3_diff1: 24.905337103765735
- nauc_precision_at_3_max: 4.960457155777402
- nauc_precision_at_5_diff1: ... (incomplete in original, continue as needed)
Jina Embeddings V3
Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.
Text Embedding
Transformers Supports Multiple Languages

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
A cross-encoder model trained on the MS Marco passage ranking task for query-passage relevance scoring in information retrieval
Text Embedding English
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
A sparse retrieval model based on distillation technology, optimized for OpenSearch, supporting inference-free document encoding with improved search relevance and efficiency over V1
Text Embedding
Transformers English

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
A biomedical entity representation model based on PubMedBERT, optimized for semantic relation capture through self-aligned pre-training
Text Embedding English
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large is a powerful sentence transformer model focused on sentence similarity and text embedding tasks, excelling in multiple benchmark tests.
Text Embedding English
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 is an English sentence transformer model focused on sentence similarity tasks, excelling in multiple text embedding benchmarks.
Text Embedding
Transformers Supports Multiple Languages

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base is a multilingual sentence embedding model supporting over 50 languages, suitable for tasks like sentence similarity calculation.
Text Embedding
Transformers Supports Multiple Languages

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT is a chemical language model designed to achieve fully machine-driven ultrafast polymer informatics. It maps PSMILES strings into 600-dimensional dense fingerprints to numerically represent polymer chemical structures.
Text Embedding
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
A sentence embedding model based on Turkish BERT, optimized for semantic similarity tasks
Text Embedding
Transformers Other

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
A text embedding model fine-tuned based on BAAI/bge-small-en-v1.5, trained with the MEDI dataset and MTEB classification task datasets, optimized for query encoding in retrieval tasks.
Text Embedding
Safetensors English
G
avsolatorio
945.68k
29
Featured Recommended AI Models