bilingual-embedding-base Open-source Bilingual Model - Free for Feature Extraction and Sentence Similarity Calculation

Bilingual Embedding Base

Developed by Lajavaness

A bilingual sentence embedding model based on sentence-transformers, supporting feature extraction and sentence similarity calculation

Text Embedding

Transformers

Open Source License:Apache-2.0 #Bilingual Embedding #Educational Domain Clustering #French Reranking

Downloads 1,269

Release Time : 6/26/2024

Model Overview

This model is primarily used to generate sentence embeddings, applicable for tasks such as sentence similarity calculation, clustering, retrieval, and reranking

Model Features

Bilingual Support

Supports bilingual processing, enabling cross-language sentence similarity calculation

Multifunctional Applications

Supports various downstream tasks including clustering, retrieval, and reranking

High Performance

Performs excellently on multiple evaluation datasets, particularly outstanding in reranking tasks

Model Capabilities

Sentence embedding generation

Sentence similarity calculation

Text clustering

Information retrieval

Result reranking

Use Cases

Education

Educational Content Clustering

Automatically clusters education-related texts

Achieved a v_measure of 64.71 on the AlloProfClusteringP2P dataset

Information Retrieval

Legal Document Retrieval

Retrieves relevant documents from legal texts

Achieved a map@100 of 10.852 on the BSARDRetrieval dataset

E-commerce

Product Review Classification

Performs sentiment classification on product reviews

Achieved an accuracy of 40.354% on the AmazonReviewsClassification(fr) dataset

🚀 Sentence-Transformers

Sentence-Transformers is a library for state-of-the-art sentence, text and image embeddings. This README presents the performance results of the bilingual-embedding-base model on various tasks.

📚 Documentation

Model Information

Property	Details
Library Name	sentence-transformers
Pipeline Tag	sentence-similarity
Tags	sentence-transformers, feature-extraction, sentence-similarity, transformers, sentence-embedding, mteb
Model Name	bilingual-embedding-base

Performance Results

Clustering on MTEB AlloProfClusteringP2P

Task Type: Clustering
Dataset: lyon-nlp/alloprof (MTEB AlloProfClusteringP2P, default config, test split, revision 392ba3f5bcc8c51f578786c1fc3dae648662cb9b)
Metrics:
- v_measure: 64.71347977413274
- v_measures: [0.6276406100583347, 0.6380984027637511, 0.6322659733117469, 0.652473343931587, 0.6411502615838207]

Clustering on MTEB AlloProfClusteringS2S

Task Type: Clustering
Dataset: lyon-nlp/alloprof (MTEB AlloProfClusteringS2S, default config, test split, revision 392ba3f5bcc8c51f578786c1fc3dae648662cb9b)
Metrics:
- v_measure: 45.56886694296516
- v_measures: [0.4359417286866465, 0.4218168523895086, 0.419693469863105, 0.4981808644314091, 0.4546120704986696]

Reranking on MTEB AlloprofReranking

Task Type: Reranking
Dataset: lyon-nlp/mteb-fr-reranking-alloprof-s2p (MTEB AlloprofReranking, default config, test split, revision 65393d0d7a08a10b4e348135e824f385d420b0fd)
Metrics:
- map: 70.45979049191963
- mrr: 71.61250582558874
- nAUC_map_diff1: 53.81172404763562
- nAUC_map_max: 11.344605627946006
- nAUC_mrr_diff1: 53.73572411920392
- nAUC_mrr_max: 11.900785644756448

Retrieval on MTEB AlloprofRetrieval

Task Type: Retrieval
Dataset: lyon-nlp/alloprof (MTEB AlloprofRetrieval, default config, test split, revision fcf295ea64c750f41fadbaa37b9b861558e1bfbd)
Metrics:
- map_at_1: 24.956999999999997
- map_at_10: 35.519
- map_at_100: 36.602000000000004
- map_at_1000: 36.667
- map_at_20: 36.199
- map_at_3: 32.397999999999996
- map_at_5: 34.121
- mrr_at_1: 24.956822107081177
- mrr_at_10: 35.5187515420676
- mrr_at_100: 36.60233410773095
- mrr_at_1000: 36.6673425266803
- mrr_at_20: 36.19886807640962
- mrr_at_3: 32.39781232009212
- mrr_at_5: 34.120610247553294
- nauc_map_at_1000_diff1: 35.47549954610535
- nauc_map_at_1000_max: 32.90078190148035
- nauc_map_at_100_diff1: 35.450977423889455
- nauc_map_at_100_max: 32.94163322687819
- nauc_map_at_10_diff1: 35.33565787959566
- nauc_map_at_10_max: 32.64293321320758
- nauc_map_at_1_diff1: 41.36063963274824
- nauc_map_at_1_max: 29.231198539339754
- nauc_map_at_20_diff1: 35.43536321023612
- nauc_map_at_20_max: 32.88345868813372
- nauc_map_at_3_diff1: 35.686505961183194
- nauc_map_at_3_max: 32.00806639149525
- nauc_map_at_5_diff1: 35.34083004527267
- nauc_map_at_5_max: 32.34459863266794
- nauc_mrr_at_1000_diff1: 35.47549954610535
- nauc_mrr_at_1000_max: 32.90078190148035
- nauc_mrr_at_100_diff1: 35.450977423889455
- nauc_mrr_at_100_max: 32.94163322687819
- nauc_mrr_at_10_diff1: 35.33565787959566
- nauc_mrr_at_10_max: 32.64293321320758
- nauc_mrr_at_1_diff1: 41.36063963274824
- nauc_mrr_at_1_max: 29.231198539339754
- nauc_mrr_at_20_diff1: 35.43536321023612
- nauc_mrr_at_20_max: 32.88345868813372
- nauc_mrr_at_3_diff1: 35.686505961183194
- nauc_mrr_at_3_max: 32.00806639149525
- nauc_mrr_at_5_diff1: 35.34083004527267
- nauc_mrr_at_5_max: 32.34459863266794
- nauc_ndcg_at_1000_diff1: 34.4325639393338
- nauc_ndcg_at_1000_max: 34.44147157978809
- nauc_ndcg_at_100_diff1: 33.88653913244061
- nauc_ndcg_at_100_max: 35.59739772150559
- nauc_ndcg_at_10_diff1: 33.51057326557921
- nauc_ndcg_at_10_max: 34.18300339583124
- nauc_ndcg_at_1_diff1: 41.36063963274824
- nauc_ndcg_at_1_max: 29.231198539339754
- nauc_ndcg_at_20_diff1: 33.804940889687714
- nauc_ndcg_at_20_max: 35.13066047975507
- nauc_ndcg_at_3_diff1: 34.167179830350406
- nauc_ndcg_at_3_max: 32.94370726673571
- nauc_ndcg_at_5_diff1: 33.52138730976869
- nauc_ndcg_at_5_max: 33.48723912009657
- nauc_precision_at_1000_diff1: 22.67827963096167
- nauc_precision_at_1000_max: 67.17958434804059
- nauc_precision_at_100_diff1: 24.83989747177019
- nauc_precision_at_100_max: 56.86858468981111
- nauc_precision_at_10_diff1: 27.973507280697774
- nauc_precision_at_10_max: 39.14247264250278
- nauc_precision_at_1_diff1: 41.36063963274824
- nauc_precision_at_1_max: 29.231198539339754
- nauc_precision_at_20_diff1: 28.244479044808852
- nauc_precision_at_20_max: 44.5538108461975
- nauc_precision_at_3_diff1: 30.09786756288918
- nauc_precision_at_3_max: 35.51252285506707
- nauc_precision_at_5_diff1: 28.444019525872083
- nauc_precision_at_5_max: 36.71010320598834
- nauc_recall_at_1000_diff1: 22.67827963096142
- nauc_recall_at_1000_max: 67.17958434804105
- nauc_recall_at_100_diff1: 24.83989747177035
- nauc_recall_at_100_max: 56.868584689811186
- nauc_recall_at_10_diff1: 27.973507280697774
- nauc_recall_at_10_max: 39.14247264250283
- nauc_recall_at_1_diff1: 41.36063963274824
- nauc_recall_at_1_max: 29.231198539339754
- nauc_recall_at_20_diff1: 28.244479044808823
- nauc_recall_at_20_max: 44.553810846197486
- nauc_recall_at_3_diff1: 30.097867562889213
- nauc_recall_at_3_max: 35.51252285506708
- nauc_recall_at_5_diff1: 28.444019525872054
- nauc_recall_at_5_max: 36.71010320598833
- ndcg_at_1: 24.956999999999997
- ndcg_at_10: 41.260999999999996
- ndcg_at_100: 46.628
- ndcg_at_1000: 48.339999999999996
- ndcg_at_20: 43.71
- ndcg_at_3: 34.771
- ndcg_at_5: 37.862
- precision_at_1: 24.956999999999997
- precision_at_10: 5.963
- precision_at_100: 0.8500000000000001
- precision_at_1000: 0.098
- precision_at_20: 3.463
- precision_at_3: 13.874
- precision_at_5: 9.818999999999999
- recall_at_1: 24.956999999999997
- recall_at_10: 59.629
- recall_at_100: 84.974
- recall_at_1000: 98.402
- recall_at_20: 69.257
- recall_at_3: 41.623
- recall_at_5: 49.092999999999996

Classification on MTEB AmazonReviewsClassification (fr)

Task Type: Classification
Dataset: mteb/amazon_reviews_multi (MTEB AmazonReviewsClassification (fr), fr config, test split, revision 1399c76144fd37290681b995c656ef9b2e06e26d)
Metrics:
- accuracy: 40.354
- f1: 38.96127209677864
- f1_weighted: 38.96127209677863

Retrieval on MTEB BSARDRetrieval

Task Type: Retrieval
Dataset: maastrichtlawtech/bsard (MTEB BSARDRetrieval, default config, test split, revision 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59)
Metrics:
- map_at_1: 5.405
- map_at_10: 9.715
- map_at_100: 10.852
- map_at_1000: 10.986
- map_at_20: 10.324
- map_at_3: 8.183
- map_at_5: 9.129
- mrr_at_1: 5.405405405405405
- mrr_at_10: 9.715250965250963
- mrr_at_100: 10.852488713991455
- mrr_at_1000: 10.986128810251913
- mrr_at_20: 10.324372731493472
- mrr_at_3: 8.183183183183184
- mrr_at_5: 9.12912912912913
- nauc_map_at_1000_diff1: 27.989147369852507
- nauc_map_at_1000_max: 13.956676274011212
- nauc_map_at_100_diff1: 27.97931691839789
- nauc_map_at_100_max: 13.885054931557967
- nauc_map_at_10_diff1: 28.39484917161119
- nauc_map_at_10_max: 12.859153645903026
- nauc_map_at_1_diff1: 40.37326707123
- nauc_map_at_1_max: 19.53225223707901
- nauc_map_at_20_diff1: 27.969553431914463
- nauc_map_at_20_max: 13.993555023169318
- nauc_map_at_3_diff1: 31.130088078198526
- nauc_map_at_3_max: 11.59332560179606
- nauc_map_at_5_diff1: 28.51357198186801
- nauc_map_at_5_max: 12.629395513278464
- nauc_mrr_at_1000_diff1: 27.989147369852507
- nauc_mrr_at_1000_max: 13.956676274011212
- nauc_mrr_at_100_diff1: 27.97931691839789
- nauc_mrr_at_100_max: 13.885054931557967
- nauc_mrr_at_10_diff1: 28.39484917161119
- nauc_mrr_at_10_max: 12.859153645903026
- nauc_mrr_at_1_diff1: 40.37326707123
- nauc_mrr_at_1_max: 19.53225223707901
- nauc_mrr_at_20_diff1: 27.969553431914463
- nauc_mrr_at_20_max: 13.993555023169318
- nauc_mrr_at_3_diff1: 31.130088078198526
- nauc_mrr_at_3_max: 11.59332560179606
- nauc_mrr_at_5_diff1: 28.51357198186801
- nauc_mrr_at_5_max: 12.629395513278464
- nauc_ndcg_at_1000_diff1: 23.808642111518818
- nauc_ndcg_at_1000_max: 16.627566094152723
- nauc_ndcg_at_100_diff1: 24.15753828838301
- nauc_ndcg_at_100_max: 15.687219543962943
- nauc_ndcg_at_10_diff1: 25.37534871155007
- nauc_ndcg_at_10_max: 12.129150838266701
- nauc_ndcg_at_1_diff1: 40.37326707123
- nauc_ndcg_at_1_max: 19.53225223707901
- nauc_ndcg_at_20_diff1: 24.446964236295017
- nauc_ndcg_at_20_max: 15.718345074121808
- nauc_ndcg_at_3_diff1: 29.121862169292296
- nauc_ndcg_at_3_max: 9.481317109332187
- nauc_ndcg_at_5_diff1: 25.17815567035254
- nauc_ndcg_at_5_max: 11.187689974665869
- nauc_precision_at_1000_diff1: 10.131813130975075
- nauc_precision_at_1000_max: 30.42776840947068
- nauc_precision_at_100_diff1: 17.8621079715631
- nauc_precision_at_100_max: 20.81364427037172
- nauc_precision_at_10_diff1: 20.451314767316635
- nauc_precision_at_10_max: 11.400840293532708
- nauc_precision_at_1_diff1: 40.37326707123
- nauc_precision_at_1_max: 19.53225223707901
- nauc_precision_at_20_diff1: 19.17170129809007
- nauc_precision_at_20_max: 20.190489899791007
- nauc_precision_at_3_diff1: 24.905337103765735
- nauc_precision_at_3_max: 4.960457155777402
- nauc_precision_at_5_diff1: ... (incomplete in original, continue as needed)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご