Jina Embeddings V3: An Open-Source Multilingual Embedding Model - Supports Over 100 Languages and Extracts Sentence Features

Jina Embeddings V3

Developed by jinaai

Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.

Text Embedding

Transformers

Supports Multiple Languages#Multilingual Sentence Embeddings #Cross-lingual Similarity Computation #Low-resource Language Optimization

Downloads 3.7M

Release Time : 9/5/2024

Model Overview

This model is a multilingual sentence embedding model capable of converting text into high-dimensional vector representations for computing sentence similarity and feature extraction. It supports a wide range of languages and is suitable for cross-lingual information retrieval and semantic similarity computation tasks.

Model Features

Multilingual Support

Supports over 100 languages, including major languages and various minority languages

Sentence Embeddings

Converts sentences into high-dimensional vector representations for easy computation of semantic similarity

Feature Extraction

Capable of extracting meaningful feature representations from text

Model Capabilities

Sentence similarity computation

Multilingual text embedding

Semantic feature extraction

Cross-lingual information retrieval

Use Cases

Information Retrieval

Cross-lingual Document Retrieval

Find semantically similar content in document collections across different languages

Achieved a primary score of 50.12 on the MTEB ArguAna-PL dataset

Semantic Similarity

Sentence Similarity Computation

Compute semantic similarity between two sentences

Achieved a Spearman correlation coefficient of 43.47 on the MTEB AFQMC dataset

## 🚀 Jina Embeddings V3

*Jina Embeddings V3 is a model designed for feature extraction and sentence similarity tasks. It supports multiple languages and has been evaluated on various MTEB datasets.*

## 📚 Documentation

### Model Information
| Property | Details |
|----------|---------|
| Model Type | jina-embeddings-v3 |
| Training Data | Not specified |
| Library Name | transformers |
| License | cc-by-nc-4.0 |
| Tags | feature-extraction, sentence-similarity, mteb, sentence-transformers |
| Supported Languages | multilingual, af, am, ar, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fr, fy, ga, gd, gl, gu, ha, he, hi, hr, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lo, lt, lv, mg, mk, ml, mn, mr, ms, my, ne, nl, no, om, or, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, so, sq, sr, su, sv, sw, ta, te, th, tl, tr, ug, uk, ur, uz, vi, xh, yi, zh |
| Inference | false |

### Evaluation Results

#### MTEB AFQMC (default)
- **Task Type**: STS
- **Split**: validation
| Metric | Value |
|--------|-------|
| cosine_pearson | 41.74237700998808 |
| cosine_spearman | 43.4726782647566 |
| euclidean_pearson | 42.244585459479964 |
| euclidean_spearman | 43.525070045169606 |
| main_score | 43.4726782647566 |
| manhattan_pearson | 42.04616728224863 |
| manhattan_spearman | 43.308828270754645 |
| pearson | 41.74237700998808 |
| spearman | 43.4726782647566 |

#### MTEB ArguAna-PL (default)
- **Task Type**: Retrieval
- **Split**: test
| Metric | Value |
|--------|-------|
| main_score | 50.117999999999995 |
| map_at_1 | 24.253 |
| map_at_10 | 40.725 |
| map_at_100 | 41.699999999999996 |
| map_at_1000 | 41.707 |
| map_at_20 | 41.467999999999996 |
| map_at_3 | 35.467 |
| map_at_5 | 38.291 |
| mrr_at_1 | 24.751066856330013 |
| mrr_at_10 | 40.91063808169072 |
| mrr_at_100 | 41.885497923928675 |
| mrr_at_1000 | 41.89301098419842 |
| mrr_at_20 | 41.653552355442514 |
| mrr_at_3 | 35.656709340919775 |
| mrr_at_5 | 38.466097676623946 |
| nauc_map_at_1000_diff1 | 7.503000359807567 |
| nauc_map_at_1000_max | -11.030405164830546 |
| nauc_map_at_1000_std | -8.902792782585117 |
| nauc_map_at_100_diff1 | 7.509899249593199 |
| nauc_map_at_100_max | -11.023581259404406 |
| nauc_map_at_100_std | -8.892241185067272 |
| nauc_map_at_10_diff1 | 7.24369711881512 |
| nauc_map_at_10_max | -10.810000200433278 |
| nauc_map_at_10_std | -8.987230542165776 |
| nauc_map_at_1_diff1 | 11.37175831832417 |
| nauc_map_at_1_max | -13.315221903223055 |
| nauc_map_at_1_std | -9.398199605510275 |
| nauc_map_at_20_diff1 | 7.477364530860648 |
| nauc_map_at_20_max | -10.901251218105566 |
| nauc_map_at_20_std | -8.868148116405925 |
| nauc_map_at_3_diff1 | 6.555548802174882 |
| nauc_map_at_3_max | -12.247274800542934 |
| nauc_map_at_3_std | -9.879475250984811 |
| nauc_map_at_5_diff1 | 7.426588563355882 |
| nauc_map_at_5_max | -11.347695686001805 |
| nauc_map_at_5_std | -9.34441892203972 |
| nauc_mrr_at_1000_diff1 | 5.99737552143614 |
| nauc_mrr_at_1000_max | -11.327205136505727 |
| nauc_mrr_at_1000_std | -8.791079115519503 |
| nauc_mrr_at_100_diff1 | 6.004622525255784 |
| nauc_mrr_at_100_max | -11.320336759899723 |
| nauc_mrr_at_100_std | -8.780602249831777 |
| nauc_mrr_at_10_diff1 | 5.783623516930227 |
| nauc_mrr_at_10_max | -11.095971693467078 |
| nauc_mrr_at_10_std | -8.877242032013582 |
| nauc_mrr_at_1_diff1 | 9.694937537703797 |
| nauc_mrr_at_1_max | -12.531905083727912 |
| nauc_mrr_at_1_std | -8.903992940100146 |
| nauc_mrr_at_20_diff1 | 5.984841206233873 |
| nauc_mrr_at_20_max | -11.195236951048969 |
| nauc_mrr_at_20_std | -8.757266039186018 |
| nauc_mrr_at_3_diff1 | 5.114333824261379 |
| nauc_mrr_at_3_max | -12.64809799843464 |
| nauc_mrr_at_3_std | -9.791146138025184 |
| nauc_mrr_at_5_diff1 | 5.88941606224512 |
| nauc_mrr_at_5_max | -11.763903418071918 |
| nauc_mrr_at_5_std | -9.279175712709446 |
| nauc_ndcg_at_1000_diff1 | 7.076950652226086 |
| nauc_ndcg_at_1000_max | -10.386482092087371 |
| nauc_ndcg_at_1000_std | -8.309190917074046 |
| nauc_ndcg_at_100_diff1 | 7.2329220284865245 |
| nauc_ndcg_at_100_max | -10.208048403220337 |
| nauc_ndcg_at_100_std | -7.997975874274613 |
| nauc_ndcg_at_10_diff1 | 6.065391100006953 |
| nauc_ndcg_at_10_max | -9.046164377601153 |
| nauc_ndcg_at_10_std | -8.34724889697153 |
| nauc_ndcg_at_1_diff1 | 11.37175831832417 |
| nauc_ndcg_at_1_max | -13.315221903223055 |
| nauc_ndcg_at_1_std | -9.398199605510275 |
| nauc_ndcg_at_20_diff1 | 6.949389989202601 |
| nauc_ndcg_at_20_max | -9.35740451760307 |
| nauc_ndcg_at_20_std | -7.761295171828212 |
| nauc_ndcg_at_3_diff1 | 5.051471796151364 |
| nauc_ndcg_at_3_max | -12.158763333711653 |
| nauc_ndcg_at_3_std | -10.078902544421926 |
| nauc_ndcg_at_5_diff1 | 6.527454512611454 |
| nauc_ndcg_at_5_max | -10.525118233848586 |
| nauc_ndcg_at_5_std | -9.120055125584031 |
| nauc_precision_at_1000_diff1 | -10.6495668199151 |
| nauc_precision_at_1000_max | 12.070656425217841 |
| nauc_precision_at_1000_std | 55.844551709649004 |
| nauc_precision_at_100_diff1 | 19.206967129266285 |
| nauc_precision_at_100_max | 16.296851020813456 |
| nauc_precision_at_100_std | 45.60378984257811 |
| nauc_precision_at_10_diff1 | 0.6490335354304879 |
| nauc_precision_at_10_max | 0.5757198255366447 |
| nauc_precision_at_10_std | -4.875847131691451 |
| nauc_precision_at_1_diff1 | 11.37175831832417 |
| nauc_precision_at_1_max | -13.315221903223055 |
| nauc_precision_at_1_std | -9.398199605510275 |
| nauc_precision_at_20_diff1 | 4.899369866929203 |
| nauc_precision_at_20_max | 5.988537297189552 |
| nauc_precision_at_20_std | 4.830900387582837 |
| nauc_precision_at_3_diff1 | 0.8791156910997744 |
| nauc_precision_at_3_max | -11.983373635905993 |
| nauc_precision_at_3_std | -10.646185111581257 |
| nauc_precision_at_5_diff1 | 3.9314486166548432 |
| nauc_precision_at_5_max | -7.798591396895839 |
| nauc_precision_at_5_std | -8.293043407234125 |
| nauc_recall_at_1000_diff1 | -10.649566819918673 |
| nauc_recall_at_1000_max | 12.070656425214647 |
| nauc_recall_at_1000_std | 55.84455170965023 |
| nauc_recall_at_100_diff1 | 19.206967129265127 |
| nauc_recall_at_100_max | 16.296851020813722 |
| nauc_recall_at_100_std | 45.60378984257728 |
| nauc_recall_at_10_diff1 | 0.6490335354304176 |
| nauc_recall_at_10_max | 0.5757198255366095 |
| nauc_recall_at_10_std | -4.875847131691468 |
| nauc_recall_at_1_diff1 | 11.37175831832417 |
| nauc_recall_at_1_max | -13.315221903223055 |
| nauc_recall_at_1_std | -9.398199605510275 |
| nauc_recall_at_20_diff1 | 4.899369866929402 |
| nauc_recall_at_20_max | 5.98853729718968 |
| nauc_recall_at_20_std | 4.830900387582967 |
| nauc_recall_at_3_diff1 | 0.8791156910997652 |
| nauc_recall_at_3_max | -11.983373635905997 |
| nauc_recall_at_3_std | -10.64618511158124 |
| nauc_recall_at_5_diff1 | 3.9314486166548472 |
| nauc_recall_at_5_max | -7.7985913968958585 |
| nauc_recall_at_5_std | -8.293043407234132 |
| ndcg_at_1 | 24.253 |
| ndcg_at_10 | 50.117999999999995 |
| ndcg_at_100 | 54.291999999999994 |
| ndcg_at_1000 | 54.44799999999999 |
| ndcg_at_20 | 52.771 |
| ndcg_at_3 | 39.296 |
| ndcg_at_5 | 44.373000000000005 |
| precision_at_1 | 24.253 |
| precision_at_10 | 8.016 |
| precision_at_100 | 0.984 |
| precision_at_1000 | 0.1 |
| precision_at_20 | 4.527 |
| precision_at_3 | 16.808999999999997 |
| precision_at_5 | 12.546 |
| recall_at_1 | 24.253 |
| recall_at_10 | 80.156 |
| recall_at_100 | 98.43499999999999 |
| recall_at_1000 | 99.57300000000001 |
| recall_at_20 | 90.54100000000001 |
| recall_at_3 | 50.427 |
| recall_at_5 | 62.731 |

#### MTEB DBPedia-PL (default)
- **Task Type**: Retrieval
- **Split**: test
| Metric | Value |
|--------|-------|
| main_score | 34.827000000000005 |
| map_at_1 | 7.049999999999999 |
| map_at_10 | 14.982999999999999 |
| map_at_100 | 20.816000000000003 |
| map_at_1000 | 22.33 |
| map_at_20 | 17.272000000000002 |
| map_at_3 | 10.661 |
| map_at_5 | 12.498 |
| mrr_at_1 | 57.25 |
| mrr_at_10 | 65.81934523809524 |
| mrr_at_100 | 66.2564203928212 |
| mrr_at_1000 | 66.27993662923856 |
| mrr_at_20 | 66.0732139130649 |
| mrr_at_3 | 64.08333333333333 |
| mrr_at_5 | 65.27083333333333 |
| nauc_map_at_1000_diff1 | 16.41780871174038 |
| nauc_map_at_1000_max | 30.193946325654654 |
| nauc_map_at_1000_std | 31.46095497039037 |
| nauc_map_at_100_diff1 | 18.57903165498531 |
| nauc_map_at_100_max | 29.541476938623262 |
| nauc_map_at_100_std | 28.228604103301052 |
| nauc_map_at_10_diff1 | 24.109434489748946 |
| nauc_map_at_10_max | 21.475954208048968 |
| nauc_map_at_10_std | 9.964464537806988 |
| nauc_map_at_1_diff1 | 38.67437644802124 |
| nauc_map_at_1_max | 14.52136658726491 |
| nauc_map_at_1_std | -2.8981666782088755 |
| nauc_map_at_20_diff1 | 21.42547228801935 |
| nauc_map_at_20_max | 25.04510402960458 |
| nauc_map_at_20_std | 16.533079346431155 |
| nauc_map_at_3_diff1 | 26.63648858245477 |
| nauc_map_at_3_max | 13.632235789780415 |
| nauc_map_at_3_std | -0.40129174577700716 |
| nauc_map_at_5_diff1 | 24.513861031197933 |
| nauc_map_at_5_max | 16.599888813946688 |
| nauc_map_at_5_std | 3.4448514739556346 |
| nauc_mrr_at_1000_diff1 | 36.57353464537154 |
| nauc_mrr_at_1000_max | 55.34763483979515 |
| nauc_mrr_at_1000_std | 40.3722796438533 |
| ... | ... |

## 📄 License
This project is licensed under the CC BY-NC 4.0 license.

This README has been translated into English and structured for better readability. It includes key information about the model, its evaluation results on different datasets, and the license details. The evaluation results are presented in tabular form for easy comparison.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご