distiluse-base-multilingual-cased-v2 Open-source Model - A Practical Tool for Multilingual Semantic Search and Clustering

Distiluse Base Multilingual Cased V2

Developed by lorenpe2

This is a multilingual sentence embedding model that maps text to a 512-dimensional vector space, suitable for semantic search and clustering tasks.

Text Embedding

Transformers

OtherOpen Source License:Apache-2.0 #Multilingual sentence embeddings #512-dimensional vector space #Semantic similarity calculation

Downloads 32

Release Time : 7/27/2023

Model Overview

ONNX version converted from the original distiluse-base-multilingual-cased-v2 model, retaining the original model's sentence embedding capabilities and supporting multilingual text processing.

Model Features

Multilingual support

Capable of processing text inputs in multiple languages

Efficient inference

ONNX format optimizes inference performance

Semantic encoding

Converts sentences into 512-dimensional semantic vectors

Model Capabilities

Sentence embedding

Semantic similarity calculation

Multilingual text processing

Feature extraction

Use Cases

Information retrieval

Semantic search

Document retrieval based on semantics rather than keyword matching

Improves search relevance and recall rate

Text clustering

Document classification

Automatically organizes documents based on semantic similarity

Enables automatic grouping without predefined categories

🚀 ONNX convert of distiluse-base-multilingual-cased-v2

This project is an ONNX conversion of the sentence-transformers/distiluse-base-multilingual-cased-v2 model. It maps sentences and paragraphs to a 512-dimensional dense vector space, which can be used for tasks such as clustering or semantic search.

🚀 Quick Start

This is a sentence-transformers ONNX model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. This custom model outputs last_hidden_state similar like original sentence-transformer implementation.

✨ Features

Multilingual Support: The model supports multiple languages, making it suitable for a wide range of multilingual tasks.
Dense Vector Representation: It maps sentences and paragraphs to a 512-dimensional dense vector space, enabling effective clustering and semantic search.
ONNX Compatibility: The ONNX conversion allows for efficient inference and deployment.

📦 Installation

Using this model becomes easy when you have optimum installed:

python -m pip install optimum

You may also need following:

python -m pip install onnxruntime
python -m pip install onnx

💻 Usage Examples

Basic Usage

from optimum.onnxruntime.modeling_ort import ORTModelForCustomTasks
from transformers import AutoTokenizer

model = ORTModelForCustomTasks.from_pretrained("lorenpe2/distiluse-base-multilingual-cased-v2")
tokenizer = AutoTokenizer.from_pretrained("lorenpe2/distiluse-base-multilingual-cased-v2")
inputs = tokenizer("I love burritos!", return_tensors="pt")
pred = model(**inputs)

Advanced Usage

You will also be able to leverage the pipeline API in transformers:

from transformers import pipeline

onnx_extractor = pipeline("feature-extraction", model=model, tokenizer=tokenizer)
text = "I love burritos!"
pred = onnx_extractor(text)

📚 Documentation

For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net

🔧 Technical Details

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

📄 License

This project is licensed under the Apache-2.0 license.

📖 Citing & Authors

This model was trained by sentence-transformers.

If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "http://arxiv.org/abs/1908.10084",
}

Information Table

Property	Details
Pipeline Tag	sentence-similarity
Language	multilingual
License	apache-2.0
Tags	sentence-transformers, feature-extraction, sentence-similarity, transformers

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご