stsb-m-mt-es-distilbert-base-uncased Open Source Model - Free Deployment for Spanish Semantic Text Similarity Testing

Stsb M Mt Es Distilbert Base Uncased

Developed by eduardofv

This is a test model fine-tuned using the Spanish dataset from stsb_multi_mt for semantic text similarity tasks.

Text Embedding Spanish#Spanish STS #Sentence Similarity #Fine-tuned Model

Downloads 37

Release Time : 3/2/2022

Model Overview

This model is based on distilbert-base-uncased and fine-tuned for Spanish semantic text similarity tasks, primarily used for understanding and evaluating STS models.

Model Features

Spanish Language Support

Specifically optimized for Spanish text, suitable for semantic similarity calculations in Spanish.

Efficient Architecture

Based on distilbert-base-uncased, reducing model size and computational requirements while maintaining performance.

Proof of Concept

Primarily used to demonstrate how to fine-tune STS models with Spanish data, serving as a foundation for further research.

Model Capabilities

Spanish Text Embedding Extraction

Semantic Similarity Calculation

Sentence Vector Representation

Use Cases

Natural Language Processing

Text Similarity Evaluation

Calculate the semantic similarity score between two Spanish sentences

Achieved a Pearson correlation coefficient of 0.7451 on the Spanish test dataset

Information Retrieval

Used to improve relevance ranking in Spanish search engines

🚀 distilbert-base-uncased Trained for Semantic Textual Similarity in Spanish

This is a test model fine - tuned using Spanish datasets to understand and benchmark STS models.

🚀 Quick Start

You may use it as any other STS trained model to extract sentence embeddings. Check Sentence Transformers documentation.

✨ Features

This model is a proof - of - concept for STS fine - tuning using Spanish data.
It can be used to get a sense of how STS training with Spanish data works.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

You can use it like any other STS trained model to extract sentence embeddings. For detailed usage, refer to the Sentence Transformers documentation.

📚 Documentation

Model and training data description

This model was built taking distilbert-base-uncased and training it on a Semantic Textual Similarity task using a modified version of the training script for STS from Sentece Transformers (the modified script is included in the repo). It was trained using the Spanish datasets from stsb_multi_mt which are the STSBenchmark datasets automatically translated to other languages using deepl.com. Refer to the dataset repository for more details.

Intended uses & limitations

This model was built just as a proof - of - concept on STS fine - tuning using Spanish data and no specific use other than getting a sense on how this training works.

Training procedure

Use the included script to train in Spanish the base model. You can also try to train another model passing it's reference as first argument. You can also train in some other language of those included in the training dataset.

Evaluation results

Evaluating distilbert-base-uncased on the Spanish test dataset before training results in:

Cosine-Similarity :	Pearson: 0.2980	Spearman: 0.4008

While the fine - tuned version with the defaults of the training script and the Spanish training dataset results in:

Cosine-Similarity :	Pearson: 0.7451	Spearman: 0.7364

In our STS Evaluation repository we compare the performance of this model with other models from Sentence Transformers and Tensorflow Hub using the standard STSBenchmark and the 2017 STSBenchmark Task 3 for Spanish.

Resources

Training dataset stsb_multi_mt
Sentence Transformers Semantic Textual Similarity
Check sts_eval for a comparison with Tensorflow and Sentence - Transformers models
Check the development environment to run the scripts and evaluation

🔧 Technical Details

The model is based on distilbert-base-uncased and is fine - tuned for Semantic Textual Similarity in Spanish. It uses a modified training script from Sentence Transformers. The training data comes from the Spanish datasets of stsb_multi_mt.

📄 License

No license information is provided in the original document, so this section is skipped.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご