đ distilbert-base-uncased Trained for Semantic Textual Similarity in Spanish
This is a test model fine - tuned using Spanish datasets to understand and benchmark STS models.
đ Quick Start
You may use it as any other STS trained model to extract sentence embeddings. Check Sentence Transformers documentation.
⨠Features
- This model is a proof - of - concept for STS fine - tuning using Spanish data.
- It can be used to get a sense of how STS training with Spanish data works.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
You can use it like any other STS trained model to extract sentence embeddings. For detailed usage, refer to the Sentence Transformers documentation.
đ Documentation
Model and training data description
This model was built taking distilbert-base-uncased
and training it on a Semantic Textual Similarity task using a modified version of the training script for STS from Sentece Transformers (the modified script is included in the repo). It was trained using the Spanish datasets from stsb_multi_mt which are the STSBenchmark datasets automatically translated to other languages using deepl.com. Refer to the dataset repository for more details.
Intended uses & limitations
This model was built just as a proof - of - concept on STS fine - tuning using Spanish data and no specific use other than getting a sense on how this training works.
Training procedure
Use the included script to train in Spanish the base model. You can also try to train another model passing it's reference as first argument. You can also train in some other language of those included in the training dataset.
Evaluation results
Evaluating distilbert-base-uncased
on the Spanish test dataset before training results in:
Cosine-Similarity : Pearson: 0.2980 Spearman: 0.4008
While the fine - tuned version with the defaults of the training script and the Spanish training dataset results in:
Cosine-Similarity : Pearson: 0.7451 Spearman: 0.7364
In our STS Evaluation repository we compare the performance of this model with other models from Sentence Transformers and Tensorflow Hub using the standard STSBenchmark and the 2017 STSBenchmark Task 3 for Spanish.
Resources
đ§ Technical Details
The model is based on distilbert-base-uncased
and is fine - tuned for Semantic Textual Similarity in Spanish. It uses a modified training script from Sentence Transformers. The training data comes from the Spanish datasets of stsb_multi_mt.
đ License
No license information is provided in the original document, so this section is skipped.