xl-lexeme Open-source Model - A Practical Tool for Word Similarity Calculation and Semantic Search

Xl Lexeme

Developed by pierluigic

A model based on sentence-transformers for mapping target words in sentences to a 1024-dimensional vector space, supporting word similarity calculation and semantic search tasks.

Text Embedding

Transformers

#Word vector embedding #Multilingual word sense analysis #Context-sensitive word representation

Downloads 1,350

Release Time : 5/14/2023

Model Overview

This model focuses on processing target words in sentences, converting them into high-dimensional vector representations, suitable for natural language processing tasks such as word similarity calculation, clustering analysis, and semantic search.

Model Features

Target word vectorization

Accurately extracts words at specific positions in sentences and generates their vector representations.

Multilingual support

Model examples demonstrate the ability to process words in multiple languages such as English and Italian.

High-dimensional semantic space

Maps words to a 1024-dimensional dense vector space, preserving rich semantic information.

Model Capabilities

Word vectorization

Semantic similarity calculation

Word clustering analysis

Cross-language word matching

Use Cases

Semantic analysis

Word sense disambiguation

Distinguishes semantic differences of words in different contexts (e.g., the word 'plane' in the example).

Can accurately distinguish between meanings such as 'airplane' and 'flat surface'.

Cross-language applications

Multilingual word alignment

Identifies words with similar semantics across different languages (e.g., English and Italian in the example).

🚀 pierluigic/xl-lexeme

This model is based on sentence-transformers. It maps the target word in sentences to a 1024-dimensional dense vector space and can be used for tasks such as clustering or semantic search.

🚀 Quick Start

✨ Features

Based on sentence-transformers.
Maps target words in sentences to a 1024-dimensional dense vector space.
Suitable for tasks like clustering or semantic search.

📦 Installation

Install the library:

git clone git@github.com:pierluigic/xl-lexeme.git
cd xl-lexeme
pip3 install .

💻 Usage Examples

Basic Usage

from WordTransformer import WordTransformer, InputExample

model = WordTransformer('pierluigic/xl-lexeme')
examples = InputExample(texts="the quick fox jumps over the lazy dog", positions=[10,13])
fox_embedding = model.encode(examples) #The embedding of the target word "fox"

🔧 Technical Details

Training

The model was trained with the following parameters:

DataLoader: torch.utils.data.dataloader.DataLoader of length 16531 with parameters:

{'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss: sentence_transformers.losses.ContrastiveLoss.ContrastiveLoss with parameters:

{'distance_metric': 'SiameseDistanceMetric.COSINE_DISTANCE', 'margin': 0.5, 'size_average': True}

Parameters of the fit()-Method:

{
    "epochs": 10,
    "evaluation_steps": 4132,
    "evaluator": "sentence_transformers.evaluation.SequentialEvaluator.SequentialEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'transformers.optimization.AdamW'>",
    "optimizer_params": {
        "lr": 1e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 16531.0,
    "weight_decay": 0.0
}

Full Model Architecture

SentenceTransformerTarget(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

📄 License

No license information provided in the original document.

📚 Documentation

Citing & Authors

@inproceedings{cassotti-etal-2023-xl,
    title = "{XL}-{LEXEME}: {W}i{C} Pretrained Model for Cross-Lingual {LEX}ical s{EM}antic chang{E}",
    author = "Cassotti, Pierluigi  and
      Siciliani, Lucia  and
      DeGemmis, Marco  and
      Semeraro, Giovanni  and
      Basile, Pierpaolo",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-short.135",
    pages = "1577--1585"
}

Additional Information

pipeline_tag: sentence-similarity tags:

sentence-transformers
feature-extraction
word-similarity
transformers

widget:

example_title: "plane (en)"
- source_sentence: "Provide a large table; this is a horizontal plane, and will represent the ground plane, viz."
- sentences:
  - "The President's plane landed at Goose Bay at 9:03 p.m."
  - "any line joining two points on a plane lies wholly on that plane"
  - "the flight was delayed due to trouble with the plane"
example_title: "radice (it)"
- source_sentence: "La radice del problema non è nota"
- sentences:
  - "il liquore è fatto dalle radici di liquirizia"
  - "La radice di 2 è 4."
  - "occorre pertanto trasformare la società alla radice"

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご