Khmer-xlm-roberta-base Open-source Model - Khmer Fill-Mask, Fine-tuned on 26,000+ Sentences

Khmer Xlm Roberta Base

Developed by channudam

A Khmer fill-mask model built based on the FacebookAI/xlm-roberta-base pre-trained model, fine-tuned with 26,000+ Khmer sentences.

Large Language Model

Transformers

OtherOpen Source License:MIT #Khmer language fill-in #XLM-RoBERTa fine-tuning #Few-shot optimization

Downloads 55

Release Time : 6/1/2024

Model Overview

This model is specifically designed for Khmer language fill-mask tasks, capable of predicting masked words in sentences.

Model Features

Khmer-specific

A fill-mask model optimized specifically for the Khmer language.

Based on XLM-RoBERTa

Uses the powerful multilingual pre-trained model XLM-RoBERTa as its foundation.

High-quality training data

Fine-tuned with 26,000+ Khmer sentences.

Model Capabilities

Khmer text understanding

Fill-mask prediction

Use Cases

Text processing

Sentence completion

Predicts masked words in sentences.

Example shows accurate prediction of 'water' in 'The weather is very hot, please drink more <mask>.'

Language learning aid

Helps Khmer language learners understand word usage.

🚀 Khmer Language Fill-Mask Model

This is a Khmer language fill-mask model built on top of the pre-trained model of FacebookAI/xlm-roberta-base, which can effectively handle Khmer text filling tasks.

🚀 Quick Start

This Khmer fill-mask model is constructed based on the pre - trained model of FacebookAI/xlm - roberta - base. It has been fine - tuned with approximately 26K+ Khmer sentences/clauses, where 80% are used for the training set and 20% for the validation set. This model performs well only with the Khmer language.

💻 Usage Examples

Basic Usage

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='channudam/khmer-xlm-roberta-base')
>>> unmasker("អាកាសធាតុក្ដៅខ្លាំង ចូរអ្នកផឹក<mask>ឲ្យបានច្រើន។")

[
  {
    'score': 0.9788032174110413,
    'token': 41440,
    'token_str': 'ទឹក',
    'sequence': 'អាកាសធាតុក្ដៅខ្លាំង ចូរអ្នកផឹកទឹក ឲ្យបានច្រើន។'
  },
  {
    'score': 0.012485685758292675,
    'token': 191670,
    'token_str': 'ស្រា',
    'sequence': 'អាកាសធាតុក្ដៅខ្លាំង ចូរអ្នកផឹកស្រា ឲ្យបានច្រើន។'
  },
  {
    'score': 0.0014946138253435493,
    'token': 162483,
    'token_str': 'បាយ',
    'sequence': 'អាកាសធាតុក្ដៅខ្លាំង ចូរអ្នកផឹកបាយ ឲ្យបានច្រើន។'
  },
  {
    'score': 0.001305083278566599,
    'token': 49245,
    'token_str': 'ស៊ី',
    'sequence': 'អាកាសធាតុក្ដៅខ្លាំង ចូរអ្នកផឹកស៊ី ឲ្យបានច្រើន។'
  },
  {
    'score': 0.0007108347490429878,
    'token': 51863,
    'token_str': 'ទឹក',
    'sequence': 'អាកាសធាតុក្ដៅខ្លាំង ចូរអ្នកផឹក ទឹក ឲ្យបានច្រើន។'
  }
]

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご