๐ Khmer Language Fill-Mask Model
This is a Khmer language fill-mask model built on top of the pre-trained model of FacebookAI/xlm-roberta-base, which can effectively handle Khmer text filling tasks.
๐ Quick Start
This Khmer fill-mask model is constructed based on the pre - trained model of FacebookAI/xlm - roberta - base
. It has been fine - tuned with approximately 26K+ Khmer sentences/clauses, where 80% are used for the training set and 20% for the validation set. This model performs well only with the Khmer language.
๐ป Usage Examples
Basic Usage
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='channudam/khmer-xlm-roberta-base')
>>> unmasker("แขแถแแถแแแถแแปแแแแ
แแแแถแแ แ
แผแแขแแแแแนแ<mask>แฒแแแแถแแ
แแแพแแ")
[
{
'score': 0.9788032174110413,
'token': 41440,
'token_str': 'แแนแ',
'sequence': 'แขแถแแถแแแถแแปแแแแ
แแแแถแแ แ
แผแแขแแแแแนแแแนแ แฒแแแแถแแ
แแแพแแ'
},
{
'score': 0.012485685758292675,
'token': 191670,
'token_str': 'แแแแถ',
'sequence': 'แขแถแแถแแแถแแปแแแแ
แแแแถแแ แ
แผแแขแแแแแนแแแแแถ แฒแแแแถแแ
แแแพแแ'
},
{
'score': 0.0014946138253435493,
'token': 162483,
'token_str': 'แแถแ',
'sequence': 'แขแถแแถแแแถแแปแแแแ
แแแแถแแ แ
แผแแขแแแแแนแแแถแ แฒแแแแถแแ
แแแพแแ'
},
{
'score': 0.001305083278566599,
'token': 49245,
'token_str': 'แแแธ',
'sequence': 'แขแถแแถแแแถแแปแแแแ
แแแแถแแ แ
แผแแขแแแแแนแแแแธ แฒแแแแถแแ
แแแพแแ'
},
{
'score': 0.0007108347490429878,
'token': 51863,
'token_str': 'แแนแ',
'sequence': 'แขแถแแถแแแถแแปแแแแ
แแแแถแแ แ
แผแแขแแแแแนแ แแนแ แฒแแแแถแแ
แแแพแแ'
}
]
๐ License
This project is licensed under the MIT license.