Khmer - sentiment - xlm - roberta - base Open - source Sentiment Analysis Model: Accurately Identify Positive and Negative Emotions in Khmer Financial Texts

Home

Khmer Sentiment Xlm Roberta Base

Developed by songhieng

Sentiment analysis model optimized for Khmer financial texts, capable of classifying positive/negative sentiments

Text Classification

Transformers

OtherOpen Source License:MIT #Khmer Financial Sentiment #Cross-lingual Pretraining #Financial Text Classification

Downloads 31

Release Time : 2/11/2025

Model Overview

This model is fine-tuned based on XLM-RoBERTa-base, specifically designed to analyze sentiment tendencies in Khmer financial texts (such as bank reports and financial news), supporting binary classification (positive/negative).

Model Features

Optimized for Khmer Financial Domain

Specifically optimized for Khmer financial terminology and expressions

Efficient Training with Small Samples

Achieves 96% accuracy with only 4,000 training samples

Cross-lingual Understanding

Leverages XLM-RoBERTa's multilingual pretraining capabilities

Model Capabilities

Financial Text Sentiment Classification

Khmer Natural Language Processing

Binary Classification Prediction

Use Cases

Financial Analysis

Bank Report Sentiment Monitoring

Automatically analyzes positive/negative statements in Khmer bank annual reports

Accurately identifies financial growth or risk warnings

Real-time Financial News Classification

Labels sentiment tendencies in Cambodian financial news

Assists in investment decision analysis

🚀 Khmer Financial Sentiment Analysis with XLM - RoBERTa

This project offers a fine - tuned [XLM - RoBERTa - base](https://huggingface.co/xlm - roberta - base) model tailored for sentiment analysis of Khmer financial texts. It's trained on a dataset of about 4,000 financial text samples, with 400 for testing, aiming to accurately classify sentiment in the financial domain of the Khmer language.

🚀 Quick Start

Financial texts like reports, news, and earnings statements are rich in information for market analysis. However, Khmer - language financial texts have been under - explored in NLP research. This project adapts the XLM - RoBERTa - base model for Khmer financial sentiment analysis. The model classifies financial text sentiment into two categories: Positive (indicating growth, profitability, or a positive outlook) and Negative (indicating loss, risk, or financial downturns).

✨ Features

Domain - Specific Adaptation: Fine - tuned for Khmer financial sentiment analysis.
Binary Classification: Clearly distinguishes between positive and negative financial sentiments.
Good Performance: Achieves approximately 96% accuracy on the validation set.

📦 Installation

The README doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the fine-tuned Khmer financial sentiment model
model_name = "songhieng/khmer-sentiment-xlm-roberta-base"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example Khmer financial text
text = "ការប្រកាសចំណូលរបស់ក្រុមហ៊ុនមានការកើនឡើងយ៉ាងច្រើន"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)

# Get predicted sentiment (0 = Negative, 1 = Positive)
predicted_class = outputs.logits.argmax(dim=1).item()
labels_mapping = {0: "Negative", 1: "Positive"}
print(f"Predicted Sentiment: {labels_mapping[predicted_class]}")

📚 Documentation

Model Details

Property	Details
Model Type	[XLM - RoBERTa - base](https://huggingface.co/xlm - roberta - base)
Task	Sentiment Analysis (Binary Classification: Positive / Negative)
Domain	Financial Data (Khmer Language)
Dataset Size	~4,000 training samples, 400 test samples
Architecture	Transformer - based sequence classification model

Training Data

The model was fine - tuned using a dataset of Khmer - language financial texts, including bank reports, financial news articles, economic forecasts, and investment analysis. The dataset has 4,000 labeled examples for training and 400 samples for testing.

Training Details

The model was fine - tuned over 3 epochs, using XLM - RoBERTa - base as the pretrained model.

Epoch	Training Loss	Validation Loss	Accuracy
1	0.163500	0.511470	XX%
2	0.517700	0.581499	XX%
3	0.312900	0.526096	XX%

Training Configuration:

Learning Rate: 2e - 5
Batch Size: 8
Optimizer: AdamW
Evaluation Strategy: Per epoch
Loss Function: CrossEntropyLoss

Results

Accuracy: ~96% on the validation set.
Strong Performance: The model effectively classifies Khmer financial sentiment.
Domain - Specific Optimization: The fine - tuning process allows better understanding of financial terminology in Khmer.

📄 License

The model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご