mt5-summarize-nepali Open Source Model - Free Deployment for Nepali Text Summary Generation

Home

Mt5 Summarize Nepali

Developed by GenzNepal

A Nepali text summarization model fine-tuned from google/mt5-small

Text Generation

Transformers

OtherOpen Source License:Apache-2.0 #Nepali Text Summarization #Multilingual MT5 #News Condensation

Downloads 21

Release Time : 7/19/2023

Model Overview

This model is specifically designed for automatic text summarization in Nepali, fine-tuned on news datasets using the MT5 architecture.

Model Features

Nepali Language Optimization

Fine-tuned specifically for Nepali linguistic characteristics, effectively handling Nepali grammar and vocabulary.

News Summarization

Trained on Nepali news datasets, particularly suitable for generating summaries of news articles.

Lightweight Model

Based on the MT5-small architecture, offering good performance with lower computational resource requirements.

Model Capabilities

Nepali Text Comprehension

Text Summarization

Long Text Compression

Use Cases

News Media

Automatic News Summarization

Generates concise summaries for Nepali news articles.

Can compress lengthy news articles into 100-250 word summaries.

Content Analysis

Document Key Information Extraction

Extracts core content from Nepali documents.

🚀 mt5-summarize-nepali

This model is a fine-tuned version of google/mt5-small on Someman/news_nepali. It solves the problem of Nepali text summarization and provides an effective solution for quickly obtaining key information from Nepali texts.

🚀 Quick Start

This model is a fine-tuned version of google/mt5-small on Someman/news_nepali. It achieves the following results on the evaluation set:

Loss: 0.6748

💻 Usage Examples

Basic Usage

>>> import torch

>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Predict with test data (first 5 rows)
>>> model_ckpt = "GenzNepal/mt5-summarize-nepali"

>>> device = "cuda" if torch.cuda.is_available() else "cpu"

>>> t5_tokenizer = AutoTokenizer.from_pretrained(model_ckpt)

>>> model = AutoModelForSeq2SeqLM.from_pretrained(model_ckpt).to(device)


>>> text = "काठमाडौँ । हाल देशको पूर्वी तथा मध्य भू–भागमा मनसुनी प्रणालीको प्रभाव रहेको छ भने बाँकी भू–भागमा स्थानीय वायु र पश्चिमी वायुको आंशिक प्रभाव रहेको छ । यसका कारण हाल गण्डकी प्रदेशका थोरै स्थानमा र कर्णाली प्रदेशका एक–दुई स्थानमा मेघगर्जनरचट्याङसहित हल्कादेखि मध्यम वर्षा भइरहेको जल तथा मौसम विज्ञान विभाग, मौसम पूर्वानुमान महाशाखाले जनाएको छ । \
महाशाखका मौमसविद् रोजल लामिछानेका अनुसार पछिल्लो तीन घन्टामा गण्डकी प्रदेशका थोरै स्थान, बागमती प्रदेशका एक–दुई स्थानमा हल्कादेखि मध्यम वर्षा भइरहेको छ । काठमाडौँ उपत्यकासहित बागमती प्रदेशमा रातिको समयमा वर्षाको सम्भावना रहेको छ । यस्तै कोशी प्रदेश, मधेश प्रदेश र देशका पहाडी भू–भागमा बदली रहनुका साथै हल्का वर्षाको सम्भावना रहेको महाशाखाले उल्लेख गरेको छ । \
मौसमविद् लामिछानेले मनसुन प्रणाली क्रमिकरूपमा देशभर फैलिने क्रममा रहेको र यो देशभर विस्तार हुन अझै एक साता लाग्ने बताए । गत जेठ ३१ गते बुधबार नेपालको पूर्वी भेग भएर मनसुन प्रणाली भित्रिएको थियो । मनसुन सुस्तगतिमा रहेकाले देशको पश्चिम क्षेत्रमा फैलिन केही दिन लाग्ने जनाइएको छ ।"

>>> inputs = t5_tokenizer(text, return_tensors="pt", max_length=1024, padding= "max_length", truncation=True, add_special_tokens=True)

>>> generation = model.generate(
      input_ids = inputs['input_ids'].to(device),
      attention_mask=inputs['attention_mask'].to(device),
      num_beams=6,
      num_return_sequences=1,
      no_repeat_ngram_size=2,
      repetition_penalty=1.0,
      min_length=100,
      max_length=250,
      length_penalty=2.0,
      early_stopping=True
    )
    # # Convert id tokens to text

>>> output = t5_tokenizer.decode(generation[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)


>>> print(output)

"हाल देशको पूर्वी तथा मध्य भू–भागमा मनसुनी प्रणालीको प्रभाव रहेको छ । बाँकी भूभागहरूमा स्थानीय वायु र पश्चिमी वायुको आंशिक सङ्क्रमण छ। गत वैशाख ३१ गते बुधबार नेपालको भेग भएर मनसुन प्रणाली भित्रिएको थियो भने हल्कादेखि मध्यम वर्षा भइरहेको जनाइएको छ भने मौसमविद् लामिछानेले उल्लेख गरेका छन् भने यो देशभर विस्तार हुन अझै एक साता लाग्नेछ।
"

📚 Documentation

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 90
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
0.7762	2.72	2500	0.7255
0.6377	5.44	5000	0.6947
0.5674	8.15	7500	0.6748

Framework versions

Transformers 4.30.1
Pytorch 2.0.0
Datasets 2.1.0
Tokenizers 0.13.3

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご