t5-large-korean-text-summary Open-source Model - Free Deployment to Generate Concise Summaries of Korean Texts

T5 Large Korean Text Summary

Developed by lcw99

A Korean text summarization model fine-tuned based on paust/pko-t5-large, capable of generating concise summaries from long Korean texts

Text Generation Korean#Korean Text Summarization #Long Text Compression #T5 Fine-tuned Model

Downloads 292

Release Time : 9/29/2022

Model Overview

This model is a T5 model specifically optimized for Korean text summarization tasks, suitable for extracting key information from lengthy articles to generate concise summaries

Model Features

Korean Language Optimization

Fine-tuned specifically for Korean text, better handling Korean grammar and expression characteristics

Long Text Processing

Supports input lengths up to 768 tokens, capable of processing lengthy Korean texts

High-Quality Summarization

Trained on AIHUB's professional summarization dataset, capable of generating accurate and fluent summaries

Model Capabilities

Korean Text Understanding

Key Information Extraction

Text Summarization Generation

Use Cases

News Media

News Summarization

Automatically generate key point summaries from news articles

Helps readers quickly grasp the core content of news

Business Analysis

Report Summarization

Extract key information from business reports

Saves reading time and improves decision-making efficiency

🚀 t5-large-korean-text-summary

This model is a fine-tuned version of the paust/pko-t5-large model, leveraging AIHUB's "summary and report generation data". It offers concise summaries of long Korean texts.

🚀 Quick Start

This model is a fine-tuning of paust/pko-t5-large model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.

✨ Features

Fine-tuned from the paust/pko-t5-large model.
Utilizes AIHUB "summary and report generation data".
Capable of summarizing long Korean texts.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')

model_dir = "lcw99/t5-large-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)

max_input_length = 512 + 256

text = """
주인공 강인구(하정우)는 ‘수리남에서 홍어가 많이 나는데 다 갖다버린다’는 친구 
박응수(현봉식)의 얘기를 듣고 수리남산 홍어를 한국에 수출하기 위해 수리남으로 간다. 
국립수산과학원 측은 “실제로 남대서양에 홍어가 많이 살고 아르헨티나를 비롯한 남미 국가에서 홍어가 많이 잡힌다”며 
“수리남 연안에도 홍어가 많이 서식할 것”이라고 설명했다.

그러나 관세청에 따르면 한국에 수리남산 홍어가 수입된 적은 없다. 
일각에선 “돈을 벌기 위해 수리남산 홍어를 구하러 간 설정은 개연성이 떨어진다”는 지적도 한다. 
드라마 배경이 된 2008~2010년에는 이미 국내에 아르헨티나, 칠레, 미국 등 아메리카산 홍어가 수입되고 있었기 때문이다. 
실제 조봉행 체포 작전에 협조했던 ‘협력자 K씨’도 홍어 사업이 아니라 수리남에 선박용 특수용접봉을 파는 사업을 하러 수리남에 갔었다.
"""

inputs = ["summarize: " + text]

inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]

print(predicted_title)

📚 Documentation

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: None
training_precision: float16

Training results

More information needed

Framework versions

Transformers 4.22.1
TensorFlow 2.10.0
Datasets 2.5.1
Tokenizers 0.12.1

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご