t5-small-korean-summarization Open-source Model - Free Deployment for Concise and Accurate Summarization of Korean Texts

T5 Small Korean Summarization

Developed by eenzeenee

A Korean text summarization model based on the T5 architecture, specifically optimized for Korean text to generate concise and accurate summaries.

Text Generation

Transformers

Korean#Korean Summarization #T5 Fine-tuning #Multi-dataset Training

Downloads 123

Release Time : 1/23/2023

Model Overview

This is a Korean text summarization model based on the T5 architecture, fine-tuned from the paust/pko-t5-small base model. It is specifically designed for automatic summarization tasks of Korean texts, capable of extracting key information from lengthy Korean texts to generate concise summaries.

Model Features

Korean Optimization

Specially fine-tuned for Korean text, better handling Korean grammar and expression characteristics.

Multi-dataset Training

Trained on three different Korean summarization datasets, enhancing the model's generalization capability.

Lightweight

Based on the T5-small architecture, the model is relatively lightweight and suitable for resource-limited environments.

Model Capabilities

Korean Text Summarization

Long Text Compression

Key Information Extraction

Use Cases

Academic Research

Thesis Abstract Generation

Automatically generate abstracts for Korean academic papers

Helps researchers quickly grasp the main content of papers

Content Management

News Summarization

Generate brief summaries for Korean news articles

Improves readers' browsing efficiency

Business Applications

Report Summarization

Automatically generate summaries for business reports

Helps decision-makers quickly obtain key information

🚀 t5-small-korean-summarization

This project presents a T5-based model designed for Korean text summarization. It leverages the power of the T5 architecture to effectively condense Korean texts into concise summaries, providing a valuable tool for information extraction and content understanding.

🚀 Quick Start

The model is based on the 'paust/pko-t5-small' model and has been fine-tuned using three datasets:

💻 Usage Examples

Basic Usage

import nltk
nltk.download('punkt')
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained('eenzeenee/t5-small-korean-summarization')
tokenizer = AutoTokenizer.from_pretrained('eenzeenee/t5-small-korean-summarization')

prefix = "summarize: "
sample = """
    안녕하세요? 우리 (2학년)/(이 학년) 친구들 우리 친구들 학교에 가서 진짜 (2학년)/(이 학년) 이 되고 싶었는데 학교에 못 가고 있어서 답답하죠? 
    그래도 우리 친구들의 안전과 건강이 최우선이니까요 오늘부터 선생님이랑 매일 매일 국어 여행을 떠나보도록 해요. 
    어/ 시간이 벌써 이렇게 됐나요? 늦었어요. 늦었어요. 빨리 국어 여행을 떠나야 돼요. 
    그런데 어/ 국어여행을 떠나기 전에 우리가 준비물을 챙겨야 되겠죠? 국어 여행을 떠날 준비물, 교안을 어떻게 받을 수 있는지 선생님이 설명을 해줄게요. 
    (EBS)/(이비에스) 초등을 검색해서 들어가면요 첫화면이 이렇게 나와요. 
    자/ 그러면요 여기 (X)/(엑스) 눌러주(고요)/(구요). 저기 (동그라미)/(똥그라미) (EBS)/(이비에스) (2주)/(이 주) 라이브특강이라고 되어있죠? 
    거기를 바로 가기를 누릅니다. 자/ (누르면요)/(눌르면요). 어떻게 되냐? b/ 밑으로 내려요 내려요 내려요 쭉 내려요. 
    우리 몇 학년이죠? 아/ (2학년)/(이 학년) 이죠 (2학년)/(이 학년)의 무슨 과목? 국어. 
    이번주는 (1주)/(일 주) 차니까요 여기 교안. 다음주는 여기서 다운을 받으면 돼요. 
    이 교안을 클릭을 하면, 짜잔/. 이렇게 교재가 나옵니다 .이 교안을 (다운)/(따운)받아서 우리 국어여행을 떠날 수가 있어요. 
    그럼 우리 진짜로 국어 여행을 한번 떠나보도록 해요? 국어여행 출발. 자/ (1단원)/(일 단원) 제목이 뭔가요? 한번 찾아봐요. 
    시를 즐겨요 에요. 그냥 시를 읽어요 가 아니에요. 시를 즐겨야 돼요 즐겨야 돼. 어떻게 즐길까? 일단은 내내 시를 즐기는 방법에 대해서 공부를 할 건데요. 
    그럼 오늘은요 어떻게 즐길까요? 오늘 공부할 내용은요 시를 여러 가지 방법으로 읽기를 공부할겁니다. 
    어떻게 여러가지 방법으로 읽을까 우리 공부해 보도록 해요. 오늘의 시 나와라 짜잔/! 시가 나왔습니다 시의 제목이 뭔가요? 다툰 날이에요 다툰 날. 
    누구랑 다퉜나 동생이랑 다퉜나 언니랑 친구랑? 누구랑 다퉜는지 선생님이 시를 읽어 줄 테니까 한번 생각을 해보도록 해요."""

inputs = [prefix + sample]


inputs = tokenizer(inputs, max_length=512, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=3, do_sample=True, min_length=10, max_length=64)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
result = nltk.sent_tokenize(decoded_output.strip())[0]

print('RESULT >>', result)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご