๐ t5-large-korean-text-summary
This model is a fine-tuned version of the paust/pko-t5-large model, leveraging AIHUB's "summary and report generation data". It offers concise summaries of long Korean texts.
๐ Quick Start
This model is a fine-tuning of paust/pko-t5-large model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.
โจ Features
- Fine-tuned from the paust/pko-t5-large model.
- Utilizes AIHUB "summary and report generation data".
- Capable of summarizing long Korean texts.
๐ป Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')
model_dir = "lcw99/t5-large-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
max_input_length = 512 + 256
text = """
์ฃผ์ธ๊ณต ๊ฐ์ธ๊ตฌ(ํ์ ์ฐ)๋ โ์๋ฆฌ๋จ์์ ํ์ด๊ฐ ๋ง์ด ๋๋๋ฐ ๋ค ๊ฐ๋ค๋ฒ๋ฆฐ๋คโ๋ ์น๊ตฌ
๋ฐ์์(ํ๋ด์)์ ์๊ธฐ๋ฅผ ๋ฃ๊ณ ์๋ฆฌ๋จ์ฐ ํ์ด๋ฅผ ํ๊ตญ์ ์์ถํ๊ธฐ ์ํด ์๋ฆฌ๋จ์ผ๋ก ๊ฐ๋ค.
๊ตญ๋ฆฝ์์ฐ๊ณผํ์ ์ธก์ โ์ค์ ๋ก ๋จ๋์์์ ํ์ด๊ฐ ๋ง์ด ์ด๊ณ ์๋ฅดํจํฐ๋๋ฅผ ๋น๋กฏํ ๋จ๋ฏธ ๊ตญ๊ฐ์์ ํ์ด๊ฐ ๋ง์ด ์กํ๋คโ๋ฉฐ
โ์๋ฆฌ๋จ ์ฐ์์๋ ํ์ด๊ฐ ๋ง์ด ์์ํ ๊ฒโ์ด๋ผ๊ณ ์ค๋ช
ํ๋ค.
๊ทธ๋ฌ๋ ๊ด์ธ์ฒญ์ ๋ฐ๋ฅด๋ฉด ํ๊ตญ์ ์๋ฆฌ๋จ์ฐ ํ์ด๊ฐ ์์
๋ ์ ์ ์๋ค.
์ผ๊ฐ์์ โ๋์ ๋ฒ๊ธฐ ์ํด ์๋ฆฌ๋จ์ฐ ํ์ด๋ฅผ ๊ตฌํ๋ฌ ๊ฐ ์ค์ ์ ๊ฐ์ฐ์ฑ์ด ๋จ์ด์ง๋คโ๋ ์ง์ ๋ ํ๋ค.
๋๋ผ๋ง ๋ฐฐ๊ฒฝ์ด ๋ 2008~2010๋
์๋ ์ด๋ฏธ ๊ตญ๋ด์ ์๋ฅดํจํฐ๋, ์น ๋ , ๋ฏธ๊ตญ ๋ฑ ์๋ฉ๋ฆฌ์นด์ฐ ํ์ด๊ฐ ์์
๋๊ณ ์์๊ธฐ ๋๋ฌธ์ด๋ค.
์ค์ ์กฐ๋ดํ ์ฒดํฌ ์์ ์ ํ์กฐํ๋ โํ๋ ฅ์ K์จโ๋ ํ์ด ์ฌ์
์ด ์๋๋ผ ์๋ฆฌ๋จ์ ์ ๋ฐ์ฉ ํน์์ฉ์ ๋ด์ ํ๋ ์ฌ์
์ ํ๋ฌ ์๋ฆฌ๋จ์ ๊ฐ์๋ค.
"""
inputs = ["summarize: " + text]
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
print(predicted_title)
๐ Documentation
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: None
- training_precision: float16
Training results
More information needed
Framework versions
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.5.1
- Tokenizers 0.12.1