๐ t5-base-korean-text-summary
This model is a fine-tuning of the paust/pko-t5-base model using AIHUB's "summary and report generation data". It can provide concise summaries of long Korean texts.
๐ Quick Start
This model fine-tunes the paust/pko-t5-base model with AIHUB's "summary and report generation data". It offers short summaries for long Korean sentences.
๐ป Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')
model_dir = "lcw99/t5-base-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
max_input_length = 512
text = """
์ฃผ์ธ๊ณต ๊ฐ์ธ๊ตฌ(ํ์ ์ฐ)๋ โ์๋ฆฌ๋จ์์ ํ์ด๊ฐ ๋ง์ด ๋๋๋ฐ ๋ค ๊ฐ๋ค๋ฒ๋ฆฐ๋คโ๋ ์น๊ตฌ
๋ฐ์์(ํ๋ด์)์ ์๊ธฐ๋ฅผ ๋ฃ๊ณ ์๋ฆฌ๋จ์ฐ ํ์ด๋ฅผ ํ๊ตญ์ ์์ถํ๊ธฐ ์ํด ์๋ฆฌ๋จ์ผ๋ก ๊ฐ๋ค.
๊ตญ๋ฆฝ์์ฐ๊ณผํ์ ์ธก์ โ์ค์ ๋ก ๋จ๋์์์ ํ์ด๊ฐ ๋ง์ด ์ด๊ณ ์๋ฅดํจํฐ๋๋ฅผ ๋น๋กฏํ ๋จ๋ฏธ ๊ตญ๊ฐ์์ ํ์ด๊ฐ ๋ง์ด ์กํ๋คโ๋ฉฐ
โ์๋ฆฌ๋จ ์ฐ์์๋ ํ์ด๊ฐ ๋ง์ด ์์ํ ๊ฒโ์ด๋ผ๊ณ ์ค๋ช
ํ๋ค.
๊ทธ๋ฌ๋ ๊ด์ธ์ฒญ์ ๋ฐ๋ฅด๋ฉด ํ๊ตญ์ ์๋ฆฌ๋จ์ฐ ํ์ด๊ฐ ์์
๋ ์ ์ ์๋ค.
์ผ๊ฐ์์ โ๋์ ๋ฒ๊ธฐ ์ํด ์๋ฆฌ๋จ์ฐ ํ์ด๋ฅผ ๊ตฌํ๋ฌ ๊ฐ ์ค์ ์ ๊ฐ์ฐ์ฑ์ด ๋จ์ด์ง๋คโ๋ ์ง์ ๋ ํ๋ค.
๋๋ผ๋ง ๋ฐฐ๊ฒฝ์ด ๋ 2008~2010๋
์๋ ์ด๋ฏธ ๊ตญ๋ด์ ์๋ฅดํจํฐ๋, ์น ๋ , ๋ฏธ๊ตญ ๋ฑ ์๋ฉ๋ฆฌ์นด์ฐ ํ์ด๊ฐ ์์
๋๊ณ ์์๊ธฐ ๋๋ฌธ์ด๋ค.
์ค์ ์กฐ๋ดํ ์ฒดํฌ ์์ ์ ํ์กฐํ๋ โํ๋ ฅ์ K์จโ๋ ํ์ด ์ฌ์
์ด ์๋๋ผ ์๋ฆฌ๋จ์ ์ ๋ฐ์ฉ ํน์์ฉ์ ๋ด์ ํ๋ ์ฌ์
์ ํ๋ฌ ์๋ฆฌ๋จ์ ๊ฐ์๋ค.
"""
inputs = ["summarize: " + text]
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
print(predicted_title)
๐ Documentation
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: None
- training_precision: float16
Training results
No specific results are provided.
Framework versions
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.5.1
- Tokenizers 0.12.1
Property |
Details |
Model Type |
Fine-tuning of paust/pko-t5-base |
Training Data |
AIHUB "summary and report generation data" |