KoMiniLM Open-Source Korean Language Model - Resolve the Latency and Capacity Issues of Large Models, Lightweight and Efficient

Kominilm

Developed by BM-K

KoMiniLM is a lightweight Korean language model designed to address latency and capacity limitations of large language models in practical applications.

Large Language Model

Transformers

#Korean lightweight model #Knowledge distillation #Low parameter optimization

Downloads 244

Release Time : 5/23/2022

Model Overview

KoMiniLM is a lightweight Korean language model that extracts knowledge from the teacher model KLUE-BERT through distillation techniques, suitable for various Korean natural language processing tasks.

Model Features

Lightweight Design

Small model parameter size (23M/68M), suitable for deployment and use in resource-limited environments.

Knowledge Distillation

Enhances model performance by distilling knowledge from the KLUE-BERT teacher model through self-attention distribution and self-attention value relationships.

Multi-Task Support

Performs excellently on various Korean NLP tasks, including sentiment analysis, named entity recognition, question answering, etc.

Model Capabilities

Text Classification

Named Entity Recognition

Question Answering System

Text Similarity Calculation

Sentiment Analysis

Use Cases

Sentiment Analysis

Movie Review Sentiment Analysis

Performs sentiment analysis on movie reviews using the NSMC dataset.

Accuracy 89.67±0.03 (23M model)

Named Entity Recognition

Naver NER Task

Tested on the NER task of the Naver NLP Challenge 2018.

F1 score 84.79±0.09 (23M model)

Question Answering System

KorQuAD Question Answering

Tested on the Korean question answering dataset KorQuAD.

EM/F1 score 82.11±0.42 / 91.21±0.29 (23M model)

🚀 KoMiniLM

🐣 A lightweight Korean language model designed to overcome the limitations of large - scale models in real - world applications.

🚀 Quick Start

Current language models often contain hundreds of millions of parameters, which pose challenges for fine - tuning and online serving in real - life applications due to latency and capacity constraints. In this project, we introduce a lightweight Korean language model to address these shortcomings.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM") # 23M model
model = AutoModel.from_pretrained("BM-K/KoMiniLM")

inputs = tokenizer("안녕 세상아!", return_tensors="pt")
outputs = model(**inputs)

📈 Update History

** Updates on 2022.06.20 **

Released KoMiniLM-bert-68M

** Updates on 2022.05.24 **

Released KoMiniLM-bert-23M

📚 Documentation

Pre - training

Teacher Model: KLUE-BERT(base)

Objective

Self - Attention Distribution and Self - Attention Value - Relation [Wang et al., 2020] were distilled from each discrete layer of the teacher model to the student model. Unlike the approach in [Wang et al., 2020], which distilled in the last layer of the transformer, our project distills from each layer.

Data Sets

Property	Details
News comments	10G
News article	10G

Configuration

KoMiniLM-23M

{
  "architectures": [
    "BertForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "output_attentions": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "torch_dtype": "float32",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 32000
}

Performance on Subtasks

The results of our fine - tuning experiments are an average of 3 runs for each task.

cd KoMiniLM-Finetune
bash scripts/run_all_kominilm.sh

Model	#Param	Average	NSMC (Acc)	Naver NER (F1)	PAWS (Acc)	KorNLI (Acc)	KorSTS (Spearman)	Question Pair (Acc)	KorQuaD (Dev) (EM/F1)
KoBERT(KLUE)	110M	86.84	90.20±0.07	87.11±0.05	81.36±0.21	81.06±0.33	82.47±0.14	95.03±0.44	84.43±0.18 / 93.05±0.04
KcBERT	108M	78.94	89.60±0.10	84.34±0.13	67.02±0.42	74.17±0.52	76.57±0.51	93.97±0.27	60.87±0.27 / 85.01±0.14
KoBERT(SKT)	92M	79.73	89.28±0.42	87.54±0.04	80.93±0.91	78.18±0.45	75.98±2.81	94.37±0.31	51.94±0.60 / 79.69±0.66
DistilKoBERT	28M	74.73	88.39±0.08	84.22±0.01	61.74±0.45	70.22±0.14	72.11±0.27	92.65±0.16	52.52±0.48 / 76.00±0.71

KoMiniLM^†	68M	85.90	89.84±0.02	85.98±0.09	80.78±0.30	79.28±0.17	81.00±0.07	94.89±0.37	83.27±0.08 / 92.08±0.06
KoMiniLM^†	23M	84.79	89.67±0.03	84.79±0.09	78.67±0.45	78.10±0.07	78.90±0.11	94.81±0.12	82.11±0.42 / 91.21±0.29

NSMC (Naver Sentiment Movie Corpus)
Naver NER (NER task on Naver NLP Challenge 2018)
PAWS (Korean Paraphrase Adversaries from Word Scrambling)
KorNLI/KorSTS (Korean Natural Language Understanding)
Question Pair (Paired Question)
KorQuAD (The Korean Question Answering Dataset)

📖 Reference

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご