Ko Core News Md
CPU-optimized Korean processing pipeline with complete NLP functions including tokenization, part-of-speech tagging, dependency parsing, named entity recognition, etc.
Downloads 16
Release Time : 5/2/2022
Model Overview
Medium-sized Korean processing model for spaCy, trained on UD Korean Kaist and KLUE datasets, supporting multi-task processing for Korean text
Model Features
Multi-task Processing
Single pipeline handles tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and more simultaneously
CPU Optimization
Specially optimized for CPU environments, suitable for resource-constrained production environments
High-quality Word Vectors
Includes floret word vectors (50,000 words, 300 dimensions) for better semantic understanding
Comprehensive Korean Support
Covers Korean-specific grammatical structures and morphological changes, including complex particles and endings
Model Capabilities
Tokenization
Part-of-speech Tagging (XPOS/UPOS)
Lemmatization
Dependency Parsing
Named Entity Recognition
Sentence Segmentation
Use Cases
Text Processing
Korean Text Analysis
Perform grammatical analysis and structural parsing on Korean news and social media content
Accurately identifies sentence components and grammatical relationships
Information Extraction
Extract named entities like person names, locations, and organizations from Korean documents
NER F-score reaches 82.86%
Language Learning
Korean Grammar Analysis
Helps learners understand Korean sentence structure and morphological changes
POS tagging accuracy rate of 83.52-94.58%
đ ko_core_news_md
ko_core_news_md
is a Korean language processing pipeline optimized for CPU, which can perform tasks such as part - of - speech tagging, morphological analysis, dependency parsing, named entity recognition, etc.
đ Quick Start
For more details, please visit: https://spacy.io/models/ko#ko_core_news_md
This Korean pipeline is optimized for CPU. Components: tok2vec, tagger, morphologizer, parser, lemmatizer (trainable_lemmatizer), senter, ner.
đ Documentation
Model Information
Property | Details |
---|---|
Model Name | ko_core_news_md |
Version | 3.7.0 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | tok2vec , tagger , morphologizer , parser , lemmatizer , attribute_ruler , ner |
Components | tok2vec , tagger , morphologizer , parser , lemmatizer , senter , attribute_ruler , ner |
Vectors | floret (50000, 300) |
Sources | UD Korean Kaist v2.8 (Choi, Jinho; Han, Na - Rae; Hwang, Jena; Chun, Jayeol) KLUE v1.1.0 (Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Ryu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung - Woo Ha, Kyunghyun Cho) Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl) (Explosion) |
License | CC BY - SA 4.0 |
Author | Explosion |
Label Scheme
View label scheme (2028 labels for 4 components)
Component | Labels |
---|---|
tagger |
_SP , ecs , etm , f , f+f+jcj , f+f+jcs , f+f+jct , f+f+jxt , f+jca , f+jca+jp+ecc , f+jca+jp+ep+ef , f+jca+jxc , f+jca+jxc+jcm , f+jca+jxt , f+jcj , f+jcm , f+jco , f+jcs , f+jct , f+jct+jcm , f+jp+ef , f+jp+ep+ef , f+jp+etm , f+jxc , f+jxt , f+ncn , f+ncn+jcm , f+ncn+jcs , f+ncn+jp+ecc , f+ncn+jxt , f+ncpa+jcm , f+npp+jcs , f+nq , f+xsn , f+xsn+jco , f+xsn+jxt , ii , jca , jca+jcm , jca+jxc , jca+jxt , jcc , jcj , jcm , jco , jcr , jcr+jxc , jcs , jct , jct+jxt , jp+ecc , jp+ecs , jp+ef , jp+ef+jcr , jp+ef+jcr+jxc , jp+ep+ecs , jp+ep+ef , jp+ep+etm , jp+ep+etn , jp+etm , jp+etn , jp+etn+jco , jp+etn+jxc , jxc , jxc+jca , jxc+jco , jxc+jcs , jxt , mad , mad+jxc , mad+jxt , mag , mag+jca , mag+jcm , mag+jcs , mag+jp+ef+jcr , mag+jxc , mag+jxc+jxc , mag+jxt , mag+xsn , maj , maj+jxc , maj+jxt , mma , mmd , nbn , nbn+jca , nbn+jca+jcj , nbn+jca+jcm , nbn+jca+jp+ef , nbn+jca+jxc , nbn+jca+jxt , nbn+jcc , nbn+jcj , nbn+jcm , nbn+jco , nbn+jcr , nbn+jcs , nbn+jct , nbn+jct+jcm , nbn+jct+jxt , nbn+jp+ecc , nbn+jp+ecs , nbn+jp+ecs+jca , nbn+jp+ecs+jcm , nbn+jp+ecs+jco , nbn+jp+ecs+jxc , nbn+jp+ecs+jxt , nbn+jp+ecx , nbn+jp+ef , nbn+jp+ef+jca , nbn+jp+ef+jco , nbn+jp+ef+jcr , nbn+jp+ef+jcr+jxc , nbn+jp+ef+jcr+jxt , nbn+jp+ef+jcs , nbn+jp+ef+jxc , nbn+jp+ef+jxc+jco , nbn+jp+ef+jxf , nbn+jp+ef+jxt , nbn+jp+ep+ecc , nbn+jp+ep+ecs , nbn+jp+ep+ecs+jxc , nbn+jp+ep+ef , nbn+jp+ep+ef+jcr , nbn+jp+ep+etm , nbn+jp+ep+etn , nbn+jp+ep+etn+jco , nbn+jp+ep+etn+jcs , nbn+jp+etm , nbn+jp+etn , nbn+jp+etn+jca , nbn+jp+etn+jca+jxt , nbn+jp+etn+jco , nbn+jp+etn+jcs , nbn+jp+etn+jxc , nbn+jp+etn+jxt , nbn+jxc , nbn+jxc+jca , nbn+jxc+jca+jxc , nbn+jxc+jca+jxt , nbn+jxc+jcc , nbn+jxc+jcm , nbn+jxc+jco , nbn+jxc+jcs , nbn+jxc+jp+ef , nbn+jxc+jxc , nbn+jxc+jxt , nbn+jxt , nbn+nbn , nbn+nbn+jp+ef , nbn+xsm+ecs , nbn+xsm+ef , nbn+xsm+ep+ef , nbn+xsm+ep+ef+jcr , nbn+xsm+etm , nbn+xsn , nbn+xsn+jca , nbn+xsn+jca+jp+ef+jcr , nbn+xsn+jca+jxc , nbn+xsn+jca+jxt , nbn+xsn+jcm , nbn+xsn+jco , nbn+xsn+jcs , nbn+xsn+jct , nbn+xsn+jp+ecc , nbn+xsn+jp+ecs , nbn+xsn+jp+ef , nbn+xsn+jp+ef+jcr , nbn+xsn+jp+ep+ef , nbn+xsn+jxc , nbn+xsn+jxt , nbn+xsv+etm , nbu , nbu+jca , nbu+jca+jxc , nbu+jca+jxt , nbu+jcc , nbu+jcc+jxc , nbu+jcj , nbu+jcm , nbu+jco , nbu+jcs , nbu+jct , nbu+jct+jxc , nbu+jp+ecc , nbu+jp+ecs , nbu+jp+ef , nbu+jp+ef+jcr , nbu+jp+ef+jxc , nbu+jp+ep+ecc , nbu+jp+ep+ecs , nbu+jp+ep+ef , nbu+jp+ep+ef+jcr , nbu+jp+ep+etm , nbu+jp+ep+etn+jco , nbu+jp+etm , nbu+jxc , nbu+jxc+jca , nbu+jxc+jcs , nbu+jxc+jp+ef , nbu+jxc+jp+ep+ef , nbu+jxc+jxt , nbu+jxt , nbu+ncn , nbu+ncn+jca , nbu+ncn+jcm , nbu+xsn , nbu+xsn+jca , nbu+xsn+jca+jxc , nbu+xsn+jca+jxt , nbu+xsn+jcm , nbu+xsn+jco , nbu+xsn+jcs , nbu+xsn+jp+ecs , nbu+xsn+jp+ep+ef , nbu+xsn+jxc , nbu+xsn+jxc+jxt , nbu+xsn+jxt , nbu+xsv+ecc , nbu+xsv+etm , ncn , ncn+f+ncpa+jco , ncn+jca , ncn+jca+jca , ncn+jca+jcc , ncn+jca+jcj , ncn+jca+jcm , ncn+jca+jcs , ncn+jca+jct , ncn+jca+jp+ecc , ncn+jca+jp+ecs , ncn+jca+jp+ef , ncn+jca+jp+ep+ef , ncn+jca+jp+etm , ncn+jca+jp+etn+jxt , ncn+jca+jxc , ncn+jca+jxc+jcc , ncn+jca+jxc+jcm , ncn+jca+jxc+jxc , ncn+jca+jxc+jxt , ncn+jca+jxt , ncn+jcc , ncn+jcc+jxc , ncn+jcj , ncn+jcj+jxt , ncn+jcm , ncn+jco , ncn+jcr , ncn+jcr+jxc , ncn+jcs , ncn+jcs+jxt , ncn+jct , ncn+jct+jcm , ncn+jct+jxc , ncn+jct+jxt , ncn+jcv , ncn+jp+ecc , ncn+jp+ecc+jct , ncn+jp+ecc+jxc , ncn+jp+ecs , ncn+jp+ecs+jcm , ncn+jp+ecs+jco , ncn+jp+ecs+jxc , ncn+jp+ecs+jxt , ncn+jp+ecx , ncn+jp+ef , ncn+jp+ef+jca , ncn+jp+ef+jcm , ncn+jp+ef+jco , ncn+jp+ef+jcr , ncn+jp+ef+jcr+jxc , ncn+jp+ef+jcr+jxt , ncn+jp+ef+jp+etm , ncn+jp+ef+jxc , ncn+jp+ef+jxf , ncn+jp+ef+jxt , ncn+jp+ep+ecc , ncn+jp+ep+ecs , ncn+jp+ep+ecs+jxc , ncn+jp+ep+ecx , ncn+jp+ep+ef , ncn+jp+ep+ef+jcr , ncn+jp+ep+ef+jcr+jxc , ncn+jp+ep+ef+jxc , ncn+jp+ep+ef+jxf , ncn+jp+ep+ef+jxt , ncn+jp+ep+ep+etm , ncn+jp+ep+etm , ncn+jp+ep+etn , ncn+jp+ep+etn+jca , ncn+jp+ep+etn+jca+jxc , ncn+jp+ep+etn+jco , ncn+jp+ep+etn+jcs , ncn+jp+ep+etn+jxt , ncn+jp+etm , ncn+jp+etn , ncn+jp+etn+jca , ncn+jp+etn+jca+jxc , ncn+jp+etn+jca+jxt , ncn+jp+etn+jco , ncn+jp+etn+jcs , ncn+jp+etn+jct , ncn+jp+etn+jxc , ncn+jp+etn+jxt , ncn+jxc , ncn+jxc+jca , ncn+jxc+jca+jxc , ncn+jxc+jca+jxt , ncn+jxc+jcc , ncn+jxc+jcm , ncn+jxc+jco , ncn+jxc+jcs , ncn+jxc+jp+ef , ncn+jxc+jxc , ncn+jxc+jxt , ncn+jxt , ncn+jxt+jcm , ncn+jxt+jxc , ncn+nbn , ncn+nbn+jca , ncn+nbn+jcm , ncn+nbn+jcs , ncn+nbn+jp+ecc , ncn+nbn+jp+ep+ef , ncn+nbn+jxc , ncn+nbn+jxt , ncn+nbu , ncn+nbu+jca , ncn+nbu+jcm , ncn+nbu+jco , ncn+nbu+jp+ef , ncn+nbu+jxc , ncn+nbu+ncn , ncn+ncn , ncn+ncn+jca , ncn+ncn+jca+jcc , ncn+ncn+jca+jcm , ncn+ncn+jca+jxc , ncn+ncn+jca+jxc+jcm , ncn+ncn+jca+jxc+jxc , ncn+ncn+jca+jxt , ncn+ncn+jcc , ncn+ncn+jcj , ncn+ncn+jcm , ncn+ncn+jco , ncn+ncn+jcr , ncn+ncn+jcs , ncn+ncn+jct , ncn+ncn+jct+jcm , ncn+ncn+jct+jxc , ncn+ncn+jct+jxt , ncn+ncn+jp+ecc , ncn+ncn+jp+ecs , ncn+ncn+jp+ef , ncn+ncn+jp+ef+jcm , ncn+ncn+jp+ef+jcr , ncn+ncn+jp+ef+jcs , ncn+ncn+jp+ep+ecc , ncn+ncn+jp+ep+ecs , ncn+ncn+jp+ep+ef , ncn+ncn+jp+ep+ef+jcr , ncn+ncn+jp+ep+ep+etm , ncn+ncn+jp+ep+etm , ncn+ncn+jp+ep+etn , ncn+ncn+jp+etm , ncn+ncn+jp+etn , ncn+ncn+jp+etn+jca , ncn+ncn+jp+etn+jco , ncn+ncn+jp+etn+jxc , ncn+ncn+jxc , ncn+ncn+jxc+jca , ncn+ncn+jxc+jcc , ncn+ncn+jxc+jcm , ncn+ncn+jxc+jco , ncn+ncn+jxc+jcs , ncn+ncn+jxc+jxc , ncn+ncn+jxt , ncn+ncn+nbn , ncn+ncn+ncn , ncn+ncn+ncn+jca , ncn+ncn+ncn+jca+jcm , ncn+ncn+ncn+jca+jxt , ncn+ncn+ncn+jcj , ncn+ncn+ncn+jcm , ncn+ncn+ncn+jco , ncn+ncn+ncn+jcs , ncn+ncn+ncn+jct+jxt , ncn+ncn+ncn+jp+etn+jxc , ncn+ncn+ncn+jxt , ncn+ncn+ncn+ncn+jca , ncn+ncn+ncn+ncn+jca+jxt , ncn+ncn+ncn+ncn+jco , ncn+ncn+ncn+xsn+jp+etm , ncn+ncn+ncpa , ncn+ncn+ncpa+jca , ncn+ncn+ncpa+jcm , ncn+ncn+ncpa+jco , ncn+ncn+ncpa+jcs , ncn+ncn+ncpa+jxc , ncn+ncn+ncpa+jxt , ncn+ncn+ncpa+ncn , ncn+ncn+ncpa+ncn+jca , ncn+ncn+ncpa+ncn+jcj , ncn+ncn+ncpa+ncn+jcm , ncn+ncn+ncpa+ncn+jxt , ncn+ncn+xsn , ncn+ncn+xsn+jca , ncn+ncn+xsn+jca+jxt , ncn+ncn+xsn+jcj , ncn+ncn+xsn+jcm , ncn+ncn+xsn+jco , ncn+ncn+xsn+jcs , ncn+ncn+xsn+jct , ncn+ncn+xsn+jp+ecs , ncn+ncn+xsn+jp+ep+ef , ncn+ncn+xsn+jp+etm , ncn+ncn+xsn+jxc , ncn+ncn+xsn+jxc+jcs , ncn+ncn+xsn+jxt , ncn+ncn+xsv+ecc , ncn+ncn+xsv+etm , ncn+ncpa , ncn+ncpa+jca , ncn+ncpa+jca+jcm , ncn+ncpa+jca+jxc , ncn+ncpa+jca+jxt , ncn+ncpa+jcc , ncn+ncpa+jcj , ncn+ncpa+jcm , ncn+ncpa+jco , ncn+ncpa+jcr , ncn+ncpa+jcs , ncn+ncpa+jct , ncn+ncpa+jct+jcm , ncn+ncpa+jct+jxt , ncn+ncpa+jp+ecc , ncn+ncpa+jp+ecc+jxc , ncn+ncpa+jp+ecs , ncn+ncpa+jp+ecs+jxc , ncn+ncpa+jp+ef , ncn+ncpa+jp+ef+jcr , ncn+ncpa+jp+ef+jcr+jxc , ncn+ncpa+jp+ep+ef , ncn+ncpa+jp+ep+etm , ncn+ncpa+jp+ep+etn , ncn+ncpa+jp+etm , ncn+ncpa+jxc , ncn+ncpa+jxc+jca+jxc , ncn+ncpa+jxc+jco , ncn+ncpa+jxc+jcs , ncn+ncpa+jxt , ncn+ncpa+nbn+jcs , ncn+ncpa+ncn , ncn+ncpa+ncn+jca , ncn+ncpa+ncn+jca+jcm , ncn+ncpa+ncn+jca+jxc , ncn+ncpa+ncn+jca+jxt , ncn+ncpa+ncn+jcj , ncn+ncpa+ncn+jcm , ncn+ncpa+ncn+jco , ncn+ncpa+ncn+jcs , ncn+ncpa+ncn+jct , ncn+ncpa+ncn+jct+jcm , ncn+ncpa+ncn+jp+ef+jcr , ncn+ncpa+ncn+jp+ep+etm , ncn+ncpa+ncn+jxc , ncn+ncpa+ncn+jxt , ncn+ncpa+ncn+xsn+jcm , ncn+ncpa+ncn+xsn+jxt , ncn+ncpa+ncpa , ncn+ncpa+ncpa+jca , ncn+ncpa+ncpa+jcj , ncn+ncpa+ncpa+jcm , ncn+ncpa+ncpa+jco , ncn+ncpa+ncpa+jcs , ncn+ncpa+ncpa+jp+ep+ef , ncn+ncpa+ncpa+jxt , ncn+ncpa+ncpa+ncn , ncn+ncpa+xsn , ncn+ncpa+xsn+jcm , ncn+ncpa+xsn+jco , ncn+ncpa+xsn+jcs , ncn+ncpa+xsn+jp+ecc , ncn+ncpa+xsn+jp+etm , ncn+ncpa+xsn+jxt , ncn+ncpa+xsv+ecc , ncn+ncpa+xsv+ecs , ncn+ncpa+xsv+ecx , ncn+ncpa+xsv+ecx+px+etm , ncn+ncpa+xsv+ef , ncn+ncpa+xsv+ef+jcm , ncn+ncpa+xsv+ef+jcr , ncn+ncpa+xsv+etm , (truncated: full list in pipeline meta) |
morphologizer |
POS=CCONJ , POS=ADV , POS=SCONJ , POS=DET , POS=NOUN , POS=VERB , POS=ADJ , POS=PUNCT , POS=SPACE , POS=AUX , POS=PRON , POS=PROPN , POS=NUM , POS=INTJ , POS=PART , POS=X , POS=ADP , POS=SYM |
parser |
ROOT , acl , advcl , advmod , amod , appos , aux , case , cc , ccomp , compound , conj , cop , csubj , dep , det , dislocated , fixed , flat , iobj , mark , nmod , nsubj , nummod , obj , obl , punct , xcomp |
ner |
DT , LC , OG , PS , QT , TI |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
100.00 |
TOKEN_P |
100.00 |
TOKEN_R |
100.00 |
TOKEN_F |
100.00 |
TAG_ACC |
83.52 |
POS_ACC |
94.58 |
SENTS_P |
100.00 |
SENTS_R |
100.00 |
SENTS_F |
100.00 |
DEP_UAS |
83.89 |
DEP_LAS |
80.87 |
LEMMA_ACC |
89.94 |
ENTS_P |
84.97 |
ENTS_R |
80.85 |
ENTS_F |
82.86 |
đ License
This project is licensed under the CC BY - SA 4.0
license.
Indonesian Roberta Base Posp Tagger
MIT
This is a POS tagging model fine-tuned based on the Indonesian RoBERTa model, trained on the indonlu dataset for Indonesian text POS tagging tasks.
Sequence Labeling
Transformers Other

I
w11wo
2.2M
7
Bert Base NER
MIT
BERT fine-tuned named entity recognition model capable of identifying four entity types: Location (LOC), Organization (ORG), Person (PER), and Miscellaneous (MISC)
Sequence Labeling English
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
This model is a sequence labeling model fine-tuned on RoBERTa, designed to identify and remove Protected Health Information (PHI/PII) from medical records.
Sequence Labeling
Transformers Supports Multiple Languages

D
obi
1.1M
33
Ner English Fast
Flair's built-in fast English 4-class named entity recognition model, based on Flair embeddings and LSTM-CRF architecture, achieving an F1 score of 92.92 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
978.01k
24
French Camembert Postag Model
French POS tagging model based on Camembert-base, trained using the free-french-treebank dataset
Sequence Labeling
Transformers French

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
A Spanish named entity recognition model fine-tuned based on the XLM-Roberta-large architecture, with excellent performance on the CoNLL-2002 dataset.
Sequence Labeling
Transformers Spanish

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
Named entity recognition model fine-tuned on Indonesian NER tasks based on NusaBert-v1.3
Sequence Labeling
Transformers Other

N
cahya
759.09k
3
Ner English Large
Flair framework's built-in large English NER model for 4 entity types, utilizing document-level XLM-R embeddings and FLERT technique, achieving an F1 score of 94.36 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
749.04k
44
Punctuate All
MIT
A multilingual punctuation prediction model fine-tuned based on xlm-roberta-base, supporting automatic punctuation completion for 12 European languages
Sequence Labeling
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
Japanese named entity recognition model fine-tuned based on xlm-roberta-base
Sequence Labeling
Transformers Supports Multiple Languages

X
tsmatz
630.71k
25
Featured Recommended AI Models
Š 2025AIbase