ja_core_news_lg Open-source Japanese Processing Model - Free to Implement Functions such as Word Segmentation and Part-of-Speech Tagging

Ja Core News Lg

Developed by spacy

spaCy's CPU-optimized Japanese processing pipeline, including tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and more

Sequence Labeling Japanese#Japanese text processing #High-precision part-of-speech tagging #Dependency parsing

Downloads 53

Release Time : 3/2/2022

Model Overview

This is a Japanese natural language processing model trained on the Universal Dependencies Japanese corpus, supporting tasks such as part-of-speech tagging, dependency parsing, and named entity recognition. The model is optimized for CPU usage, making it suitable for Japanese text analysis tasks.

Model Features

CPU Optimization

The model is specifically optimized for CPU usage, making it suitable for running in environments without GPUs

Comprehensive NLP Features

Provides a complete set of natural language processing functions, from basic tokenization to advanced named entity recognition

High-Quality Word Vectors

Includes 480,443 300-dimensional word vectors based on the chiVe word embedding model

Model Capabilities

Japanese tokenization

Part-of-speech tagging

Named entity recognition

Dependency parsing

Lemmatization

Sentence segmentation

Use Cases

Text Analysis

Japanese News Analysis

Analyze Japanese news texts to extract entities, relations, and events

NER F-score reached 71.19%

Japanese Text Preprocessing

Prepare Japanese text data for machine learning tasks

Part-of-speech tagging accuracy reached 97.42%

Language Learning

Japanese Grammar Analysis

Help learners analyze Japanese sentence structures

Dependency parsing LAS score 90.90

🚀 ja_core_news_lg

A Japanese language processing pipeline optimized for CPU, suitable for various token - classification tasks.

🚀 Quick Start

For detailed information about this model, please visit: Details

✨ Features

Optimized for CPU, enabling efficient processing of Japanese language tasks.
Equipped with multiple components including tok2vec, morphologizer, parser, senter, ner, and attribute_ruler.

📚 Documentation

Model Information

Property	Details
Model Type	`ja_core_news_lg`
Version	`3.7.0`
spaCy Compatibility	`>=3.7.0,<3.8.0`
Default Pipeline	`tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `ner`
Components	`tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `ner`
Vectors	480443 keys, 480443 unique vectors (300 dimensions)
Sources	UD Japanese GSD v2.8 (Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel) UD Japanese GSD v2.8 NER (Megagon Labs Tokyo) chiVe: Japanese Word Embedding with Sudachi & NWJC (chive-1.1-mc90-500k) (Works Applications)
License	`CC BY - SA 4.0`
Author	Explosion

Label Scheme

View label scheme (65 labels for 3 components)

Component	Labels
`morphologizer`	`POS=NOUN`, `POS=ADP`, `POS=VERB`, `POS=SCONJ`, `POS=AUX`, `POS=PUNCT`, `POS=PART`, `POS=DET`, `POS=NUM`, `POS=ADV`, `POS=PRON`, `POS=ADJ`, `POS=PROPN`, `POS=CCONJ`, `POS=SYM`, `POS=NOUN\|Polarity=Neg`, `POS=AUX\|Polarity=Neg`, `POS=SPACE`, `POS=INTJ`, `POS=SCONJ\|Polarity=Neg`
`parser`	`ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	99.37
`TOKEN_P`	97.64
`TOKEN_R`	97.88
`TOKEN_F`	97.76
`POS_ACC`	97.42
`MORPH_ACC`	0.00
`MORPH_MICRO_P`	34.01
`MORPH_MICRO_R`	98.04
`MORPH_MICRO_F`	50.51
`SENTS_P`	95.56
`SENTS_R`	97.63
`SENTS_F`	96.59
`DEP_UAS`	92.12
`DEP_LAS`	90.90
`TAG_ACC`	97.13
`LEMMA_ACC`	96.70
`ENTS_P`	73.88
`ENTS_R`	68.68
`ENTS_F`	71.19

📄 License

This model is licensed under CC BY - SA 4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご