B

Bert Base Japanese V2

Developed by tohoku-nlp
BERT model pretrained on Japanese Wikipedia using Unidic dictionary for word-level tokenization and whole word masking
Downloads 12.59k
Release Time : 3/2/2022

Model Overview

This is a BERT base model optimized for Japanese text, primarily used for natural language processing tasks such as text classification and named entity recognition.

Model Features

Whole Word Masking Training
Adopts whole word masking strategy where all subword tokens of the same word are masked simultaneously to enhance model comprehension
Unidic Dictionary Tokenization
Uses Unidic 2.1.2 dictionary for word-level tokenization combined with WordPiece subword segmentation for input text processing
Large-scale Pretraining Data
Based on Japanese Wikipedia dump from August 31, 2020, containing approximately 30 million sentences

Model Capabilities

Japanese text comprehension
Masked language modeling
Text feature extraction

Use Cases

Natural Language Processing
Text Classification
Classifying Japanese text
Named Entity Recognition
Identifying entities such as person names and locations in Japanese text
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase