B

Bert Base Japanese Char V2

Developed by tohoku-nlp
BERT model pre-trained on Japanese text using character-level tokenization and whole word masking, trained on the Japanese Wikipedia version as of August 31, 2020
Downloads 134.28k
Release Time : 3/2/2022

Model Overview

This is a BERT pre-trained model specifically designed for Japanese text, employing character-level tokenization and whole word masking strategies, suitable for various Japanese natural language processing tasks

Model Features

Character-level Tokenization
First uses MeCab+Unidic dictionary for word segmentation, then decomposes into characters, with a vocabulary size of 6144
Whole Word Masking Mechanism
In MLM tasks, all subword tokens of the same word are masked simultaneously
Professional Japanese Processing
Uses MeCab with mecab-ipadic-NEologd dictionary for text sentence segmentation

Model Capabilities

Japanese Text Understanding
Japanese Text Feature Extraction
Japanese Language Model Fine-tuning

Use Cases

Natural Language Processing
Japanese Text Classification
Can be used for tasks such as Japanese news classification and sentiment analysis
Japanese Question Answering System
Serves as a base model for building Japanese question answering systems
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase