B

Bert Base Japanese

Developed by tohoku-nlp
A BERT model pretrained on Japanese Wikipedia text, using IPA dictionary for word-level tokenization, suitable for Japanese natural language processing tasks.
Downloads 153.44k
Release Time : 3/2/2022

Model Overview

This is a BERT model pretrained on Japanese text, utilizing IPA dictionary for word-level tokenization followed by WordPiece subword tokenization, suitable for various Japanese natural language understanding tasks.

Model Features

Japanese-Specific Tokenization
Uses MeCab morphological analyzer with IPA dictionary for Japanese-specific tokenization, ensuring efficient processing of Japanese text.
Large-Scale Pretraining
Trained on 2.6GB of Japanese Wikipedia corpus, containing approximately 17 million sentences.
Standard BERT Architecture
Adopts the same architecture and training parameters as the original BERT, ensuring compatibility and reliability.

Model Capabilities

Japanese Text Understanding
Japanese Text Classification
Japanese Question Answering
Japanese Named Entity Recognition
Japanese Semantic Similarity Calculation

Use Cases

Text Analysis
Japanese Sentiment Analysis
Analyze the sentiment tendency of Japanese text
Japanese Text Classification
Classify Japanese documents
Information Extraction
Japanese Named Entity Recognition
Extract entities such as person names and locations from Japanese text
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase