Bert Base Japanese
B
Bert Base Japanese
Developed by tohoku-nlp
A BERT model pretrained on Japanese Wikipedia text, using IPA dictionary for word-level tokenization, suitable for Japanese natural language processing tasks.
Downloads 153.44k
Release Time : 3/2/2022
Model Overview
This is a BERT model pretrained on Japanese text, utilizing IPA dictionary for word-level tokenization followed by WordPiece subword tokenization, suitable for various Japanese natural language understanding tasks.
Model Features
Japanese-Specific Tokenization
Uses MeCab morphological analyzer with IPA dictionary for Japanese-specific tokenization, ensuring efficient processing of Japanese text.
Large-Scale Pretraining
Trained on 2.6GB of Japanese Wikipedia corpus, containing approximately 17 million sentences.
Standard BERT Architecture
Adopts the same architecture and training parameters as the original BERT, ensuring compatibility and reliability.
Model Capabilities
Japanese Text Understanding
Japanese Text Classification
Japanese Question Answering
Japanese Named Entity Recognition
Japanese Semantic Similarity Calculation
Use Cases
Text Analysis
Japanese Sentiment Analysis
Analyze the sentiment tendency of Japanese text
Japanese Text Classification
Classify Japanese documents
Information Extraction
Japanese Named Entity Recognition
Extract entities such as person names and locations from Japanese text
Featured Recommended AI Models
Š 2025AIbase