B

Bert Base Japanese Char

Developed by tohoku-nlp
A BERT model pretrained on Japanese text using character-level tokenization, suitable for Japanese natural language processing tasks.
Downloads 116.10k
Release Time : 3/2/2022

Model Overview

This is a BERT model pretrained on Japanese Wikipedia text, utilizing IPA dictionary for word-level tokenization followed by character-level tokenization, suitable for various Japanese natural language understanding tasks.

Model Features

Character-level Tokenization
Employs a dual processing approach of word-level followed by character-level tokenization, better suited for Japanese language characteristics
Large-scale Pretraining
Trained on 2.6GB of Japanese Wikipedia text, containing approximately 17 million sentences
Compatibility with Original BERT
Model architecture and training parameters remain consistent with original BERT, facilitating transfer learning

Model Capabilities

Japanese Text Understanding
Japanese Text Classification
Japanese Question Answering Systems
Japanese Named Entity Recognition

Use Cases

Natural Language Processing
Japanese Text Classification
Sentiment analysis or topic classification for Japanese news, reviews, etc.
Japanese Question Answering System
Building intelligent Q&A applications in Japanese
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase