B

Bert Base Ja

Developed by colorfulscoop
BERT base model trained on Japanese Wikipedia dataset, suitable for masked language modeling tasks in Japanese text
Downloads 16
Release Time : 3/2/2022

Model Overview

This is a BERT base model trained on Japanese Wikipedia dataset, primarily used for masked language modeling tasks in Japanese. The model adopts standard BERT architecture with a vocabulary size of 32,000.

Model Features

Japanese-specific vocabulary
Vocabulary size set to 32,000, specifically optimized for Japanese text
SentencePiece tokenizer
Uses SentencePiece model for tokenization, specially handling Japanese text that doesn't use spaces to separate words
Stable tokenization behavior
Uses DebertaV2Tokenizer to ensure consistent tokenization behavior across different environments

Model Capabilities

Japanese text understanding
Masked language modeling prediction

Use Cases

Education
Subject prediction
Predicting subjects students might be good at
Example: '得意な科目は[MASK]です' → '得意な科目は数学です' (My strong subject is [MASK] → My strong subject is mathematics)
Academic
Field of study prediction
Predicting academic fields of study
Example: '専門として[MASK]を専攻しています' → '専門として工学を専攻しています' (I'm majoring in [MASK] → I'm majoring in engineering)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase