R

Roberta Base Japanese With Auto Jumanpp

Developed by nlp-waseda
A Japanese pretrained model based on RoBERTa architecture, supporting automatic Juman++ tokenization, suitable for Japanese natural language processing tasks.
Downloads 536
Release Time : 10/15/2022

Model Overview

This is a base model based on Japanese RoBERTa, pretrained on Japanese Wikipedia and the Japanese portion of CC-100, supporting masked language modeling and downstream task fine-tuning.

Model Features

Auto Juman++ Tokenization Support
BertJapaneseTokenizer now supports automatic tokenization with Juman++, simplifying Japanese text processing.
Large-scale Pretraining Data
The model was trained on Japanese Wikipedia and the Japanese portion of CC-100, covering a wide range of Japanese language features.
Optimized Training Process
Trained for one week using 8 NVIDIA A100 GPUs with advanced training strategies and hyperparameter settings.

Model Capabilities

Japanese Text Understanding
Masked Language Modeling
Downstream Task Fine-tuning

Use Cases

Natural Language Processing
Text Completion
Using masked language modeling to fill in missing parts of Japanese sentences
Text Classification
Achieving Japanese text classification tasks through model fine-tuning
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase