R

Roberta Large Japanese With Auto Jumanpp

Developed by nlp-waseda
A large Japanese RoBERTa model pretrained on Japanese Wikipedia and the Japanese portion of CC-100, supporting automatic Juman++ tokenization
Downloads 139
Release Time : 10/15/2022

Model Overview

This is a large Japanese RoBERTa model specifically pretrained for Japanese natural language processing tasks, supporting masked language modeling and fine-tuning for downstream tasks.

Model Features

Automatic Juman++ Tokenization
Supports automatic tokenization with Juman++, simplifying preprocessing workflows
Large-scale Pretraining
Pretrained on Japanese Wikipedia and the Japanese portion of CC-100, covering a wide range of Japanese corpora
High-performance Tokenization
Combines JumanDIC and sentencepiece to provide a rich vocabulary of 32,000 tokens

Model Capabilities

Japanese Text Understanding
Masked Language Modeling
Downstream Task Fine-tuning

Use Cases

Natural Language Processing
Text Completion
Predicts words masked by [MASK] tokens in sentences
Text Classification
Can be fine-tuned for classification tasks such as sentiment analysis
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase