llm-jp-modernbert-base Open-Source Large Japanese Language Model - Supports Long Sequences and Trained with Massive Corpora

Llm Jp Modernbert Base

Developed by llm-jp

A Japanese large language model based on the modernBERT-base architecture, supporting a maximum sequence length of 8192, trained on 3.4TB of Japanese corpus

Large Language Model

Transformers

JapaneseOpen Source License:Apache-2.0 #Japanese Large Language Model #Long Text Processing #Masked Language Modeling

Downloads 1,398

Release Time : 4/25/2025

Model Overview

This model is a BERT variant optimized for Japanese, adopting the modernBERT architecture and llm-jp-tokenizer, suitable for Japanese text understanding and generation tasks

Model Features

Long Context Support

Supports a maximum sequence length of 8192, suitable for processing long texts

Large-scale Training Data

Trained using the Japanese subset (3.4TB) of llm-jp-corpus v4

Optimized Tokenizer

Uses the llm-jp-tokenizer, specifically optimized for Japanese text

Model Capabilities

Japanese Text Understanding

Masked Language Prediction

Long Text Processing

Use Cases

Natural Language Processing

Japanese Text Completion

Predicts masked parts in the text

Example correctly predicts 'Tokyo' in '日本の首都は東京です'

Japanese Text Classification

Can be used for tasks such as sentiment analysis and topic classification

🚀 llm-jp-modernbert-base

This model is based on the modernBERT-base architecture with llm-jp-tokenizer. It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.

🚀 Quick Start

This model is based on the modernBERT-base architecture with llm-jp-tokenizer. It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.

For detailed information on the training methods, evaluation, and analysis results, please visit llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length.

📦 Installation

Please install the transformers library.

pip install "transformers>=4.48.0"

If your GPU supports flash-attn 2, it is recommended to install flash-attn.

pip install flash-attn --no-build-isolation

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "llm-jp/llm-jp-modernbert-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "日本の首都は<MASK|LLM-jp>です。"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# To get predictions for the mask:
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
# Predicted token:  東京

🔧 Technical Details

Training

This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.

Training code can be found at https://github.com/llm-jp/llm-jp-modernbert

Property	Stage 1	Stage 2
max_seq_len	1024	8192
max_steps	500,000	200,000
Total batch size	3328	384
Peak LR	5e-4	5e-5
warmup step	24,000	Same as stage 1
LR schedule	Linear decay	Same as stage 1
Adam beta 1	0.9	Same as stage 1
Adam beta 2	0.98	Same as stage 1
Adam eps	1e-6	Same as stage 1
MLM prob	0.30	Same as stage 1
Gradient clipping	1.0	Same as stage 1
weight decay	1e-5	Same as stage 1
line_by_line	True	Same as stage 1

Evaluation

JSTS, JNLI, and JCoLA from JGLUE were used. Evaluation code can be found at https://github.com/llm-jp/llm-jp-modernbert

Model	JSTS (pearson)	JNLI (accuracy)	JCoLA (accuracy)	Avg
tohoku-nlp/bert-base-japanese-v3	0.920	0.912	0.880	0.904
sbintuitions/modernbert-ja-130m	0.916	0.927	0.868	0.904
sbintuitions/modernbert-ja-310m	0.932	0.933	0.883	0.916
llm-jp/llm-jp-modernbert-base	0.918	0.913	0.844	0.892

📄 License

Apache License, Version 2.0

📚 Citation

@misc{sugiura2025llmjpmodernbertmodernbertmodeltrained,
      title={llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length}, 
      author={Issa Sugiura and Kouta Nakayama and Yusuke Oda},
      year={2025},
      eprint={2504.15544},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.15544}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご