jp-ModernBERT-large-preview Open-source Japanese Model - Supports Masked Fill and Ultra-long Context Handling

Home

Jp ModernBERT Large Preview

Developed by makiart

Japanese BERT model trained by Algomatic team, supporting fill-mask tasks with a context length of up to 8192.

Large Language Model

Safetensors

JapaneseOpen Source License:MIT #Japanese fill-mask #Long context support #Efficient inference

Downloads 20

Release Time : 2/11/2025

Model Overview

This is a Japanese language model based on the BERT architecture, specifically optimized for fill-mask tasks. The model was trained on the fineweb2 Japanese dataset and has strong capabilities for processing long contexts.

Model Features

Long context support

Supports context length of 8192 tokens, suitable for long-text tasks.

Efficient inference

Supports FlashAttention acceleration, improving inference efficiency on compatible GPUs.

Specialized Japanese tokenization

Uses BertJapaneseTokenizer, optimized for Japanese text processing.

Model Capabilities

Japanese text understanding

Fill-mask prediction

Long text processing

Use Cases

Text processing

Sentence completion

Predicts masked words in sentences

Example shows possible words for the [MASK] position in 'I believe our greatest suffering comes from dreaming of possible alternative [MASK].'

🚀 makiart/jp-modernbert-large-preview

This model was created by the Algomatic team using the computational resources provided in the ABCI Generative AI Hackathon.

Context length: 8192
Vocabulary size: 50,368
Total learned tokens: Approximately 100B Tokens (after inheriting weights from the base model)
Number of parameters: 396M
Number of parameters excluding embeddings: 343M
Utilizes fineweb2 Japanese data

🚀 Quick Start

📦 Installation

Since the model uses BertJapaneseTokenizer as the tokenizer, you need to install the following dependencies:

pip install -U transformers>=4.48.0

pip install fugashi unidic_lite

If your GPU supports FlashAttention, you can install the following for more efficient inference:

pip install flash-attn --no-build-isolation

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline

model = AutoModelForMaskedLM.from_pretrained("makiart/jp-ModernBERT-large-preview", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("makiart/jp-ModernBERT-large-preview")
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)

results = fill_mask("我々の大方の苦悩は、あり得べき別の[MASK]を夢想することから始まる。")

for result in results:
    print(result)

# {'score': 0.16015625, 'token': 12489, 'token_str': 'こと', 'sequence': '我々 の 大方 の 苦悩 は 、 あり 得 べき 別 の こと を 夢想 する こと から 始まる 。'}
# {'score': 0.09716796875, 'token': 12518, 'token_str': 'もの', 'sequence': '我々 の 大方 の 苦悩 は 、 あり 得 べき 別 の もの を 夢想 する こと から 始まる 。'}
# {'score': 0.043212890625, 'token': 12575, 'token_str': '世界', 'sequence': '我々 の 大方 の 苦悩 は 、 あり 得 べき 別 の 世界 を 夢想 する こと から 始まる 。'}
# {'score': 0.03369140625, 'token': 29991, 'token_str': '事柄', 'sequence': '我々 の 大方 の 苦悩 は 、 あり 得 べき 別 の 事柄 を 夢想 する こと から 始まる 。'}
# {'score': 0.0296630859375, 'token': 655, 'token_str': '事', 'sequence': '我々 の 大方 の 苦悩 は 、 あり 得 べき 別 の 事 を 夢想 する こと から 始まる 。'}

📚 Documentation

Model Details

Inherits weights from the base model by tiling weights from the middle.
Trained with a context length of 8192 and approximately 100B Tokens.
The tokenizer is based on tohoku-nlp/bert-base-japanese-v3, with a vocabulary size of 50,368.
Dataset: Only uses Japanese data from fineweb2.
Computational Resources: Trained using 1 node (H200 x 8) of the computational resources provided by ABCI for approximately 2 days.

Evaluation

A proper evaluation has not been conducted yet 😭. It is expected to be inferior to existing models in terms of the total learned tokens.

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご