Multilingual ModernBert Base Preview
A multilingual BERT model developed by the Algomatic team, supporting mask-filling tasks with an 8192 context length and a vocabulary of 151,680.
Downloads 60
Release Time : 2/10/2025
Model Overview
This model is a multilingual BERT model primarily designed for mask-filling tasks. It supports multiple languages and features extended context processing capabilities, making it suitable for text understanding and generation tasks.
Model Features
Long Context Support
Supports an 8192 context length, ideal for long-text processing tasks.
Multilingual Capability
Supports multiple languages including Korean, English, Chinese, and Japanese.
Efficient Inference
Supports FlashAttention for more efficient inference on compatible GPUs.
Custom Tokenizer
Based on Qwen2.5 tokenizer with a 151,680 vocabulary size, optimized for code indentation recognition.
Model Capabilities
Mask Filling
Multilingual Text Understanding
Long Text Processing
Use Cases
Text Understanding and Generation
Korean Text Filling
Fills missing parts in Korean sentences.
Example result: {'score': 0.248046875, 'token': 128956, 'token_str': ' 하는', 'sequence': '우리의 대부분의 고뇌는 가능했을 또 다른 인생을 하는 데서 시작된다.'}
English Text Filling
Fills missing parts in English sentences.
Example result: {'score': 0.20703125, 'token': 5322, 'token_str': ' problems', 'sequence': 'Pinning our hopes on the unreliable notion of our potential is the root of all our problems.'}
Chinese Text Filling
Fills missing parts in Chinese sentences.
Example result: {'score': 0.177734375, 'token': 99392, 'token_str': '知道', 'sequence': '我们必须知道,我们只能成为此时此地的那个自己,而无法成为其他任何人。'}
Japanese Text Filling
Fills missing parts in Japanese sentences.
Example result: {'score': 0.11865234375, 'token': 142732, 'token_str': 'ケーキ', 'sequence': '大きなケーキを一人で切り分けて食べるというのは孤独の極地ですからね'}
Featured Recommended AI Models