ModernBERT-base-ita Open-source Model - Pre-trained on Massive Data for More Powerful Long Text Processing

Modernbert Base Ita

Developed by DeepMount00

ModernBERT is a modern bidirectional encoder-only Transformer model (BERT-style), pre-trained on 2 trillion tokens of English and code data, with a native context length of up to 8,192 tokens.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Long text processing #Code semantic search #Rotary position embedding

Downloads 81

Release Time : 12/19/2024

Model Overview

ModernBERT is a modern bidirectional encoder-only Transformer model, suitable for tasks involving long documents, such as retrieval, classification, and semantic search in large-scale corpora.

Model Features

Rotary Position Embedding (RoPE)

Supports long-context processing.

Local-global alternating attention

Improves efficiency for long inputs.

Depadding and Flash Attention

Enables efficient inference.

Native support for long context

Native context length of up to 8,192 tokens.

Model Capabilities

Masked language modeling

Long-context processing

Semantic search

Code retrieval

Text classification

Use Cases

Natural Language Processing

Text classification

Classification tasks for long documents.

Semantic search

Semantic search in large-scale corpora.

Code processing

Code retrieval

Retrieval tasks in codebases.

Achieved state-of-the-art results for code retrieval on CodeSearchNet and StackQA.

🚀 ModernBERT

ModernBERT is a modernized bidirectional encoder - only Transformer model pre - trained on a large amount of English and code data, suitable for long - context tasks and a wide range of downstream applications.

🚀 Quick Start

You can use these models directly with the transformers library. Until the next transformers release, doing so requires installing transformers from main:

pip install git+https://github.com/huggingface/transformers.git

Since ModernBERT is a Masked Language Model (MLM), you can use the fill - mask pipeline or load it via AutoModelForMaskedLM. To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine - tune it following standard BERT fine - tuning recipes.

✨ Features

ModernBERT is a modernized bidirectional encoder - only Transformer model (BERT - style) pre - trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens.
It leverages recent architectural improvements such as Rotary Positional Embeddings (RoPE) for long - context support, Local - Global Alternating Attention for efficiency on long inputs, and Unpadding and Flash Attention for efficient inference.
Available in two sizes: [ModernBERT - base](https://huggingface.co/answerdotai/ModernBERT - base) with 22 layers and 149 million parameters, and [ModernBERT - large](https://huggingface.co/answerdotai/ModernBERT - large) with 28 layers and 395 million parameters.

📦 Installation

To use ModernBERT, you need to install the transformers library from the main branch:

pip install git+https://github.com/huggingface/transformers.git

If your GPU supports it, to use ModernBERT with Flash Attention 2 for the highest efficiency, install Flash Attention as follows:

pip install flash - attn

💻 Usage Examples

Basic Usage

Using AutoModelForMaskedLM:

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "DeepMount00/ModernBERT - base - ita"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "La capitale dell'Italia è [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# To get predictions for the mask:
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
# Predicted token:  Roma

Advanced Usage

Using a pipeline:

import torch
from transformers import pipeline
from pprint import pprint

pipe = pipeline(
    "fill - mask",
    model="answerdotai/ModernBERT - base",
    torch_dtype=torch.bfloat16,
)

input_text = "He walked to the [MASK]."
results = pipe(input_text)
pprint(results)

📚 Documentation

For more information about ModernBERT, we recommend our release blog post for a high - level overview, and our arXiv pre - print for in - depth information.

🔧 Technical Details

We evaluate ModernBERT across a range of tasks, including natural language understanding (GLUE), general retrieval (BEIR), long - context retrieval (MLDR), and code retrieval (CodeSearchNet and StackQA).

On GLUE, ModernBERT - base surpasses other similarly - sized encoder models, and ModernBERT - large is second only to Deberta - v3 - large.
For general retrieval tasks, ModernBERT performs well on BEIR in both single - vector (DPR - style) and multi - vector (ColBERT - style) settings.
Thanks to the inclusion of code data in its training mixture, ModernBERT as a backbone also achieves new state - of - the - art code retrieval results on CodeSearchNet and StackQA.

📄 License

We release the ModernBERT model architectures, model weights, training codebase under the Apache 2.0 license.

📚 Citation

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

⚠️ Important Note

ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the token_type_ids parameter.

⚠️ Important Note

ModernBERT’s training data is primarily English and code, so performance may be lower for other languages. While it can handle long sequences efficiently, using the full 8,192 tokens window may be slower than short - context inference. Like any large language model, ModernBERT may produce representations that reflect biases present in its training data. Verify critical or sensitive outputs before relying on them.

Property	Details
Library Name	transformers
Model Type	ModernBERT (a modernized bidirectional encoder - only Transformer model)
Training Data	2 trillion tokens of English and code data
License	Apache 2.0
Tags	fill - mask, masked - lm, long - context, modernbert
Pipeline Tag	fill - mask
Languages Supported	English, Italian

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご