M

Moderncamembert Base

Developed by almanach
ModernCamemBERT is a French language model pre-trained on a 1T high-quality French text corpus. It is the French version of ModernBERT, focusing on long contexts and efficient inference speed.
Downloads 213
Release Time : 4/11/2025

Model Overview

ModernCamemBERT is a French language model trained with the masked language modeling (MLM) objective, suitable for tasks that require long contexts or efficient inference speed.

Model Features

High-quality pre-training data
Trained on a 1T high-quality French text corpus with tokens, including RedPajama-V2, French scientific literature, and French Wikipedia.
Long context support
Initially trained with a context length of 1024, and then extended to 8192 tokens during the pre-training phase.
Efficient inference
It has faster training and inference speeds compared to traditional architectures.
Semantic filtering
Semantic filtering is performed through a BERT classifier trained on a document quality dataset automatically annotated based on LLama-3 70B.

Model Capabilities

French text understanding
Masked language modeling
Long context processing

Use Cases

Natural language processing
Named entity recognition
Named entity recognition tasks in French text
Achieved an F1 score of 91.33 on the FTB-NER dataset
Text classification
French text classification tasks
Achieved an accuracy of 94.92 on the CLS dataset
Semantic similarity
Semantic similarity judgment of French text
Achieved an accuracy of 92.52 on the PAWS-X dataset
Question answering system
French question answering
French reading comprehension question answering tasks
Achieved an F1 score of 82.19 and an EM score of 62.66 on the FQuAD dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase