M

Moderncamembert Cv2 Base

Developed by almanach
A French language model pre-trained on 1 trillion high-quality French texts, the French version of ModernBERT
Downloads 232
Release Time : 4/11/2025

Model Overview

ModernCamemBERT is a French Transformer model using Masked Language Modeling (MLM) objective, trained on 48 H100 GPUs, supporting long context processing

Model Features

Large-scale Pre-training
Trained on 1 trillion tokens of high-quality French corpus, including RedPajama-V2, HALvest scientific literature, and French Wikipedia
Efficient Architecture
Faster training and inference speed compared to traditional BERT architecture
Long Context Support
Initial pre-training with 1024 context length, later extended to 8192 tokens
Strict Data Filtering
Semantic filtering via LLama-3 70B-based BERT classifier to ensure data quality

Model Capabilities

French text understanding
Masked language modeling
Long text processing

Use Cases

Natural Language Processing
Named Entity Recognition
Named entity recognition tasks in French text
Achieved 92.17 F1 score on FTB-NER dataset
Text Classification
French text classification tasks
Achieved 94.86 accuracy on CLS dataset
Question Answering
French question answering system development
Achieved 81.68 F1 score on FQuAD dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase