Camembertav2 Base
CamemBERTav2 is a French language model pretrained on 275 billion French text tokens, utilizing the DebertaV2 architecture, and excels in multiple French NLP tasks.
Downloads 2,972
Release Time : 11/14/2024
Model Overview
The second-generation CamemBERTa model, optimized for French, supports various natural language processing tasks.
Model Features
Large-scale Pretraining
Trained on 275 billion French text tokens, significantly surpassing the original model's 32 billion tokens.
Improved Tokenizer
New WordPiece tokenizer supporting 32,768 tokens, with optimized number processing and special character support.
Extended Context Window
Context window extended to 1,024 tokens, enabling processing of longer texts.
Multi-task Performance Enhancement
Outperforms previous models in tasks like POS tagging, named entity recognition, and question answering.
Model Capabilities
French text understanding
Feature extraction
Masked language modeling
POS tagging
Named entity recognition
Text classification
Question answering system
Use Cases
Natural Language Processing
French Text Analysis
Used for POS tagging and dependency parsing of French texts.
Achieves 97.71% UPOS accuracy on GSD/Rhapsodie/Sequoia/FSMB datasets.
Named Entity Recognition
Identifies named entities in French texts.
Achieves 93.40% F1 score on the FTB-NER dataset.
Question Answering System
Builds French question answering systems.
Achieves 83.04% F1 score and 64.29% EM score on the FQuAD dataset.
Academic Research
Scientific Literature Processing
Processes and analyzes French scientific literature.
Featured Recommended AI Models