C

Camembertv2 Base

Developed by almanach
CamemBERTv2 is a French language model pre-trained on a 275 billion-word French text corpus, serving as the second-generation version of CamemBERT. It adopts the RoBERTa architecture with optimized tokenizer and training data.
Downloads 1,512
Release Time : 11/14/2024

Model Overview

CamemBERTv2 is a more intelligent French language model suitable for various natural language processing tasks, such as text infilling, part-of-speech tagging, named entity recognition, etc.

Model Features

Large-scale Pre-training Data
Pre-trained on 275 billion unique tokens, significantly surpassing the original version's 32 billion.
New Tokenizer
Utilizes WordPiece tokenizer with support for emojis and optimized number handling (splitting into two-digit tokens).
Extended Context Window
Context window extended to 1024 tokens, enhancing long-text processing capabilities.
High-performance Fine-tuning
Excels in multiple French NLP tasks, such as part-of-speech tagging and named entity recognition.

Model Capabilities

Text Infilling
Part-of-speech Tagging
Dependency Parsing
Named Entity Recognition
Question Answering
Text Classification

Use Cases

Natural Language Processing
French Text Infilling
Used to fill in missing parts of French texts.
Part-of-speech Tagging
Performs part-of-speech tagging on French texts.
UPOS accuracy 97.66
Named Entity Recognition
Identifies named entities in French texts.
FTB-NER F1 score 91.99
Question Answering
French Question Answering
Used to build French question-answering systems.
FQuAD F1 score 80.98
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase