AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
WordPiece Tokenization

# WordPiece Tokenization

Camembertv2 Base
MIT
CamemBERTv2 is a French language model pre-trained on a 275 billion-word French text corpus, serving as the second-generation version of CamemBERT. It adopts the RoBERTa architecture with optimized tokenizer and training data.
Large Language Model Transformers French
C
almanach
1,512
11
Luke Japanese Wordpiece Base
Apache-2.0
A LUKE model improved from Japanese BERT, specifically optimized for Japanese named entity recognition tasks
Sequence Labeling Transformers Japanese
L
uzabase
16
4
Bert Base Indonesian 522M
MIT
A BERT base model pretrained on Indonesian Wikipedia using Masked Language Modeling (MLM) objective, case insensitive.
Large Language Model Other
B
cahya
2,799
25
Bert Base Indonesian 1.5G
MIT
This is a BERT-based Indonesian pretrained model trained on Wikipedia and newspaper data, suitable for various natural language processing tasks.
Large Language Model Other
B
cahya
40.08k
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase