# WordPiece Tokenization
Camembertv2 Base
MIT
CamemBERTv2 is a French language model pre-trained on a 275 billion-word French text corpus, serving as the second-generation version of CamemBERT. It adopts the RoBERTa architecture with optimized tokenizer and training data.
Large Language Model
Transformers French

C
almanach
1,512
11
Luke Japanese Wordpiece Base
Apache-2.0
A LUKE model improved from Japanese BERT, specifically optimized for Japanese named entity recognition tasks
Sequence Labeling
Transformers Japanese

L
uzabase
16
4
Bert Base Indonesian 522M
MIT
A BERT base model pretrained on Indonesian Wikipedia using Masked Language Modeling (MLM) objective, case insensitive.
Large Language Model Other
B
cahya
2,799
25
Bert Base Indonesian 1.5G
MIT
This is a BERT-based Indonesian pretrained model trained on Wikipedia and newspaper data, suitable for various natural language processing tasks.
Large Language Model Other
B
cahya
40.08k
5
Featured Recommended AI Models