Bert Base Finnish Uncased V1
FinBERT is a Finnish pre-trained language model based on Google's BERT architecture, trained on over 3 billion Finnish word tokens and suitable for various Finnish NLP tasks.
Downloads 1,964
Release Time : 3/2/2022
Model Overview
FinBERT is a BERT model specifically optimized for Finnish, achieving state-of-the-art performance in tasks such as document classification, named entity recognition, and part-of-speech tagging through fine-tuning.
Model Features
Specialized Finnish Vocabulary
Custom 50,000-word piece vocabulary with far superior Finnish coverage compared to multilingual BERT
Large-Scale Finnish Training
Trained on 3 billion word tokens (24 billion characters) of Finnish text, far exceeding Wikipedia data volume
Multi-Domain Applicability
Training data includes news, online discussions, and web-crawled content, adaptable to various text types
Model Capabilities
Finnish Text Understanding
Document Classification
Named Entity Recognition
Part-of-Speech Tagging
Transfer Learning
Use Cases
News Classification
Yle News Classification
Classifying news articles from Finnish Broadcasting Company
Outperforms multilingual BERT across different training set sizes
Social Media Analysis
Ylilauta Forum Classification
Classifying content from Finnish online forums
Significantly outperforms baseline models
Information Extraction
Named Entity Recognition
Identifying entities such as person names and locations in Finnish text
Achieves 92.40% accuracy on the FiNER corpus
Featured Recommended AI Models