G

Gottbert Base Last

Developed by TUM
GottBERT is the first RoBERTa model specifically designed for German, pre-trained on the German portion of the OSCAR dataset, available in both base and large versions.
Downloads 6,842
Release Time : 3/2/2022

Model Overview

GottBERT is a pure German language model aimed at enhancing performance for German natural language processing tasks such as named entity recognition, text classification, and natural language inference.

Model Features

Pure German Optimization
Designed specifically for German, pre-trained on the German OSCAR dataset for more accurate German language understanding.
Dual Version Options
Offers a base version (125M parameters) and a large version (355M parameters) to meet different computational needs.
Efficient Filtering
Improves model quality by filtering noisy data using metrics like stopword ratio, punctuation ratio, and uppercase word ratio.
High-Performance Tokenizer
Uses a GPT-2 Byte Pair Encoding (BPE) tokenizer with a vocabulary size of 52k subword units.

Model Capabilities

German text understanding
Named entity recognition
Text classification
Natural language inference

Use Cases

Natural Language Processing
Named Entity Recognition
Identify named entities (e.g., person names, locations, organizations) in German text.
Achieves F1 scores of 86.14 (base) and 86.78 (large) on the CoNLL 2003 dataset.
Text Classification
Classify German texts (e.g., news classification, sentiment analysis).
Achieves F1 scores of 78.65 (base) and 79.40 (large) on GermEval 2018 (coarse-grained).
Natural Language Inference
Determine the logical relationship between German text pairs (e.g., entailment, contradiction, neutral).
Achieves accuracy of 80.82 (base) and 82.46 (large) on the XNLI German subset.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase