U

Umberto Wikipedia Uncased V1

Developed by Musixmatch
UmBERTo is an Italian language model based on the Roberta architecture, trained using SentencePiece and whole word masking techniques, suitable for various natural language processing tasks.
Downloads 1,079
Release Time : 3/2/2022

Model Overview

This model is an Italian pre-trained language model based on the Roberta architecture, specifically trained on the Italian Wikipedia corpus, suitable for downstream tasks such as named entity recognition and part-of-speech tagging.

Model Features

Whole Word Masking
Utilizes Whole Word Masking (WWM) technique for pre-training, enhancing the model's understanding of complete vocabulary.
SentencePiece Tokenization
Uses SentencePiece as the tokenizer with a 32K vocabulary size, effectively handling Italian text.
Wikipedia Corpus Training
Specifically trained on the Italian Wikipedia corpus, demonstrating strong comprehension of Italian text.

Model Capabilities

Italian text understanding
Named entity recognition
Part-of-speech tagging
Masked word prediction

Use Cases

Natural Language Processing
Named Entity Recognition
Identify entities such as person names and locations in Italian text
Achieved F1 scores of 86.240 on ICAB-EvalITA07 and 90.483 on WikiNER-ITA datasets
Part-of-Speech Tagging
Tag parts of speech for words in Italian text
Achieved 98.717% accuracy on the UD_Italian-ISDT dataset
Text Completion
Predict masked words in sentences
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase