Open-source model bert-italian-cased-finetuned-pos - Accurate part-of-speech tagging for Italian

Bert Italian Cased Finetuned Pos

Developed by sachaarbonel

This model is fine-tuned based on Bert Base Italian on the XTREME UD POS Italian dataset for the POS tagging downstream task.

Sequence Labeling Other#Italian POS tagging #High-accuracy POS #Multi-label classification

Downloads 88

Release Time : 3/2/2022

Model Overview

A POS tagging model fine-tuned on the Italian Bert model, capable of accurately identifying the parts of speech of words in Italian text.

Model Features

High-accuracy POS Tagging

Achieves an F1 score of 97.25 on Italian POS tagging tasks.

Comprehensive POS Tag Coverage

Supports 16 common POS tags, including nouns, verbs, adjectives, etc.

Bert-based Fine-tuned Model

Optimized for POS tagging tasks based on Bert Base Italian.

Model Capabilities

Italian text analysis

POS tagging

Natural Language Processing

Use Cases

Text Analysis

Italian Text Processing

Performs POS tagging on Italian text for subsequent NLP tasks.

Accurately identifies the parts of speech of words.

Education

Language Learning Assistance

Helps Italian learners understand sentence structure.

🚀 Italian-Bert (Italian Bert) + POS 🎃🏷

This model is a fine - tuned version of Bert Base Italian on xtreme udpos Italian for the POS downstream task. It offers high - quality performance for part - of - speech tagging in Italian.

🚀 Quick Start

The model is ready to use for POS tagging in Italian. You can follow the usage examples below to start.

✨ Features

Fine - tuned on the xtreme udpos Italian dataset for the POS downstream task.
Covers a wide range of POS labels, including ADJ, ADP, ADV, etc.
Achieves high scores on evaluation metrics such as F1, Precision, and Recall.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import pipeline

nlp_pos = pipeline(
    "ner",
    model="sachaarbonel/bert-italian-cased-finetuned-pos",
    tokenizer=(
        'sachaarbonel/bert-spanish-cased-finetuned-pos',  
        {"use_fast": False}
))


text = 'Roma è la Capitale d\'Italia.'

nlp_pos(text)
      
'''
Output:
--------
[{'entity': 'PROPN', 'index': 1, 'score': 0.9995346665382385, 'word': 'roma'},
 {'entity': 'AUX', 'index': 2, 'score': 0.9966597557067871, 'word': 'e'},
 {'entity': 'DET', 'index': 3, 'score': 0.9994786977767944, 'word': 'la'},
 {'entity': 'NOUN',
  'index': 4,
  'score': 0.9995198249816895,
  'word': 'capitale'},
 {'entity': 'ADP', 'index': 5, 'score': 0.9990894198417664, 'word': 'd'},
 {'entity': 'PART', 'index': 6, 'score': 0.57159024477005, 'word': "'"},
 {'entity': 'PROPN',
  'index': 7,
  'score': 0.9994804263114929,
  'word': 'italia'},
 {'entity': 'PUNCT', 'index': 8, 'score': 0.9772886633872986, 'word': '.'}]
'''

📚 Documentation

Details of the downstream task (POS) - Dataset

Dataset: xtreme udpos Italian 📚

Dataset	# Examples
Train	716 K
Dev	85 K

[Fine - tune on NER script provided by @stefan - it](https://raw.githubusercontent.com/stefan-it/fine - tuned - berts - seq/master/scripts/preprocess.py)
Labels covered:

ADJ
ADP
ADV
AUX
CCONJ
DET
INTJ
NOUN
NUM
PART
PRON
PROPN
PUNCT
SCONJ
SYM
VERB
X

Metrics on evaluation set 🧾

Metric	# score
F1	97.25
Precision	97.15
Recall	97.36

🔧 Technical Details

The model is a fine - tuned version of [Bert Base Italian](https://huggingface.co/dbmdz/bert - base - italian - cased) on the xtreme udpos Italian dataset for the POS downstream task. It uses the ner pipeline from the transformers library for inference.

Created by Sacha Arbonel/@sachaarbonel | [LinkedIn](https://www.linkedin.com/in/sacha - arbonel)

Made with ♥ in Paris

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご