Open-source French Part-of-Speech Tagging Model - french-camembert-postag-model: Precise Identification of French Word Part-of-Speech

Home

French Camembert Postag Model

Developed by gilf

French POS tagging model based on Camembert-base, trained using the free-french-treebank dataset

Sequence Labeling

Transformers

French#French POS tagging #CamemBERT base #High-precision tagging

Downloads 950.03k

Release Time : 3/2/2022

Model Overview

This model is a French POS tagging model capable of tagging each word in French text with its part of speech, supporting multiple POS tags.

Model Features

Extensive POS tag support

Supports up to 28 different French POS tags, including nouns, verbs, adjectives, adverbs, and more.

Based on high-quality dataset

Trained using the free-french-treebank dataset, a publicly available French treebank resource.

Based on Camembert model

Built upon Camembert-base, inheriting its excellent French language understanding capabilities.

Model Capabilities

French POS tagging

Text analysis

Natural language processing

Use Cases

Natural language processing

French text analysis

Performs POS tagging on French text for subsequent syntactic analysis or semantic understanding.

Can accurately identify the POS categories of words in the text

Linguistic research

Used for French linguistic research to analyze POS distribution patterns in texts.

Educational applications

French learning aid

Helps French learners understand the grammatical functions of words in sentences.

🚀 French Camembert Part - of - Speech Tagging Model

The french - camembert - postag - model is a French part - of - speech tagging model. It addresses the need for accurate grammatical analysis in French text, trained on the free - french - treebank dataset. This model provides reliable tagging results, enabling better natural language processing applications in the French language.

📚 Documentation

About

The french - camembert - postag - model is a part of speech tagging model for French. It was trained on the free - french - treebank dataset available on [github](https://github.com/nicolashernandez/free - french - treebank). The base tokenizer and model used for training is 'camembert - base'.

Supported Tags

It uses the following tags:

Property	Details
ADJ	Adjective
ADJWH	Adjective
ADV	Adverb
ADVWH	Adverb
CC	Coordinating conjunction
CLO	Pronoun (object)
CLR	Pronoun (reflexive)
CLS	Pronoun (subject)
CS	Subordinating conjunction
DET	Determiner
DETWH	Determiner
ET	Foreign word
I	Interjection
NC	Common noun
NPP	Proper noun
P	Preposition
P+D	Preposition + Determiner
PONCT	Punctuation mark
PREF	Prefix
PRO	Other pronouns
PROREL	Other pronouns (relative)
PROWH	Other pronouns (interrogative)
U	Unknown
V	Verb
VIMP	Imperative verb
VINF	Infinitive verb
VPP	Past participle
VPR	Present participle
VS	Subjunctive

More information on the tags can be found here: http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi - taln2008 - final.pdf

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("gilf/french-camembert-postag-model")
model = AutoModelForTokenClassification.from_pretrained("gilf/french-camembert-postag-model")

from transformers import pipeline

nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)

nlp_token_class('Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages')

The lines above would display something like this on a Jupyter notebook:

[{'entity_group': 'NC', 'score': 0.5760144591331482, 'word': '<s>'},
 {'entity_group': 'U', 'score': 0.9946700930595398, 'word': 'Face'},
 {'entity_group': 'P', 'score': 0.999615490436554, 'word': 'à'},
 {'entity_group': 'DET', 'score': 0.9995906352996826, 'word': 'un'},
 {'entity_group': 'NC', 'score': 0.9995531439781189, 'word': 'choc'},
 {'entity_group': 'ADJ', 'score': 0.999183714389801, 'word': 'inédit'},
 {'entity_group': 'P', 'score': 0.3710663616657257, 'word': ','},
 {'entity_group': 'DET', 'score': 0.9995903968811035, 'word': 'les'},
 {'entity_group': 'NC', 'score': 0.9995649456977844, 'word': 'mesures'},
 {'entity_group': 'VPP', 'score': 0.9988670349121094, 'word': 'mises'},
 {'entity_group': 'P', 'score': 0.9996246099472046, 'word': 'en'},
 {'entity_group': 'NC', 'score': 0.9995329976081848, 'word': 'place'},
 {'entity_group': 'P', 'score': 0.9996233582496643, 'word': 'par'},
 {'entity_group': 'DET', 'score': 0.9995935559272766, 'word': 'le'},
 {'entity_group': 'NC', 'score': 0.9995369911193848, 'word': 'gouvernement'},
 {'entity_group': 'V', 'score': 0.9993771314620972, 'word': 'ont'},
 {'entity_group': 'VPP', 'score': 0.9991101026535034, 'word': 'permis'},
 {'entity_group': 'DET', 'score': 0.9995885491371155, 'word': 'une'},
 {'entity_group': 'NC', 'score': 0.9995636343955994, 'word': 'protection'},
 {'entity_group': 'ADJ', 'score': 0.9991781711578369, 'word': 'forte'},
 {'entity_group': 'CC', 'score': 0.9991298317909241, 'word': 'et'},
 {'entity_group': 'ADJ', 'score': 0.9992275238037109, 'word': 'efficace'},
 {'entity_group': 'P+D', 'score': 0.9993300437927246, 'word': 'des'},
 {'entity_group': 'NC', 'score': 0.8353511393070221, 'word': 'ménages</s>'}]

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご