Camembert2Camembert Open-Source French Text Summarization Model - Free Deployment for Precise Summarization of French News

Camembert2camembert Shared Finetuned French Summarization

Developed by mrm8488

This model is a French text summarization model based on the CamemBERT architecture, specifically fine-tuned for French news summarization tasks.

Text Generation

Transformers

French#French News Summarization #Multilingual Summarization #RoBERTa Architecture

Downloads 540

Release Time : 3/2/2022

Model Overview

A RoBERTa model with a shared encoder-decoder structure for automatic summarization of French news texts. Fine-tuned on the French subset of the large-scale multilingual MLSUM summarization dataset.

Model Features

Dedicated French Summarization Model

Optimized specifically for French news summarization tasks, fine-tuned on the MLSUM French dataset

Shared Encoder-Decoder Architecture

Uses a parameter-shared RoBERTa architecture to improve model efficiency

Multilingual Dataset Training

Trained on the MLSUM dataset containing 1.5 million multilingual news articles

Model Capabilities

French Text Comprehension

News Summarization Generation

Long Text Compression

Use Cases

News Media

Automatic News Summarization

Generates concise summaries for French news articles

Produces concise summaries in line with journalistic style

Content Analysis

Key Information Extraction from Long Documents

Extracts key information from long French documents to generate summaries

🚀 French RoBERTa2RoBERTa (shared) fine-tuned on MLSUM FR for summarization

This model is a French RoBERTa2RoBERTa (shared) fine - tuned on the MLSUM FR dataset for text summarization, aiming to provide efficient and accurate summarization services for French news.

✨ Features

Fine - tuned on MLSUM FR: Utilizes the MLSUM French dataset, which contains a large number of French news article - summary pairs, enabling the model to better adapt to French news summarization tasks.
Based on RoBERTa: Built on the [camembert - base](https://huggingface.co/camembert - base) RoBERTa checkpoint, leveraging the powerful language understanding ability of RoBERTa.

📦 Installation

The code example in the README uses Python and the transformers library. You can install the transformers library using the following command:

pip install transformers torch

💻 Usage Examples

Basic Usage

import torch
from transformers import RobertaTokenizerFast, EncoderDecoderModel
device = 'cuda' if torch.cuda.is_available() else 'cpu'
ckpt = 'mrm8488/camembert2camembert_shared-finetuned-french-summarization'
tokenizer = RobertaTokenizerFast.from_pretrained(ckpt)
model = EncoderDecoderModel.from_pretrained(ckpt).to(device)
def generate_summary(text):
    inputs = tokenizer([text], padding="max_length", truncation=True, max_length=512, return_tensors="pt")
    input_ids = inputs.input_ids.to(device)
    attention_mask = inputs.attention_mask.to(device)
    output = model.generate(input_ids, attention_mask=attention_mask)
    return tokenizer.decode(output[0], skip_special_tokens=True)
    
text = "Un nuage de fumée juste après l’explosion, le 1er juin 2019. Une déflagration dans une importante usine d’explosifs du centre de la Russie a fait au moins 79 blessés samedi 1er juin. L’explosion a eu lieu dans l’usine Kristall à Dzerzhinsk, une ville située à environ 400 kilomètres à l’est de Moscou, dans la région de Nijni - Novgorod. « Il y a eu une explosion technique dans l’un des ateliers, suivie d’un incendie qui s’est propagé sur une centaine de mètres carrés », a expliqué un porte - parole des services d’urgence. Des images circulant sur les réseaux sociaux montraient un énorme nuage de fumée après l’explosion. Cinq bâtiments de l’usine et près de 180 bâtiments résidentiels ont été endommagés par l’explosion, selon les autorités municipales. Une enquête pour de potentielles violations des normes de sécurité a été ouverte. Fragments de shrapnel Les blessés ont été soignés après avoir été atteints par des fragments issus de l’explosion, a précisé une porte - parole des autorités sanitaires citée par Interfax. « Nous parlons de blessures par shrapnel d’une gravité moyenne et modérée », a - t - elle précisé. Selon des représentants de Kristall, cinq personnes travaillaient dans la zone où s’est produite l’explosion. Elles ont pu être évacuées en sécurité. Les pompiers locaux ont rapporté n’avoir aucune information sur des personnes qui se trouveraient encore dans l’usine."

generate_summary(text)

# Output: L’explosion a eu lieu dans l’usine Kristall à Dzerzhinsk, une ville située à environ 400 kilomètres à l’est de Moscou.

📚 Documentation

Model

The model is based on the [camembert - base](https://huggingface.co/camembert - base) RoBERTa checkpoint, which is a pre - trained language model with strong language understanding capabilities.

Dataset

MLSUM is the first large - scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large - scale multilingual dataset which can enable new research directions for the text summarization community. MLSUM fr

Results

Property	Details
Test Rouge2 - mid - precision	14.47
Test Rouge2 - mid - recall	12.90
Test Rouge2 - mid - fmeasure	13.30

📄 License

There is no license information provided in the original README.

⚠️ Important Note

The code example requires the transformers and torch libraries to be installed. Please ensure these libraries are installed before running the code.

💡 Usage Tip

You can adjust the max_length parameter in the tokenizer according to your actual needs to control the length of the input text.

Created by Manuel Romero/@mrm8488 with the support of Narrativa Made with ♥ in Spain

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご