đ mbart-large-cc25-cnn-dailymail-nl
This is a fine - tuned version of mbart, designed for summarizing Dutch news articles.
đ Quick Start
How to Use
import transformers
undisputed_best_model = transformers.MBartForConditionalGeneration.from_pretrained(
"ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune"
)
tokenizer = transformers.MBartTokenizer.from_pretrained("facebook/mbart-large-cc25")
summarization_pipeline = transformers.pipeline(
task="summarization",
model=undisputed_best_model,
tokenizer=tokenizer,
)
summarization_pipeline.model.config.decoder_start_token_id = tokenizer.lang_code_to_id[
"nl_XX"
]
article = "Kan je dit even samenvatten alsjeblief."
summarization_pipeline(
article,
do_sample=True,
top_p=0.75,
top_k=50,
min_length=50,
early_stopping=True,
truncation=True,
)[0]["summary_text"]
⨠Features
This model is a fine - tuned version of mbart. It is specifically intended for summarizing Dutch news articles. We also wrote a blog post about this model here.
đĻ Installation
The installation process is mainly about using the transformers
library. You can install it via pip install transformers
if not already installed.
đ Documentation
Intended Uses & Limitations
It's meant for summarizing Dutch news articles.
Training Data
Finetuned mbart with this dataset and another smaller dataset that we can't open source because we scraped it from the internet. For more information check out our blog post here.
đ License
No license information is provided in the original document, so this section is skipped.
đ§ Technical Details
No specific technical details (more than 50 - word description) are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
import transformers
undisputed_best_model = transformers.MBartForConditionalGeneration.from_pretrained(
"ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune"
)
tokenizer = transformers.MBartTokenizer.from_pretrained("facebook/mbart-large-cc25")
summarization_pipeline = transformers.pipeline(
task="summarization",
model=undisputed_best_model,
tokenizer=tokenizer,
)
summarization_pipeline.model.config.decoder_start_token_id = tokenizer.lang_code_to_id[
"nl_XX"
]
article = "Kan je dit even samenvatten alsjeblief."
summarization_pipeline(
article,
do_sample=True,
top_p=0.75,
top_k=50,
min_length=50,
early_stopping=True,
truncation=True,
)[0]["summary_text"]
Advanced Usage
There is no advanced usage example in the original document, so this part is not added.
đ Information Table
Property |
Details |
Model Type |
Fine - tuned version of mbart |
Training Data |
this dataset and another smaller dataset scraped from the internet |
Pipeline Tag |
summarization |
Datasets |
ml6team/cnn_dailymail_nl |
Language |
nl |
Tags |
mbart, bart, summarization |