🚀 新闻文章师生抽象摘要生成器
本模型基于BART - large进行微调,以StableBeluga - 7B作为教师模型,旨在高效地为新闻文章提供高质量的抽象摘要,在速度和计算资源使用方面表现出色。
🚀 快速开始
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")
article_text = """
Los Angeles Lakers will have more time than anticipated. The four-time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""
inputs = tokenizer(article_text, return_tensors='pt')
with torch.no_grad():
summary_ids = model.generate(
inputs['input_ids'],
num_beams=4,
max_length=250,
early_stopping=True
)
summary = tokenizer.decode(
summary_ids[0],
skip_special_tokens=True
)
✨ 主要特性
- 基于BART - large微调,以StableBeluga - 7B为教师模型,能提供高质量的新闻文章抽象摘要。
- 相比教师模型,推理速度更快,GPU内存使用显著减少。
📦 安装指南
文档未提供安装相关内容,故跳过该章节。
💻 使用示例
基础用法
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")
article_text = """
Los Angeles Lakers will have more time than anticipated. The four-time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""
inputs = tokenizer(article_text, return_tensors='pt')
with torch.no_grad():
summary_ids = model.generate(
inputs['input_ids'],
num_beams=4,
max_length=250,
early_stopping=True
)
summary = tokenizer.decode(
summary_ids[0],
skip_special_tokens=True
)
高级用法
文档未提供高级用法相关代码,故跳过该部分。
📚 详细文档
模型详情
属性 |
详情 |
模型类型 |
抽象摘要生成 |
基础模型 |
BART - large |
教师模型 |
StableBeluga - 7B |
语言 |
英语 |
数据集
- 来源:从一家墨西哥报纸上爬取的295,174篇新闻文章。
- 翻译:使用Helsinki - NLP/opus - mt - es - en NLP模型将西班牙语文章翻译成英语。
- 教师摘要:由StableBeluga - 7B生成。
训练
微调过程使用StableBeluga - 7B生成的教师观测结果(摘要)来训练轻量级的BART模型。这种方法旨在复制教师模型的摘要质量,同时实现更快的推理时间和减少GPU内存使用。
性能
- 评估指标:
- 推理速度:比教师模型(StableBeluga - 7B)快3倍。
- 资源使用:与StableBeluga - 7B相比,GPU内存使用显著减少。
目标
本模型的主要目标是提供一种轻量级的摘要解决方案,在保持与教师模型(StableBeluga - 7B)相似的高质量输出的同时,提高效率,适用于资源受限的环境。
使用场景
本模型非常适合需要快速高效地对大量新闻文章进行摘要的应用,特别是在计算资源有限的场景中。
局限性
- 语言翻译:从西班牙语到英语的初始翻译可能会引入一些小的不准确之处,从而影响摘要质量。
- 领域特异性:该模型专门针对新闻文章进行微调,在处理不同领域的文本时,性能可能会有所不同。
未来工作
未来的改进可能包括:
- 在双语数据上对模型进行微调,以消除翻译步骤。
- 扩展数据集,以包含更多样化的新闻来源和主题。
- 进一步探索优化方法,以减少推理时间和资源使用。
结论
新闻文章师生抽象摘要生成器模型展示了高效提供高质量摘要的潜力,是新闻内容处理和类似应用的有价值工具。