🚀 新聞文章師生抽象摘要生成器
本模型基於BART - large進行微調,以StableBeluga - 7B作為教師模型,旨在高效地為新聞文章提供高質量的抽象摘要,在速度和計算資源使用方面表現出色。
🚀 快速開始
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")
article_text = """
Los Angeles Lakers will have more time than anticipated. The four-time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""
inputs = tokenizer(article_text, return_tensors='pt')
with torch.no_grad():
summary_ids = model.generate(
inputs['input_ids'],
num_beams=4,
max_length=250,
early_stopping=True
)
summary = tokenizer.decode(
summary_ids[0],
skip_special_tokens=True
)
✨ 主要特性
- 基於BART - large微調,以StableBeluga - 7B為教師模型,能提供高質量的新聞文章抽象摘要。
- 相比教師模型,推理速度更快,GPU內存使用顯著減少。
📦 安裝指南
文檔未提供安裝相關內容,故跳過該章節。
💻 使用示例
基礎用法
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")
article_text = """
Los Angeles Lakers will have more time than anticipated. The four-time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""
inputs = tokenizer(article_text, return_tensors='pt')
with torch.no_grad():
summary_ids = model.generate(
inputs['input_ids'],
num_beams=4,
max_length=250,
early_stopping=True
)
summary = tokenizer.decode(
summary_ids[0],
skip_special_tokens=True
)
高級用法
文檔未提供高級用法相關代碼,故跳過該部分。
📚 詳細文檔
模型詳情
屬性 |
詳情 |
模型類型 |
抽象摘要生成 |
基礎模型 |
BART - large |
教師模型 |
StableBeluga - 7B |
語言 |
英語 |
數據集
- 來源:從一家墨西哥報紙上爬取的295,174篇新聞文章。
- 翻譯:使用Helsinki - NLP/opus - mt - es - en NLP模型將西班牙語文章翻譯成英語。
- 教師摘要:由StableBeluga - 7B生成。
訓練
微調過程使用StableBeluga - 7B生成的教師觀測結果(摘要)來訓練輕量級的BART模型。這種方法旨在複製教師模型的摘要質量,同時實現更快的推理時間和減少GPU內存使用。
性能
- 評估指標:
- 推理速度:比教師模型(StableBeluga - 7B)快3倍。
- 資源使用:與StableBeluga - 7B相比,GPU內存使用顯著減少。
目標
本模型的主要目標是提供一種輕量級的摘要解決方案,在保持與教師模型(StableBeluga - 7B)相似的高質量輸出的同時,提高效率,適用於資源受限的環境。
使用場景
本模型非常適合需要快速高效地對大量新聞文章進行摘要的應用,特別是在計算資源有限的場景中。
侷限性
- 語言翻譯:從西班牙語到英語的初始翻譯可能會引入一些小的不準確之處,從而影響摘要質量。
- 領域特異性:該模型專門針對新聞文章進行微調,在處理不同領域的文本時,性能可能會有所不同。
未來工作
未來的改進可能包括:
- 在雙語數據上對模型進行微調,以消除翻譯步驟。
- 擴展數據集,以包含更多樣化的新聞來源和主題。
- 進一步探索優化方法,以減少推理時間和資源使用。
結論
新聞文章師生抽象摘要生成器模型展示了高效提供高質量摘要的潛力,是新聞內容處理和類似應用的有價值工具。