T5 Summary En Ru Zh Base 2048
模型概述
模型特點
模型能力
使用案例
🚀 用於英、俄、中文多語言文本摘要的T5模型
本模型旨在以多任務模式執行受控生成摘要文本內容的任務,並具備內置的英、俄、中語言翻譯功能。
🚀 快速開始
該模型為T5多任務模型,具備有條件地受控生成摘要文本內容並進行翻譯的能力。它總共能理解12種指令,具體依據設定的前綴而定:
- "summary: " - 用於在源語言中生成簡單簡潔的內容
- "summary brief: " - 用於在源語言中生成簡短的摘要內容
- "summary big: " - 用於在源語言中生成詳細的摘要內容
該模型能夠理解俄語、中文或英語列表中的任何語言文本,還能將結果翻譯成俄語、中文或英語列表中的任何語言。
若要翻譯成目標語言,需指定目標語言標識符作為前綴 "... to
任務前綴如下: 4) "summary to en: " - 從多語言文本中生成英文摘要內容 5) "summary brief to en: " - 從多語言文本中生成英文的簡短摘要內容 6) "summary big to en: " - 從多語言文本中生成英文的詳細摘要內容 7) "summary to ru: " - 從多語言文本中生成俄文摘要內容 8) "summary brief to ru: " - 從多語言文本中生成俄文的簡短摘要內容 9) "summary big to ru: " - 從多語言文本中生成俄文的詳細摘要內容 10) "summary to zh: " - 從多語言文本中生成中文摘要內容 11) "summary brief to zh: " - 從多語言文本中生成中文的簡短摘要內容 12) "summary big to zh: " - 從多語言文本中生成中文的詳細摘要內容
該訓練模型可處理2048個標記的上下文,並在詳細任務中輸出最多200個標記的摘要,在普通摘要任務中輸出50個標記,在簡短摘要任務中輸出20個標記。
💻 使用示例
基礎用法
英文文本摘要示例
from transformers import T5ForConditionalGeneration, T5Tokenizer
device = 'cuda' #or 'cpu' for translate on cpu
model_name = 'utrobinmv/t5_summary_en_ru_zh_base_2048'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
text = """Videos that say approved vaccines are dangerous and cause autism, cancer or infertility are among those that will be taken down, the company said. The policy includes the termination of accounts of anti-vaccine influencers. Tech giants have been criticised for not doing more to counter false health information on their sites. In July, US President Joe Biden said social media platforms were largely responsible for people's scepticism in getting vaccinated by spreading misinformation, and appealed for them to address the issue. YouTube, which is owned by Google, said 130,000 videos were removed from its platform since last year, when it implemented a ban on content spreading misinformation about Covid vaccines. In a blog post, the company said it had seen false claims about Covid jabs "spill over into misinformation about vaccines in general". The new policy covers long-approved vaccines, such as those against measles or hepatitis B. "We're expanding our medical misinformation policies on YouTube with new guidelines on currently administered vaccines that are approved and confirmed to be safe and effective by local health authorities and the WHO," the post said, referring to the World Health Organization."""
# text summary generate
prefix = 'summary: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#YouTube is cracking down on videos that suggest Covid-19 vaccines are dangerous and harmful.
# text brief summary generate
prefix = 'summary brief: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#YouTube is cracking down on misleading information about Covid vaccines.
# text big summary generate
prefix = 'summary big: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#YouTube has said it will remove more than 1,500 videos of Covid vaccines from its platform in a bid to tackle the spread of misinformation about the jabs.
中文文本翻譯成英文摘要示例
from transformers import T5ForConditionalGeneration, T5Tokenizer
device = 'cuda' #or 'cpu' for translate on cpu
model_name = 'utrobinmv/t5_summary_en_ru_zh_base_2048'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
text = """在北京冬奧會自由式滑雪女子坡面障礙技巧決賽中,中國選手谷愛凌奪得銀牌。祝賀谷愛凌!今天上午,自由式滑雪女子坡面障礙技巧決賽舉行。決賽分三輪進行,取選手最佳成績排名決出獎牌。第一跳,中國選手谷愛凌獲得69.90分。在12位選手中排名第三。完成動作後,谷愛凌又扮了個鬼臉,甚是可愛。第二輪中,谷愛凌在道具區第三個障礙處失誤,落地時摔倒。獲得16.98分。網友:摔倒了也沒關係,繼續加油!在第二跳失誤摔倒的情況下,谷愛凌頂住壓力,第三跳穩穩發揮,流暢落地!獲得86.23分!此輪比賽,共12位選手參賽,谷愛凌第10位出場。網友:看比賽時我比谷愛凌緊張,加油!"""
# text summary generate
prefix = 'summary to en: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#In Beijing Winter Olympics Games, Chinese contestant Grulove凌 won the silver card. Celebrate.
# text brief summary generate
prefix = 'summary brief to en: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#In Beijing Winter Olympics Games, Chinese contestant Gruelean won the silver card.
# text big summary generate
prefix = 'summary big to en: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#In Beijing's Winter Olympics Games, the 12-year-old has won the silver card in a free-skating lady hillwalking contest. The first jump, Chinese contestant, 69.90.
俄文文本摘要示例
from transformers import T5ForConditionalGeneration, T5Tokenizer
device = 'cuda' #or 'cpu' for translate on cpu
model_name = 'utrobinmv/t5_summary_en_ru_zh_base_2048'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
text = """Высота башни составляет 324 метра (1063 фута), примерно такая же высота, как у 81-этажного здания, и самое высокое сооружение в Париже. Его основание квадратно, размером 125 метров (410 футов) с любой стороны. Во время строительства Эйфелева башня превзошла монумент Вашингтона, став самым высоким искусственным сооружением в мире, и этот титул она удерживала в течение 41 года до завершения строительство здания Крайслер в Нью-Йорке в 1930 году. Это первое сооружение которое достигло высоты 300 метров. Из-за добавления вещательной антенны на вершине башни в 1957 году она сейчас выше здания Крайслер на 5,2 метра (17 футов). За исключением передатчиков, Эйфелева башня является второй самой высокой отдельно стоящей структурой во Франции после виадука Мийо."""
# text summary generate
prefix = 'summary: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Французская Эйфелева башня, ставшая самой высокой в мире, достигла высоты 300 метров (1063 фута).
# text brief summary generate
prefix = 'summary brief: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Французская Эйфелева башня стала самой высокой в мире.
# text big summary generate
prefix = 'summary big: '
src_text = prefix + text
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Французская Эйфелева башня, построенная в 1957 году, достигла высоты 300 метров (1063 фута) с любой стороны. Это самый высокий сооружения в мире после виадука Мийо.
📚 詳細文檔
支持的語言
俄語 (ru_RU)、中文 (zh_CN)、英語 (en_US)
📄 許可證
本模型採用的許可證為 apache - 2.0。



