envit5-base開源模型 - 免費支持越南語與英語摘要、翻譯及問答任務

首頁

Envit5 Base

由VietAI開發

基於Transformer的預訓練編碼器-解碼器模型，專為越南語和英語設計，支持摘要生成、翻譯和問答任務。

大型語言模型其他開源協議:MIT #越南語-英語翻譯 #多領域翻譯 #預訓練編碼器-解碼器

下載量 47

發布時間 : 6/20/2022

模型概述

EnViT5-base是一個先進的預訓練模型，專為越南語和英語設計，適用於多語言文本生成和理解任務，如摘要生成、翻譯和問答。

模型特點

多語言支持

支持越南語和英語的雙向翻譯和文本生成任務。

預訓練模型

基於大規模數據集預訓練，具備強大的語言理解和生成能力。

開源許可

採用MIT許可證，允許自由使用和修改。

模型能力

文本生成

文本摘要

機器翻譯

問答系統

使用案例

自然語言處理

越南語-英語翻譯

將越南語文本翻譯為英語，或反之。

文本摘要

生成越南語或英語文本的簡潔摘要。

教育

語言學習輔助

幫助學習者理解越南語和英語的文本內容。

🚀 EnViT5-base

EnViT5-base 是一個基於 Transformer 的預訓練編碼器 - 解碼器模型，處於當前先進水平，可用於越南語和英語處理，該模型在 MTet 論文中被使用。它能解決越南語和英語之間的多種自然語言處理任務，如摘要生成、翻譯和問答等，為相關領域的研究和應用提供了強大的支持。

🚀 快速開始

如需更多詳細信息，請查看我們的 Github 倉庫。

微調示例可在此處找到。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("VietAI/envit5-base")  
model = AutoModelForSeq2SeqLM.from_pretrained("VietAI/envit5-base")
model.cuda()

# need prefix for en: and vi: sentences
inputs = [
    "vi: VietAI là tổ chức phi lợi nhuận với sứ mệnh ươm mầm tài năng về trí tuệ nhân tạo và xây dựng một cộng đồng các chuyên gia trong lĩnh vực trí tuệ nhân tạo đẳng cấp quốc tế tại Việt Nam.",
    "vi: Theo báo cáo mới nhất của Linkedin về danh sách việc làm triển vọng với mức lương hấp dẫn năm 2020, các chức danh công việc liên quan đến AI như Chuyên gia AI (Artificial Intelligence Specialist), Kỹ sư ML (Machine Learning Engineer) đều xếp thứ hạng cao.",
    "en: Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.",
    "en: We're on a journey to advance and democratize artificial intelligence through open source and open science."
    ]

outputs = model.generate(tokenizer(inputs, return_tensors="pt", padding=True).input_ids.to('cuda'), max_length=512)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

📚 詳細文檔

數據集

屬性	詳情
數據集	cc100

📄 許可證

本項目採用 MIT 許可證。

📚 引用

@misc{mtet,
  doi = {10.48550/ARXIV.2210.05610},
  url = {https://arxiv.org/abs/2210.05610},
  author = {Ngo, Chinh and Trinh, Trieu H. and Phan, Long and Tran, Hieu and Dang, Tai and Nguyen, Hieu and Nguyen, Minh and Luong, Minh-Thang},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {MTet: Multi-domain Translation for English and Vietnamese},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}