🚀 ptt5-v2-small
ptt5-v2模型是专门为葡萄牙语量身定制的预训练T5模型,它基于谷歌原始的检查点继续训练,模型大小从t5-small到t5-3B不等。这些检查点用于训练葡萄牙语的MonoT5重排器,你可以在其HuggingFace集合中找到这些重排器。如需了解更多关于预训练过程的信息,请参考我们的论文ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language。
🚀 快速开始
数据集
- allenai/c4
- legacy-datasets/mc4
语言
任务类型
text2text-generation
基础模型
google-t5/t5-small
许可证
apache-2.0
✨ 主要特性
- 专为葡萄牙语设计的预训练T5模型。
- 基于谷歌原始检查点继续训练。
- 可用于训练葡萄牙语的MonoT5重排器。
💻 使用示例
基础用法
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("unicamp-dl/ptt5-v2-small")
model = T5ForConditionalGeneration.from_pretrained("unicamp-dl/ptt5-v2-small")
📄 许可证
本项目采用apache-2.0许可证。
📚 详细文档
如需了解更多关于预训练过程的信息,请参考我们的论文ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language。
📚 引用
如果您使用了我们的模型,请引用以下文献:
@misc{piau2024ptt5v2,
title={ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language},
author={Marcos Piau and Roberto Lotufo and Rodrigo Nogueira},
year={2024},
eprint={2406.10806},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}