đ Taiyi-Roberta-124M-D-v2
A RoBERTa-base model for the text encoder of MAP (tentative) in English, with special pre-training on 1M image-text pairs.
đ Quick Start
This is a RoBERTa-base model used as the text encoder for the English version of MAP (temporary name), which has undergone special pre-training on 1M image-text pairs.
⨠Features
- Based on pre-trained Roberta-base, it applies some multimodal information with special pre-training tasks.
- "D" implies a special training method. For special multimodal representations, several special training objectives are designed.
- The pre-training datasets are MSCOCO, VG and SBU.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import RobertaTokenizer, RobertaModel
tokenizer = RobertaTokenizer.from_pretrained("IDEA-CCNL/Taiyi-Roberta-124M-D-v2")
model = RobertaModel.from_pretrained("IDEA-CCNL/Taiyi-Roberta-124M-D-v2")
đ Documentation
Model Taxonomy
Property |
Details |
Demand |
Special |
Task |
Multimodal |
Series |
Taiyi |
Model |
TBD |
Parameter |
124M |
Extra |
Special pre - training method - Second version - English (D-v2-English) |
Model Information
Based on pre-trained Roberta-base, we apply some multimodal information with special pre-training tasks. "D" implies a special training method. For special multimodal representations, we design several special training objectives in our paper. The pre-training datasets are MSCOCO, VG and SBU. Our code and details of pre-training tasks will be made publicly available upon paper acceptance.
Performance on Downstream Tasks
GLUE
Task |
MNLI |
QQP |
QNLI |
SST-2 |
CoLA |
STS-B |
MRPC |
RTE |
Robert-base (official) |
87.6 |
91.9 |
92.8 |
94.8 |
63.6 |
91.2 |
90.2 |
78.7 |
Roberta-base (local) |
87.0 |
91.3 |
92.5 |
94.2 |
62.8 |
90.6 |
92.9 |
78.0 |
Taiyi-Roberta-124M-D (local) |
87.1 |
91.8 |
92.3 |
94.5 |
62.6 |
90.4 |
92.4 |
78.7 |
Taiyi-Roberta-124M-D-v2 (local) |
87.1 |
91.9 |
92.4 |
94.5 |
65.5 |
91.0 |
93.0 |
79.8 |
The local test settings are:
Sequence length: 128, Batch size: 32, Learning rate: 3e-5
đ§ Technical Details
Based on Roberta-base, we use special training tasks to introduce some multimodal information. The "D" in the model name represents a new pre - training method. For special multimodal representations, we design several different training objectives in the paper. The pre - training datasets are MSCOCO, VG and SBU. Our code and details of pre - training tasks will be made publicly available after the paper is accepted.
đ License
This project is licensed under the Apache-2.0 license.
đ Citation
If you are using the resource for your work, please cite the our paper:
@article{fengshenbang,
author = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
journal = {CoRR},
volume = {abs/2209.02970},
year = {2022}
}
You can also cite our website:
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2021},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}