Taiyi-Roberta-124M-D-v2 Open-Source Text Encoder - Process English Multimodal Texts, Free to Use!

Taiyi Roberta 124M D V2

Developed by IDEA-CCNL

A specially pretrained English multimodal text encoder based on RoBERTa-base architecture, trained with 1 million image-text pairs

Multimodal Fusion

Transformers

EnglishOpen Source License:Apache-2.0 #Multimodal Pretraining #Image-Text Representation #RoBERTa Improvement

Downloads 18

Release Time : 6/13/2022

Model Overview

This model is a text encoder that incorporates multimodal information through special training tasks on the basis of RoBERTa-base, mainly used for multimodal representation tasks

Model Features

Multimodal Pretraining

Incorporates visual and linguistic information through special training tasks to enhance multimodal representation capabilities

Improved Pretraining Method

The D-v2 version shows performance improvements in multiple NLP tasks compared to the initial version

Image-Text Pair Training

Pretrained using 1 million image-text pairs from MSCOCO, VG, and SBU datasets

Model Capabilities

Text Encoding

Multimodal Representation

Natural Language Understanding

Use Cases

Multimodal Applications

Image-Text Matching

Maps text and images to the same semantic space for matching

Cross-Modal Retrieval

Enables text-to-image or image-to-text retrieval

Natural Language Processing

Text Classification

Used for various text classification tasks

Performs well on the GLUE benchmark

Semantic Similarity Calculation

Calculates semantic similarity between texts

Achieves 91.0 on the STS-B task

🚀 Taiyi-Roberta-124M-D-v2

A RoBERTa-base model for the text encoder of MAP (tentative) in English, with special pre-training on 1M image-text pairs.

Main Page: Fengshenbang
Github: Fengshenbang-LM

🚀 Quick Start

This is a RoBERTa-base model used as the text encoder for the English version of MAP (temporary name), which has undergone special pre-training on 1M image-text pairs.

✨ Features

Based on pre-trained Roberta-base, it applies some multimodal information with special pre-training tasks.
"D" implies a special training method. For special multimodal representations, several special training objectives are designed.
The pre-training datasets are MSCOCO, VG and SBU.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import RobertaTokenizer, RobertaModel

tokenizer = RobertaTokenizer.from_pretrained("IDEA-CCNL/Taiyi-Roberta-124M-D-v2")
model = RobertaModel.from_pretrained("IDEA-CCNL/Taiyi-Roberta-124M-D-v2")

📚 Documentation

Model Taxonomy

Property	Details
Demand	Special
Task	Multimodal
Series	Taiyi
Model	TBD
Parameter	124M
Extra	Special pre - training method - Second version - English (D-v2-English)

Model Information

Based on pre-trained Roberta-base, we apply some multimodal information with special pre-training tasks. "D" implies a special training method. For special multimodal representations, we design several special training objectives in our paper. The pre-training datasets are MSCOCO, VG and SBU. Our code and details of pre-training tasks will be made publicly available upon paper acceptance.

Performance on Downstream Tasks

GLUE

Task	MNLI	QQP	QNLI	SST-2	CoLA	STS-B	MRPC	RTE
Robert-base (official)	87.6	91.9	92.8	94.8	63.6	91.2	90.2	78.7
Roberta-base (local)	87.0	91.3	92.5	94.2	62.8	90.6	92.9	78.0
Taiyi-Roberta-124M-D (local)	87.1	91.8	92.3	94.5	62.6	90.4	92.4	78.7
Taiyi-Roberta-124M-D-v2 (local)	87.1	91.9	92.4	94.5	65.5	91.0	93.0	79.8

The local test settings are: Sequence length: 128, Batch size: 32, Learning rate: 3e-5

🔧 Technical Details

Based on Roberta-base, we use special training tasks to introduce some multimodal information. The "D" in the model name represents a new pre - training method. For special multimodal representations, we design several different training objectives in the paper. The pre - training datasets are MSCOCO, VG and SBU. Our code and details of pre - training tasks will be made publicly available after the paper is accepted.

📄 License

This project is licensed under the Apache-2.0 license.

📖 Citation

If you are using the resource for your work, please cite the our paper:

@article{fengshenbang,
  author    = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
  title     = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
  journal   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}
}

You can also cite our website:

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご