T

Taiyi Roberta 124M D V2

Developed by IDEA-CCNL
A specially pretrained English multimodal text encoder based on RoBERTa-base architecture, trained with 1 million image-text pairs
Downloads 18
Release Time : 6/13/2022

Model Overview

This model is a text encoder that incorporates multimodal information through special training tasks on the basis of RoBERTa-base, mainly used for multimodal representation tasks

Model Features

Multimodal Pretraining
Incorporates visual and linguistic information through special training tasks to enhance multimodal representation capabilities
Improved Pretraining Method
The D-v2 version shows performance improvements in multiple NLP tasks compared to the initial version
Image-Text Pair Training
Pretrained using 1 million image-text pairs from MSCOCO, VG, and SBU datasets

Model Capabilities

Text Encoding
Multimodal Representation
Natural Language Understanding

Use Cases

Multimodal Applications
Image-Text Matching
Maps text and images to the same semantic space for matching
Cross-Modal Retrieval
Enables text-to-image or image-to-text retrieval
Natural Language Processing
Text Classification
Used for various text classification tasks
Performs well on the GLUE benchmark
Semantic Similarity Calculation
Calculates semantic similarity between texts
Achieves 91.0 on the STS-B task
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase