Model Selection

Multi-scene Description

# Multi-scene Description

Vit Gpt2 Image Captioning

This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase