U

Uform Gen

Developed by unum-cloud
UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.
Downloads 152
Release Time : 12/25/2023

Model Overview

UForm-Gen is a pocket-sized multimodal AI model that combines a visual encoder and a language model for content understanding and generation, excelling particularly in image captioning and visual question answering tasks.

Model Features

Lightweight and Efficient
A compact model with only 1.5B parameters, achieving an inference speed of 140 tokens/sec, which is 3.5 times faster than 7B models
Multimodal Understanding
Combines visual and linguistic capabilities to process both image and text inputs simultaneously
Versatile Generation
Can perform various tasks such as image captioning, content summarization, or visual question answering through prompt control

Model Capabilities

Image caption generation
Visual question answering
Content summarization
Multimodal understanding

Use Cases

Content Understanding
Image Captioning
Generate detailed or concise textual descriptions for images
CLIPScore reaches 0.847 (long text)/0.842 (short text)
Visual Question Answering
Answer natural language questions about image content
66.5 accuracy on VQAv2 dataset
Content Creation
Social Media Content Generation
Automatically generate captions for social media images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase