U

Uform Gen2 Dpo

Developed by unum-cloud
UForm-Gen2-dpo is a small generative vision-language model, aligned for image caption generation and visual question answering tasks through Direct Preference Optimization (DPO) on VLFeedback and LLaVA-Human-Preference-10K preference datasets.
Downloads 3,568
Release Time : 3/27/2024

Model Overview

This model is primarily used for image caption generation, visual question answering, and multimodal dialogue scenarios, incorporating a ViT-H/14 visual encoder with a CLIP-like architecture and the Qwen1.5-0.5B-Chat language model.

Model Features

Direct Preference Optimization Training
Trained with DPO on VLFeedback and LLaVA-Human-Preference-10K preference datasets to enhance output quality.
Efficient Training
Trained in less than a day on an 8x H100 GPU DGX-H100 server.
Multimodal Capabilities
Combines visual encoder and language model for image understanding and text generation.

Model Capabilities

Image caption generation
Visual question answering
Multimodal dialogue
Image understanding
Text generation

Use Cases

Content Generation
Detailed Image Description
Generate detailed descriptions for input images.
Example output: 'The image shows a well-lit, tranquil bedroom...'
Brief Image Description
Generate brief descriptions for input images.
Example output: 'A white and orange cat standing on its hind legs...'
Intelligent Q&A
Visual Question Answering
Answer questions about image content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase