H

Hyperclovax SEED Vision Instruct 3B

Developed by naver-hyperclovax
HyperCLOVAX-SEED-Vision-Instruct-3B is a lightweight multimodal model developed by NAVER, featuring image-text understanding and text generation capabilities, with special optimization for Korean language processing.
Downloads 160.75k
Release Time : 4/22/2025

Model Overview

Based on the LLaVA architecture, this model combines visual encoders and language modules to support tasks such as image question answering, chart parsing, and video content understanding. It is Korea's first open-source vision-language model.

Model Features

Lightweight Design
Optimized computational efficiency, achieving competitive performance with fewer visual tokens compared to models of similar scale
Korean Language Optimization
Pareto-optimal model specifically optimized for Korean, outperforming open-source models of similar scale in Korean benchmark tests
Efficient Video Processing
Achieves low token consumption for video understanding through dynamic frame sampling, supporting up to 1856 tokens/108 frames per video
Multimodal Capabilities
Supports text, image, and video inputs simultaneously, with image-text understanding and text generation capabilities

Model Capabilities

Visual question answering
Chart parsing
Video content understanding
Korean text generation
Multimodal reasoning

Use Cases

Content Understanding
Image Question Answering
Answer questions based on input images
Achieved 79.2 points on the TextVQA-Val benchmark
Video Content Analysis
Understand video content and answer related questions
Achieved 48.2 points on the VideoMME benchmark
Commercial Applications
Product Recognition
Identify products in images and provide relevant information
Supports OCR and entity recognition-assisted input
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase