E

Emova Qwen 2 5 3b Hf

Developed by Emova-ollm
EMOVA is an end-to-end omni-modal large language model that supports visual, auditory, and speech capabilities, with emotional speech dialogue abilities.
Downloads 101
Release Time : 3/11/2025

Model Overview

EMOVA is a novel end-to-end omni-modal large language model that achieves visual, auditory, and speech functionalities without relying on external models. By receiving omni-modal inputs (i.e., text, visual, and speech), EMOVA can utilize speech decoders and style encoders to generate text and speech responses with vivid emotional control.

Model Features

Omni-Modal Performance
Achieves top-tier comparable results in both vision-language and speech benchmarks, supporting text, visual, and speech input/output.
Emotional Speech Dialogue
Utilizes a semantic-acoustic decoupled speech tokenizer and lightweight style control module, supporting bilingual (Chinese and English) speech dialogue and 24 speech style controls.
Diverse Configurations
Offers three configurations (3B/7B/72B) to support omni-modal usage under different computational budgets.

Model Capabilities

Vision-Language Understanding
Speech Recognition
Emotional Speech Generation
Multimodal Dialogue
Image Caption Generation
Document Understanding
Chart Understanding
Mathematical Problem Solving

Use Cases

Intelligent Assistant
Emotional Voice Assistant
Build intelligent assistants that understand user emotions and respond with appropriate speech.
Supports 24 speech style controls
Education
Multimodal Learning Aid
Helps students understand charts, mathematical problems, and scientific concepts.
Achieves 92.7% accuracy on ScienceQA-Img
Customer Service
Emotional Customer Service Bot
Provides customer service dialogues with emotional tones.
Supports bilingual service in Chinese and English
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase