E

Emova Qwen 2 5 3b

Developed by Emova-ollm
EMOVA is an end-to-end omni-modal large language model that supports visual, auditory, and speech functions, capable of generating text and speech responses with emotional control.
Downloads 25
Release Time : 4/25/2025

Model Overview

EMOVA is a novel end-to-end omni-modal large language model that achieves visual, auditory, and speech functions without relying on external models. It supports bilingual (Chinese and English) speech dialogue and provides 24 voice style controls.

Model Features

Omni-Modal Performance
Achieves leading comparable results in both visual language and speech benchmarks simultaneously.
Emotional Speech Dialogue
Utilizes a semantic-acoustic decoupled speech tokenizer and lightweight style control module to achieve seamless omni-modal alignment and diversified voice style controllability.
Diversified Configurations
Offers three configurations (3B/7B/72B) to support omni-modal usage under different computational budgets.

Model Capabilities

Visual Language Understanding
Speech Recognition
Emotional Speech Generation
Multimodal Dialogue
Structured Data Understanding

Use Cases

Intelligent Assistant
Emotional Voice Assistant
Generates speech responses with emotional tones to enhance user experience.
Supports 24 voice style controls.
Education
Multimodal Learning Assistant
Helps students understand complex visual and textual content.
Achieves 92.7% accuracy on the ScienceQA-image benchmark.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase