JoyHallo-v1 Open-source Model - Generate Lifelike Facial Animations for Free Based on Mandarin Speech

Joyhallo V1

Developed by jdh-algo

JoyHallo is a Mandarin-focused audio-driven facial animation generation model capable of producing realistic facial animations from Mandarin speech.

Text-to-Video Open Source License:MIT #Mandarin digital human #Audio-driven animation #Cross-language generation

Downloads 26

Release Time : 9/18/2024

Model Overview

Optimized for Mandarin phonetic characteristics, this model employs a semi-decoupled architecture to process lip, expression, and pose features, significantly improving Chinese video generation quality while maintaining English generation capabilities.

Model Features

Mandarin Optimization

Specifically optimized for the complex lip movements of Mandarin, addressing technical challenges in Chinese speech-driven animation.

Semi-Decoupled Architecture

Innovatively uses a semi-decoupled architecture to handle the relationships between lip, expression, and pose features, improving information utilization efficiency.

Cross-Language Capability

While optimized for Mandarin generation, it still maintains excellent English video generation capabilities.

Efficient Inference

Compared to traditional architectures, inference speed is improved by 14.3%.

Model Capabilities

Mandarin speech-driven facial animation generation

English speech-driven facial animation generation

Lip synchronization

Facial expression generation

Head pose simulation

Use Cases

Digital Human Applications

Virtual Anchor

Generates realistic digital human videos for Mandarin news broadcasts or program hosting.

Achieves natural and smooth lip synchronization and expression changes.

Medical Consultation

Generates explanatory videos for professional medical content.

Accurately conveys the pronunciation and lip movements of medical terminology.

Education

Language Teaching

Generates demonstration videos for standard Mandarin pronunciation.

Clearly displays lip movements during pronunciation.

🚀 JoyHallo: Digital human model for Mandarin

JoyHallo is a digital human model designed for generating Mandarin videos. It addresses the challenges in Mandarin audio - driven video generation and also shows excellent cross - language generation capabilities.

🚀 Quick Start

This section is not available in the original README, so it is skipped.

✨ Features

Diverse Mandarin Dataset: Collected 29 hours of Mandarin speech video from JD Health International Inc. employees, forming the jdh - Hallo dataset with a wide range of ages, speaking styles, and covering both conversational and specialized medical topics.
Audio Feature Embedding: Utilized the Chinese wav2vec2 model for audio feature embedding to adapt the model for Mandarin.
Semi - decoupled Structure: Proposed a semi - decoupled structure to capture inter - feature relationships among lip, expression, and pose features, improving information utilization efficiency and accelerating inference speed by 14.3%.
Cross - language Generation: Maintains a strong ability to generate English videos, demonstrating excellent cross - language generation capabilities.

📚 Documentation

📖 Introduction

In audio - driven video generation, creating Mandarin videos presents significant challenges. Collecting comprehensive Mandarin datasets is difficult, and the complex lip movements in Mandarin further complicate model training compared to English.

In this study, we collected 29 hours of Mandarin speech video from JD Health International Inc. employees, resulting in the jdh - Hallo dataset. This dataset includes a diverse range of ages and speaking styles, encompassing both conversational and specialized medical topics.

To adapt the JoyHallo model for Mandarin, we employed the Chinese wav2vec2 model for audio feature embedding. A semi - decoupled structure is proposed to capture inter - feature relationships among lip, expression, and pose features. This integration not only improves information utilization efficiency but also accelerates inference speed by 14.3%.

Notably, JoyHallo maintains its strong ability to generate English videos, demonstrating excellent cross - language generation capabilities.

📄 License

The model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご