ACE-Step-v1 Chinese Rap LoRA Open-Source Model - Improve the Quality of Chinese Rap/Hip-Hop Music Generation

ACE Step V1 Chinese Rap LoRA

Developed by ACE-Step

A hybrid rap vocal model focused on improving the generation quality of Chinese rap/hip-hop music

Audio Generation Supports Multiple LanguagesOpen Source License:Apache-2.0 #Chinese Rap Generation #Multi-Style Fusion #High-Expressiveness Vocals

Downloads 896

Release Time : 5/12/2025

Model Overview

This model is trained on a carefully curated Chinese rap dataset, enhancing Chinese pronunciation accuracy and strengthening the expressiveness of hip-hop and electronic music styles, resulting in more diverse rap vocals

Model Features

Chinese Pronunciation Optimization

Significantly improves Chinese pronunciation accuracy through rigorous data cleaning and relabeling

Style Diversity

Supports multiple rap styles and vocal effects, including mumble rap, fast rap, melodic rap, etc.

Parameter Control

Provides fine-grained vocal timbre and technique control parameters, such as vocal_timbre and techniques

Efficient Generation

Combines diffusion generation, Sana deep compression autoencoder, and lightweight linear transformers for fast generation

Model Capabilities

Text-to-Music Generation

Music Style Transfer

Vocal Timbre Control

Multilingual Music Generation

Use Cases

Music Creation

Chinese Rap Composition

Generate high-quality Chinese rap works

Improves Chinese pronunciation accuracy and stylistic expressiveness

Hip-Hop Music Production

Create premium hip-hop works

Enhances the musical expression of hip-hop styles

Music Experimentation

Style Fusion

Blend rap with other music genres

Creates unique vocal textures and experimental elements

🎵 Chinese Rap LoRA for ACE-Step (Rap Machine)

This is a hybrid rap voice model. We carefully curated Chinese rap/hip-hop datasets for training, with strict data cleaning and recaptioning. The results show:

Improved accuracy of Chinese pronunciation.
Enhanced adherence to hip-hop and electronic styles.
Greater diversity in hip-hop vocal expressions.

Check out audio examples here: https://ace-step.github.io/#RapMachine

🚀 Quick Start

✨ Features

This model can be used to:

Generate higher-quality Chinese songs.
Create superior hip-hop tracks.
Blend with other genres to:
- Produce music with better vocal quality and more details.
- Add experimental flavors (e.g., underground, street culture).
Fine-tune using the following parameters:

Vocal Controls
vocal_timbre

Examples: Bright, dark, warm, cold, breathy, nasal, gritty, smooth, husky, metallic, whispery, resonant, airy, smoky, sultry, light, clear, high-pitched, raspy, powerful, ethereal, flute-like, hollow, velvety, shrill, hoarse, mellow, thin, thick, reedy, silvery, twangy.
Describes inherent vocal qualities.

techniques (List)

Rap styles: mumble rap, chopper rap, melodic rap, lyrical rap, trap flow, double-time rap
Vocal FX: auto-tune, reverb, delay, distortion
Delivery: whispered, shouted, spoken word, narration, singing
Other: ad-libs, call-and-response, harmonized

Community Note

Although a Chinese rap LoRA may seem niche for non-Chinese communities, through such projects, we consistently demonstrate that ACE-step, as a music generation foundation model, has boundless potential. It not only improves pronunciation in one language but also spawns new styles.

The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions drive the evolution of musical history forward.

📚 Documentation

ACE-Step: A Step Towards Music Generation Foundation Model

ACE-Step Framework

Model Description

ACE-Step is a novel open-source foundation model for music generation. It overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.

Key Features:

15× faster than LLM-based baselines (20s for 4-minute music on A100).
Superior musical coherence across melody, harmony, and rhythm.
Full-song generation, duration control, and accepts natural language descriptions.

Uses

Direct Use

ACE-Step can be used for:

Generating original music from text descriptions.
Music remixing and style transfer.
Editing song lyrics.

Downstream Use

The model serves as a foundation for:

Voice cloning applications.
Specialized music generation (rap, jazz, etc.).
Music production tools.
Creative AI assistants.

Out-of-Scope Use

The model should not be used for:

Generating copyrighted content without permission.
Creating harmful or offensive content.
Misrepresenting AI-generated music as human-created.

How to Get Started

See: https://github.com/ace-step/ACE-Step

Hardware Performance

Device	27 Steps	60 Steps
NVIDIA A100	27.27x	12.27x
RTX 4090	34.48x	15.63x
RTX 3090	12.76x	6.48x
M2 Max	2.27x	1.03x

RTF (Real-Time Factor) shown - higher values indicate faster generation

Limitations

Performance varies by language (top 10 languages perform best).
Longer generations (>5 minutes) may lose structural coherence.
Rare instruments may not render perfectly.
Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
Style-specific Weaknesses: Underperforms on certain genres (e.g., Chinese rap/zh_rap), with limited style adherence and a musicality ceiling.
Continuity Artifacts: Unnatural transitions in repainting/extend operations.
Vocal Quality: Coarse vocal synthesis lacking nuance.
Control Granularity: Needs finer-grained musical parameter control.

Ethical Considerations

Users should:

Verify the originality of generated works.
Disclose AI involvement.
Respect cultural elements and copyrights.
Avoid generating harmful content.

Model Details

Property	Details
Developed by	ACE Studio and StepFun
Model Type	Diffusion-based music generation with transformer conditioning
License	Apache 2.0
Resources	Project Page, Demo Space, GitHub Repository

Citation

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご