Smart Turn V2
Smart Turn v2 is an open-source semantic voice activity detection (VAD) model that determines whether the speaker has finished speaking by analyzing the raw waveform.
Downloads 670
Release Time : 7/11/2025
Model Overview
This model supports multiple languages, has a small model size, and is fast. It is suitable for scenarios such as voice assistants and real-time transcription.
Model Features
Multilingual Support
Supports 14 languages, meeting the voice activity detection needs in different language environments.
Small Model Size
Compared with the v1 version, the model size is reduced by 6 times, only about 360 MB, making it easier to deploy and use.
Fast Speed
The speed of analyzing audio is increased by 3 times. It only takes about 12 milliseconds to analyze an 8-second audio on the NVIDIA L40S.
Model Capabilities
Semantic Voice Activity Detection
Multilingual Voice Analysis
Real-time Voice Processing
Use Cases
Voice Assistant/Chatbot
Avoid Interrupting Users
Wait for the user to truly finish speaking before replying to avoid interrupting the user.
Improve the user experience
Real-time Transcription + Text-to-Speech (TTS)
Trigger TTS
Trigger TTS only when the user finishes speaking to avoid 'two-way dialogue'.
Improve transcription accuracy
Call Center Assistance and Analysis
Speaker Separation and Sentiment Analysis
Provide accurate segmentation for the speaker separation and sentiment analysis pipeline.
Improve analysis efficiency
Featured Recommended AI Models
Qwen2.5 VL 7B Abliterated Caption It I1 GGUF
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Abliterated-Caption-it, supporting multilingual image description tasks.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
167
1
Nunchaku Flux.1 Dev Colossus
Other
The Nunchaku quantized version of the Colossus Project Flux, designed to generate high-quality images based on text prompts. This model minimizes performance loss while optimizing inference efficiency.
Image Generation English
N
nunchaku-tech
235
3
Qwen2.5 VL 7B Abliterated Caption It GGUF
Apache-2.0
This is a static quantized version based on the Qwen2.5-VL-7B model, focusing on image captioning generation tasks and supporting multiple languages.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
133
1
Olmocr 7B 0725 FP8
Apache-2.0
olmOCR-7B-0725-FP8 is a document OCR model based on the Qwen2.5-VL-7B-Instruct model. It is fine-tuned using the olmOCR-mix-0225 dataset and then quantized to the FP8 version.
Image-to-Text
Transformers English

O
allenai
881
3
Lucy 128k GGUF
Apache-2.0
Lucy-128k is a model developed based on Qwen3-1.7B, focusing on proxy-based web search and lightweight browsing, and can run efficiently on mobile devices.
Large Language Model
Transformers English

L
Mungert
263
2