S

Speaker Diarization 3.1

Developed by pyannote
An audio processing model for speaker segmentation that can automatically detect and segment different speakers in audio.
Downloads 11.7M
Release Time : 11/16/2023

Model Overview

This model accepts single-channel audio sampled at 16kHz and outputs speaker segmentation results. It supports automatic downmixing and resampling, eliminating the need for manual voice activity detection or specifying the number of speakers.

Model Features

Pure PyTorch implementation
Removes the problematic use of onnxruntime, simplifies deployment, and may accelerate inference.
Automatic processing
Automatically processes stereo/multi-channel audio and different sampling rates without manual preprocessing.
Speaker number control
Allows specifying the number of speakers or providing upper and lower limits to improve segmentation accuracy.
Progress monitoring
Supports monitoring the processing progress through hooks.

Model Capabilities

Speaker segmentation
Speaker change detection
Voice activity detection
Overlapping speech detection
Automatic speech recognition assistance

Use Cases

Meeting minutes
Meeting minutes segmentation
Automatically identify the time periods of different speakers in the meeting recording
Achieved a segmentation error rate of 12.2% on the AISHELL-4 dataset
Media analysis
Radio program analysis
Analyze the speech time distribution of different hosts and guests in the radio program
Achieved a segmentation error rate of 7.8% on the REPERE dataset
Speech transcription
Multi-speaker transcription assistance
Provide speaker segmentation information for the automatic speech recognition system
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase