S

Speechllm 2B

Developed by skit-ai
SpeechLLM is a multimodal large language model trained to predict speaker turn metadata in conversations, including speech activity, transcribed text, speaker gender, age, accent, and emotion.
Downloads 237
Release Time : 6/4/2024

Model Overview

A multimodal model based on HubertX audio encoder and TinyLlama LLM, capable of extracting rich metadata information from audio signals.

Model Features

Multimodal processing capability
Processes both audio and text information simultaneously for speech understanding and metadata prediction
Rich metadata prediction
Can predict various information including speech activity, transcribed text, gender, age, accent, and emotion
High-performance ASR
Achieves WER performance of 6.73-9.13 on the LibriSpeech test set

Model Capabilities

Voice activity detection
Automatic speech recognition
Speaker gender classification
Speaker age classification
Speaker accent classification
Speaker emotion recognition

Use Cases

Speech analysis
Customer service dialogue analysis
Analyze speaker characteristics and emotions in customer service conversations
Can identify customer emotional states and demographic information
Enhanced speech transcription
Add rich metadata to speech transcriptions
Provides more comprehensive dialogue analysis dimensions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase