Speechllm 1.5B
SpeechLLM is a multimodal large language model designed to predict speaker turn metadata in conversations, including speech activity, transcribed text, gender, age, accent, and emotion.
Downloads 40
Release Time : 6/20/2024
Model Overview
SpeechLLM is based on the HubertX audio encoder and TinyLlama LLM, capable of processing speech signals and generating rich metadata information.
Model Features
Multimodal Processing Capability
Combines audio signal processing with language model capabilities to understand speech content and generate metadata.
Rich Metadata Prediction
Can predict various information such as speech activity, transcribed text, speaker gender, age, accent, and emotion.
Diverse Dataset Training
Trained on multiple speech datasets including Common Voice and LibriSpeech, enhancing the model's generalization ability.
Model Capabilities
Speech Activity Detection
Automatic Speech Recognition
Speaker Gender Classification
Speaker Age Classification
Speaker Accent Classification
Emotion Recognition
Use Cases
Speech Analysis
Customer Service Dialogue Analysis
Analyze speaker characteristics and emotional states in customer service conversations.
Identifies customer emotions and demographic information to help improve service quality.
Enhanced Speech Transcription
Add speaker metadata to speech transcriptions.
Provides richer transcription text information, including speaker characteristics.
Conversational Systems
Intelligent Voice Assistant
Build conversational agents capable of understanding speaker characteristics.
Delivers personalized responses based on speaker features.
Featured Recommended AI Models
Š 2025AIbase