S

Speechllm 1.5B

Developed by skit-ai
SpeechLLM is a multimodal large language model designed to predict speaker turn metadata in conversations, including speech activity, transcribed text, gender, age, accent, and emotion.
Downloads 40
Release Time : 6/20/2024

Model Overview

SpeechLLM is based on the HubertX audio encoder and TinyLlama LLM, capable of processing speech signals and generating rich metadata information.

Model Features

Multimodal Processing Capability
Combines audio signal processing with language model capabilities to understand speech content and generate metadata.
Rich Metadata Prediction
Can predict various information such as speech activity, transcribed text, speaker gender, age, accent, and emotion.
Diverse Dataset Training
Trained on multiple speech datasets including Common Voice and LibriSpeech, enhancing the model's generalization ability.

Model Capabilities

Speech Activity Detection
Automatic Speech Recognition
Speaker Gender Classification
Speaker Age Classification
Speaker Accent Classification
Emotion Recognition

Use Cases

Speech Analysis
Customer Service Dialogue Analysis
Analyze speaker characteristics and emotional states in customer service conversations.
Identifies customer emotions and demographic information to help improve service quality.
Enhanced Speech Transcription
Add speaker metadata to speech transcriptions.
Provides richer transcription text information, including speaker characteristics.
Conversational Systems
Intelligent Voice Assistant
Build conversational agents capable of understanding speaker characteristics.
Delivers personalized responses based on speaker features.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase