I

Ichigo Llama3.1 S Instruct V0.4

Developed by homebrewltd
A multimodal language model based on Llama-3 architecture, supporting audio and text input understanding with noise robustness and multi-turn dialogue capabilities
Downloads 486
Release Time : 11/8/2024

Model Overview

This model is a speech-text multimodal model developed based on the Llama-3 architecture, enhanced with supervised fine-tuning for speech understanding, specifically optimized for performance in noisy environments and multi-turn dialogue capabilities

Model Features

Multimodal input support
Natively supports audio and text input, capable of understanding speech content and generating text responses
Noise robustness
Incorporated noise suppression capability during training, maintaining good performance even in noisy environments
Multi-turn dialogue optimization
Enhanced dialogue coherence through training with newly added multi-turn speech dialogue data
Efficient training
Utilized torchtune library for FSDP2 training, optimizing training efficiency

Model Capabilities

Speech-to-text
Text generation
Multi-turn dialogue
Noisy environment understanding

Use Cases

Voice assistant
Intelligent voice assistant
Build smart assistants capable of understanding voice commands and responding
Achieved a score of 3.5 (GPT-4-O rating) in AudioBench evaluation
Speech transcription
Meeting transcription
Real-time transcription of meeting speech content into text
Educational applications
Language learning assistant
Helps learners practice English listening and speaking
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase