Ichigo Llama3.1 S Base V0.3
The Llama3-S series model is a multimodal language model developed by Homebrew Research, natively supporting audio and text input comprehension, extending the speech understanding capability based on the Llama-3 architecture.
Downloads 33
Release Time : 9/9/2024
Model Overview
This model underwent continuous pre-training using a 900 million token speech dataset on an extended vocabulary, aiming to enhance the speech comprehension capabilities of large language models.
Model Features
Multimodal Input Support
Natively supports audio and text input comprehension, expanding the capability boundaries of traditional language models.
Speech Comprehension Optimization
Significantly improves speech comprehension through specialized dataset continuous pre-training.
Efficient Training
Utilizes the torchtune library to implement the latest FSDP2 training code, optimizing training efficiency.
Model Capabilities
Audio Comprehension
Text Generation
Multimodal Input Processing
Use Cases
Speech Research
Speech Command Comprehension
Parses and understands voice input commands
Achieved a 63.79 MMLU score on specific test sets
Educational Research
Language Learning Assistance
Helps learners comprehend English speech input
Featured Recommended AI Models
Š 2025AIbase