I

Ichigo Llama3.1 S Base V0.3

Developed by homebrewltd
The Llama3-S series model is a multimodal language model developed by Homebrew Research, natively supporting audio and text input comprehension, extending the speech understanding capability based on the Llama-3 architecture.
Downloads 33
Release Time : 9/9/2024

Model Overview

This model underwent continuous pre-training using a 900 million token speech dataset on an extended vocabulary, aiming to enhance the speech comprehension capabilities of large language models.

Model Features

Multimodal Input Support
Natively supports audio and text input comprehension, expanding the capability boundaries of traditional language models.
Speech Comprehension Optimization
Significantly improves speech comprehension through specialized dataset continuous pre-training.
Efficient Training
Utilizes the torchtune library to implement the latest FSDP2 training code, optimizing training efficiency.

Model Capabilities

Audio Comprehension
Text Generation
Multimodal Input Processing

Use Cases

Speech Research
Speech Command Comprehension
Parses and understands voice input commands
Achieved a 63.79 MMLU score on specific test sets
Educational Research
Language Learning Assistance
Helps learners comprehend English speech input
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase