đ Llama3-S Sound Instruction Language Model
A family of models natively understanding audio and text input, aiming to improve sound understanding capabilities for research applications.
đ Quick Start
This README provides detailed information about the Llama3-S family of models, including model details, intended use, training process, citation information, and acknowledgements.
⨠Features
- Natively understands audio and text input.
- Continual pretraining on an expanded vocabulary.
- Primarily intended for research applications to improve sound understanding capabilities.
đ Documentation
đ Model Details
We have developed and released the llama3s family. This family can natively understand audio and text input.
We continually pretrain on the expanded vocabulary homebrewltd/llama3.1-s-whispervq-init with 900M tokens from the homebrewltd/raw-speech-whispervq-v1 dataset.
Property |
Details |
Model Developers |
Homebrew Research |
Input |
Text and sound |
Output |
Text |
Model Architecture |
Llama - 3 |
Language(s) |
English |
đ¯ Intended Use
Intended Use Cases: This family is primarily intended for research applications. This version aims to further improve the LLM's sound understanding capabilities.
â ī¸ Important Note
The use of llama3 - s in any manner that violates applicable laws or regulations is strictly prohibited.
âī¸ Training Process
Training Metrics Image: Below is a snapshot of the training loss curve visualized.

MMLU:
Model |
MMLU Score |
llama3.5 - instruct - 8b |
69.40 |
ichigo - llama3.1 - s - v0.3: phase 3 |
63.79 |
ichigo - llama3.1 - s - v0.3: phase 2 |
63.08 |
ichigo - llama3.1 - s - base - v0.3 |
42.11 |
llama3.5 - instruct - v0.2 |
50.27 |
đģ Hardware
- GPU Configuration: Cluster of 10x NVIDIA A6000 - 48GB.
- GPU Usage:
- Continual Training: 30 hours.
âī¸ Training Arguments
We utilize the torchtune library for the latest FSDP2 training code implementation.
Parameter |
Continual Training |
Epoch |
1 |
Global batch size |
480 |
Learning Rate |
2e - 4 |
Learning Scheduler |
Cosine with warmup |
Optimizer |
AdamW fused |
Warmup Steps |
50 |
Weight Decay |
0.01 |
Max Sequence Length |
512 |
đ Citation Information
BibTeX:
@article{Llama3-S: Sound Instruction Language Model 2024,
title={Llama3-S},
author={Homebrew Research},
year=2024,
month=August,
url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}
đ Acknowledgement
đ License
This project is licensed under the Apache - 2.0 license.