Arsh Llm
A
Arsh Llm
Developed by arshiaafshani
Arsh LLM is an open-source large language model designed for research, pretrained on the olmo mixed dataset using a T4 GPU, with a total training time of approximately 4-5 days.
Downloads 162
Release Time : 4/23/2025
Model Overview
This project aims to demonstrate that large models do not necessarily require top-tier hardware, achieving efficient development through optimized architectural design and phased training. The current version is an initial iteration and requires further training.
Model Features
Hardware-Friendly Training
Training completed on consumer-grade T4 GPUs, reducing hardware barriers through a phased training strategy (8 parts, each taking 1-2 days).
Mixed Dataset Training
Combines PILE dataset pretraining for stable model performance, followed by main training using the olmo-mix-1124 dataset.
Open-Source Architecture Design
References Gpt-neox and Llama technical documentation, incorporating AI-assisted design for optimized architecture (effectiveness pending verification).
Model Capabilities
Text Generation
Research Assistance
Use Cases
Research Field
Literature Assistance Generation
Helps researchers quickly generate draft papers or technical documents
Featured Recommended AI Models
Š 2025AIbase