Arsh-llm Open-source Large Language Model - Free to Boost the Rapid Development of Scientific Research!

Arsh Llm

Developed by arshiaafshani

Arsh LLM is an open-source large language model designed for research, pretrained on the olmo mixed dataset using a T4 GPU, with a total training time of approximately 4-5 days.

Large Language Model

PyTorch

Open Source License:MIT #Lightweight Pretraining #Research Assistance Tools #Mixed Dataset Optimization

Downloads 162

Release Time : 4/23/2025

Model Overview

This project aims to demonstrate that large models do not necessarily require top-tier hardware, achieving efficient development through optimized architectural design and phased training. The current version is an initial iteration and requires further training.

Model Features

Hardware-Friendly Training

Training completed on consumer-grade T4 GPUs, reducing hardware barriers through a phased training strategy (8 parts, each taking 1-2 days).

Mixed Dataset Training

Combines PILE dataset pretraining for stable model performance, followed by main training using the olmo-mix-1124 dataset.

Open-Source Architecture Design

References Gpt-neox and Llama technical documentation, incorporating AI-assisted design for optimized architecture (effectiveness pending verification).

Model Capabilities

Text Generation

Research Assistance

Use Cases

Research Field

Literature Assistance Generation

Helps researchers quickly generate draft papers or technical documents

🚀 Arsh LLM

Arsh LLM is a newly developed project designed to assist in research. It aims to demonstrate that large models don't necessarily require high - end hardware.

🚀 Quick Start

There is no specific quick - start content provided in the original document.

✨ Features

Arsh LLM is a research project with the following features:

It is trained on the olmo mix dataset using a T4 GPU.
The architecture is created with reference to Gpt neox and llama documents and AI for optimization.
Initial weights are calculated using phi - 4.
It is first trained on a part of the PILE dataset for stability and then on the olmo - mix - 1124 dataset.
The merged model is fine - tuned using small conversational open - source datasets.

📦 Installation

There is no installation steps provided in the original document.

💻 Usage Examples

There is no code examples provided in the original document.

📚 Documentation

Model Details

Model Description

Arsh LLM is my latest research project. It requires more training, and this is just one of its early versions. My objective is to show that large models don't need powerful hardware, at least with the available tools to expedite the process.

First, I designed an architecture using Gpt neox and llama documents and AI to achieve the best optimization (I'm not certain if it's fully optimized). Then, I created a model and calculated the initial weights using phi - 4. After that, I trained it on a portion of the PILE dataset to stabilize the model for use. Next, I trained a model named arshGpt on the olmo - mix - 1124 dataset. The goal was to more easily convert data into my new large - scale model. This is a merged model fine - tuned with some small conversational open - source datasets. I believe its performance has improved.

📄 License

I've chosen the MIT license for this model, which is the best license to experience the freedom 😅!

Information Table

Property	Details
Library Name	transformers
Pipeline Tag	text - generation
Training Datasets	allenai/olmo - mix - 1124
License	MIT

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご