Arsh-llm Open-source Language Model - Generate creative stories, coherent texts, and practical code for free.

Arsh Llm

Developed by Arsh-ai

Arsh-llm is a language model with 50 million parameters based on the Llama architecture, excelling at generating creative stories, coherent texts, and practical code.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Lightweight story generation #Dialogue fine-tuning optimization #Code-assisted generation

Downloads 1,481

Release Time : 5/27/2025

Model Overview

Arsh-llm is a compact and powerful language model that has been pre-trained and fine-tuned, suitable for tasks such as creative writing, code generation, and conversational AI.

Model Features

Compact and efficient

With only 50 million parameters, it is trained on a T4 GPU, featuring low resource consumption but excellent performance.

Multifunctional generation

Capable of generating creative stories, coherent texts, and practical code snippets.

Dialogue optimization

Fine-tuned with 20 hours of dialogue data, suitable for applications such as chatbots.

Open-source license

Adopts the MIT license, allowing free use and modification.

Model Capabilities

Creative writing

Text generation

Code generation

Dialogue interaction

Mathematical problem solving

Use Cases

Creative writing

Short story generation

Generate engaging short stories or narrative prompts.

Programming assistance

Code snippet generation

Generate practical code snippets for various programming tasks.

Conversational AI

Chatbot

Provide natural dialogue capabilities for chatbots or assistants.

Educational tool

Mathematical problem solving

Assist in solving mathematical problems or explaining concepts step by step.

🚀 Arsh-llm: A Compact 500M Parameter Powerhouse

Arsh-llm is a 500-million-parameter language model based on the Llama architecture. It excels in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a well - curated set of small but powerful datasets and fine - tuned for 20 hours on conversational data, this model is a highly efficient text - generating machine with great potential. With a training loss between 1.2 - 1.9, it has shown promising results and is ready for further improvement with more training. This is just the beginning!

🚀 Quick Start

To use Arsh-llm, you can load it directly from Hugging Face:

import torch
from transformers import pipeline, set_seed

# Set up the text-generation pipeline
model_name = "arshiaafshani/Arsh-llm"
chatbot = pipeline(
    "text-generation",
    model=model_name,
    device=0 if torch.cuda.is_available() else -1
)

# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"

# Set seed for reproducibility (optional)
set_seed(42)

print("Arsh llm is ready! Type 'exit' to end the conversation.")

# Initialize the conversation history
conversation_history = []

conversation_history.append({"role": "system", "content": "You are a helpful assistant."})

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        print("Exited from the chat. Bye!")
        break

    # Append user message to the conversation history
    conversation_history.append({"role": "user", "content": user_input})

    # Prepare the messages with the conversation history and an empty assistant turn
    messages = conversation_history + [{"role": "assistant", "content": ""}]

    # Use the tokenizer's apply_chat_template() method to format the prompt.
    prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

    # Generate text using the formatted prompt.
    response = chatbot(
        prompt,
        do_sample=True,
        max_new_tokens=512,
        top_k=50,
        temperature=0.6,
        num_return_sequences=1,
        repetition_penalty=1.1,
        pad_token_id=chatbot.tokenizer.eos_token_id,
        min_new_tokens=20
    )

    # The returned 'generated_text' includes the prompt plus the generation.
    full_text = response[0]["generated_text"]
    # Extract the assistant's response by removing the prompt portion.
    bot_response = full_text[len(prompt):].strip()
    print(f"Bot: {bot_response}")

✨ Features

Creative Storytelling: Generate engaging short stories or narrative prompts.
Code Generation: Produce functional code snippets for various programming tasks.
Conversational AI: Power chatbots or assistants with natural dialogue.
Educational Assistance: Assist with math problem - solving or explain concepts step - by - step.

📦 Installation

Since Arsh-llm uses the Hugging Face Transformers library, you can install the necessary dependencies via pip:

pip install transformers torch

📚 Documentation

Model Overview

Property	Details
Architecture	Llama - based causal language model
Parameters	500M
Context Length	128 tokens
Pretraining Duration	~35 hours on NVIDIA T4 GPU
Fine - tuning Duration	~20 hours on conversational datasets
Training Loss	1.2 - 1.9 (with room to improve!)
Library	Transformers (Hugging Face)
License	MIT

Datasets

Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code - related tasks:

roneneldan/TinyStories: Short, creative stories for narrative generation.
Salesforce/wikitext: Wikipedia - based text for general knowledge and coherence.
abhinand/alpaca - gpt4 - sharegpt: Instruction - based conversational data for task - oriented responses.
shibing624/sharegpt_gpt4: High - quality conversational data for chat - like interactions.
ChristophSchuhmann/basic - math - problems - with - step - by - step - solutions: Math problems with solutions to boost logical reasoning.

Fine - tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue - based applications.

Training Details

Pretraining: Conducted on a T4 GPU for ~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation.
Fine - tuning: 20 hours on ShareGPT - based conversational data with a structured chat template to enhance dialogue capabilities.
Hardware: NVIDIA T4 GPU (15GB VRAM).
Training Loss: Achieved 1.2 - 1.9, indicating solid performance with significant potential for improvement through extended training.

Limitations

Current Stage: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models.
Dataset Size: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential.
Context Length: Limited to 128 tokens, which may constrain performance on longer sequences.
Not Production - Ready: This model is best used as a base for further fine - tuning rather than as a standalone solution.

Future Plans

Extended Pretraining: Leveraging larger datasets for broader knowledge and better generalization.
Conversational Fine - tuning: Enhancing dialogue capabilities with advanced post - training techniques.
Benchmarking: Evaluating performance against similar models (e.g., TinyLlama, Phi - 1.5) on tasks like MMLU, HumanEval, and GSM8K.
Community Feedback: Incorporating user insights to refine and improve the model.

🔧 Technical Details

The model is built on the Llama architecture, which is a causal language model. The pretraining process on a T4 GPU for about 35 hours with a combination of various small - scale yet powerful datasets helps the model learn basic text and story - generation patterns. The subsequent 20 - hour fine - tuning on conversational datasets using a structured ShareGPT chat template further enhances its dialogue capabilities. The training loss between 1.2 - 1.9 shows that the model has a good starting point but also has significant room for improvement with more training.

📄 License

This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it!

Acknowledgments

Built with ❤️ by Arshia Afshani.
Powered by the Hugging Face Transformers library.
Thanks to the open - source community for providing the amazing datasets that made this model possible.

⚠️ Important Note

This model is a work in progress. For production - grade performance, further pretraining on larger datasets and post - training on conversational data is recommended.

💡 Usage Tip

If you want to reproduce the results or further train the model, make sure to use the same set of hyperparameters and datasets as described in the training details section.

Ready to take Arsh-llm for a spin? Clone it, train it, and let's make it a superstar together! For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご