đ Arsh-llm: A Compact 500M Parameter Powerhouse
Arsh-llm is a 500-million-parameter language model based on the Llama architecture. It excels in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a well - curated set of small but powerful datasets and fine - tuned for 20 hours on conversational data, this model is a highly efficient text - generating machine with great potential. With a training loss between 1.2 - 1.9, it has shown promising results and is ready for further improvement with more training. This is just the beginning!
đ Quick Start
To use Arsh-llm, you can load it directly from Hugging Face:
import torch
from transformers import pipeline, set_seed
model_name = "arshiaafshani/Arsh-llm"
chatbot = pipeline(
"text-generation",
model=model_name,
device=0 if torch.cuda.is_available() else -1
)
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"
set_seed(42)
print("Arsh llm is ready! Type 'exit' to end the conversation.")
conversation_history = []
conversation_history.append({"role": "system", "content": "You are a helpful assistant."})
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Exited from the chat. Bye!")
break
conversation_history.append({"role": "user", "content": user_input})
messages = conversation_history + [{"role": "assistant", "content": ""}]
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
response = chatbot(
prompt,
do_sample=True,
max_new_tokens=512,
top_k=50,
temperature=0.6,
num_return_sequences=1,
repetition_penalty=1.1,
pad_token_id=chatbot.tokenizer.eos_token_id,
min_new_tokens=20
)
full_text = response[0]["generated_text"]
bot_response = full_text[len(prompt):].strip()
print(f"Bot: {bot_response}")
⨠Features
- Creative Storytelling: Generate engaging short stories or narrative prompts.
- Code Generation: Produce functional code snippets for various programming tasks.
- Conversational AI: Power chatbots or assistants with natural dialogue.
- Educational Assistance: Assist with math problem - solving or explain concepts step - by - step.
đĻ Installation
Since Arsh-llm uses the Hugging Face Transformers library, you can install the necessary dependencies via pip:
pip install transformers torch
đ Documentation
Model Overview
Property |
Details |
Architecture |
Llama - based causal language model |
Parameters |
500M |
Context Length |
128 tokens |
Pretraining Duration |
~35 hours on NVIDIA T4 GPU |
Fine - tuning Duration |
~20 hours on conversational datasets |
Training Loss |
1.2 - 1.9 (with room to improve!) |
Library |
Transformers (Hugging Face) |
License |
MIT |
Datasets
Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code - related tasks:
- roneneldan/TinyStories: Short, creative stories for narrative generation.
- Salesforce/wikitext: Wikipedia - based text for general knowledge and coherence.
- abhinand/alpaca - gpt4 - sharegpt: Instruction - based conversational data for task - oriented responses.
- shibing624/sharegpt_gpt4: High - quality conversational data for chat - like interactions.
- ChristophSchuhmann/basic - math - problems - with - step - by - step - solutions: Math problems with solutions to boost logical reasoning.
Fine - tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue - based applications.
Training Details
- Pretraining: Conducted on a T4 GPU for ~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation.
- Fine - tuning: 20 hours on ShareGPT - based conversational data with a structured chat template to enhance dialogue capabilities.
- Hardware: NVIDIA T4 GPU (15GB VRAM).
- Training Loss: Achieved 1.2 - 1.9, indicating solid performance with significant potential for improvement through extended training.
Limitations
- Current Stage: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models.
- Dataset Size: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential.
- Context Length: Limited to 128 tokens, which may constrain performance on longer sequences.
- Not Production - Ready: This model is best used as a base for further fine - tuning rather than as a standalone solution.
Future Plans
- Extended Pretraining: Leveraging larger datasets for broader knowledge and better generalization.
- Conversational Fine - tuning: Enhancing dialogue capabilities with advanced post - training techniques.
- Benchmarking: Evaluating performance against similar models (e.g., TinyLlama, Phi - 1.5) on tasks like MMLU, HumanEval, and GSM8K.
- Community Feedback: Incorporating user insights to refine and improve the model.
đ§ Technical Details
The model is built on the Llama architecture, which is a causal language model. The pretraining process on a T4 GPU for about 35 hours with a combination of various small - scale yet powerful datasets helps the model learn basic text and story - generation patterns. The subsequent 20 - hour fine - tuning on conversational datasets using a structured ShareGPT chat template further enhances its dialogue capabilities. The training loss between 1.2 - 1.9 shows that the model has a good starting point but also has significant room for improvement with more training.
đ License
This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it!
Acknowledgments
- Built with â¤ī¸ by Arshia Afshani.
- Powered by the Hugging Face Transformers library.
- Thanks to the open - source community for providing the amazing datasets that made this model possible.
â ī¸ Important Note
This model is a work in progress. For production - grade performance, further pretraining on larger datasets and post - training on conversational data is recommended.
đĄ Usage Tip
If you want to reproduce the results or further train the model, make sure to use the same set of hyperparameters and datasets as described in the training details section.
Ready to take Arsh-llm for a spin? Clone it, train it, and let's make it a superstar together! For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo.