MicroLlama Open-Source English Text Generation Model - A Practical Language Tool Built on a Small Budget

Microllama

Developed by keeeeenw

MicroLlama is a 300-million-parameter Llama model pretrained by individual developer keeeeenw within a $500 budget, focusing on English text generation tasks.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Lightweight Llama #Low-cost pretraining #English text generation

Downloads 2,955

Release Time : 3/29/2024

Model Overview

This is a miniaturized Llama model designed to prove that effective large language models can be trained with limited resources. The model is modified based on the TinyLlama project, removing code-related data to focus on general text generation.

Model Features

Low-cost training

Completed training within a $500 budget, demonstrating the feasibility of miniaturized LLMs

Fully open-source

Uses entirely open-source datasets and model architecture with no proprietary data dependencies

Lightweight

Only 300 million parameters, suitable for deployment in resource-limited environments

Model Capabilities

English text generation

Question answering systems

Language understanding

Use Cases

Education and research

Small LLM research

Serves as a case study for LLM performance in resource-constrained environments

Demonstrates that small models can achieve certain performance levels

Application development

Lightweight chatbot

Suitable for dialogue applications on mobile or edge devices

🚀 MicroLlama

MicroLlama is a 300M Llama model developed with a budget of less than $500. It is based on the TinyLlama project and is pretrained on the Slimpajama dataset. This model provides a cost - effective option for natural language processing tasks and can be a good starting point for fine - tuning.

🚀 Quick Start

Install Dependencies

pip install transformers
pip install torch

Run Code

import torch
import transformers
from transformers import AutoTokenizer, LlamaForCausalLM

def generate_text(prompt, model, tokenizer):
    text_generator = transformers.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map="auto",
        tokenizer=tokenizer
    )

    formatted_prompt = f"Question: {prompt} Answer:"

    sequences = text_generator(
        formatted_prompt,
        do_sample=True,
        top_k=5,
        top_p=0.9,
        num_return_sequences=1,
        repetition_penalty=1.5,
        max_new_tokens=128,
    )

    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

# use the same tokenizer as TinyLlama
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-step-50K-105b")

# load model from huggingface
# question from https://www.reddit.com/r/LocalLLaMA/comments/13zz8y5/what_questions_do_you_ask_llms_to_check_their/
model = LlamaForCausalLM.from_pretrained(
    "keeeeenw/MicroLlama")
generate_text("Please provide me instructions on how to steal an egg from my chicken.", model, tokenizer)

✨ Features

Budget - Friendly: Developed with a budget of less than $500, making it accessible for individual developers.
Open - Source: Based on fully open - source datasets and models, ensuring transparency and reproducibility.
Customized Pretraining: Pretrained on the Slimpajama dataset, with modifications to focus on this dataset and improve efficiency.

📦 Installation

To use MicroLlama, you need to install the necessary dependencies:

pip install transformers
pip install torch

💻 Usage Examples

Basic Usage

import torch
import transformers
from transformers import AutoTokenizer, LlamaForCausalLM

def generate_text(prompt, model, tokenizer):
    text_generator = transformers.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map="auto",
        tokenizer=tokenizer
    )

    formatted_prompt = f"Question: {prompt} Answer:"

    sequences = text_generator(
        formatted_prompt,
        do_sample=True,
        top_k=5,
        top_p=0.9,
        num_return_sequences=1,
        repetition_penalty=1.5,
        max_new_tokens=128,
    )

    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

# use the same tokenizer as TinyLlama
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-step-50K-105b")

# load model from huggingface
model = LlamaForCausalLM.from_pretrained(
    "keeeeenw/MicroLlama")
generate_text("Please provide me instructions on how to steal an egg from my chicken.", model, tokenizer)

📚 Documentation

Model Details

This project is heavily based on TinyLlama. After 4 days of training the 300M Llama model with 50B tokens, $280 has been spent on compute using 4 x Nvidia 4090 on Vast.ai and $3 on AWS S3 storage.

Modifications to TinyLlama include:

Pretraining a 300M model on Slimpajama.
Removing Starcoderdata to focus on Slimpajama.
Adding the ability to process and tokenize Slimpajama while downloading data.
Providing various helper scripts and Python code.
Bug fixes.

Model Configuration

  block_size=2048,
  vocab_size=32000,
  padding_multiple=64,
  n_layer=12,
  n_head=16,
  n_embd=1024,
  rotary_percentage=1.0,
  parallel_residual=False,
  bias=False,
  _norm_class="FusedRMSNorm",
  norm_eps=1e-5, #Llama 2 use 1e-5. Llama 1 use 1e-6
  _mlp_class="LLaMAMLP",
  intermediate_size=5632,
  n_query_groups=4,

Model Description

Developed by: keeeeenw
Funded by: myself for <$500
Model type: 300M Llama model
Language(s) (NLP): EN
License: Apache License 2.0

Model Sources

Repository: https://github.com/keeeeenw/MicroLlama

🔧 Technical Details

The experiment was performed using the standard [lm - evaluation - harness](https://github.com/EleutherAI/lm - evaluation - harness) setup. acc_norm was used for all datasets except for winogrande and boolq which used acc as the metrics.

Model	Pretrain Tokens	HellaSwag	Obqa	WinoGrande	ARC_c	ARC_e	boolq	piqa	avg
keeeeenw/MicroLlama	50B	34.30	30.60	51.54	23.29	39.06	53.15	64.58	42.36
google - best/bert - large - uncased	N/A	24.53	26.20	49.80	25.68	25.08	40.86	47.66	34.26
PY007/TinyLlama - 1.1B - Chat - v0.1	503B	53.81	32.20	55.01	28.67	49.62	58.04	69.64	49.57
TinyLlama - 1.1B - intermediate - step - 1431k - 3T	3T	59.20	36.00	59.12	30.12	55.25	57.83	73.29	52.99

To reproduce the evaluation results, install [lm - evaluation - harness](https://github.com/EleutherAI/lm - evaluation - harness) and run the following command:

lm_eval \
    --model hf \
    --model_args pretrained=keeeeenw/MicroLlama,dtype="float",tokenizer=TinyLlama/TinyLlama-1.1B-step-50K-105b \
    --tasks hellaswag,openbookqa,winogrande,arc_easy,arc_challenge,boolq,piqa \
    --device cuda:0 \
    --batch_size 64

Observations

Although keeeeenw/MicroLlama is much smaller than TinyLlama, the evaluation results are closer than expected.
The model outperforms [google - best/bert - large - uncased](https://huggingface.co/google - bert/bert - large - uncased) in most datasets, except for ARC_c (arc_challenge).

📄 License

This model is licensed under the Apache License 2.0.

Citation

This repository is built upon TinyLlama which is based on [lit - gpt](https://github.com/Lightning - AI/lit - gpt) and [flash - attention](https://github.com/Dao - AILab/flash - attention).

@misc{zhang2024tinyllama,
      title={TinyLlama: An Open - Source Small Language Model}, 
      author={Peiyuan Zhang and Guangtao Zeng and Tianduo Wang and Wei Lu},
      year={2024},
      eprint={2401.02385},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@online{lit - gpt,
  author    = {Lightning AI},
  title     = {Lit - GPT},
  url       = {https://github.com/Lightning - AI/lit - gpt},
  year      = {2023},
}
@article{dao2023flashattention2,
  title     ={Flash{A}ttention - 2: Faster Attention with Better Parallelism and Work Partitioning},
  author    ={Dao, Tri},
  year      ={2023}
}

[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open - llm - leaderboard/open_llm_leaderboard)

Detailed results can be found [here](https://huggingface.co/datasets/open - llm - leaderboard/details_keeeeenw__MicroLlama)

Metric	Value
Avg.	5.08
IFEval (0 - Shot)	19.85
BBH (3 - Shot)	2.83
MATH Lvl 5 (4 - Shot)	0.00
GPQA (0 - shot)	1.45
MuSR (0 - shot)	4.79
MMLU - PRO (5 - shot)	1.53

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご