TinyLlama-1.1B-Chat-v0.6-GGUF Open Source Chat Model - Lightweight Design Optimizes Conversation Experience

Tinyllama 1.1B Chat V0.6 GGUF

Developed by afrideva

TinyLlama-1.1B-Chat-v0.6 is a lightweight chat model based on the 1.1B-parameter Llama architecture, optimized for conversational tasks.

Large Language Model EnglishOpen Source License:Apache-2.0 #Lightweight conversational model #Multi-scenario text generation #Quantized efficient inference

Downloads 128

Release Time : 11/20/2023

Model Overview

This model is a compact chat language model suitable for generating dialogue responses with low hardware requirements.

Model Features

Lightweight design

Only 1.1B parameters, suitable for deployment in resource-constrained environments

Multiple quantization versions

Offers various quantization levels from q2_k to q8_0

Conversation optimization

Trained specifically for chat scenarios, capable of generating coherent dialogue responses

Model Capabilities

Text generation

Dialogue response

English text processing

Use Cases

Chat applications

Smart customer service

Used for handling simple customer inquiry dialogues

Can generate basic customer service responses

Personal assistant

Acts as a lightweight personal conversation assistant

Capable of daily conversation exchanges

Education

Language learning

Assists English learners in practicing conversations

Provides basic English conversation practice

🚀 TinyLlama/TinyLlama-1.1B-Chat-v0.6-GGUF

Quantized GGUF model files for TinyLlama-1.1B-Chat-v0.6 from TinyLlama. This project provides quantized models to optimize resource usage and enhance performance.

🚀 Quick Start

Model Information

Property	Details
Base Model	TinyLlama/TinyLlama-1.1B-Chat-v0.6
Model Creator	TinyLlama
Model Name	TinyLlama-1.1B-Chat-v0.6
Pipeline Tag	text-generation
Quantized By	afrideva
Tags	gguf, ggml, quantized, q2_k, q3_k_m, q4_k_m, q5_k_m, q6_k, q8_0
License	apache-2.0
Datasets	cerebras/SlimPajama-627B, bigcode/starcoderdata, OpenAssistant/oasst_top1_2023-08-25
Inference	false
Language	en

Quantized Model Files

Name	Quant method	Size
tinyllama-1.1b-chat-v0.6.q2_k.gguf	q2_k	482.14 MB
tinyllama-1.1b-chat-v0.6.q3_k_m.gguf	q3_k_m	549.85 MB
tinyllama-1.1b-chat-v0.6.q4_k_m.gguf	q4_k_m	667.81 MB
tinyllama-1.1b-chat-v0.6.q5_k_m.gguf	q5_k_m	782.04 MB
tinyllama-1.1b-chat-v0.6.q6_k.gguf	q6_k	903.41 MB
tinyllama-1.1b-chat-v0.6.q8_0.gguf	q8_0	1.17 GB

✨ Features

Original Model Goals

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, it can achieve this within a span of "just" 90 days using 16 A100 - 40G GPUs 🚀🚀. The training started on 2023 - 09 - 01.

Model Advantages

Compatibility: It adopted exactly the same architecture and tokenizer as Llama 2, which means TinyLlama can be plugged and played in many open - source projects built upon Llama.
Compactness: TinyLlama is compact with only 1.1B parameters, allowing it to cater to a multitude of applications demanding a restricted computation and memory footprint.

This Model's Training

Finetuning Base: This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-955k-2T.
Training Recipe: It follows HF's Zephyr's training recipe. The model was initially fine - tuned on a variant of the UltraChat dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. Then it was further aligned with 🤗 TRL's DPOTrainer on the openbmb/UltraFeedback dataset, which contains 64k prompts and model completions that are ranked by GPT - 4.

💻 Usage Examples

Basic Usage

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v0.6", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...

📚 Documentation

Prerequisites

You will need transformers>=4.34. Do check the TinyLlama github page for more information.

Original Model Card

You can find more details about the original model at TinyLlama-1.1B.

Model Training Details

This model is a chat - finetuned version based on TinyLlama/TinyLlama-1.1B-intermediate-step-955k-2T. The training process involves multiple steps and datasets, as described above.

📄 License

This project is under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご