TinyMistral-248M-GGUF Open-Source Small Language Model - Free for Fine-Tuning on Downstream Tasks!

Tinymistral 248M GGUF

Developed by afrideva

TinyMistral-248M is a small language model based on pre-training from the Mistral 7B model, with parameters scaled down to approximately 248 million, primarily used for fine-tuning downstream tasks.

Large Language Model EnglishOpen Source License:Apache-2.0 #Lightweight pre-training #English text generation #Trainable on a single GPU

Downloads 211

Release Time : 11/14/2023

Model Overview

This is a language model based on pre-training from the Mistral 7B model, with parameters scaled down to approximately 248 million. The model has been trained on 7,488,000 samples and is mainly used for fine-tuning downstream tasks.

Model Features

Compact design

Parameters scaled down to approximately 248 million, suitable for running on resource-limited devices.

Efficient pre-training

Pre-training completed on a single GPU (Titan V), requiring a smaller dataset.

Long context support

Context length of approximately 32,768 tokens, suitable for long-text tasks.

Model Capabilities

Text generation

Downstream task fine-tuning

Use Cases

Natural Language Processing

Text generation

Used to generate coherent English text.

In the InstructMix evaluation, the average perplexity score was 6.3.

Downstream task fine-tuning

Suitable for various NLP tasks requiring small language models.

🚀 Locutusque/TinyMistral-248M-GGUF

Quantized GGUF model files for TinyMistral-248M from Locutusque, aiming to provide efficient and accessible language model solutions.

🚀 Quick Start

This README provides information about the quantized GGUF model files of TinyMistral - 248M. You can directly use the links below to access different quantized versions of the model.

✨ Features

Quantized Variants: Offers multiple quantization methods such as fp16, q2_k, q3_k_m, etc., to meet different resource requirements.
Small Dataset Pretraining: Demonstrates that language models can be pretrained without trillion - scale datasets, and can be pretrained on a single GPU (Titan V).

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Model Information

Property	Details
Base Model	Locutusque/TinyMistral - 248M
Datasets	Skylion007/openwebtext, JeanKaddour/minipile
Inference	false
Language	en
License	apache - 2.0
Model Creator	Locutusque
Model Name	TinyMistral - 248M
Pipeline Tag	text - generation
Quantized By	afrideva
Tags	gguf, ggml, quantized, q2_k, q3_k_m, q4_k_m, q5_k_m, q6_k, q8_0

Quantized Model Files

Name	Quant method	Size
tinymistral-248m.fp16.gguf	fp16	497.76 MB
tinymistral-248m.q2_k.gguf	q2_k	116.20 MB
tinymistral-248m.q3_k_m.gguf	q3_k_m	131.01 MB
tinymistral-248m.q4_k_m.gguf	q4_k_m	156.61 MB
tinymistral-248m.q5_k_m.gguf	q5_k_m	180.17 MB
tinymistral-248m.q6_k.gguf	q6_k	205.20 MB
tinymistral-248m.q8_0.gguf	q8_0	265.26 MB

Original Model Card

A pre - trained language model, based on the Mistral 7B model, has been scaled down to approximately 248 million parameters. This model has been trained on 7,488,000 examples. This model isn't intended for direct use but for fine - tuning on a downstream task. This model should have a context length of around 32,768 tokens. Safe serialization has been removed due to issues saving model weights.

During evaluation on InstructMix, this model achieved an average perplexity score of 6.3. More epochs are planned for this model on different datasets.

Open LLM Leaderboard Evaluation Results (outdated)

Open LLM Leaderboard Evaluation Results (outdated) Detailed results can be found here

Metric	Value
Avg.	24.18
ARC (25 - shot)	20.82
HellaSwag (10 - shot)	26.98
MMLU (5 - shot)	23.11
TruthfulQA (0 - shot)	46.89
Winogrande (5 - shot)	50.75
GSM8K (5 - shot)	0.0
DROP (3 - shot)	0.74

Model Purpose

The purpose of this model is to prove that trillion - scale datasets are not needed to pretrain a language model. As a result of needing small datasets, this model was pretrained on a single GPU (Titan V).

🔧 Technical Details

No specific technical implementation details are provided in the original document.

📄 License

This model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご