TinyMistral-248M Open-Source Language Model - Free Deployment Empowers Text Generation and Fine-Tuning for Downstream Tasks

Tinymistral 248M

Developed by Locutusque

A language model scaled down from Mistral 7B to 248 million parameters, designed for text generation tasks and suitable for downstream task fine-tuning.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Lightweight Pretraining #English Text Generation #Low-resource Training

Downloads 1,127

Release Time : 11/14/2023

Model Overview

This model demonstrates that pretrained language models can be trained without large-scale datasets, using a single Titan V GPU, and supports long-context processing.

Model Features

Efficient Pretraining

Pretrained using only a single Titan V GPU, proving the feasibility of small-scale datasets.

Long Context Support

Supports context lengths of approximately 32,768 tokens, suitable for processing long text sequences.

Lightweight Design

Parameters scaled down to 248 million, balancing performance and resource consumption.

Model Capabilities

English Text Generation

Downstream Task Fine-tuning

Use Cases

Text Generation

Content Creation Assistance

Generate article drafts or creative text snippets.

Education & Research

Small-scale Language Model Research

Used to validate the performance of lightweight models on specific tasks.

🚀 TinyMistral-248m: A Compact Language Model

This is a pre - trained language model derived from the Mistral 7B model, scaled down to approximately 248 million parameters. It has been trained on 7,488,000 examples. The model is designed not for direct use but for fine - tuning on downstream tasks. It is expected to have a context length of around 32,768 tokens. Due to issues with saving model weights, safe serialization has been removed.

🚀 Quick Start

This model is mainly for fine - tuning. You can refer to the following inference parameters:

{
    "parameters": {
        "do_sample": true,
        "temperature": 0.5,
        "top_p": 0.5,
        "top_k": 50,
        "max_new_tokens": 250,
        "repetition_penalty": 1.176
    }
}

✨ Features

Compact Size: With approximately 248 million parameters, it is a scaled - down version of the Mistral 7B model.
Small - scale Training: Trained on a relatively small number of examples (7,488,000).
Fine - tuning Oriented: Intended for fine - tuning on downstream tasks rather than direct use.
Long Context Length: Expected to have a context length of around 32,768 tokens.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Evaluation Results

During evaluation on InstructMix, this model achieved an average perplexity score of 6.3. More epochs are planned for this model on different datasets.

You can find the Open LLM Leaderboard Evaluation Results (outdated). Detailed results are available here.

Metric	Value
Avg.	24.18
ARC (25 - shot)	20.82
HellaSwag (10 - shot)	26.98
MMLU (5 - shot)	23.11
TruthfulQA (0 - shot)	46.89
Winogrande (5 - shot)	50.75
GSM8K (5 - shot)	0.0
DROP (3 - shot)	0.74

Purpose

The purpose of this model is to prove that trillion - scale datasets are not needed to pretrain a language model. As a result of needing small datasets, this model was pretrained on a single GPU (Titan V).

🔧 Technical Details

Training Datasets:
- Skylion007/openwebtext
- JeanKaddour/minipile
Model Parameters: Scaled down to approximately 248 million parameters from the Mistral 7B model.
Training Examples: 7,488,000 examples.
Context Length: Around 32,768 tokens.
Serialization: Safe serialization has been removed due to issues saving model weights.

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご