đ Locutusque/TinyMistral-248M-GGUF
Quantized GGUF model files for TinyMistral-248M from Locutusque, aiming to provide efficient and accessible language model solutions.
đ Quick Start
This README provides information about the quantized GGUF model files of TinyMistral - 248M. You can directly use the links below to access different quantized versions of the model.
⨠Features
- Quantized Variants: Offers multiple quantization methods such as fp16, q2_k, q3_k_m, etc., to meet different resource requirements.
- Small Dataset Pretraining: Demonstrates that language models can be pretrained without trillion - scale datasets, and can be pretrained on a single GPU (Titan V).
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Model Information
Property |
Details |
Base Model |
Locutusque/TinyMistral - 248M |
Datasets |
Skylion007/openwebtext, JeanKaddour/minipile |
Inference |
false |
Language |
en |
License |
apache - 2.0 |
Model Creator |
Locutusque |
Model Name |
TinyMistral - 248M |
Pipeline Tag |
text - generation |
Quantized By |
afrideva |
Tags |
gguf, ggml, quantized, q2_k, q3_k_m, q4_k_m, q5_k_m, q6_k, q8_0 |
Quantized Model Files
Original Model Card
A pre - trained language model, based on the Mistral 7B model, has been scaled down to approximately 248 million parameters. This model has been trained on 7,488,000 examples. This model isn't intended for direct use but for fine - tuning on a downstream task.
This model should have a context length of around 32,768 tokens. Safe serialization has been removed due to issues saving model weights.
During evaluation on InstructMix, this model achieved an average perplexity score of 6.3. More epochs are planned for this model on different datasets.
Open LLM Leaderboard Evaluation Results (outdated)
Open LLM Leaderboard Evaluation Results (outdated)
Detailed results can be found here
Metric |
Value |
Avg. |
24.18 |
ARC (25 - shot) |
20.82 |
HellaSwag (10 - shot) |
26.98 |
MMLU (5 - shot) |
23.11 |
TruthfulQA (0 - shot) |
46.89 |
Winogrande (5 - shot) |
50.75 |
GSM8K (5 - shot) |
0.0 |
DROP (3 - shot) |
0.74 |
Model Purpose
The purpose of this model is to prove that trillion - scale datasets are not needed to pretrain a language model. As a result of needing small datasets, this model was pretrained on a single GPU (Titan V).
đ§ Technical Details
No specific technical implementation details are provided in the original document.
đ License
This model is licensed under the apache - 2.0 license.