Tinymistral 248M V3
TinyMistral-248M-v3 is a small language model with 248M parameters, currently still in training, having processed approximately 21 billion tokens.
Downloads 179
Release Time : 2/5/2024
Model Overview
This model is a compact language model focused on text generation tasks, suitable for various natural language processing scenarios.
Model Features
Compact and efficient
248M parameter scale, suitable for deployment in resource-limited environments
Multi-dataset training
Trained using multiple high-quality datasets including TM-DATA-V2 and TxT360
Continuous training
The model is still undergoing continuous training, with performance steadily improving
Model Capabilities
Text generation
Natural language understanding
Instruction following
Use Cases
Education
History knowledge Q&A
Answering questions about high school world history and US history
Achieved 29.11% accuracy in high school world history tests
Legal
Legal Q&A
Answering professional legal and international law questions
Achieved 21.49% accuracy in international law tests
Medical
Medical knowledge Q&A
Answering clinical knowledge and medical genetics questions
Achieved 30% accuracy in medical genetics tests
language:
- en license: apache-2.0 datasets:
- Locutusque/TM-DATA-V2
- LLM360/TxT360
- mlfoundations/dclm-baseline-1.0
- Skylion007/openwebtext
- JeanKaddour/minipile
- eminorhan/gutenberg_en model-index:
- name: TinyMistral-248M-v3
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc value: 16.39 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm value: 1.78 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match value: 0.0 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm value: 0.0 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm value: 5.15 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc value: 1.47 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
still in training. Trained on about ~21 billion tokens so far.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
Open LLM Leaderboard | N/A | |||||||
- arc_challenge | 1 | none | 25 | acc | ↑ | 0.2005 | ± | 0.0117 |
none | 25 | acc_norm | ↑ | 0.2406 | ± | 0.0125 | ||
- gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.0083 | ± | 0.0025 |
strict-match | 5 | exact_match | ↑ | 0.0000 | ± | 0.0000 | ||
- hellaswag | 1 | none | 10 | acc | ↑ | 0.2724 | ± | 0.0044 |
none | 10 | acc_norm | ↑ | 0.2838 | ± | 0.0045 | ||
- mmlu | 2 | none | acc | ↑ | 0.2290 | ± | 0.0035 | |
- humanities | 2 | none | acc | ↑ | 0.2380 | ± | 0.0062 | |
- formal_logic | 1 | none | 5 | acc | ↑ | 0.2460 | ± | 0.0385 |
- high_school_european_history | 1 | none | 5 | acc | ↑ | 0.1818 | ± | 0.0301 |
- high_school_us_history | 1 | none | 5 | acc | ↑ | 0.2647 | ± | 0.0310 |
- high_school_world_history | 1 | none | 5 | acc | ↑ | 0.2911 | ± | 0.0296 |
- international_law | 1 | none | 5 | acc | ↑ | 0.2149 | ± | 0.0375 |
- jurisprudence | 1 | none | 5 | acc | ↑ | 0.2685 | ± | 0.0428 |
- logical_fallacies | 1 | none | 5 | acc | ↑ | 0.2209 | ± | 0.0326 |
- moral_disputes | 1 | none | 5 | acc | ↑ | 0.2457 | ± | 0.0232 |
- moral_scenarios | 1 | none | 5 | acc | ↑ | 0.2369 | ± | 0.0142 |
- philosophy | 1 | none | 5 | acc | ↑ | 0.1865 | ± | 0.0221 |
- prehistory | 1 | none | 5 | acc | ↑ | 0.1975 | ± | 0.0222 |
- professional_law | 1 | none | 5 | acc | ↑ | 0.2432 | ± | 0.0110 |
- world_religions | 1 | none | 5 | acc | ↑ | 0.3099 | ± | 0.0355 |
- other | 2 | none | acc | ↑ | 0.2375 | ± | 0.0076 | |
- business_ethics | 1 | none | 5 | acc | ↑ | 0.3200 | ± | 0.0469 |
- clinical_knowledge | 1 | none | 5 | acc | ↑ | 0.2226 | ± | 0.0256 |
- college_medicine | 1 | none | 5 | acc | ↑ | 0.1965 | ± | 0.0303 |
- global_facts | 1 | none | 5 | acc | ↑ | 0.1800 | ± | 0.0386 |
- human_aging | 1 | none | 5 | acc | ↑ | 0.3004 | ± | 0.0308 |
- management | 1 | none | 5 | acc | ↑ | 0.1942 | ± | 0.0392 |
- marketing | 1 | none | 5 | acc | ↑ | 0.2735 | ± | 0.0292 |
- medical_genetics | 1 | none | 5 | acc | ↑ | 0.3000 | ± | 0.0461 |
- miscellaneous | 1 | none | 5 | acc | ↑ | 0.2478 | ± | 0.0154 |
- nutrition | 1 | none | 5 | acc | ↑ | 0.2222 | ± | 0.0238 |
- professional_accounting | 1 | none | 5 | acc | ↑ | 0.2021 | ± | 0.0240 |
- professional_medicine | 1 | none | 5 | acc | ↑ | 0.1912 | ± | 0.0239 |
- virology | 1 | none | 5 | acc | ↑ | 0.2590 | ± | 0.0341 |
- social sciences | 2 | none | acc | ↑ | 0.2203 | ± | 0.0075 | |
- econometrics | 1 | none | 5 | acc | ↑ | 0.2368 | ± | 0.0400 |
- high_school_geography | 1 | none | 5 | acc | ↑ | 0.2020 | ± | 0.0286 |
- high_school_government_and_politics | 1 | none | 5 | acc | ↑ | 0.1865 | ± | 0.0281 |
- high_school_macroeconomics | 1 | none | 5 | acc | ↑ | 0.2205 | ± | 0.0210 |
- high_school_microeconomics | 1 | none | 5 | acc | ↑ | 0.2143 | ± | 0.0267 |
- high_school_psychology | 1 | none | 5 | acc | ↑ | 0.1908 | ± | 0.0168 |
- human_sexuality | 1 | none | 5 | acc | ↑ | 0.2672 | ± | 0.0388 |
- professional_psychology | 1 | none | 5 | acc | ↑ | 0.2386 | ± | 0.0172 |
- public_relations | 1 | none | 5 | acc | ↑ | 0.1727 | ± | 0.0362 |
- security_studies | 1 | none | 5 | acc | ↑ | 0.2367 | ± | 0.0272 |
- sociology | 1 | none | 5 | acc | ↑ | 0.2488 | ± | 0.0306 |
- us_foreign_policy | 1 | none | 5 | acc | ↑ | 0.2600 | ± | 0.0441 |
- stem | 2 | none | acc | ↑ | 0.2157 | ± | 0.0073 | |
- abstract_algebra | 1 | none | 5 | acc | ↑ | 0.2200 | ± | 0.0416 |
- anatomy | 1 | none | 5 | acc | ↑ | 0.1778 | ± | 0.0330 |
- astronomy | 1 | none | 5 | acc | ↑ | 0.1908 | ± | 0.0320 |
- college_biology | 1 | none | 5 | acc | ↑ | 0.2778 | ± | 0.0375 |
- college_chemistry | 1 | none | 5 | acc | ↑ | 0.2200 | ± | 0.0416 |
- college_computer_science | 1 | none | 5 | acc | ↑ | 0.2100 | ± | 0.0409 |
- college_mathematics | 1 | none | 5 | acc | ↑ | 0.2100 | ± | 0.0409 |
- college_physics | 1 | none | 5 | acc | ↑ | 0.2157 | ± | 0.0409 |
- computer_security | 1 | none | 5 | acc | ↑ | 0.2700 | ± | 0.0446 |
- conceptual_physics | 1 | none | 5 | acc | ↑ | 0.2638 | ± | 0.0288 |
- electrical_engineering | 1 | none | 5 | acc | ↑ | 0.2483 | ± | 0.0360 |
- elementary_mathematics | 1 | none | 5 | acc | ↑ | 0.2037 | ± | 0.0207 |
- high_school_biology | 1 | none | 5 | acc | ↑ | 0.1774 | ± | 0.0217 |
- high_school_chemistry | 1 | none | 5 | acc | ↑ | 0.2020 | ± | 0.0282 |
- high_school_computer_science | 1 | none | 5 | acc | ↑ | 0.2500 | ± | 0.0435 |
- high_school_mathematics | 1 | none | 5 | acc | ↑ | 0.2148 | ± | 0.0250 |
- high_school_physics | 1 | none | 5 | acc | ↑ | 0.2053 | ± | 0.0330 |
- high_school_statistics | 1 | none | 5 | acc | ↑ | 0.1481 | ± | 0.0242 |
- machine_learning | 1 | none | 5 | acc | ↑ | 0.3125 | ± | 0.0440 |
- truthfulqa_gen | 3 | none | 0 | bleu_acc | ↑ | 0.2362 | ± | 0.0149 |
none | 0 | bleu_diff | ↑ | -1.0138 | ± | 0.2569 | ||
none | 0 | bleu_max | ↑ | 7.9522 | ± | 0.4088 | ||
none | 0 | rouge1_acc | ↑ | 0.2595 | ± | 0.0153 | ||
none | 0 | rouge1_diff | ↑ | -1.9129 | ± | 0.4349 | ||
none | 0 | rouge1_max | ↑ | 21.7885 | ± | 0.7307 | ||
none | 0 | rouge2_acc | ↑ | 0.1200 | ± | 0.0114 | ||
none | 0 | rouge2_diff | ↑ | -1.9771 | ± | 0.3475 | ||
none | 0 | rouge2_max | ↑ | 9.0199 | ± | 0.5842 | ||
none | 0 | rougeL_acc | ↑ | 0.2570 | ± | 0.0153 | ||
none | 0 | rougeL_diff | ↑ | -1.8812 | ± | 0.4185 | ||
none | 0 | rougeL_max | ↑ | 19.6284 | ± | 0.6850 | ||
- truthfulqa_mc1 | 2 | none | 0 | acc | ↑ | 0.1983 | ± | 0.0140 |
- truthfulqa_mc2 | 2 | none | 0 | acc | ↑ | 0.3861 | ± | 0.0147 |
- winogrande | 1 | none | 5 | acc | ↑ | 0.4972 | ± | 0.0141 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
- mmlu | 2 | none | acc | ↑ | 0.2290 | ± | 0.0035 | |
- humanities | 2 | none | acc | ↑ | 0.2380 | ± | 0.0062 | |
- other | 2 | none | acc | ↑ | 0.2375 | ± | 0.0076 | |
- social sciences | 2 | none | acc | ↑ | 0.2203 | ± | 0.0075 | |
- stem | 2 | none | acc | ↑ | 0.2157 | ± | 0.0073 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
agieval_nous | 0 | none | acc_norm | ↑ | 0.2133 | ± | 0.0081 | |
- agieval_aqua_rat | 1 | none | 0 | acc | ↑ | 0.2047 | ± | 0.0254 |
none | 0 | acc_norm | ↑ | 0.1969 | ± | 0.0250 | ||
- agieval_logiqa_en | 1 | none | 0 | acc | ↑ | 0.2043 | ± | 0.0158 |
none | 0 | acc_norm | ↑ | 0.2304 | ± | 0.0165 | ||
- agieval_lsat_ar | 1 | none | 0 | acc | ↑ | 0.1739 | ± | 0.0250 |
none | 0 | acc_norm | ↑ | 0.1957 | ± | 0.0262 | ||
- agieval_lsat_lr | 1 | none | 0 | acc | ↑ | 0.1549 | ± | 0.0160 |
none | 0 | acc_norm | ↑ | 0.1608 | ± | 0.0163 | ||
- agieval_lsat_rc | 1 | none | 0 | acc | ↑ | 0.1636 | ± | 0.0226 |
none | 0 | acc_norm | ↑ | 0.2119 | ± | 0.0250 | ||
- agieval_sat_en | 1 | none | 0 | acc | ↑ | 0.2670 | ± | 0.0309 |
none | 0 | acc_norm | ↑ | 0.2621 | ± | 0.0307 | ||
- agieval_sat_en_without_passage | 1 | none | 0 | acc | ↑ | 0.2670 | ± | 0.0309 |
none | 0 | acc_norm | ↑ | 0.2621 | ± | 0.0307 | ||
- agieval_sat_math | 1 | none | 0 | acc | ↑ | 0.2182 | ± | 0.0279 |
none | 0 | acc_norm | ↑ | 0.2318 | ± | 0.0285 | ||
arc_challenge | 1 | none | 0 | acc | ↑ | 0.1945 | ± | 0.0116 |
none | 0 | acc_norm | ↑ | 0.2372 | ± | 0.0124 | ||
truthfulqa_mc2 | 2 | none | 0 | acc | ↑ | 0.3861 | ± | 0.0147 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
agieval_nous | 0 | none | acc_norm | ↑ | 0.2133 | ± | 0.0081 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 4.13 |
IFEval (0-Shot) | 16.39 |
BBH (3-Shot) | 1.78 |
MATH Lvl 5 (4-Shot) | 0.00 |
GPQA (0-shot) | 0.00 |
MuSR (0-shot) | 5.15 |
MMLU-PRO (5-shot) | 1.47 |
Phi 2 GGUF
Other
Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.
Large Language Model Supports Multiple Languages
P
TheBloke
41.5M
205
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.
Large Language Model English
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Opt 125m
Other
OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.
Large Language Model English
O
facebook
6.3M
198
1
A pretrained model based on the transformers library, suitable for various NLP tasks
Large Language Model
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Featured Recommended AI Models