モデル概要

このモデルはテキスト生成タスクに特化した小型言語モデルで、様々な自然言語処理シナリオに適しています。

モデル特徴

小型で効率的

2億4800万パラメータ規模で、リソースが限られた環境での展開に適しています

複数データセットでの訓練

TM-DATA-V2、TxT360など複数の高品質データセットを使用して訓練されています

継続的な訓練

モデルは現在も訓練が続けられており、性能が向上し続けています

モデル能力

テキスト生成

自然言語理解

指示追従

使用事例

教育

歴史知識Q&A

高校世界史やアメリカ史に関する質問に回答

高校世界史テストで29.11%の正確率を達成

法律

法律問題解答

専門的な法律や国際法に関する質問に回答

国際法テストで21.49%の正確率を達成

医療

医学知識Q&A

臨床知識や医学遺伝学に関する質問に回答

医学遺伝学テストで30%の正確率を達成

language:

en license: apache-2.0 datasets:
Locutusque/TM-DATA-V2
LLM360/TxT360
mlfoundations/dclm-baseline-1.0
Skylion007/openwebtext
JeanKaddour/minipile
eminorhan/gutenberg_en model-index:
name: TinyMistral-248M-v3 results:
- task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics:
  - type: inst_level_strict_acc and prompt_level_strict_acc value: 16.39 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics:
  - type: acc_norm value: 1.78 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics:
  - type: exact_match value: 0.0 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics:
  - type: acc_norm value: 0.0 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics:
  - type: acc_norm value: 5.15 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics:
  - type: acc value: 1.47 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=M4-ai/TinyMistral-248M-v3 name: Open LLM Leaderboard

still in training. Trained on about ~21 billion tokens so far.

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
Open LLM Leaderboard	N/A
- arc_challenge	1	none	25	acc	↑	0.2005	±	0.0117
		none	25	acc_norm	↑	0.2406	±	0.0125
- gsm8k	3	flexible-extract	5	exact_match	↑	0.0083	±	0.0025
		strict-match	5	exact_match	↑	0.0000	±	0.0000
- hellaswag	1	none	10	acc	↑	0.2724	±	0.0044
		none	10	acc_norm	↑	0.2838	±	0.0045
- mmlu	2	none		acc	↑	0.2290	±	0.0035
- humanities	2	none		acc	↑	0.2380	±	0.0062
- formal_logic	1	none	5	acc	↑	0.2460	±	0.0385
- high_school_european_history	1	none	5	acc	↑	0.1818	±	0.0301
- high_school_us_history	1	none	5	acc	↑	0.2647	±	0.0310
- high_school_world_history	1	none	5	acc	↑	0.2911	±	0.0296
- international_law	1	none	5	acc	↑	0.2149	±	0.0375
- jurisprudence	1	none	5	acc	↑	0.2685	±	0.0428
- logical_fallacies	1	none	5	acc	↑	0.2209	±	0.0326
- moral_disputes	1	none	5	acc	↑	0.2457	±	0.0232
- moral_scenarios	1	none	5	acc	↑	0.2369	±	0.0142
- philosophy	1	none	5	acc	↑	0.1865	±	0.0221
- prehistory	1	none	5	acc	↑	0.1975	±	0.0222
- professional_law	1	none	5	acc	↑	0.2432	±	0.0110
- world_religions	1	none	5	acc	↑	0.3099	±	0.0355
- other	2	none		acc	↑	0.2375	±	0.0076
- business_ethics	1	none	5	acc	↑	0.3200	±	0.0469
- clinical_knowledge	1	none	5	acc	↑	0.2226	±	0.0256
- college_medicine	1	none	5	acc	↑	0.1965	±	0.0303
- global_facts	1	none	5	acc	↑	0.1800	±	0.0386
- human_aging	1	none	5	acc	↑	0.3004	±	0.0308
- management	1	none	5	acc	↑	0.1942	±	0.0392
- marketing	1	none	5	acc	↑	0.2735	±	0.0292
- medical_genetics	1	none	5	acc	↑	0.3000	±	0.0461
- miscellaneous	1	none	5	acc	↑	0.2478	±	0.0154
- nutrition	1	none	5	acc	↑	0.2222	±	0.0238
- professional_accounting	1	none	5	acc	↑	0.2021	±	0.0240
- professional_medicine	1	none	5	acc	↑	0.1912	±	0.0239
- virology	1	none	5	acc	↑	0.2590	±	0.0341
- social sciences	2	none		acc	↑	0.2203	±	0.0075
- econometrics	1	none	5	acc	↑	0.2368	±	0.0400
- high_school_geography	1	none	5	acc	↑	0.2020	±	0.0286
- high_school_government_and_politics	1	none	5	acc	↑	0.1865	±	0.0281
- high_school_macroeconomics	1	none	5	acc	↑	0.2205	±	0.0210
- high_school_microeconomics	1	none	5	acc	↑	0.2143	±	0.0267
- high_school_psychology	1	none	5	acc	↑	0.1908	±	0.0168
- human_sexuality	1	none	5	acc	↑	0.2672	±	0.0388
- professional_psychology	1	none	5	acc	↑	0.2386	±	0.0172
- public_relations	1	none	5	acc	↑	0.1727	±	0.0362
- security_studies	1	none	5	acc	↑	0.2367	±	0.0272
- sociology	1	none	5	acc	↑	0.2488	±	0.0306
- us_foreign_policy	1	none	5	acc	↑	0.2600	±	0.0441
- stem	2	none		acc	↑	0.2157	±	0.0073
- abstract_algebra	1	none	5	acc	↑	0.2200	±	0.0416
- anatomy	1	none	5	acc	↑	0.1778	±	0.0330
- astronomy	1	none	5	acc	↑	0.1908	±	0.0320
- college_biology	1	none	5	acc	↑	0.2778	±	0.0375
- college_chemistry	1	none	5	acc	↑	0.2200	±	0.0416
- college_computer_science	1	none	5	acc	↑	0.2100	±	0.0409
- college_mathematics	1	none	5	acc	↑	0.2100	±	0.0409
- college_physics	1	none	5	acc	↑	0.2157	±	0.0409
- computer_security	1	none	5	acc	↑	0.2700	±	0.0446
- conceptual_physics	1	none	5	acc	↑	0.2638	±	0.0288
- electrical_engineering	1	none	5	acc	↑	0.2483	±	0.0360
- elementary_mathematics	1	none	5	acc	↑	0.2037	±	0.0207
- high_school_biology	1	none	5	acc	↑	0.1774	±	0.0217
- high_school_chemistry	1	none	5	acc	↑	0.2020	±	0.0282
- high_school_computer_science	1	none	5	acc	↑	0.2500	±	0.0435
- high_school_mathematics	1	none	5	acc	↑	0.2148	±	0.0250
- high_school_physics	1	none	5	acc	↑	0.2053	±	0.0330
- high_school_statistics	1	none	5	acc	↑	0.1481	±	0.0242
- machine_learning	1	none	5	acc	↑	0.3125	±	0.0440
- truthfulqa_gen	3	none	0	bleu_acc	↑	0.2362	±	0.0149
		none	0	bleu_diff	↑	-1.0138	±	0.2569
		none	0	bleu_max	↑	7.9522	±	0.4088
		none	0	rouge1_acc	↑	0.2595	±	0.0153
		none	0	rouge1_diff	↑	-1.9129	±	0.4349
		none	0	rouge1_max	↑	21.7885	±	0.7307
		none	0	rouge2_acc	↑	0.1200	±	0.0114
		none	0	rouge2_diff	↑	-1.9771	±	0.3475
		none	0	rouge2_max	↑	9.0199	±	0.5842
		none	0	rougeL_acc	↑	0.2570	±	0.0153
		none	0	rougeL_diff	↑	-1.8812	±	0.4185
		none	0	rougeL_max	↑	19.6284	±	0.6850
- truthfulqa_mc1	2	none	0	acc	↑	0.1983	±	0.0140
- truthfulqa_mc2	2	none	0	acc	↑	0.3861	±	0.0147
- winogrande	1	none	5	acc	↑	0.4972	±	0.0141

Groups	Version	Filter	Metric		Value		Stderr
- mmlu	2	none	acc	↑	0.2290	±	0.0035
- humanities	2	none	acc	↑	0.2380	±	0.0062
- other	2	none	acc	↑	0.2375	±	0.0076
- social sciences	2	none	acc	↑	0.2203	±	0.0075
- stem	2	none	acc	↑	0.2157	±	0.0073

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
agieval_nous	0	none		acc_norm	↑	0.2133	±	0.0081
- agieval_aqua_rat	1	none	0	acc	↑	0.2047	±	0.0254
		none	0	acc_norm	↑	0.1969	±	0.0250
- agieval_logiqa_en	1	none	0	acc	↑	0.2043	±	0.0158
		none	0	acc_norm	↑	0.2304	±	0.0165
- agieval_lsat_ar	1	none	0	acc	↑	0.1739	±	0.0250
		none	0	acc_norm	↑	0.1957	±	0.0262
- agieval_lsat_lr	1	none	0	acc	↑	0.1549	±	0.0160
		none	0	acc_norm	↑	0.1608	±	0.0163
- agieval_lsat_rc	1	none	0	acc	↑	0.1636	±	0.0226
		none	0	acc_norm	↑	0.2119	±	0.0250
- agieval_sat_en	1	none	0	acc	↑	0.2670	±	0.0309
		none	0	acc_norm	↑	0.2621	±	0.0307
- agieval_sat_en_without_passage	1	none	0	acc	↑	0.2670	±	0.0309
		none	0	acc_norm	↑	0.2621	±	0.0307
- agieval_sat_math	1	none	0	acc	↑	0.2182	±	0.0279
		none	0	acc_norm	↑	0.2318	±	0.0285
arc_challenge	1	none	0	acc	↑	0.1945	±	0.0116
		none	0	acc_norm	↑	0.2372	±	0.0124
truthfulqa_mc2	2	none	0	acc	↑	0.3861	±	0.0147