🚀 Phi 3 Mini 4K Instruct GGUF
This repository contains GGUF format model files for Microsoft's Phi 3 Mini 4K Instruct, updated with the latest model changes as of July 21, 2024.
The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model. It is trained with the Phi-3 datasets, which includes both synthetic data and the filtered publicly available websites data, focusing on high-quality and reasoning dense properties. Learn more on Microsoft’s Model page.
🚀 Quick Start
What is GGUF?
GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It replaces GGML, which is no longer supported by llama.cpp. This model is converted with llama.cpp build 3432 (revision 45f2c19), using autogguf.
Prompt template
<|system|>
{{system_prompt}}<|end|>
<|user|>
{{prompt}}<|end|>
<|assistant|>
Download & run with cnvrs on iPhone, iPad, and Mac!

cnvrs is the best app for private, local AI on your device:
- Create & save Characters with custom system prompts & temperature settings.
- Download and experiment with any GGUF model you can find on HuggingFace.
- Customize it with custom Theme colors.
- Powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming.
- Try it out yourself today, on Testflight!
- Follow cnvrs on twitter to stay up to date.
📚 Documentation
Original Model Evaluation
Comparison of July update vs original April release
Benchmarks |
Original |
June 2024 Update |
Instruction Extra Hard |
5.7 |
6.0 |
Instruction Hard |
4.9 |
5.1 |
Instructions Challenge |
24.6 |
42.3 |
JSON Structure Output |
11.5 |
52.3 |
XML Structure Output |
14.4 |
49.8 |
GPQA |
23.7 |
30.6 |
MMLU |
68.8 |
70.9 |
Average |
21.9 |
36.7 |
Original April release
As is now standard, we use few-shot prompts to evaluate the models, at temperature 0. The prompts and number of shots are part of a Microsoft internal tool to evaluate language models, and in particular, we did no optimization to the pipeline for Phi-3. More specifically, we do not change prompts, pick different few-shot examples, change prompt format, or do any other form of optimization for the model. The number of k–shot examples is listed per-benchmark.
|
Phi-3-Mini-4K-In 3.8b |
Phi-2 2.7b |
Mistral 7b |
Gemma 7b |
Llama-3-In 8b |
Mixtral 8x7b |
GPT-3.5 version 1106 |
MMLU 5-Shot |
68.8 |
56.3 |
61.7 |
63.6 |
66.5 |
68.4 |
71.4 |
HellaSwag 5-Shot |
76.7 |
53.6 |
58.5 |
49.8 |
71.1 |
70.4 |
78.8 |
ANLI 7-Shot |
52.8 |
42.5 |
47.1 |
48.7 |
57.3 |
55.2 |
58.1 |
GSM-8K 0-Shot; CoT |
82.5 |
61.1 |
46.4 |
59.8 |
77.4 |
64.7 |
78.1 |
MedQA 2-Shot |
53.8 |
40.9 |
49.6 |
50.0 |
60.5 |
62.2 |
63.4 |
AGIEval 0-Shot |
37.5 |
29.8 |
35.1 |
42.1 |
42.0 |
45.2 |
48.4 |
TriviaQA 5-Shot |
64.0 |
45.2 |
72.3 |
75.2 |
67.7 |
82.2 |
85.8 |
Arc-C 10-Shot |
84.9 |
75.9 |
78.6 |
78.3 |
82.8 |
87.3 |
87.4 |
Arc-E 10-Shot |
94.6 |
88.5 |
90.6 |
91.4 |
93.4 |
95.6 |
96.3 |
PIQA 5-Shot |
84.2 |
60.2 |
77.7 |
78.1 |
75.7 |
86.0 |
86.6 |
SociQA 5-Shot |
76.6 |
68.3 |
74.6 |
65.5 |
73.9 |
75.9 |
68.3 |
BigBench-Hard 0-Shot |
71.7 |
59.4 |
57.3 |
59.6 |
51.5 |
69.7 |
68.32 |
WinoGrande 5-Shot |
70.8 |
54.7 |
54.2 |
55.6 |
65 |
62.0 |
68.8 |
OpenBookQA 10-Shot |
83.2 |
73.6 |
79.8 |
78.6 |
82.6 |
85.8 |
86.0 |
BoolQ 0-Shot |
77.6 |
-- |
72.2 |
66.0 |
80.9 |
77.6 |
79.1 |
CommonSenseQA 10-Shot |
80.2 |
69.3 |
72.6 |
76.2 |
79 |
78.1 |
79.6 |
TruthfulQA 10-Shot |
65.0 |
-- |
52.1 |
53.0 |
63.2 |
60.1 |
85.8 |
HumanEval 0-Shot |
59.1 |
47.0 |
28.0 |
34.1 |
60.4 |
37.8 |
62.2 |
MBPP 3-Shot |
53.8 |
60.6 |
50.8 |
51.5 |
67.7 |
60.2 |
77.8 |
📄 License
This project is licensed under the MIT License. View the license.
📋 Information Table
Property |
Details |
Base Model |
microsoft/Phi-3-mini-4k-instruct |
Inference |
false |
Model Creator |
microsoft |
Model Name |
Phi-3-mini-4k-instruct |
Model Type |
phi3 |
Quantized By |
brittlewis12 |
Pipeline Tag |
text-generation |
Tags |
nlp, code |
Language |
en |