đ Phi-3-Mini-4K-Instruct GGUF Model
This repository offers the GGUF format for the Phi-3-Mini-4K-Instruct model. It's a lightweight, state - of - the - art open model with 3.8B parameters, trained on Phi-3 datasets. These datasets combine synthetic data and filtered publicly available website data, emphasizing high - quality and reasoning - dense properties.
đ Quick Start
Download the GGUF Files
- Install Hugging Face CLI:
pip install huggingface-hub>=0.17.1
- Login to Hugging Face:
huggingface-cli login
- Download the GGUF model:
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir . --local-dir-use-symlinks False
Use with Different Tools
Ollama
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
- Run the phi3 model:
ollama run phi3
Building from Modelfile
- Get the Modelfile:
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Modelfile_q4 --local-dir /path/to/your/local/dir
- Build the Ollama Model:
ollama create phi3 -f Modelfile_q4
- Run the phi3 model:
ollama run phi3 "Your prompt here"
Llamafile
- Download Llamafile-0.7.3:
wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.7.3/llamafile-0.7.3
- Run the model with chat format prompt:
<|user|>\nHow to explain Internet for a medieval knight?\n<|end|>\n<|assistant|>
./llamafile-0.7.3 -ngl 9999 -m Phi-3-mini-4k-instruct-q4.gguf --temp 0.6 -p "<|user|>\nHow to explain Internet for a medieval knight?\n<|end|>\n<|assistant|>"
- Run with a chat interface:
./llamafile-0.7.3 -ngl 9999 -m Phi-3-mini-4k-instruct-q4.gguf
Python
- Install llama-cpp-python:
! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
- Run the model:
from llama_cpp import Llama
llm = Llama(
model_path="./Phi-3-mini-4k-instruct-q4.gguf",
n_ctx=4096,
n_threads=8,
n_gpu_layers=35,
)
prompt = "How to explain Internet to a medieval knight?"
output = llm(
f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
max_tokens=256,
stop=["<|end|>"],
echo=True,
)
print(output['choices'][0]['text'])
⨠Features
- Lightweight: With only 3.8B parameters, it's suitable for memory - and compute - constrained environments.
- State - of - the - art: Performs well in benchmarks for common sense, language understanding, math, code, long context, and logical reasoning among models with less than 13 billion parameters.
- Two Variants: Available in 4K and 128K context lengths.
- Post - training: Incorporates supervised fine - tuning and direct preference optimization for precise instruction adherence and safety.
đĻ Installation
The installation steps are included in the "Quick Start" section above.
đģ Usage Examples
Basic Usage
<|user|>
How to explain Internet for a medieval knight?<|end|>
<|assistant|>
Advanced Usage
<|user|>
I am going to Paris, what should I see?<|end|>
<|assistant|>
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."<|end|>
<|user|>
What is so great about #1?<|assistant|>
đ Documentation
Model Summary
This repo provides the GGUF format for the Phi-3-Mini-4K-Instruct. The model belongs to the Phi-3 family, with the Mini version in two variants based on context length. It has undergone post - training for better performance and safety.
Resources and Technical Documentation
Intended Uses
Primary use cases
The model is for commercial and research use in English, suitable for memory/compute - constrained environments, latency - bound scenarios, strong reasoning tasks, and long - context applications.
Use case considerations
Developers should consider limitations, adhere to laws and regulations, and evaluate the model's suitability for specific downstream use cases, especially high - risk scenarios.
Chat Format
The model works best with prompts in the chat format:
<|user|>\nQuestion <|end|>\n<|assistant|>
Responsible AI Considerations
- Quality of Service: Performs better in English, especially standard American English.
- Representation of Harms & Perpetuation of Stereotypes: May over - or under - represent groups, or reinforce stereotypes.
- Inappropriate or Offensive Content: May produce inappropriate content, requiring additional mitigations for sensitive contexts.
- Information Reliability: Can generate inaccurate or outdated content.
- Limited Scope for Code: Majority of training data is Python - based. Manually verify API uses for other packages or languages.
Training
Model
Property |
Details |
Architecture |
Phi-3 Mini is a dense decoder - only Transformer model with 3.8B parameters, fine - tuned with SFT and DPO. |
Inputs |
Text, best with chat - format prompts. |
Context length |
4K tokens |
GPUS |
512 H100 - 80G |
Training time |
7 days |
Training data |
3.3T tokens |
Outputs |
Generated text |
Dates |
Trained between February and April 2024 |
Status |
Static model trained on offline dataset with cutoff date October 2023. Future versions may be released. |
Datasets
The training data combines publicly available documents, synthetic data, and high - quality chat - format supervised data.
Software
đ License
The model is licensed under the MIT license.