🚀 Phi-4-mini-instruct
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites, supporting 128K token context length. It's suitable for broad multilingual commercial and research use.
🚀 Quick Start
The Phi-4-mini-instruct model is designed for broad multilingual commercial and research use. You can try it out through the following links:
✨ Features
- Multilingual Support: Supports a wide range of languages including Arabic, Chinese, Czech, and more.
- 128K Token Context Length: Enables handling of long conversations and complex tasks.
- Enhanced Performance: Incorporates supervised fine - tuning and direct preference optimization for precise instruction adherence and robust safety measures.
📦 Installation
Inference with vLLM
Requirements
flash_attn==2.7.4.post1
torch==2.5.1
vllm>=0.7.3
Inference with Transformers
Requirements
flash_attn==2.7.4.post1
torch==2.5.1
transformers==4.49.0
accelerate==1.3.0
💻 Usage Examples
Basic Usage
Inference with vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="microsoft/Phi-4-mini-instruct", trust_remote_code=True)
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
sampling_params = SamplingParams(
max_tokens=500,
temperature=0.0,
)
output = llm.chat(messages=messages, sampling_params=sampling_params)
print(output[0].outputs[0].text)
Inference with Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.random.manual_seed(0)
model_path = "microsoft/Phi-4-mini-instruct"
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
generation_args = {
"max_new_tokens": 500,
"temperature": 0.0,
"do_sample": False,
}
output_ids = model.generate(input_ids, **generation_args)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)
📚 Documentation
Model Summary
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites. It belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization.
Intended Uses
Primary Use Cases
The model is intended for broad multilingual commercial and research use, suitable for general purpose AI systems and applications in memory/compute constrained environments, latency bound scenarios, and tasks requiring strong reasoning.
Use Case Considerations
Developers should be aware of the model's limitations, including performance differences across languages, potential for generating inappropriate content, and information reliability issues. They should also comply with applicable laws and regulations.
Release Notes
This release of Phi-4-mini-instruct is based on user feedback from the Phi-3 series. It employs a new architecture, larger vocabulary, and better post-training techniques, achieving similar multilingual language understanding and reasoning ability as larger models despite its 3.8B parameters.
Usage
Tokenizer
Phi-4-mini-instruct supports a vocabulary size of up to 200064
tokens. The tokenizer files can be used for downstream fine-tuning.
Input Formats
- Chat format:
<|system|>Insert System Message<|end|><|user|>Insert User Message<|end|><|assistant|>
- Tool-enabled function-calling format: The user should provide available tools in the system prompt, wrapped by <|tool|> and <|/tool|> tokens.
Responsible AI Considerations
The Phi family of models may have limitations such as unfair behavior, unreliable information, and generation of inappropriate content. Developers should be aware of these issues and take appropriate measures.
Training
Model
Property |
Details |
Model Type |
Dense decoder-only Transformer model |
Training Data |
5T tokens from various sources |
Supported Languages |
Arabic, Chinese, Czech, etc. |
Release Date |
February 2025 |
Training Datasets
The training data includes publicly available documents, synthetic data, and high-quality chat format supervised data. A decontamination process was applied to the dataset.
Fine-tuning
A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided here.
Safety Evaluation and Red-Teaming
Various evaluation techniques were used to assess the model's safety. The model is resistant to jailbreak techniques across languages, but there are still some limitations, especially in function calling scenarios and long conversations.
Software
Hardware
The model uses flash attention and requires certain GPU hardware. It has been tested on NVIDIA A100, A6000, and H100.
License
The model is licensed under the MIT license.
Trademarks
Use of Microsoft trademarks or logos must follow Microsoft’s Trademark & Brand Guidelines.
Appendix A: Benchmark Methodology
The benchmark methodology includes some exceptions for optimizing prompts, but few-shot examples and prompt formats are kept consistent when comparing different models. The model was evaluated across a wide range of public and internal benchmarks.
🔧 Technical Details
Model Quality
The 3.8B parameters Phi-4-mini-instruct model was compared with a set of models over a variety of benchmarks using an internal benchmark platform. Overall, it achieves a similar level of multilingual language understanding and reasoning ability as much larger models, but is limited by its size for certain tasks.
Benchmark Methodology
The benchmark methodology has specific rules for optimizing prompts, including allowing some model - specific adjustments while keeping few-shot examples and prompt formats consistent when comparing different models.
Training Datasets
The training data of Phi-4-mini-instruct is a combination of publicly available documents, synthetic data, and high-quality chat format supervised data. A decontamination process was carried out to ensure data quality.
Safety Evaluation
Various evaluation techniques were used to evaluate the model's safety, including red teaming, adversarial conversation simulations, and multilingual safety evaluation benchmark datasets.
📄 License
The model is licensed under the MIT license.