🚀 Phi-2 Model Introduction
Phi-2 is a 2.7-billion-parameter Transformer model. It offers high performance in common sense, language understanding, and logical reasoning benchmarks among models with less than 13 billion parameters. This open - source model aims to help the research community explore safety challenges.
🚀 Quick Start
Phi-2 has been integrated into transformers
version 4.37.0. Make sure you are using a version equal to or higher than this. If you encounter an attention overflow issue (with FP16), enable/disable autocast on the PhiAttention.forward() function.
✨ Features
- High - Performance in Benchmarks: Demonstrates nearly state - of - the - art performance in common sense, language understanding, and logical reasoning among sub - 13 - billion - parameter models.
- Open - Source for Research: Designed to assist the research community in exploring safety challenges such as reducing toxicity and understanding biases.
📦 Installation
Ensure you have transformers
version 4.37.0 or higher installed.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
inputs = tokenizer('''def print_prime(n):
"""
Print all primes between 1 and n
"""''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)
Advanced Usage
The Phi - 2 model is suitable for different prompt formats:
QA Format
You can provide a standalone question as a prompt:
Write a detailed analogy between mathematics and a lighthouse.
The model will generate text after ".". To get more concise answers, use the "Instruct: \nOutput:" format:
Instruct: Write a detailed analogy between mathematics and a lighthouse.
Output: Mathematics is like a lighthouse. Just as a lighthouse guides ships safely to shore, mathematics provides a guiding light in the world of numbers and logic. It helps us navigate through complex problems and find solutions. Just as a lighthouse emits a steady beam of light, mathematics provides a consistent framework for reasoning and problem - solving. It illuminates the path to understanding and helps us make sense of the world around us.
Chat Format
Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice: ...
The model generates text after the first "Bob:".
Code Format
def print_prime(n):
"""
Print all primes between 1 and n
"""
primes = []
for num in range(2, n+1):
is_prime = True
for i in range(2, int(math.sqrt(num))+1):
if num % i == 0:
is_prime = False
break
if is_prime:
primes.append(num)
print(primes)
The model generates text after the comments.
📚 Documentation
Intended Uses
Due to the nature of the training data, Phi - 2 is best for QA, chat, and code - related prompts.
Limitations of Phi - 2
- Inaccurate Outputs: May generate incorrect code snippets and statements.
- Limited Code Scope: Mainly trained on Python with common packages. Manual verification is recommended for other packages or languages.
- Instruction Adherence: Struggles to follow complex or nuanced instructions.
- Language Limitations: Primarily understands standard English, may misinterpret informal English, slang, or other languages.
- Societal Biases: May generate content reflecting societal biases.
- Toxicity: Can produce harmful content if explicitly prompted.
- Verbosity: Tends to produce irrelevant or extra text after the first answer.
Training
Model
Property |
Details |
Architecture |
A Transformer - based model with next - word prediction objective |
Context length |
2048 tokens |
Dataset size |
250B tokens, combination of NLP synthetic data created by AOAI GPT - 3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, assessed by AOAI GPT - 4 |
Training tokens |
1.4T tokens |
GPUs |
96xA100 - 80G |
Training time |
14 days |
Software
- PyTorch
- DeepSpeed
- [Flash - Attention](https://github.com/HazyResearch/flash - attention)
📄 License
The model is licensed under the MIT license.
⚖️ Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third - party trademarks or logos are subject to those third - party’s policies.