Phi-2 Open-Source Transformer Model - Free Deployment for Commonsense, Language, and Logical Reasoning Applications

Phi 2

Developed by microsoft

Phi-2 is a Transformer model with 2.7 billion parameters, performing close to state-of-the-art levels on benchmarks for common sense understanding, language proficiency, and logical reasoning.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:MIT #2.7B parameter lightweight #Textbook-style generation #Code-assisted generation

Downloads 972.57k

Release Time : 12/13/2023

Model Overview

Phi-2 is a small yet powerful language model focused on text generation tasks, particularly suitable for question answering, dialogue, and code generation scenarios.

Model Features

Near state-of-the-art performance

Among models with fewer than 13 billion parameters, Phi-2 performs close to state-of-the-art levels on benchmarks for common sense understanding, language proficiency, and logical reasoning.

Diverse training data

Training data includes various NLP synthetic texts and web data filtered for safety, possessing educational value.

Research-friendly

Designed specifically for the research community to explore critical safety challenges such as toxicity reduction, understanding social biases, and enhancing controllability.

Model Capabilities

Text generation

Question answering

Dialogue generation

Code generation

Use Cases

Education

Learning assistance

Helps students understand complex concepts and provides learning suggestions

Generates textbook-style explanations and recommendations

Programming

Code generation

Generates Python code based on comments

Produces basic Python code snippets

Content creation

Analogy creation

Generates detailed analogies between concepts

Creates imaginative analogy descriptions

🚀 Phi-2 Model Introduction

Phi-2 is a 2.7-billion-parameter Transformer model. It offers high performance in common sense, language understanding, and logical reasoning benchmarks among models with less than 13 billion parameters. This open - source model aims to help the research community explore safety challenges.

🚀 Quick Start

Phi-2 has been integrated into transformers version 4.37.0. Make sure you are using a version equal to or higher than this. If you encounter an attention overflow issue (with FP16), enable/disable autocast on the PhiAttention.forward() function.

✨ Features

High - Performance in Benchmarks: Demonstrates nearly state - of - the - art performance in common sense, language understanding, and logical reasoning among sub - 13 - billion - parameter models.
Open - Source for Research: Designed to assist the research community in exploring safety challenges such as reducing toxicity and understanding biases.

📦 Installation

Ensure you have transformers version 4.37.0 or higher installed.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

Advanced Usage

The Phi - 2 model is suitable for different prompt formats:

QA Format

You can provide a standalone question as a prompt:

Write a detailed analogy between mathematics and a lighthouse.

The model will generate text after ".". To get more concise answers, use the "Instruct: \nOutput:" format:

Instruct: Write a detailed analogy between mathematics and a lighthouse.
Output: Mathematics is like a lighthouse. Just as a lighthouse guides ships safely to shore, mathematics provides a guiding light in the world of numbers and logic. It helps us navigate through complex problems and find solutions. Just as a lighthouse emits a steady beam of light, mathematics provides a consistent framework for reasoning and problem - solving. It illuminates the path to understanding and helps us make sense of the world around us.

Chat Format

Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice: ...

The model generates text after the first "Bob:".

Code Format

def print_prime(n):
   """
   Print all primes between 1 and n
   """
   primes = []
   for num in range(2, n+1):
       is_prime = True
       for i in range(2, int(math.sqrt(num))+1):
           if num % i == 0:
               is_prime = False
               break
       if is_prime:
           primes.append(num)
   print(primes)

The model generates text after the comments.

📚 Documentation

Intended Uses

Due to the nature of the training data, Phi - 2 is best for QA, chat, and code - related prompts.

Limitations of Phi - 2

Inaccurate Outputs: May generate incorrect code snippets and statements.
Limited Code Scope: Mainly trained on Python with common packages. Manual verification is recommended for other packages or languages.
Instruction Adherence: Struggles to follow complex or nuanced instructions.
Language Limitations: Primarily understands standard English, may misinterpret informal English, slang, or other languages.
Societal Biases: May generate content reflecting societal biases.
Toxicity: Can produce harmful content if explicitly prompted.
Verbosity: Tends to produce irrelevant or extra text after the first answer.

Training

Model

Property	Details
Architecture	A Transformer - based model with next - word prediction objective
Context length	2048 tokens
Dataset size	250B tokens, combination of NLP synthetic data created by AOAI GPT - 3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, assessed by AOAI GPT - 4
Training tokens	1.4T tokens
GPUs	96xA100 - 80G
Training time	14 days

Software

PyTorch
DeepSpeed
[Flash - Attention](https://github.com/HazyResearch/flash - attention)

📄 License

The model is licensed under the MIT license.

⚖️ Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third - party trademarks or logos are subject to those third - party’s policies.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご