Phi-3-mini-4k-instruct-gguf Open-source Model - A Lightweight and High-quality Tool for English Business Research

Phi 3 Mini 4k Instruct Gguf

Developed by microsoft

Phi-3-Mini-4K-Instruct is a lightweight, cutting-edge open-source model with 3.8 billion parameters, focusing on high quality and inference-intensive features, suitable for commercial and research use in English.

Large Language Model Supports Multiple LanguagesOpen Source License:MIT #Lightweight Inference #Mathematical Logic Optimization #4K Context

Downloads 20.51k

Release Time : 4/22/2024

Model Overview

Phi-3-Mini-4K-Instruct is a lightweight, cutting-edge open-source model suitable for commercial and research use in English. The model undergoes post-training processes, including supervised fine-tuning and direct preference optimization, to ensure precise instruction following and robust safety measures.

Model Features

Lightweight Design

A lightweight model with 3.8 billion parameters, suitable for memory/computation-constrained environments.

High-Quality Inference

Trained using the Phi-3 dataset, focusing on high-quality and inference-intensive features.

Robust Safety Measures

Undergoes supervised fine-tuning and direct preference optimization to ensure precise instruction following and strong safety measures.

Long Context Support

Supports a context length of 4K tokens, suitable for long-context tasks.

Model Capabilities

Text Generation

Natural Language Processing

Code Generation

Logical Reasoning

Mathematical Calculation

Use Cases

Commercial Applications

Customer Support

Used to generate customer support responses, improving response speed and quality.

Content Generation

Generates high-quality commercial content such as reports, emails, etc.

Research Applications

Language Model Research

Serves as a foundational building block for generative AI capabilities, accelerating research in language and multimodal models.

Logical Reasoning

Used to test and evaluate the model's logical reasoning capabilities.

🚀 Phi-3-Mini-4K-Instruct GGUF Model

This repository offers the GGUF format for the Phi-3-Mini-4K-Instruct model. It's a lightweight, state - of - the - art open model with 3.8B parameters, trained on Phi-3 datasets. These datasets combine synthetic data and filtered publicly available website data, emphasizing high - quality and reasoning - dense properties.

🚀 Quick Start

Download the GGUF Files

Install Hugging Face CLI:

pip install huggingface-hub>=0.17.1

Login to Hugging Face:

huggingface-cli login

Download the GGUF model:

huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir . --local-dir-use-symlinks False

Use with Different Tools

Ollama

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Run the phi3 model:

ollama run phi3

Building from `Modelfile`

Get the Modelfile:

huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Modelfile_q4 --local-dir /path/to/your/local/dir

Build the Ollama Model:

ollama create phi3 -f Modelfile_q4

Run the phi3 model:

ollama run phi3 "Your prompt here"

Llamafile

Download Llamafile-0.7.3:

wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.7.3/llamafile-0.7.3

Run the model with chat format prompt:

<|user|>\nHow to explain Internet for a medieval knight?\n<|end|>\n<|assistant|>

./llamafile-0.7.3 -ngl 9999 -m Phi-3-mini-4k-instruct-q4.gguf --temp 0.6 -p "<|user|>\nHow to explain Internet for a medieval knight?\n<|end|>\n<|assistant|>"

Run with a chat interface:

./llamafile-0.7.3 -ngl 9999 -m Phi-3-mini-4k-instruct-q4.gguf

Python

Install llama-cpp-python:

! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Run the model:

from llama_cpp import Llama


llm = Llama(
  model_path="./Phi-3-mini-4k-instruct-q4.gguf",  # path to GGUF file
  n_ctx=4096,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35, # The number of layers to offload to GPU, if you have GPU acceleration available. Set to 0 if no GPU acceleration is available on your system.
)

prompt = "How to explain Internet to a medieval knight?"

# Simple inference example
output = llm(
  f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
  max_tokens=256,  # Generate up to 256 tokens
  stop=["<|end|>"], 
  echo=True,  # Whether to echo the prompt
)

print(output['choices'][0]['text'])

✨ Features

Lightweight: With only 3.8B parameters, it's suitable for memory - and compute - constrained environments.
State - of - the - art: Performs well in benchmarks for common sense, language understanding, math, code, long context, and logical reasoning among models with less than 13 billion parameters.
Two Variants: Available in 4K and 128K context lengths.
Post - training: Incorporates supervised fine - tuning and direct preference optimization for precise instruction adherence and safety.

📦 Installation

The installation steps are included in the "Quick Start" section above.

💻 Usage Examples

Basic Usage

<|user|>
How to explain Internet for a medieval knight?<|end|>
<|assistant|>

Advanced Usage

<|user|>
I am going to Paris, what should I see?<|end|>
<|assistant|>
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."<|end|>
<|user|>
What is so great about #1?<|assistant|>

📚 Documentation

Model Summary

This repo provides the GGUF format for the Phi-3-Mini-4K-Instruct. The model belongs to the Phi-3 family, with the Mini version in two variants based on context length. It has undergone post - training for better performance and safety.

Resources and Technical Documentation

Intended Uses

Primary use cases

The model is for commercial and research use in English, suitable for memory/compute - constrained environments, latency - bound scenarios, strong reasoning tasks, and long - context applications.

Use case considerations

Developers should consider limitations, adhere to laws and regulations, and evaluate the model's suitability for specific downstream use cases, especially high - risk scenarios.

Chat Format

The model works best with prompts in the chat format:

<|user|>\nQuestion <|end|>\n<|assistant|>

Responsible AI Considerations

Quality of Service: Performs better in English, especially standard American English.
Representation of Harms & Perpetuation of Stereotypes: May over - or under - represent groups, or reinforce stereotypes.
Inappropriate or Offensive Content: May produce inappropriate content, requiring additional mitigations for sensitive contexts.
Information Reliability: Can generate inaccurate or outdated content.
Limited Scope for Code: Majority of training data is Python - based. Manually verify API uses for other packages or languages.

Training

Model

Property	Details
Architecture	Phi-3 Mini is a dense decoder - only Transformer model with 3.8B parameters, fine - tuned with SFT and DPO.
Inputs	Text, best with chat - format prompts.
Context length	4K tokens
GPUS	512 H100 - 80G
Training time	7 days
Training data	3.3T tokens
Outputs	Generated text
Dates	Trained between February and April 2024
Status	Static model trained on offline dataset with cutoff date October 2023. Future versions may be released.

Datasets

The training data combines publicly available documents, synthetic data, and high - quality chat - format supervised data.

Software

📄 License

The model is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご