Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Llama-4-Scout-17B-16E-Instruct-GGUF
This project provides a quantized GGUF version of the Llama-4-Scout-17B-16E-Instruct model, enabling efficient inference and deployment.
📚 Documentation
Model Information
Property | Details |
---|---|
Model Name | Llama-4-Scout-17B-16E-Instruct |
Base Model | unsloth/Llama-4-Scout-17B-16E-Instruct |
Model Creator | Meta |
Quantized By | Second State Inc. |
Library Name | transformers |
Languages Supported | Arabic (ar), German (de), English (en), Spanish (es), French (fr), Hindi (hi), Indonesian (id), Italian (it), Portuguese (pt), Thai (th), Tagalog (tl), Vietnamese (vi) |
Tags | facebook, meta, llama, llama-4 |
Original Model
The original model can be found at unsloth/Llama-4-Scout-17B-16E-Instruct.
Important Note
⚠️ Important Note
ONLY text mode of Llama-4 supported for now!
Run with LlamaEdge
Prerequisites
- LlamaEdge version: v0.16.16 and above
Prompt Template
- Prompt type:
llama-4-chat
- Prompt string:
<|begin_of_text|><|header_start|>system<|header_end|>
{system_prompt}<|eot|><|header_start|>user<|header_end|>
{user_message_1}<|eot|><|header_start|>assistant<|header_end|>
{assistant_message_1}<|eot|><|header_start|>user<|header_end|>
{user_message_2}<|eot|>
<|header_start|>assistant<|header_end|>
Example: Tool Use
- Prompt with user question and tool info:
Expand to see the example
```text <|begin_of_text|><|header_start|>system<|header_end|>You are a helpful assistant.<|eot|> <|header_start|>user<|header_end|>
Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
[ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "description": "The temperature unit to use. Infer this from the users location.", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location", "unit" ] } } }, { "type": "function", "function": { "name": "predict_weather", "description": "Predict the weather in 24 hours", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "description": "The temperature unit to use. Infer this from the users location.", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location", "unit" ] } } }, { "type": "function", "function": { "name": "sum", "description": "Calculate the sum of two numbers", "parameters": { "type": "object", "properties": { "a": { "type": "integer", "description": "the left hand side number" }, "b": { "type": "integer", "description": "the right hand side number" } }, "required": [ "a", "b" ] } } } ]
Question: How is the weather of Beijing, China in celsius?<|eot|><|header_start|>assistant<|header_end|>
</details>
- Prompt with tool results:
<details>
<summary> Expand to see the example </summary>
```text
<|begin_of_text|><|header_start|>system<|header_end|>
You are a helpful assistant.<|eot|>
<|header_start|>user<|header_end|>
How is the weather of Beijing, China in celsius?<|eot|>
<|header_start|>assistant<|header_end|>
{"name":"get_current_weather","arguments":"{\"location\":\"Beijing, China\",\"unit\":\"celsius\"}"}<|eot|>
<|header_start|>ipython<|header_end|>
{"temperature":"30","unit":"celsius"}<|eot|><|header_start|>assistant<|header_end|>
Context Size
- Context size:
10M
Run as LlamaEdge Service
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-4-Scout-17B-16E-Instruct-Q5_K_M.gguf \
llama-api-server.wasm \
--prompt-template llama-4-chat \
--ctx-size 1000000 \
--model-name Llama-4-Scout
Run as LlamaEdge Command App
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-4-Scout-17B-16E-Instruct-Q5_K_M.gguf \
llama-chat.wasm \
--prompt-template llama-4-chat \
--ctx-size 1000000
Quantized GGUF Models
Quantized with llama.cpp b5074.

