Llama-3-FireFunction-v2 Open-Source Function Call Model - Supports Parallel Calls and Exhibits Excellent Instruction Compliance

Llama 3 Firefunction V2

Developed by fireworks-ai

FireFunction V2 is a state-of-the-art function calling model with a commercially viable license, trained on Llama 3, supporting parallel function calls and strong instruction following.

Large Language Model

Transformers

#Function Call Optimization #Parallel Function Processing #Commercial License

Downloads 1,361

Release Time : 6/5/2024

Model Overview

FireFunction V2 is the successor to the FireFunction model, supporting parallel function calls and instruction following, suitable for multi-turn dialogues and structured information extraction.

Model Features

Parallel Function Calls

Supports parallel function calls, improving multitasking efficiency.

Instruction Following

Strong instruction-following capability, suitable for complex dialogue scenarios.

High Performance

Competes with GPT-4o in multiple public evaluations, with close scores.

Low Cost

Hosted on the Fireworks platform, costing less than 10% of GPT-4o while being twice as fast.

Model Capabilities

Function calling

Instruction following

Multi-turn dialogue

Structured information extraction

Use Cases

General Instruction Following

Multi-Turn Dialogue

Supports multi-turn dialogues mixing regular messages with function calls.

Enhances the naturalness and efficiency of conversations.

Function Calling

Single Function Call

Supports single function calls for simple task processing.

Quickly completes task processing.

Parallel Function Calls

Supports parallel function calls for multitasking.

Improves multitasking efficiency.

Information Extraction

Structured Information Extraction

Supports extracting structured information from text.

Improves the accuracy and efficiency of information extraction.

🚀 FireFunction V2: Fireworks Function Calling Model

FireFunction is a cutting - edge function calling model with a commercially viable license. It offers high - performance function calling capabilities and significant improvements over previous versions.

✨ Features

Comparison with other models

Function - calling performance: It is competitive with GPT - 4o in function - calling, scoring 0.81 compared to GPT - 4o's 0.80 in a series of public evaluations.
Conversation and instruction - following: Trained on Llama 3, it retains Llama 3’s conversation and instruction - following capabilities. It scores 0.84 on MT bench, while Llama 3 scores 0.89.
Quality improvement: There are significant quality improvements over FireFunction v1 across a wide range of metrics.

General info

Successor model: It is the successor of the FireFunction model.
Function support: It supports parallel function calling (unlike FireFunction v1) and has good instruction - following ability.
Cost - effective hosting: Hosted on the Fireworks platform, it costs < 10% of GPT 4o and is 2x faster.

Property	Details
Model Type	Function Calling Model
Training Data	Based on Llama 3
License	llama3
Tags	function - calling

🚀 Quick Start

Intended Use and Limitations

Supported usecases

The model is tuned to perform well in the following scenarios:

General instruction following
Multi - turn chat mixing vanilla messages with function calls
Single - and parallel function calling
Supporting up to 20 function specs at once
Structured information extraction

It has an 8k context window, similar to Llama 3.

Out-of-Scope Use

The model is not optimized for the following use cases:

100+ function specs
Nested function calling

Metrics

Benchmark	Firefunction v1	Firefunction v2	Llama 3 70b Instruct	Gpt - 4o
Gorilla simple	0.91	0.94	0.925	0.88
Gorilla multiple_function	0.92	0.91	0.86	0.91
Gorilla parallel_function	0	0.9	0.86	0.89
Gorilla parallel_multiple_function	0	0.8	0.615	0.72
Nexus parallel	0.38	0.53	0.3	0.47
Mtbench	0.73	0.84	0.89	0.93
Average	0.49	0.82	0.74	0.8

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import json
from datetime import datetime

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("fireworks-ai/firefunction-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("fireworks-ai/firefunction-v2")

function_spec = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "The stock symbol, e.g. AAPL, GOOG"
                }
            },
            "required": [
                "symbol"
            ]
        }
    },
    {
        "name": "check_word_anagram",
        "description": "Check if two words are anagrams of each other",
        "parameters": {
            "type": "object",
            "properties": {
                "word1": {
                    "type": "string",
                    "description": "The first word"
                },
                "word2": {
                    "type": "string",
                    "description": "The second word"
                }
            },
            "required": [
                "word1",
                "word2"
            ]
        }
    }
]
functions = json.dumps(function_spec, indent=4)

messages = [
    {'role': 'system', 'content': 'You are a helpful assistant with access to functions. Use them if required.'},
    {'role': 'user', 'content': 'Hi, can you tell me the current stock price of google and netflix?'}
]

now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

model_inputs = tokenizer.apply_chat_template(messages, functions=functions, datetime=now, return_tensors="pt").to(model.device)

generated_ids = model.generate(model_inputs, max_new_tokens=128)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Advanced Usage

For more detailed usage, refer to the documentation.

📚 Documentation

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご