🚀 FireFunction V2: Fireworks Function Calling Model
FireFunction is a cutting - edge function calling model with a commercially viable license. It offers high - performance function calling capabilities and significant improvements over previous versions.
✨ Features
Comparison with other models
- Function - calling performance: It is competitive with GPT - 4o in function - calling, scoring 0.81 compared to GPT - 4o's 0.80 in a series of public evaluations.
- Conversation and instruction - following: Trained on Llama 3, it retains Llama 3’s conversation and instruction - following capabilities. It scores 0.84 on MT bench, while Llama 3 scores 0.89.
- Quality improvement: There are significant quality improvements over FireFunction v1 across a wide range of metrics.
General info
- Successor model: It is the successor of the FireFunction model.
- Function support: It supports parallel function calling (unlike FireFunction v1) and has good instruction - following ability.
- Cost - effective hosting: Hosted on the Fireworks platform, it costs < 10% of GPT 4o and is 2x faster.
Property |
Details |
Model Type |
Function Calling Model |
Training Data |
Based on Llama 3 |
License |
llama3 |
Tags |
function - calling |
🚀 Quick Start
Intended Use and Limitations
Supported usecases
The model is tuned to perform well in the following scenarios:
- General instruction following
- Multi - turn chat mixing vanilla messages with function calls
- Single - and parallel function calling
- Supporting up to 20 function specs at once
- Structured information extraction
It has an 8k context window, similar to Llama 3.
Out-of-Scope Use
The model is not optimized for the following use cases:
- 100+ function specs
- Nested function calling
Metrics
Benchmark |
Firefunction v1 |
Firefunction v2 |
Llama 3 70b Instruct |
Gpt - 4o |
Gorilla simple |
0.91 |
0.94 |
0.925 |
0.88 |
Gorilla multiple_function |
0.92 |
0.91 |
0.86 |
0.91 |
Gorilla parallel_function |
0 |
0.9 |
0.86 |
0.89 |
Gorilla parallel_multiple_function |
0 |
0.8 |
0.615 |
0.72 |
Nexus parallel |
0.38 |
0.53 |
0.3 |
0.47 |
Mtbench |
0.73 |
0.84 |
0.89 |
0.93 |
Average |
0.49 |
0.82 |
0.74 |
0.8 |
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import json
from datetime import datetime
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("fireworks-ai/firefunction-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("fireworks-ai/firefunction-v2")
function_spec = [
{
"name": "get_stock_price",
"description": "Get the current stock price",
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "The stock symbol, e.g. AAPL, GOOG"
}
},
"required": [
"symbol"
]
}
},
{
"name": "check_word_anagram",
"description": "Check if two words are anagrams of each other",
"parameters": {
"type": "object",
"properties": {
"word1": {
"type": "string",
"description": "The first word"
},
"word2": {
"type": "string",
"description": "The second word"
}
},
"required": [
"word1",
"word2"
]
}
}
]
functions = json.dumps(function_spec, indent=4)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant with access to functions. Use them if required.'},
{'role': 'user', 'content': 'Hi, can you tell me the current stock price of google and netflix?'}
]
now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
model_inputs = tokenizer.apply_chat_template(messages, functions=functions, datetime=now, return_tensors="pt").to(model.device)
generated_ids = model.generate(model_inputs, max_new_tokens=128)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
Advanced Usage
For more detailed usage, refer to the documentation.
📚 Documentation