đ Falcon-H1 Model
Falcon-H1 is a family of hybrid-head language models developed by TII, offering high efficiency and performance across multiple languages.
đ Quick Start
To quickly start using the Falcon-H1 model, you can choose from different libraries such as Hugging Face transformers
, vLLM
, or a custom fork of llama.cpp
.
⨠Features
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only
- Architecture: Hybrid Transformers + Mamba architecture
- Language(s) (NLP): English, Multilingual
- License: Falcon-LLM License
đĻ Installation
Install transformers
Make sure to install the latest version of transformers
, you can install it from source:
pip install git+https://github.com/huggingface/transformers.git
Refer to the official vLLM documentation for more details on building vLLM from source.
đģ Usage Examples
Basic Usage - transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-H1-1B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
Advanced Usage - vLLM
For vLLM, simply start a server by executing the command below:
# pip install vllm
vllm serve tiiuae/Falcon-H1-1B-Instruct --tensor-parallel-size 2 --data-parallel-size 1
Advanced Usage - llama.cpp
While the development team is working on integrating the architecture directly into llama.cpp
library, you can install the custom fork and use it directly: https://github.com/tiiuae/llama.cpp-Falcon-H1. Use the same installing guidelines as llama.cpp
.
đ Documentation
Training Details
For more details about the training protocol of this model, please refer to the Falcon-H1 technical blogpost.
Evaluation
Falcon-H1 series perform very well on a variety of tasks, including reasoning tasks.
Tasks |
Falcon-H1-34B |
Qwen2.5-72B |
Qwen2.5-32B |
Gemma3-27B |
Llama3.1-70B |
Llama4-scout |
General |
|
|
|
|
|
|
BBH |
69.36 |
67.77 |
67.45 |
61.6 |
62.78 |
61.71 |
MMLU |
83.46 |
85.96 |
83.18 |
78.32 |
78.49 |
77.98 |
ARC-C |
71.25 |
72.44 |
70.48 |
70.31 |
69.2 |
62.97 |
HellaSwag |
85.68 |
87.57 |
85.13 |
86.19 |
87.78 |
84.01 |
Winogrande |
82.72 |
83.74 |
82.32 |
82.4 |
85.32 |
78.93 |
Math |
|
|
|
|
|
|
GSM8k |
76.5 |
89.76 |
90.14 |
81.35 |
80.52 |
83.24 |
MATH lvl5 |
40.71 |
38.14 |
36.4 |
25.38 |
18.81 |
27.19 |
Science |
|
|
|
|
|
|
GPQA |
42.7 |
42.28 |
39.68 |
35.82 |
36.49 |
35.99 |
MMLU-Pro |
57.18 |
60.22 |
58.05 |
49.64 |
47.07 |
50.16 |
MMLU-stem |
83.82 |
84.81 |
82.81 |
76.59 |
70.35 |
72.57 |
Code |
|
|
|
|
|
|
HumanEval |
70.12 |
59.15 |
59.76 |
48.78 |
57.32 |
57.32 |
HumanEval+ |
64.63 |
51.22 |
51.83 |
40.85 |
50.61 |
48.78 |
MBPP |
83.33 |
87.04 |
83.07 |
76.19 |
78.84 |
77.78 |
MBPP+ |
70.37 |
70.63 |
68.78 |
61.64 |
66.67 |
64.29 |
You can check more in detail on our our release blogpost, detailed benchmarks.
Useful links
đ License
The model is licensed under the Falcon-LLM License.
đ Citation
If the Falcon-H1 family of models were helpful to your work, feel free to give us a cite.
@misc{tiifalconh1,
title = {Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance},
url = {https://falcon-lm.github.io/blog/falcon-h1},
author = {Falcon-LLM Team},
month = {May},
year = {2025}
}