đ Falcon-H1
Falcon-H1 is a family of hybrid - head language models developed by TII, offering high - performance solutions for various NLP tasks.
đ Quick Start
To start using the Falcon-H1 model, you can choose from Hugging Face transformers
, vLLM
, or a custom fork of the llama.cpp
library.
⨠Features
- Model Type: Causal decoder - only
- Architecture: Hybrid Transformers + Mamba architecture
- Language: English
- License: Falcon - LLM License
đĻ Installation
Install transformers
pip install git+https://github.com/huggingface/transformers.git
Build vLLM
from source
Refer to the official vLLM documentation for more details on building vLLM from source.
Install llama.cpp
fork
You can install our fork of the llama.cpp
library and use it directly: https://github.com/tiiuae/llama.cpp-Falcon-H1. Use the same installing guidelines as llama.cpp
.
đģ Usage Examples
Basic Usage
Using transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-H1-1B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
Using vLLM
# pip install vllm
vllm serve tiiuae/Falcon-H1-1B-Instruct --tensor-parallel-size 2 --data-parallel-size 1
Advanced Usage
While we are working on integrating our architecture directly into the llama.cpp
library, you can use our fork as described above.
đ Documentation
Model Details
- Developed by: https://www.tii.ae
- Model type: Causal decoder - only
- Architecture: Hybrid Transformers + Mamba architecture
- Language(s) (NLP): English
- License: Falcon - LLM License
Training Details
For more details about the training protocol of this model, please refer to the Falcon - H1 technical blogpost.
Evaluation
Falcon - H1 series perform very well on a variety of tasks, including reasoning tasks.
Tasks |
Falcon - H1 - 0.5B |
Qwen3 - 0.6B |
Qwen2.5 - 0.5B |
Gemma3 - 1B |
Llama3.2 - 1B |
Falcon3 - 1B |
General |
|
|
|
|
|
|
BBH |
40.22 |
36.07 |
32.62 |
30.26 |
30.72 |
35.24 |
MMLU |
55.04 |
52.64 |
47.61 |
26.33 |
32.39 |
45.14 |
ARC - C |
46.93 |
44.8 |
35.32 |
39.33 |
39.42 |
47.87 |
HellaSwag |
56.3 |
53.51 |
51.79 |
62.94 |
65.73 |
62.3 |
Winogrande |
59.43 |
60.54 |
56.83 |
62.59 |
62.75 |
61.17 |
Math |
|
|
|
|
|
|
GSM8k |
60.2 |
50.04 |
34.8 |
2.2 |
7.05 |
34.95 |
MATH lvl5 |
15.18 |
9.29 |
4.23 |
1.21 |
0.98 |
3.4 |
Science |
|
|
|
|
|
|
GPQA |
29.7 |
29.11 |
27.94 |
24.66 |
23.57 |
27.85 |
MMLU - Pro |
30.04 |
22.99 |
18.98 |
11.31 |
11.8 |
16.11 |
MMLU - stem |
57.12 |
50.11 |
43.74 |
27.59 |
30.19 |
40.06 |
Code |
|
|
|
|
|
|
HumanEval |
35.98 |
31.71 |
29.27 |
6.71 |
18.9 |
10.37 |
HumanEval+ |
31.1 |
27.44 |
25.0 |
5.49 |
16.46 |
9.15 |
MBPP |
52.12 |
51.06 |
40.74 |
12.7 |
35.98 |
12.43 |
MBPP+ |
43.39 |
42.33 |
34.66 |
9.52 |
29.89 |
9.52 |
You can check more in detail on our release blogpost, detailed benchmarks.
Useful Links
đ License
This model is licensed under the Falcon - LLM License.
đ Citation
If the Falcon - H1 family of models were helpful to your work, feel free to give us a cite.
@misc{tiifalconh1,
title = {Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance},
url = {https://falcon-lm.github.io/blog/falcon-h1},
author = {Falcon-LLM Team},
month = {May},
year = {2025}
}