đ Falcon-E Model
Falcon-E is a series of powerful, universal, and fine-tunable 1.58bit language models developed by tiiuae, offering high performance in various NLP tasks with different model scales.
đ Quick Start
To use the Falcon-E model, you can rely on the Hugging Face transformers library or the BitNet library. There are multiple ways to interact with the model depending on your target usage.
⨠Features
- Model Type: Causal decoder-only / Base version
- Architecture: Pure-transformer - 1.58bit version
- Language(s) (NLP): English
- License: Falcon-LLM License
đĻ Installation
Currently, to use this model, you can either rely on the Hugging Face transformers library or the BitNet library. For the BitNet library, you can install it as follows:
git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
đģ Usage Examples
Basic Usage
đ¤ transformers
If you want to perform inference on the BitNet checkpoint, run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-E-1B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
).to("cuda")
If you want to use the classic bfloat16
version, you can run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-E-1B-Base"
revision = "bfloat16"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
revision=revision,
).to("cuda")
BitNet
python setup_env.py --hf-repo tiiuae/Falcon-E-1B-Base -q i2_s
python run_inference.py -m models/Falcon-E-1B-Base/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
Advanced Usage
Fine-tuning
For fine-tuning the model, you should load the prequantized
revision of the model and use the onebitllms
Python package:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit
model_id = "tiiuae/Falcon-E-1B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
revision="prequantized"
)
model = replace_linear_with_bitnet_linear(model)
trainer = SFTTrainer(
model,
...
)
trainer.train()
quantize_to_1bit(output_directory)
đ Documentation
Model Details
Property |
Details |
Developed by |
https://www.tii.ae |
Model Type |
Causal decoder-only / Base version |
Architecture |
Pure-transformer - 1.58bit version |
Language(s) (NLP) |
English |
License |
Falcon-LLM License |
Training Details
For more details about the training protocol of this model, please refer to the Falcon-E technical blogpost.
Evaluation
We report in the following table our internal pipeline benchmarks. Note that evaluation results are normalized scores from former Hugging Face leaderboard v2 tasks.
For 1B scale models and below
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math-Hard |
GPQA |
MuSR |
BBH |
MMLU-Pro |
Avg. |
Qwen-2.5-0.5B |
0.5B |
1GB |
16.27 |
3.93 |
0.0 |
2.08 |
6.95 |
10.06 |
6.55 |
SmolLM2-360M |
0.36B |
720MB |
21.15 |
1.21 |
0.0 |
7.73 |
5.54 |
1.88 |
6.25 |
Qwen-2.5-1.5B |
1.5B |
3.1GB |
26.74 |
9.14 |
16.66 |
5.27 |
20.61 |
4.7 |
13.85 |
Llama-3.2-1B |
1.24B |
2.47GB |
14.78 |
1.21 |
4.37 |
2.56 |
2.26 |
0 |
4.2 |
SmolLM2-1.7B |
1.7B |
3.4GB |
24.4 |
2.64 |
9.3 |
4.6 |
12.64 |
3.91 |
9.58 |
Falcon-3-1B-Base |
1.5B |
3GB |
24.28 |
3.32 |
11.34 |
9.71 |
6.76 |
3.91 |
9.89 |
Hymba-1.5B-Base |
1.5B |
3GB |
22.95 |
1.36 |
7.69 |
5.18 |
10.25 |
0.78 |
8.04 |
Falcon-E-1B-Base |
1.8B |
635MB |
32.9 |
10.97 |
2.8 |
3.65 |
12.28 |
17.82 |
13.40 |
For 3B scale models
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math-Hard |
GPQA |
MuSR |
BBH |
MMLU-Pro |
Avg. |
Falcon-3-3B-Base |
3B |
6.46GB |
15.74 |
11.78 |
21.58 |
6.27 |
18.09 |
6.26 |
15.74 |
Qwen2.5-3B |
3B |
6.17GB |
26.9 |
14.8 |
24.3 |
11.76 |
24.48 |
6.38 |
18.1 |
Falcon-E-3B-Base |
3B |
999MB |
36.67 |
13.45 |
8.67 |
4.14 |
19.83 |
27.16 |
18.32 |
Instruction fine-tuned models - For 1B scale models and below
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math-Hard |
GPQA |
MuSR |
BBH |
MMLU-Pro |
Avg. |
Qwen-2.5-0.5B-Instruct |
500M |
1GB |
30.71 |
0 |
8.43 |
0.94 |
7.75 |
0 |
6.59 |
SmolLM2-360M-Instruct |
360M |
720MB |
38.42 |
1.51 |
4.17 |
2.77 |
1.3 |
0.67 |
8.14 |
Qwen-2.5-1.5B-Instruct |
1.5B |
3.1GB |
44.76 |
22.05 |
19.81 |
3.19 |
19.99 |
0.78 |
18.43 |
SmolLM2-1.7B |
1.7B |
3.4GB |
53.68 |
5.82 |
10.92 |
4.1 |
11.71 |
0 |
15.02 |
Falcon-3-1B-Instruct |
1.5B |
3GB |
55.57 |
6.34 |
12.96 |
10.56 |
9.32 |
2.24 |
16.16 |
Hymba-1.5B-Instruct |
1.5B |
3GB |
60.09 |
2.72 |
4.59 |
1.05 |
11.56 |
5.515 |
14.19 |
Falcon-E-1B-Instruct |
1.8B |
635MB |
54.35 |
9.12 |
16.5 |
2.51 |
19.42 |
9.64 |
18.59 |
Instruction fine-tuned models - For 3B scale models
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math-Hard |
GPQA |
MuSR |
BBH |
MMLU-Pro |
Avg. |
Falcon-3-3B-Instruct |
3B |
6.46GB |
69.77 |
25 |
26.29 |
11.13 |
22.28 |
5.15 |
26.6 |
Qwen2.5-3B-Instruct |
3B |
6.17GB |
64.75 |
36.78 |
25.8 |
7.57 |
25.05 |
3.02 |
27.16 |
Falcon-E-3B-Instruct |
3B |
999MB |
60.97 |
15.3 |
23.59 |
2.12 |
26.45 |
7.45 |
22.64666667 |
Useful links
đ License
This model is under the Falcon-LLM License.
đ§ Technical Details
For more technical details, please refer to the Falcon-E technical blogpost.
đ Citation
If the Falcon-E family of models were helpful to your work, feel free to give us a cite.
@misc{tiionebitllms,
title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
author = {Falcon-LLM Team},
month = {April},
url = {https://falcon-lm.github.io/blog/falcon-edge},
year = {2025}
}