đ Falcon-E Model
Falcon-E is a series of powerful language models developed by TII. It offers high performance with relatively low memory footprint, suitable for various NLP tasks.
đ Quick Start
Currently, you can use this model through either the Hugging Face transformers library or the BitNet library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series models, you have three variants: the BitNet model, the prequantized checkpoint for fine - tuning, and the bfloat16
version of the BitNet model.
⨠Features
- Low Memory Footprint: The Falcon - E models, such as Falcon - E - 1B - Base and Falcon - E - 3B - Base, have significantly lower memory footprints compared to some of their counterparts, making them more resource - efficient.
- Multiple Variants: Available in different versions like BitNet, prequantized for fine - tuning, and
bfloat16
for inference.
đĻ Installation
For using the model with the Hugging Face transformers library, you need to have the transformers
library installed. For BitNet, you can follow these steps:
git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
python setup_env.py --hf-repo tiiuae/Falcon-E-3B-Instruct -q i2_s
đģ Usage Examples
Basic Usage
đ¤ transformers
If you want to perform inference on the BitNet checkpoint:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-E-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
).to("cuda")
If you want to use the classic bfloat16
version:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-E-3B-Instruct"
revision = "bfloat16"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
revision=revision,
).to("cuda")
BitNet
python run_inference.py -m models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
Advanced Usage
For fine - tuning the model, you should load the prequantized
revision of the model and use the onebitllms
Python package:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit
model_id = "tiiuae/Falcon-E-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
revision="prequantized"
)
model = replace_linear_with_bitnet_linear(model)
trainer = SFTTrainer(
model,
...
)
trainer.train()
quantize_to_1bit(output_directory)
đ Documentation
Model Details
Property |
Details |
Developed by |
https://www.tii.ae |
Model Type |
Causal decoder - only / Base version |
Architecture |
Pure - transformer - 1.58bit version |
Language(s) (NLP) |
English |
License |
Falcon - LLM License |
Training Details
For more details about the training protocol of this model, please refer to the Falcon - E technical blogpost.
Evaluation
We report in the following table our internal pipeline benchmarks. Note that evaluation results are normalized scores from former Hugging Face leaderboard v2 tasks.
For 1B scale models and below
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math - Hard |
GPQA |
MuSR |
BBH |
MMLU - Pro |
Avg. |
Qwen - 2.5 - 0.5B |
0.5B |
1GB |
16.27 |
3.93 |
0.0 |
2.08 |
6.95 |
10.06 |
6.55 |
SmolLM2 - 360M |
0.36B |
720MB |
21.15 |
1.21 |
0.0 |
7.73 |
5.54 |
1.88 |
6.25 |
Qwen - 2.5 - 1.5B |
1.5B |
3.1GB |
26.74 |
9.14 |
16.66 |
5.27 |
20.61 |
4.7 |
13.85 |
Llama - 3.2 - 1B |
1.24B |
2.47GB |
14.78 |
1.21 |
4.37 |
2.56 |
2.26 |
0 |
4.2 |
SmolLM2 - 1.7B |
1.7B |
3.4GB |
24.4 |
2.64 |
9.3 |
4.6 |
12.64 |
3.91 |
9.58 |
Falcon - 3 - 1B - Base |
1.5B |
3GB |
24.28 |
3.32 |
11.34 |
9.71 |
6.76 |
3.91 |
9.89 |
Hymba - 1.5B - Base |
1.5B |
3GB |
22.95 |
1.36 |
7.69 |
5.18 |
10.25 |
0.78 |
8.04 |
Falcon - E - 1B - Base |
1.8B |
635MB |
32.9 |
10.97 |
2.8 |
3.65 |
12.28 |
17.82 |
13.40 |
For 3B scale models
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math - Hard |
GPQA |
MuSR |
BBH |
MMLU - Pro |
Avg. |
Falcon - 3 - 3B - Base |
3B |
6.46GB |
15.74 |
11.78 |
21.58 |
6.27 |
18.09 |
6.26 |
15.74 |
Qwen2.5 - 3B |
3B |
6.17GB |
26.9 |
14.8 |
24.3 |
11.76 |
24.48 |
6.38 |
18.1 |
Falcon - E - 3B - Base |
3B |
955MB |
36.67 |
13.45 |
8.67 |
4.14 |
19.83 |
27.16 |
18.32 |
Below are the results for instruction fine - tuned models:
For 1B scale models and below
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math - Hard |
GPQA |
MuSR |
BBH |
MMLU - Pro |
Avg. |
Qwen - 2.5 - 0.5B - Instruct |
500M |
1GB |
30.71 |
0 |
8.43 |
0.94 |
7.75 |
0 |
6.59 |
SmolLM2 - 360M - Instruct |
360M |
720MB |
38.42 |
1.51 |
4.17 |
2.77 |
1.3 |
0.67 |
8.14 |
Qwen - 2.5 - 1.5B - Instruct |
1.5B |
3.1GB |
44.76 |
22.05 |
19.81 |
3.19 |
19.99 |
0.78 |
18.43 |
SmolLM2 - 1.7B |
1.7B |
3.4GB |
53.68 |
5.82 |
10.92 |
4.1 |
11.71 |
0 |
15.02 |
Falcon - 3 - 1B - Instruct |
1.5B |
3GB |
55.57 |
6.34 |
12.96 |
10.56 |
9.32 |
2.24 |
16.16 |
Hymba - 1.5B - Instruct |
1.5B |
3GB |
60.09 |
2.72 |
4.59 |
1.05 |
11.56 |
5.515 |
14.19 |
Falcon - E - 1B - Instruct |
1.8B |
635MB |
54.35 |
9.12 |
16.5 |
2.51 |
19.42 |
9.64 |
18.59 |
For 3B scale models
Model |
Nb Params |
Mem Footprint |
IFEVAL |
Math - Hard |
GPQA |
MuSR |
BBH |
MMLU - Pro |
Avg. |
Falcon - 3 - 3B - Instruct |
3B |
6.46GB |
69.77 |
25 |
26.29 |
11.13 |
22.28 |
5.15 |
26.6 |
Qwen2.5 - 3B - Instruct |
3B |
6.17GB |
64.75 |
36.78 |
25.8 |
7.57 |
25.05 |
3.02 |
27.16 |
Falcon - E - 3B - Instruct |
3B |
955MB |
60.97 |
15.3 |
23.59 |
2.12 |
26.45 |
7.45 |
22.64666667 |
Useful links
đ License
This model is licensed under the Falcon - LLM License.
đ Citation
If the Falcon - E family of models were helpful to your work, feel free to give us a cite.
@misc{tiionebitllms,
title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
author = {Falcon-LLM Team},
month = {April},
url = {https://falcon-lm.github.io/blog/falcon-edge},
year = {2025}
}