Falcon-E-3B-Instruct Open Source Language Model - Suitable for Edge Devices, with Powerful Inference and Low Memory Consumption

Falcon E 3B Instruct

Developed by tiiuae

Falcon-E-3B-Instruct is an efficient language model based on a 1.58-bit architecture, optimized for edge devices, with excellent inference capabilities and low memory usage.

Large Language Model

Transformers

Open Source License:Other #1.58-bit quantization #Edge computing optimization #English instruction fine-tuning

Downloads 225

Release Time : 4/16/2025

Model Overview

This model uses a 1.58-bit version of the pure Transformer architecture, supports English, and is suitable for various natural language processing tasks, especially for deployment in resource-constrained environments.

Model Features

Efficient 1.58-bit architecture

Adopts an innovative 1.58-bit quantization technology, significantly reducing memory usage while maintaining model performance

Edge device optimization

Designed specifically for edge computing scenarios, with an ultra-low memory usage of 955MB, making it suitable for resource-constrained environments

Multi-version support

Provides BitNet models, pre-quantized checkpoints, and bfloat16 versions to meet the needs of different usage scenarios

Model Capabilities

Text generation

Instruction following

Question-answering system

Content creation

Use Cases

Intelligent assistant

Dialogue system

An intelligent dialogue assistant deployed on edge devices

Scored 60.97 points in the IFEVAL evaluation

Education

Learning tutoring

Provides real-time learning tutoring and question answering for students

Scored 15.3 points in the math problem evaluation

🚀 Falcon-E Model

Falcon-E is a series of powerful language models developed by TII. It offers high performance with relatively low memory footprint, suitable for various NLP tasks.

🚀 Quick Start

Currently, you can use this model through either the Hugging Face transformers library or the BitNet library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series models, you have three variants: the BitNet model, the prequantized checkpoint for fine - tuning, and the bfloat16 version of the BitNet model.

✨ Features

Low Memory Footprint: The Falcon - E models, such as Falcon - E - 1B - Base and Falcon - E - 3B - Base, have significantly lower memory footprints compared to some of their counterparts, making them more resource - efficient.
Multiple Variants: Available in different versions like BitNet, prequantized for fine - tuning, and bfloat16 for inference.

📦 Installation

For using the model with the Hugging Face transformers library, you need to have the transformers library installed. For BitNet, you can follow these steps:

git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
python setup_env.py --hf-repo tiiuae/Falcon-E-3B-Instruct -q i2_s

💻 Usage Examples

Basic Usage

🤗 transformers

If you want to perform inference on the BitNet checkpoint:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon-E-3B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
).to("cuda")

# Perform text generation

If you want to use the classic bfloat16 version:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon-E-3B-Instruct"
revision = "bfloat16"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
  revision=revision,
).to("cuda")

# Perform text generation

BitNet

python run_inference.py -m models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv

Advanced Usage

For fine - tuning the model, you should load the prequantized revision of the model and use the onebitllms Python package:

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit

model_id = "tiiuae/Falcon-E-3B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    revision="prequantized"
)
model = replace_linear_with_bitnet_linear(model)

trainer = SFTTrainer(
    model,
    ...
)

trainer.train()

quantize_to_1bit(output_directory)

📚 Documentation

Model Details

Property	Details
Developed by	https://www.tii.ae
Model Type	Causal decoder - only / Base version
Architecture	Pure - transformer - 1.58bit version
Language(s) (NLP)	English
License	Falcon - LLM License

Training Details

For more details about the training protocol of this model, please refer to the Falcon - E technical blogpost.

Evaluation

We report in the following table our internal pipeline benchmarks. Note that evaluation results are normalized scores from former Hugging Face leaderboard v2 tasks.

For 1B scale models and below

Model	Nb Params	Mem Footprint	IFEVAL	Math - Hard	GPQA	MuSR	BBH	MMLU - Pro	Avg.
Qwen - 2.5 - 0.5B	0.5B	1GB	16.27	3.93	0.0	2.08	6.95	10.06	6.55
SmolLM2 - 360M	0.36B	720MB	21.15	1.21	0.0	7.73	5.54	1.88	6.25
Qwen - 2.5 - 1.5B	1.5B	3.1GB	26.74	9.14	16.66	5.27	20.61	4.7	13.85
Llama - 3.2 - 1B	1.24B	2.47GB	14.78	1.21	4.37	2.56	2.26	0	4.2
SmolLM2 - 1.7B	1.7B	3.4GB	24.4	2.64	9.3	4.6	12.64	3.91	9.58
Falcon - 3 - 1B - Base	1.5B	3GB	24.28	3.32	11.34	9.71	6.76	3.91	9.89
Hymba - 1.5B - Base	1.5B	3GB	22.95	1.36	7.69	5.18	10.25	0.78	8.04
Falcon - E - 1B - Base	1.8B	635MB	32.9	10.97	2.8	3.65	12.28	17.82	13.40

For 3B scale models

Model	Nb Params	Mem Footprint	IFEVAL	Math - Hard	GPQA	MuSR	BBH	MMLU - Pro	Avg.
Falcon - 3 - 3B - Base	3B	6.46GB	15.74	11.78	21.58	6.27	18.09	6.26	15.74
Qwen2.5 - 3B	3B	6.17GB	26.9	14.8	24.3	11.76	24.48	6.38	18.1
Falcon - E - 3B - Base	3B	955MB	36.67	13.45	8.67	4.14	19.83	27.16	18.32

Below are the results for instruction fine - tuned models:

For 1B scale models and below

Model	Nb Params	Mem Footprint	IFEVAL	Math - Hard	GPQA	MuSR	BBH	MMLU - Pro	Avg.
Qwen - 2.5 - 0.5B - Instruct	500M	1GB	30.71	0	8.43	0.94	7.75	0	6.59
SmolLM2 - 360M - Instruct	360M	720MB	38.42	1.51	4.17	2.77	1.3	0.67	8.14
Qwen - 2.5 - 1.5B - Instruct	1.5B	3.1GB	44.76	22.05	19.81	3.19	19.99	0.78	18.43
SmolLM2 - 1.7B	1.7B	3.4GB	53.68	5.82	10.92	4.1	11.71	0	15.02
Falcon - 3 - 1B - Instruct	1.5B	3GB	55.57	6.34	12.96	10.56	9.32	2.24	16.16
Hymba - 1.5B - Instruct	1.5B	3GB	60.09	2.72	4.59	1.05	11.56	5.515	14.19
Falcon - E - 1B - Instruct	1.8B	635MB	54.35	9.12	16.5	2.51	19.42	9.64	18.59

For 3B scale models

Model	Nb Params	Mem Footprint	IFEVAL	Math - Hard	GPQA	MuSR	BBH	MMLU - Pro	Avg.
Falcon - 3 - 3B - Instruct	3B	6.46GB	69.77	25	26.29	11.13	22.28	5.15	26.6
Qwen2.5 - 3B - Instruct	3B	6.17GB	64.75	36.78	25.8	7.57	25.05	3.02	27.16
Falcon - E - 3B - Instruct	3B	955MB	60.97	15.3	23.59	2.12	26.45	7.45	22.64666667

Useful links

View our release blogpost.
Learn more about onebitllms library.
Feel free to join our discord server if you have any questions or to interact with our researchers and developers.

📄 License

This model is licensed under the Falcon - LLM License.

📚 Citation

If the Falcon - E family of models were helpful to your work, feel free to give us a cite.

@misc{tiionebitllms,
    title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
    author = {Falcon-LLM Team},
    month = {April},
    url = {https://falcon-lm.github.io/blog/falcon-edge},
    year = {2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご