LFM2-350M開源混合模型 - 適用於邊緣AI和設備端，訓練推理高效

首頁

LFM2 350M

由LiquidAI開發

LFM2-350M 是由 Liquid AI 開發的混合模型，專為邊緣 AI 和設備端部署設計，具有高效訓練和推理能力。

大型語言模型

Transformers

支持多種語言開源協議:其他 #邊緣AI優化 #多語言混合模型 #高效推理

下載量 1,519

發布時間 : 7/10/2025

模型概述

LFM2-350M 是一種新型混合 Liquid 模型，適用於邊緣設備部署，支持快速訓練和推理，性能優於同類規模模型。

模型特點

快速訓練與推理

訓練速度比上一代提升 3 倍，CPU 上的解碼和預填充速度比 Qwen3 快 2 倍。

卓越性能

在知識、數學、指令遵循和多語言能力等基準測試中表現優於類似規模模型。

全新架構

採用具有乘法門和短卷積的混合 Liquid 模型架構。

靈活部署

支持在 CPU、GPU 和 NPU 硬件上高效運行，適用於智能手機、筆記本電腦或車輛等設備。

模型能力

文本生成

多語言處理

指令遵循

數學推理

工具調用

使用案例

代理任務

數據提取

從文本中提取結構化數據。

創意寫作

故事生成

生成短篇故事或創意文本。

多輪對話

聊天助手

支持多輪對話的聊天機器人。

🚀 LFM2-350M

LFM2 是由 Liquid AI 開發的新一代混合模型，專為邊緣 AI 和設備端部署而設計。它在質量、速度和內存效率方面樹立了新的標準。

我們發佈了三個經過微調的檢查點權重，參數分別為 3.5 億、7 億和 12 億。它們為創建人工智能驅動的邊緣應用提供了以下關鍵特性：

快速訓練與推理：與上一代相比，LFM2 的訓練速度提高了 3 倍。與 Qwen3 相比，它在 CPU 上的解碼和預填充速度快了 2 倍。
卓越性能：在多個基準測試類別中，包括知識、數學、指令遵循和多語言能力，LFM2 的表現優於類似規模的模型。
全新架構：LFM2 是一種新的混合 Liquid 模型，具有乘法門和短卷積。
靈活部署：LFM2 可以在 CPU、GPU 和 NPU 硬件上高效運行，可靈活部署在智能手機、筆記本電腦或車輛上。

在我們的博客文章中瞭解更多關於 LFM2 的信息。

🚀 快速開始

LFM2 可使用 transformers 和 llama.cpp 運行，vLLM 支持即將推出。

1. 使用 `transformers` 運行

要運行 LFM2，你需要從源代碼安裝 Hugging Face 的 transformers（v4.54.0.dev0）。你可以使用以下命令更新或安裝它：

pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"

以下是一個使用 transformers 在 Python 中生成答案的示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加載模型和分詞器
model_id = "LiquidAI/LFM2-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    #    attn_implementation="flash_attention_2" <- 在兼容的 GPU 上取消註釋
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 生成答案
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

# <|startoftext|><|im_start|>user
# What is C. elegans?<|im_end|>
# <|im_start|>assistant
# C. elegans, also known as Caenorhabditis elegans, is a small, free-living
# nematode worm (roundworm) that belongs to the phylum Nematoda.

你可以使用這個 Colab 筆記本直接運行和測試該模型。

2. 使用 `llama.cpp` 運行

你可以使用 GGUF 檢查點在 llama.cpp 中運行 LFM2。在模型卡片中查找更多信息。

✨ 主要特性

快速訓練與推理：與上一代相比，LFM2 實現了 3 倍的訓練速度提升。在 CPU 上，其解碼和預填充速度比 Qwen3 快 2 倍。
卓越性能：在多個基準測試類別中，如知識、數學、指令遵循和多語言能力，LFM2 優於類似規模的模型。
全新架構：LFM2 是一種具有乘法門和短卷積的新型混合 Liquid 模型。
靈活部署：LFM2 可在 CPU、GPU 和 NPU 硬件上高效運行，適用於智能手機、筆記本電腦或車輛等不同設備。

📦 安裝指南

運行 LFM2 需要從源代碼安裝 Hugging Face 的 transformers（v4.54.0.dev0），可使用以下命令進行安裝：

pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"

💻 使用示例

基礎用法

以下是使用 transformers 庫調用 LFM2 模型的基礎代碼示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加載模型和分詞器
model_id = "LiquidAI/LFM2-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    #    attn_implementation="flash_attention_2" <- 在兼容的 GPU 上取消註釋
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 生成答案
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

高級用法

在實際應用中，你可能需要根據不同的場景調整生成參數：

# 高級場景說明：根據不同的任務需求，調整生成參數以獲得更好的結果
# 例如，在需要更具創造性的回答時，可以適當提高 temperature 值

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    #    attn_implementation="flash_attention_2" <- 在兼容的 GPU 上取消註釋
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "Write a short story about a robot's adventure."
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

# 調整 temperature 值以增加回答的創造性
output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.7,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=1024,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

📚 詳細文檔

📄 模型詳情

由於模型規模較小，我們建議在特定用例上對 LFM2 模型進行微調，以最大化性能。它們特別適用於代理任務、數據提取、RAG、創意寫作和多輪對話。然而，我們不建議將它們用於知識密集型任務或需要編程技能的任務。

屬性	詳情
參數數量	354,483,968
層數	16（10 個卷積層 + 6 個注意力層）
上下文長度	32,768 個標記
詞彙表大小	65,536
精度	bfloat16
訓練數據量	10 萬億個標記
許可證	LFM 開放許可證 v1.0

支持的語言：英語、阿拉伯語、中文、法語、德語、日語、韓語和西班牙語。

生成參數：我們建議使用以下參數：

temperature=0.3
min_p=0.15
repetition_penalty=1.05

聊天模板：LFM2 使用類似 ChatML 的聊天模板，如下所示：

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant
It's a tiny nematode that lives in temperate soil environments.<|im_end|>

你可以使用 Hugging Face transformers 中的專用 .apply_chat_template() 函數來應用它。

工具使用：它包括四個主要步驟：

函數定義：LFM2 將 JSON 函數定義作為輸入（<|tool_list_start|> 和 <|tool_list_end|> 特殊標記之間的 JSON 對象），通常在系統提示中。
函數調用：LFM2 編寫 Python 風格的函數調用（<|tool_call_start|> 和 <|tool_call_end|> 特殊標記之間的 Python 列表），作為助手的回答。
函數執行：執行函數調用並返回結果（<|tool_response_start|> 和 <|tool_response_end|> 特殊標記之間的字符串），作為 “工具” 角色。
最終答案：LFM2 解釋函數調用的結果，以純文本形式回答原始用戶提示。

以下是一個使用工具的簡單對話示例：

<|startoftext|><|im_start|>system
List of tools: <|tool_list_start|>[{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|tool_list_end|><|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
<|tool_response_start|>{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}<|tool_response_end|><|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

架構：具有乘法門和短卷積的混合模型：10 個雙門短程 LIV 卷積塊和 6 個分組查詢注意力（GQA）塊。

預訓練數據混合：大約 75% 的英語數據、20% 的多語言數據和 5% 的代碼數據，來自網絡和許可材料。

訓練方法：

使用 LFM1-7B 作為教師模型進行知識蒸餾。
在 50% 的下游任務和 50% 的通用領域上進行大規模 SFT。
具有長度歸一化和半在線數據集的自定義 DPO。
迭代模型合併。

🔧 如何微調 LFM2

我們建議在你的用例上對 LFM2 模型進行微調，以最大化性能。

筆記本	描述	鏈接
SFT + LoRA	在 TRL 中使用 LoRA 適配器的監督微調（SFT）筆記本。
DPO	在 TRL 中使用直接偏好優化（DPO）進行偏好對齊。

📈 性能

LFM2 在不同評估類別中優於類似規模的模型。

1. 自動化基準測試

image/png

模型	MMLU	GPQA	IFEval	IFBench	GSM8K	MGSM	MMMLU
LFM2-350M	43.43	27.46	65.12	16.41	30.1	29.52	37.99
LFM2-700M	49.9	28.48	72.23	20.56	46.4	45.36	43.28
LFM2-1.2B	55.23	31.47	74.89	20.7	58.3	55.04	46.73
Qwen3-0.6B	44.93	22.14	64.24	19.75	36.47	41.28	30.84
Qwen3-1.7B	59.11	27.72	73.98	21.27	51.4	66.56	46.51
Llama-3.2-1B-Instruct	46.6	28.84	52.39	16.86	35.71	29.12	38.15
gemma-3-1b-it	40.08	21.07	62.9	17.72	59.59	43.6	34.43