LFM2-350M开源混合模型 - 适用于边缘AI和设备端，训练推理高效

首页

LFM2 350M

由 LiquidAI 开发

LFM2-350M 是由 Liquid AI 开发的混合模型，专为边缘 AI 和设备端部署设计，具有高效训练和推理能力。

大型语言模型

Transformers

支持多种语言开源协议:其他 #边缘AI优化 #多语言混合模型 #高效推理

下载量 1,519

发布时间 : 7/10/2025

模型简介

LFM2-350M 是一种新型混合 Liquid 模型，适用于边缘设备部署，支持快速训练和推理，性能优于同类规模模型。

模型特点

快速训练与推理

训练速度比上一代提升 3 倍，CPU 上的解码和预填充速度比 Qwen3 快 2 倍。

卓越性能

在知识、数学、指令遵循和多语言能力等基准测试中表现优于类似规模模型。

全新架构

采用具有乘法门和短卷积的混合 Liquid 模型架构。

灵活部署

支持在 CPU、GPU 和 NPU 硬件上高效运行，适用于智能手机、笔记本电脑或车辆等设备。

模型能力

文本生成

多语言处理

指令遵循

数学推理

工具调用

使用案例

代理任务

数据提取

从文本中提取结构化数据。

创意写作

故事生成

生成短篇故事或创意文本。

多轮对话

聊天助手

支持多轮对话的聊天机器人。

🚀 LFM2-350M

LFM2 是由 Liquid AI 开发的新一代混合模型，专为边缘 AI 和设备端部署而设计。它在质量、速度和内存效率方面树立了新的标准。

我们发布了三个经过微调的检查点权重，参数分别为 3.5 亿、7 亿和 12 亿。它们为创建人工智能驱动的边缘应用提供了以下关键特性：

快速训练与推理：与上一代相比，LFM2 的训练速度提高了 3 倍。与 Qwen3 相比，它在 CPU 上的解码和预填充速度快了 2 倍。
卓越性能：在多个基准测试类别中，包括知识、数学、指令遵循和多语言能力，LFM2 的表现优于类似规模的模型。
全新架构：LFM2 是一种新的混合 Liquid 模型，具有乘法门和短卷积。
灵活部署：LFM2 可以在 CPU、GPU 和 NPU 硬件上高效运行，可灵活部署在智能手机、笔记本电脑或车辆上。

在我们的博客文章中了解更多关于 LFM2 的信息。

🚀 快速开始

LFM2 可使用 transformers 和 llama.cpp 运行，vLLM 支持即将推出。

1. 使用 `transformers` 运行

要运行 LFM2，你需要从源代码安装 Hugging Face 的 transformers（v4.54.0.dev0）。你可以使用以下命令更新或安装它：

pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"

以下是一个使用 transformers 在 Python 中生成答案的示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载模型和分词器
model_id = "LiquidAI/LFM2-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    #    attn_implementation="flash_attention_2" <- 在兼容的 GPU 上取消注释
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 生成答案
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

# <|startoftext|><|im_start|>user
# What is C. elegans?<|im_end|>
# <|im_start|>assistant
# C. elegans, also known as Caenorhabditis elegans, is a small, free-living
# nematode worm (roundworm) that belongs to the phylum Nematoda.

你可以使用这个 Colab 笔记本直接运行和测试该模型。

2. 使用 `llama.cpp` 运行

你可以使用 GGUF 检查点在 llama.cpp 中运行 LFM2。在模型卡片中查找更多信息。

✨ 主要特性

快速训练与推理：与上一代相比，LFM2 实现了 3 倍的训练速度提升。在 CPU 上，其解码和预填充速度比 Qwen3 快 2 倍。
卓越性能：在多个基准测试类别中，如知识、数学、指令遵循和多语言能力，LFM2 优于类似规模的模型。
全新架构：LFM2 是一种具有乘法门和短卷积的新型混合 Liquid 模型。
灵活部署：LFM2 可在 CPU、GPU 和 NPU 硬件上高效运行，适用于智能手机、笔记本电脑或车辆等不同设备。

📦 安装指南

运行 LFM2 需要从源代码安装 Hugging Face 的 transformers（v4.54.0.dev0），可使用以下命令进行安装：

pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"

💻 使用示例

基础用法

以下是使用 transformers 库调用 LFM2 模型的基础代码示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载模型和分词器
model_id = "LiquidAI/LFM2-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    #    attn_implementation="flash_attention_2" <- 在兼容的 GPU 上取消注释
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 生成答案
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

高级用法

在实际应用中，你可能需要根据不同的场景调整生成参数：

# 高级场景说明：根据不同的任务需求，调整生成参数以获得更好的结果
# 例如，在需要更具创造性的回答时，可以适当提高 temperature 值

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    #    attn_implementation="flash_attention_2" <- 在兼容的 GPU 上取消注释
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "Write a short story about a robot's adventure."
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

# 调整 temperature 值以增加回答的创造性
output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.7,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=1024,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

📚 详细文档

📄 模型详情

由于模型规模较小，我们建议在特定用例上对 LFM2 模型进行微调，以最大化性能。它们特别适用于代理任务、数据提取、RAG、创意写作和多轮对话。然而，我们不建议将它们用于知识密集型任务或需要编程技能的任务。

属性	详情
参数数量	354,483,968
层数	16（10 个卷积层 + 6 个注意力层）
上下文长度	32,768 个标记
词汇表大小	65,536
精度	bfloat16
训练数据量	10 万亿个标记
许可证	LFM 开放许可证 v1.0

支持的语言：英语、阿拉伯语、中文、法语、德语、日语、韩语和西班牙语。

生成参数：我们建议使用以下参数：

temperature=0.3
min_p=0.15
repetition_penalty=1.05

聊天模板：LFM2 使用类似 ChatML 的聊天模板，如下所示：

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant
It's a tiny nematode that lives in temperate soil environments.<|im_end|>

你可以使用 Hugging Face transformers 中的专用 .apply_chat_template() 函数来应用它。

工具使用：它包括四个主要步骤：

函数定义：LFM2 将 JSON 函数定义作为输入（<|tool_list_start|> 和 <|tool_list_end|> 特殊标记之间的 JSON 对象），通常在系统提示中。
函数调用：LFM2 编写 Python 风格的函数调用（<|tool_call_start|> 和 <|tool_call_end|> 特殊标记之间的 Python 列表），作为助手的回答。
函数执行：执行函数调用并返回结果（<|tool_response_start|> 和 <|tool_response_end|> 特殊标记之间的字符串），作为 “工具” 角色。
最终答案：LFM2 解释函数调用的结果，以纯文本形式回答原始用户提示。

以下是一个使用工具的简单对话示例：

<|startoftext|><|im_start|>system
List of tools: <|tool_list_start|>[{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|tool_list_end|><|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
<|tool_response_start|>{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}<|tool_response_end|><|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

架构：具有乘法门和短卷积的混合模型：10 个双门短程 LIV 卷积块和 6 个分组查询注意力（GQA）块。

预训练数据混合：大约 75% 的英语数据、20% 的多语言数据和 5% 的代码数据，来自网络和许可材料。

训练方法：

使用 LFM1-7B 作为教师模型进行知识蒸馏。
在 50% 的下游任务和 50% 的通用领域上进行大规模 SFT。
具有长度归一化和半在线数据集的自定义 DPO。
迭代模型合并。

🔧 如何微调 LFM2

我们建议在你的用例上对 LFM2 模型进行微调，以最大化性能。

笔记本	描述	链接
SFT + LoRA	在 TRL 中使用 LoRA 适配器的监督微调（SFT）笔记本。
DPO	在 TRL 中使用直接偏好优化（DPO）进行偏好对齐。

📈 性能

LFM2 在不同评估类别中优于类似规模的模型。

1. 自动化基准测试

image/png

模型	MMLU	GPQA	IFEval	IFBench	GSM8K	MGSM	MMMLU
LFM2-350M	43.43	27.46	65.12	16.41	30.1	29.52	37.99
LFM2-700M	49.9	28.48	72.23	20.56	46.4	45.36	43.28
LFM2-1.2B	55.23	31.47	74.89	20.7	58.3	55.04	46.73
Qwen3-0.6B	44.93	22.14	64.24	19.75	36.47	41.28	30.84
Qwen3-1.7B	59.11	27.72	73.98	21.27	51.4	66.56	46.51
Llama-3.2-1B-Instruct	46.6	28.84	52.39	16.86	35.71	29.12	38.15
gemma-3-1b-it	40.08	21.07	62.9	17.72	59.59	43.6	34.43