开源Arch-Router-1.5B.gguf模型 - 精准映射查询偏好助力模型路由决策

首页

Arch Router 1.5B.gguf

由 katanemo 开发

Arch-Router是一个1.5B参数的偏好对齐路由框架模型，用于将查询映射到领域-操作偏好以进行模型路由决策。

大型语言模型

Transformers

英语开源协议:其他 #偏好路由 #多模型调度 #领域-操作映射

下载量 220

发布时间 : 5/30/2025

模型简介

该模型是一个紧凑的路由框架，通过学习将查询映射到用户定义的领域和操作类型，为路由决策提供偏好对齐机制。

模型特点

偏好对齐路由

通过领域-操作映射使提示请求与模型优势相匹配

透明可控

路由决策透明且可配置，用户可自定义系统行为

灵活自适应

支持变化的用户需求和模型更新，无需重新训练

生产环境优化

针对多模型环境中的低延迟、高吞吐量应用优化

模型能力

查询路由

领域分类

操作类型识别

模型选择

使用案例

多模型路由

编程问题路由

将编程相关问题路由到最适合的模型

准确识别代码生成、错误修复等操作类型

领域特定路由

根据领域(如法律、医疗)选择专业模型

提高领域特定任务的响应质量

🚀 katanemo/Arch-Router-1.5B

本项目引入了一个偏好对齐的路由框架，通过将查询与用户定义的领域（如旅行）或操作类型（如图像编辑）相匹配来指导模型选择，为在路由决策中编码偏好提供了实用机制。具体而言，引入了 Arch-Router 这一紧凑的 15 亿参数模型，它学习将查询映射到领域 - 操作偏好，以进行模型路由决策。

🔍 信息表格

属性	详情
基础模型	Qwen/Qwen2.5 - 1.5B - Instruct
语言	en
任务类型	文本生成
库名称	transformers
许可证	Katanemo license

🚀 快速开始

随着大语言模型（LLM）的迅速普及——每个模型都针对不同的优势、风格或延迟/成本配置进行了优化——路由已成为实现不同模型实际应用的关键技术。然而，现有的大语言模型路由方法在两个关键方面存在局限性：它们使用的基准测试往往无法捕捉由主观评估标准驱动的人类偏好，并且通常从有限的模型池中进行选择。

我们引入了一个偏好对齐的路由框架，通过将查询与用户定义的领域（如旅行）或操作类型（如图像编辑）相匹配来指导模型选择，为在路由决策中编码偏好提供了实用机制。具体而言，我们引入了 Arch - Router，这是一个紧凑的 15 亿参数模型，它学习将查询映射到领域 - 操作偏好，以进行模型路由决策。在对话数据集上的实验表明，我们的方法在将查询与人类偏好相匹配方面取得了最先进（SOTA）的结果，优于顶级专有模型。

该模型在论文 https://arxiv.org/abs/2506.16655 中有所描述，并为 Arch 提供支持，Arch 是一个开源的 AI 原生代理代理，可无缝实现基于偏好的路由。

🔧 工作原理

为了支持有效的路由，Arch - Router 引入了两个关键概念：

领域 – 请求的高级主题类别或主题内容（例如，法律、医疗保健、编程）。
操作 – 用户希望执行的特定操作类型（例如，总结、代码生成、预约、翻译）。

领域和操作配置都与首选模型或模型变体相关联。在推理时，Arch - Router 会分析传入的提示，使用语义相似度、任务指标和上下文线索来推断其领域和操作。然后，它会应用用户定义的路由偏好，选择最适合处理请求的模型。

✨ 主要特性

结构化偏好路由：使用显式的领域 - 操作映射，使提示请求与模型优势相匹配。
透明且可控：使路由决策透明且可配置，使用户能够自定义系统行为。
灵活且自适应：支持不断变化的用户需求、模型更新以及新的领域/操作，无需重新训练路由器。
适用于生产环境：针对多模型环境中的低延迟、高吞吐量应用进行了优化。

📦 安装指南

Arch - Router - 1.5B 的代码已集成在 Hugging Face 的 transformers 库中，建议安装最新版本：

pip install transformers>=4.37.0

💻 使用示例

基础用法

import json
from typing import Any, Dict, List
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "katanemo/Arch-Router-1.5B"
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Please use our provided prompt for best performance
TASK_INSTRUCTION = """
You are a helpful assistant designed to find the best suited route.
You are provided with route description within <routes></routes> XML tags:
<routes>
\n{routes}\n
</routes>

<conversation>
\n{conversation}\n
</conversation>
"""

FORMAT_PROMPT = """
Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags.  Follow the instruction:
1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}.
2. You must analyze the route descriptions and find the best match route for user latest intent. 
3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>.

Based on your analysis, provide your response in the following JSON formats if you decide to match any route:
{"route": "route_name"} 
"""


# Define route config
route_config = [
    {
        "name": "code_generation",
        "description": "Generating new code snippets, functions, or boilerplate based on user prompts or requirements",
    },
    {
        "name": "bug_fixing",
        "description": "Identifying and fixing errors or bugs in the provided code across different programming languages",
    },
    {
        "name": "performance_optimization",
        "description": "Suggesting improvements to make code more efficient, readable, or scalable",
    },
    {
        "name": "api_help",
        "description": "Assisting with understanding or integrating external APIs and libraries",
    },
    {
        "name": "programming",
        "description": "Answering general programming questions, theory, or best practices",
    },
]


# Helper function to create the system prompt for our model
def format_prompt(
    route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]]
):
    return (
        TASK_INSTRUCTION.format(
            routes=json.dumps(route_config), conversation=json.dumps(conversation)
        )
        + FORMAT_PROMPT
    )


# Define conversations

conversation = [
    {
        "role": "user",
        "content": "fix this module 'torch.utils._pytree' has no attribute 'register_pytree_node'. did you mean: '_register_pytree_node'?",
    }
]

route_prompt = format_prompt(route_config, conversation)

messages = [
    {"role": "user", "content": route_prompt},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

# 2. Generate
generated_ids = model.generate(
    input_ids=input_ids,  # or just positional: model.generate(input_ids, …)
    max_new_tokens=32768,
)

# 3. Strip the prompt from each sequence
prompt_lengths = input_ids.shape[1]  # same length for every row here
generated_only = [
    output_ids[prompt_lengths:]  # slice off the prompt tokens
    for output_ids in generated_ids
]

# 4. Decode if you want text
response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0]
print(response)