開源Arch-Router-1.5B.gguf模型 - 精準映射查詢偏好助力模型路由決策

首頁

Arch Router 1.5B.gguf

由katanemo開發

Arch-Router是一個1.5B參數的偏好對齊路由框架模型，用於將查詢映射到領域-操作偏好以進行模型路由決策。

大型語言模型

Transformers

英語開源協議:其他 #偏好路由 #多模型調度 #領域-操作映射

下載量 220

發布時間 : 5/30/2025

模型概述

該模型是一個緊湊的路由框架，通過學習將查詢映射到用戶定義的領域和操作類型，為路由決策提供偏好對齊機制。

模型特點

偏好對齊路由

通過領域-操作映射使提示請求與模型優勢相匹配

透明可控

路由決策透明且可配置，用戶可自定義系統行為

靈活自適應

支持變化的用戶需求和模型更新，無需重新訓練

生產環境優化

針對多模型環境中的低延遲、高吞吐量應用優化

模型能力

查詢路由

領域分類

操作類型識別

模型選擇

使用案例

多模型路由

編程問題路由

將編程相關問題路由到最適合的模型

準確識別代碼生成、錯誤修復等操作類型

領域特定路由

根據領域(如法律、醫療)選擇專業模型

提高領域特定任務的響應質量

🚀 katanemo/Arch-Router-1.5B

本項目引入了一個偏好對齊的路由框架，通過將查詢與用戶定義的領域（如旅行）或操作類型（如圖像編輯）相匹配來指導模型選擇，為在路由決策中編碼偏好提供了實用機制。具體而言，引入了 Arch-Router 這一緊湊的 15 億參數模型，它學習將查詢映射到領域 - 操作偏好，以進行模型路由決策。

🔍 信息表格

屬性	詳情
基礎模型	Qwen/Qwen2.5 - 1.5B - Instruct
語言	en
任務類型	文本生成
庫名稱	transformers
許可證	Katanemo license

🚀 快速開始

隨著大語言模型（LLM）的迅速普及——每個模型都針對不同的優勢、風格或延遲/成本配置進行了優化——路由已成為實現不同模型實際應用的關鍵技術。然而，現有的大語言模型路由方法在兩個關鍵方面存在侷限性：它們使用的基準測試往往無法捕捉由主觀評估標準驅動的人類偏好，並且通常從有限的模型池中進行選擇。

我們引入了一個偏好對齊的路由框架，通過將查詢與用戶定義的領域（如旅行）或操作類型（如圖像編輯）相匹配來指導模型選擇，為在路由決策中編碼偏好提供了實用機制。具體而言，我們引入了 Arch - Router，這是一個緊湊的 15 億參數模型，它學習將查詢映射到領域 - 操作偏好，以進行模型路由決策。在對話數據集上的實驗表明，我們的方法在將查詢與人類偏好相匹配方面取得了最先進（SOTA）的結果，優於頂級專有模型。

該模型在論文 https://arxiv.org/abs/2506.16655 中有所描述，併為 Arch 提供支持，Arch 是一個開源的 AI 原生代理代理，可無縫實現基於偏好的路由。

🔧 工作原理

為了支持有效的路由，Arch - Router 引入了兩個關鍵概念：

領域 – 請求的高級主題類別或主題內容（例如，法律、醫療保健、編程）。
操作 – 用戶希望執行的特定操作類型（例如，總結、代碼生成、預約、翻譯）。

領域和操作配置都與首選模型或模型變體相關聯。在推理時，Arch - Router 會分析傳入的提示，使用語義相似度、任務指標和上下文線索來推斷其領域和操作。然後，它會應用用戶定義的路由偏好，選擇最適合處理請求的模型。

✨ 主要特性

結構化偏好路由：使用顯式的領域 - 操作映射，使提示請求與模型優勢相匹配。
透明且可控：使路由決策透明且可配置，使用戶能夠自定義系統行為。
靈活且自適應：支持不斷變化的用戶需求、模型更新以及新的領域/操作，無需重新訓練路由器。
適用於生產環境：針對多模型環境中的低延遲、高吞吐量應用進行了優化。

📦 安裝指南

Arch - Router - 1.5B 的代碼已集成在 Hugging Face 的 transformers 庫中，建議安裝最新版本：

pip install transformers>=4.37.0

💻 使用示例

基礎用法

import json
from typing import Any, Dict, List
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "katanemo/Arch-Router-1.5B"
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Please use our provided prompt for best performance
TASK_INSTRUCTION = """
You are a helpful assistant designed to find the best suited route.
You are provided with route description within <routes></routes> XML tags:
<routes>
\n{routes}\n
</routes>

<conversation>
\n{conversation}\n
</conversation>
"""

FORMAT_PROMPT = """
Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags.  Follow the instruction:
1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}.
2. You must analyze the route descriptions and find the best match route for user latest intent. 
3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>.

Based on your analysis, provide your response in the following JSON formats if you decide to match any route:
{"route": "route_name"} 
"""


# Define route config
route_config = [
    {
        "name": "code_generation",
        "description": "Generating new code snippets, functions, or boilerplate based on user prompts or requirements",
    },
    {
        "name": "bug_fixing",
        "description": "Identifying and fixing errors or bugs in the provided code across different programming languages",
    },
    {
        "name": "performance_optimization",
        "description": "Suggesting improvements to make code more efficient, readable, or scalable",
    },
    {
        "name": "api_help",
        "description": "Assisting with understanding or integrating external APIs and libraries",
    },
    {
        "name": "programming",
        "description": "Answering general programming questions, theory, or best practices",
    },
]


# Helper function to create the system prompt for our model
def format_prompt(
    route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]]
):
    return (
        TASK_INSTRUCTION.format(
            routes=json.dumps(route_config), conversation=json.dumps(conversation)
        )
        + FORMAT_PROMPT
    )


# Define conversations

conversation = [
    {
        "role": "user",
        "content": "fix this module 'torch.utils._pytree' has no attribute 'register_pytree_node'. did you mean: '_register_pytree_node'?",
    }
]

route_prompt = format_prompt(route_config, conversation)

messages = [
    {"role": "user", "content": route_prompt},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

# 2. Generate
generated_ids = model.generate(
    input_ids=input_ids,  # or just positional: model.generate(input_ids, …)
    max_new_tokens=32768,
)

# 3. Strip the prompt from each sequence
prompt_lengths = input_ids.shape[1]  # same length for every row here
generated_only = [
    output_ids[prompt_lengths:]  # slice off the prompt tokens
    for output_ids in generated_ids
]

# 4. Decode if you want text
response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0]
print(response)