LLaDA-8B-Tools开源AI模型 - 免费实现工具调用JSON生成与响应数据处理

首页

Llada 8B Tools

由 Proximile 开发

LLaDA-8B-工具版是基于LLaDA-8B-Instruct模型的变体，专为增强工具调用能力而微调，支持生成工具调用JSON、处理工具响应数据等功能。

大型语言模型

Transformers

英语开源协议:MIT #工具调用优化 #扩散模型生成 #LoRA微调

下载量 78

发布时间 : 5/14/2025

模型简介

此模型由Proximile LLC微调，专注于提升LLaDA模型在工具调用任务中的表现，包括生成正确的工具调用JSON、处理工具响应数据以及根据工具输出提供有用的答案。

模型特点

工具调用能力增强

模型经过微调，能够生成正确的工具调用JSON，并处理工具响应数据。

基于扩散的文本生成

使用离散扩散进行文本生成，通过迭代去噪过程逐步生成文本。

LoRA微调

使用LoRA（低秩适应）进行监督微调，提升模型在特定任务上的表现。

模型能力

文本生成

工具调用

处理工具响应数据

生成JSON格式的工具调用请求

使用案例

聊天机器人

天气查询工具调用

模型可以生成天气查询的工具调用JSON，并根据返回的天气数据生成用户友好的响应。

生成包含温度、湿度等详细天气信息的响应。

助手

工具调用助手

模型可以作为助手，根据用户需求调用工具并返回处理结果。

生成工具调用请求并解析工具响应。

🚀 LLaDA-8B-Tools

本仓库包含一个基于 GSAI-ML/LLaDA-8B-Instruct 模型的变体，由 Proximile LLC 进行微调，以增强其工具调用能力。Proximile 专注于为中小型企业提供安全的本地部署 AI 解决方案。

📅 更新时间线

2025年5月14日 – 首次公开发布。训练示例中缺少填充生成窗口其余部分的填充标记。
2025年5月17日 – 修补训练脚本以包含正确的填充；将更新后的模型权重推送到此仓库。

✨ 关于LLaDA

LLaDA（Large Language Diffusion with mAsking）是一种新颖的语言模型架构，它使用离散扩散进行文本生成。与传统的自回归模型不同，LLaDA 通过迭代去噪过程生成文本，根据置信度分数逐步用预测标记替换掩码标记。

📚 模型描述

这个合并的 LoRA 模型经过训练，以提高 LLaDA 处理工具调用任务的能力，包括：

为工具调用生成合适的 JSON
处理工具响应数据
根据工具输出提供有用的答案

训练详情

基础模型：GSAI-ML/LLaDA-8B-Instruct
训练方法：使用 LoRA 的监督微调（SFT）
LoRA 配置：
- 秩（r）：128
- Alpha：256
- 目标模块：q_proj、k_proj、v_proj、gate_proj
训练数据：ToolACE 数据集的修改子集。

属性	详情
模型类型	合并的 LoRA 模型
训练数据	ToolACE 数据集的修改子集

📦 安装指南

pip install transformers peft torch bitsandbytes

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

# 加载基础模型和分词器
model_name = "Proximile/LLaDA-8B-Tools"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, device_map="auto")

高级用法

以下是一个使用该模型进行带有工具调用的聊天完成的完整示例：

import torch
import json
from transformers import AutoTokenizer, AutoModel

# 常量
MASK_TOKEN_ID = 126336

def add_gumbel_noise(logits, temperature):
    '''
    The Gumbel max is a method for sampling categorical distributions.
    For diffusion models, low-precision Gumbel Max affects generation quality.
    '''
    if temperature <= 0:
        return logits
        
    logits = logits.to(torch.float64)
    noise = torch.rand_like(logits, dtype=torch.float64)
    gumbel_noise = (- torch.log(noise)) ** temperature
    return logits.exp() / gumbel_noise

def get_num_transfer_tokens(mask_index, steps):
    '''
    In the reverse process, we precompute the number of tokens to transition at each step.
    '''
    mask_num = mask_index.sum(dim=1, keepdim=True)
    
    # Ensure we have at least one step
    if steps == 0:
        steps = 1
        
    base = mask_num // steps
    remainder = mask_num % steps
    
    num_transfer_tokens = torch.zeros(mask_num.size(0), steps, device=mask_index.device, dtype=torch.int64) + base
    
    for i in range(mask_num.size(0)):
        if remainder[i] > 0:
            num_transfer_tokens[i, :remainder[i]] += 1
            
    return num_transfer_tokens

def generate(model, prompt, steps=128, gen_length=128, block_length=32, temperature=0.,
             remasking='low_confidence', mask_id=MASK_TOKEN_ID):
    '''
    Generate text using LLaDA's diffusion-based generation process.
    '''
    device = next(model.parameters()).device
    prompt = prompt.to(device)
    
    x = torch.full((1, prompt.shape[1] + gen_length), mask_id, dtype=torch.long).to(device)
    x[:, :prompt.shape[1]] = prompt.clone()
    
    prompt_index = (x != mask_id)
    
    assert gen_length % block_length == 0
    num_blocks = gen_length // block_length
    
    assert steps % num_blocks == 0
    steps_per_block = steps // num_blocks
    
    for num_block in range(num_blocks):
        block_mask_index = (x[:, prompt.shape[1] + num_block * block_length: prompt.shape[1] + (num_block + 1) * block_length:] == mask_id)
        num_transfer_tokens = get_num_transfer_tokens(block_mask_index, steps_per_block)
        
        for i in range(steps_per_block):
            mask_index = (x == mask_id)
            if not mask_index.any():
                break
                
            outputs = model(x)
            logits = outputs.logits
            
            logits_with_noise = add_gumbel_noise(logits, temperature=temperature)
            x0 = torch.argmax(logits_with_noise, dim=-1)  # b, l
            
            if remasking == 'low_confidence':
                p = torch.nn.functional.softmax(logits.to(torch.float64), dim=-1)
                x0_p = torch.squeeze(
                    torch.gather(p, dim=-1, index=torch.unsqueeze(x0, -1)), -1)  # b, l
            elif remasking == 'random':
                x0_p = torch.rand((x0.shape[0], x0.shape[1]), device=x0.device)
            else:
                raise NotImplementedError(remasking)
            
            x0_p[:, prompt.shape[1] + (num_block + 1) * block_length:] = -float('inf')
            
            x0 = torch.where(mask_index, x0, x)
            confidence = torch.where(mask_index, x0_p, -float('inf'))
            
            transfer_index = torch.zeros_like(x0, dtype=torch.bool, device=x0.device)
            for j in range(confidence.shape[0]):
                _, select_index = torch.topk(confidence[j], k=num_transfer_tokens[j, i])
                transfer_index[j, select_index] = True
            x[transfer_index] = x0[transfer_index]
    
    return x

def chat_completion(model, tokenizer, messages, temperature=0.1, gen_length=128, steps=128):
    """
    Generate a chat completion.
    
    Args:
        model: The LLaDA tool calling model
        tokenizer: The tokenizer
        messages: List of message dictionaries with 'role' and 'content' keys
        temperature: Temperature for generation (0 for greedy)
        gen_length: Maximum length of generated text
        steps: Number of denoising steps
        
    Returns:
        The generated response text
    """
    # Format input for the model
    formatted_input = tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
    )
    
    # Tokenize input
    input_ids = tokenizer(formatted_input, return_tensors="pt")["input_ids"]
    
    # Generate response
    with torch.no_grad():
        output_ids = generate(
            model, 
            input_ids, 
            steps=steps,
            gen_length=gen_length,
            block_length=32,
            temperature=temperature,
            remasking='low_confidence'
        )
    
    # Decode the generated output
    generated_text = tokenizer.decode(output_ids[0, input_ids.shape[1]:], skip_special_tokens=False).split("<|")[0]
    return generated_text

# Example usage
if __name__ == "__main__":
    # Load the base model and tokenizer
    model_name = "Proximile/LLaDA-8B-Tools"
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    model = AutoModel.from_pretrained(model_name, trust_remote_code=True, device_map="auto")
    
    # Define tool calling function schema
    tool_schema = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The unit of temperature"
                        }
                    },
                    "required": ["location", "unit"]
                }
            }
        }
    ]
    
    # Create conversation with system prompt including tool description
    system_prompt = """You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the orginal user question.

If you choose to use one or more of the following tool functions, respond with a list of JSON function calls, each with the proper arguments that best answers the given prompt.

Each tool request within the list should be in the exact format {"name": function name, "parameters": {dictionary of argument names and values}}. Do not use variables. Just a list of two-key dictionaries, each starting with the function name, followed by a dictionary of parameters.

Here are the tool functions available to you:

""" + json.dumps(tool_schema, indent=4) + """

After receiving the results back from a function call, you have to formulate your response to the user. If the information needed is not found in the returned data, either attempt a new function call, or inform the user that you cannot answer based on your available knowledge. The user cannot see the function results. You have to interpret the data and provide a response based on it.

If the user request does not necessitate a function call, simply respond to the user's query directly."""
    
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What's the weather like in New York?"}
    ]
    
    # Generate assistant response (expecting tool call)
    assistant_response = chat_completion(model, tokenizer, messages)
    print(f"Assistant: {assistant_response}")
    
    # Mock tool response
    tool_response = json.dumps({
        "location": "New York, NY",
        "temperature": 72,
        "unit": "fahrenheit",
        "condition": "Partly Cloudy",
        "humidity": 65,
        "wind_speed": 8,
        "wind_direction": "NE"
    })
    
    # Add assistant and tool responses to the conversation
    messages.append({"role": "assistant", "content": assistant_response})
    messages.append({"role": "ipython", "content": tool_response})
    
    # Generate final assistant response
    final_response = chat_completion(model, tokenizer, messages)
    print(f"Assistant (with tool data): {final_response}")

# Assistant: [{"name": "get_weather", "parameters": {"location": "New York", "unit": "fahrenheit"}}]
# Assistant (with tool data): The current weather in New York is as follows:
# - Temperature: 72°F
# - Weather Condition: Partly Cloudy
# - Humidity: 65%
# - Wind Speed: 8 miles per hour
# - Wind Direction: Northeast

⚠️ 局限性

⚠️ 重要提示

LLaDA 基于扩散的生成方式与标准大语言模型不同，在某些上下文中可能表现不同。

模型仍可能产生幻觉或生成不正确的工具调用格式。

工具调用的格式必须与示例中所示的格式（这是官方 llama 3.1 格式的修改版本）完全匹配。

📖 引用

如果您在研究中使用此模型，请同时引用原始的 LLaDA 论文和此适配器：

@misc{llada-8b-tools,
  author = {Proximile LLC},
  title = {LLaDA-8B-Tools},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Proximile/LLaDA-8B-Tools}}
}