Mistral-7B-Instruct-v0.3開源大語言模型 - 支持函數調用與擴展詞彙表

首頁

Lake 1 Advanced

由BICORP開發

Mistral-7B-Instruct-v0.3是基於Mistral-7B-v0.3進行指令微調的大語言模型，支持函數調用和擴展詞彙表。

大型語言模型

Safetensors

開源協議:MIT #指令微調 #函數調用支持 #多輪對話優化

下載量 62

發布時間 : 1/28/2025

模型概述

這是一個指令微調的大語言模型，專門優化了對話和指令跟隨能力，適用於多種文本生成任務。

模型特點

擴展詞彙表

詞彙表擴展至32768，提高了模型處理複雜文本的能力。

支持v3分詞器

使用新版分詞器，提升了文本處理的效率和準確性。

函數調用支持

模型支持函數調用，可以集成到更復雜的應用中。

模型能力

文本生成

指令跟隨

對話系統

函數調用

使用案例

對話系統

海盜聊天機器人

模型可以模擬特定角色（如海盜）進行對話。

生成符合角色設定的自然語言回覆。

工具集成

天氣查詢

模型可以調用天氣查詢函數，獲取即時天氣信息。

生成包含天氣數據的自然語言回覆。

🚀 Mistral-7B-Instruct-v0.3模型卡片

Mistral-7B-Instruct-v0.3大語言模型（LLM）是Mistral-7B-v0.3的指令微調版本。

與Mistral-7B-v0.2相比，Mistral-7B-v0.3有以下變化：

詞彙表擴展到32768
支持v3分詞器
支持函數調用

🚀 快速開始

安裝

建議將mistralai/Mistral-7B-Instruct-v0.3與mistral-inference一起使用。有關HF transformers的代碼片段，請繼續向下滾動。

pip install mistral_inference

下載

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)

聊天

安裝mistral_inference後，你的環境中應該會有一個mistral-chat CLI命令。你可以使用以下命令與模型進行聊天：

mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256

指令跟隨

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

函數調用

from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

使用`transformers`生成文本

如果你想使用Hugging Face的transformers庫來生成文本，可以這樣做：

from transformers import pipeline

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
chatbot(messages)

使用`transformers`進行函數調用

要使用此示例，你需要transformers版本4.42.0或更高版本。有關更多信息，請參閱transformers文檔中的函數調用指南。

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mistralai/Mistral-7B-Instruct-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)

def get_current_weather(location: str, format: str):
    """
    Get the current weather

    Args:
        location: The city and state, e.g. San Francisco, CA
        format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"])
    """
    pass

conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
tools = [get_current_weather]


# format and tokenize the tool use prompt 
inputs = tokenizer.apply_chat_template(
            conversation,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            return_tensors="pt",
)

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

inputs.to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

請注意，由於篇幅原因，此示例未展示調用工具並將工具調用和工具結果添加到聊天曆史記錄的完整循環，以便模型在下次生成時可以使用它們。有關完整的工具調用示例，請參閱函數調用指南，並注意Mistral 確實使用工具調用ID，因此這些ID必須包含在你的工具調用和工具結果中。它們應該正好是9個字母數字字符。

🔧 技術細節

Mistral 7B指令模型是一個快速演示，表明基礎模型可以很容易地進行微調以實現出色的性能。它沒有任何審核機制。我們期待與社區合作，探討如何使模型更好地遵守規則，以便在需要審核輸出的環境中進行部署。

📄 許可證

本項目採用MIT許可證。

團隊成員

Albert Jiang、Alexandre Sablayrolles、Alexis Tacnet、Antoine Roux、Arthur Mensch、Audrey Herblin - Stoop、Baptiste Bout、Baudouin de Monicault、Blanche Savary、Bam4d、Caroline Feldman、Devendra Singh Chaplot、Diego de las Casas、Eleonore Arcelin、Emma Bou Hanna、Etienne Metzger、Gianna Lengyel、Guillaume Bour、Guillaume Lample、Harizo Rajaona、Jean - Malo Delignon、Jia Li、Justus Murke、Louis Martin、Louis Ternon、Lucile Saulnier、Lélio Renard Lavaud、Margaret Jennings、Marie Pellat、Marie Torelli、Marie - Anne Lachaux、Nicolas Schuhl、Patrick von Platen、Pierre Stock、Sandeep Subramanian、Sophia Yang、Szymon Antoniak、Teven Le Scao、Thibaut Lavril、Timothée Lacroix、Théophile Gervet、Thomas Wang、Valera Nemychnikova、William El Sayed、William Marshall