nsql-llama-2-7B開源SQL生成模型 - 免費部署，自然語言秒變SQL查詢

首頁

Nsql Llama 2 7B

由NumbersStation開發

NSQL-Llama-2-7B是基於Llama-2 7B模型微調而成的SQL生成模型，專為將自然語言轉換為SQL查詢而設計。

大型語言模型

Transformers

#文本轉SQL生成 #Llama2微調 #數據庫查詢優化

下載量 223

發布時間 : 7/31/2023

模型概述

該模型是NSQL系列的一部分，專門用於根據給定的表結構和自然語言提示生成SQL查詢語句，主要輸出SELECT查詢。

模型特點

專為SQL生成優化

通過預訓練和微調專門優化了SQL生成能力

基於Llama-2架構

繼承了Llama-2模型的強大語言理解能力

多源訓練數據

使用來自The Stack的SQL子集和20多個公開標準數據集進行訓練

模型能力

自然語言到SQL轉換

數據庫查詢生成

SELECT語句生成

使用案例

數據庫管理

體育場信息查詢

根據自然語言問題生成查詢體育場容量等信息的SQL

可準確生成包含MAX、AVG等聚合函數的SQL查詢

工單狀態統計

根據工單狀態生成統計查詢

可正確生成包含WHERE條件的SQL查詢

商業智能

數據報表生成

將業務問題自動轉換為SQL查詢以生成報表

🚀 NSQL-Llama-2-7B

NSQL 是專門為 SQL 生成任務設計的自迴歸開源大型基礎模型（FMs）家族。本倉庫引入了 NSQL 家族的新成員 NSQL-Llama-2-7B，它基於 Meta 原始的 Llama-2 7B 模型，先在通用 SQL 查詢數據集上進行預訓練，然後在由文本到 SQL 對組成的數據集上進行微調。

🚀 快速開始

模型推理參數

推理時的參數設置如下：

inference:
  parameters:
    do_sample: false
    max_length: 200

使用示例

以下是一些使用該模型進行文本到 SQL 生成的示例：

基礎用法

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-llama-2-7B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-llama-2-7B", torch_dtype=torch.bfloat16)

text = """CREATE TABLE stadium (
    stadium_id number,
    location text,
    name text,
    capacity number,
    highest number,
    lowest number,
    average number
)

CREATE TABLE singer (
    singer_id number,
    name text,
    country text,
    song_name text,
    song_release_year text,
    age number,
    is_male others
)

CREATE TABLE concert (
    concert_id number,
    concert_name text,
    theme text,
    stadium_id text,
    year text
)

CREATE TABLE singer_in_concert (
    concert_id number,
    singer_id text
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- What is the maximum, the average, and the minimum capacity of stadiums ?

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

其他示例

以下是另外兩個不同場景的使用示例：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-llama-2-7B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-llama-2-7B", torch_dtype=torch.bfloat16)

text = """CREATE TABLE stadium (
    stadium_id number,
    location text,
    name text,
    capacity number,
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- how many stadiums in total?

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-llama-2-7B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-llama-2-7B", torch_dtype=torch.bfloat16)

text = """CREATE TABLE work_orders (
    ID NUMBER,
    CREATED_AT TEXT,
    COST FLOAT,
    INVOICE_AMOUNT FLOAT,
    IS_DUE BOOLEAN,
    IS_OPEN BOOLEAN,
    IS_OVERDUE BOOLEAN,
    COUNTRY_NAME TEXT,
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- how many work orders are open?

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

如需更多信息（例如在本地數據庫上運行），請在此倉庫中查找示例。

✨ 主要特性

專為文本到 SQL 生成任務設計，能根據給定的表結構和自然語言提示生成 SQL 查詢。
基於強大的 Llama-2 7B 模型，經過預訓練和微調，在 SQL 生成任務上有較好的表現。

📦 訓練相關信息

訓練數據

屬性	詳情
通用 SQL 查詢數據	來自 The Stack 的 SQL 子集，包含 100 萬個訓練樣本。
標註的文本到 SQL 對	來自網絡上 20 多個公共來源的標準數據集。我們預留了 Spider 和 GeoQuery 數據集用於評估。