nsql-llama-2-7B开源SQL生成模型 - 免费部署，自然语言秒变SQL查询

首页

Nsql Llama 2 7B

由 NumbersStation 开发

NSQL-Llama-2-7B是基于Llama-2 7B模型微调而成的SQL生成模型，专为将自然语言转换为SQL查询而设计。

大型语言模型

Transformers

#文本转SQL生成 #Llama2微调 #数据库查询优化

下载量 223

发布时间 : 7/31/2023

模型简介

该模型是NSQL系列的一部分，专门用于根据给定的表结构和自然语言提示生成SQL查询语句，主要输出SELECT查询。

模型特点

专为SQL生成优化

通过预训练和微调专门优化了SQL生成能力

基于Llama-2架构

继承了Llama-2模型的强大语言理解能力

多源训练数据

使用来自The Stack的SQL子集和20多个公开标准数据集进行训练

模型能力

自然语言到SQL转换

数据库查询生成

SELECT语句生成

使用案例

数据库管理

体育场信息查询

根据自然语言问题生成查询体育场容量等信息的SQL

可准确生成包含MAX、AVG等聚合函数的SQL查询

工单状态统计

根据工单状态生成统计查询

可正确生成包含WHERE条件的SQL查询

商业智能

数据报表生成

将业务问题自动转换为SQL查询以生成报表

🚀 NSQL-Llama-2-7B

NSQL 是专门为 SQL 生成任务设计的自回归开源大型基础模型（FMs）家族。本仓库引入了 NSQL 家族的新成员 NSQL-Llama-2-7B，它基于 Meta 原始的 Llama-2 7B 模型，先在通用 SQL 查询数据集上进行预训练，然后在由文本到 SQL 对组成的数据集上进行微调。

🚀 快速开始

模型推理参数

推理时的参数设置如下：

inference:
  parameters:
    do_sample: false
    max_length: 200

使用示例

以下是一些使用该模型进行文本到 SQL 生成的示例：

基础用法

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-llama-2-7B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-llama-2-7B", torch_dtype=torch.bfloat16)

text = """CREATE TABLE stadium (
    stadium_id number,
    location text,
    name text,
    capacity number,
    highest number,
    lowest number,
    average number
)

CREATE TABLE singer (
    singer_id number,
    name text,
    country text,
    song_name text,
    song_release_year text,
    age number,
    is_male others
)

CREATE TABLE concert (
    concert_id number,
    concert_name text,
    theme text,
    stadium_id text,
    year text
)

CREATE TABLE singer_in_concert (
    concert_id number,
    singer_id text
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- What is the maximum, the average, and the minimum capacity of stadiums ?

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

其他示例

以下是另外两个不同场景的使用示例：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-llama-2-7B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-llama-2-7B", torch_dtype=torch.bfloat16)

text = """CREATE TABLE stadium (
    stadium_id number,
    location text,
    name text,
    capacity number,
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- how many stadiums in total?

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-llama-2-7B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-llama-2-7B", torch_dtype=torch.bfloat16)

text = """CREATE TABLE work_orders (
    ID NUMBER,
    CREATED_AT TEXT,
    COST FLOAT,
    INVOICE_AMOUNT FLOAT,
    IS_DUE BOOLEAN,
    IS_OPEN BOOLEAN,
    IS_OVERDUE BOOLEAN,
    COUNTRY_NAME TEXT,
)

-- Using valid SQLite, answer the following questions for the tables provided above.

-- how many work orders are open?

SELECT"""

input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

如需更多信息（例如在本地数据库上运行），请在此仓库中查找示例。

✨ 主要特性

专为文本到 SQL 生成任务设计，能根据给定的表结构和自然语言提示生成 SQL 查询。
基于强大的 Llama-2 7B 模型，经过预训练和微调，在 SQL 生成任务上有较好的表现。

📦 训练相关信息

训练数据

属性	详情
通用 SQL 查询数据	来自 The Stack 的 SQL 子集，包含 100 万个训练样本。
标注的文本到 SQL 对	来自网络上 20 多个公共来源的标准数据集。我们预留了 Spider 和 GeoQuery 数据集用于评估。