pip-sql-1.3b開源SQL生成模型 - 免費使用，性能超越多數專家模型和ChatGPT

首頁

Pip Sql 1.3b

由PipableAI開發

一個13億參數的SQL生成模型，在多個流行基準測試中超越了大多數SQL專家模型和ChatGPT。

大型語言模型支持多種語言開源協議:Apache-2.0 #文本轉SQL生成 #高效SQL查詢 #數據庫交互

下載量 1,288

發布時間 : 2/14/2024

模型概述

基於deepseek基礎模型提煉的文本到SQL轉換模型，能夠根據自然語言問題和數據庫模式生成SQL查詢。

模型特點

高性能SQL生成

在Spider、SParC和CoSQL等基準測試中表現優於同類模型和ChatGPT

高效參數規模

僅13億參數即達到優異性能，相比更大規模模型更具效率優勢

多框架支持

支持PyTorch和JAX/Flax兩種主流深度學習框架

模型能力

自然語言轉SQL

數據庫查詢生成

複雜SQL語句構建

使用案例

數據庫管理

業務數據分析

非技術人員通過自然語言查詢數據庫

自動生成準確的SQL查詢語句

數據庫應用開發

快速原型開發中自動生成數據庫查詢代碼

減少SQL編寫時間，提高開發效率

🚀 pipSQL-1.3b

pipSQL-1.3b是一個擁有13億參數的SQL模型，在流行基準測試中表現優於大多數SQL專家模型和ChatGPT。它基於DeepSeek基礎模型構建，為文本到SQL的轉換提供了強大支持。

🚀 快速開始

你可以通過以下鏈接體驗本項目：

✨ 主要特性

一個擁有13億參數的SQL模型，在流行基準測試中超越了大多數SQL專家模型和ChatGPT。
這是一個基於DeepSeek基礎模型構建的蒸餾模型。
有關我們的先進模型，請參考PipableAI/pip-library-etl-1.3b。

🔧 技術細節

模型構建方法

我們使用了softmax交叉熵、改進形式的策略梯度以及Q損失，並在EM設置中進行優化。以下是上述設置中的損失行為：

image/png

基準測試

為了進行基準測試，我們使用了由耶魯大學和伯克利大學的研究團隊提出的“Semantic Evaluation for Text-to-SQL with Distilled Test Suites”，這是一個被官方認可的用於Spider、SParC和CoSQL的評估框架。該基準包含2200個測試數據點。

Test Suite SQL Eval

模型	簡單	中等	困難	額外
sqlcoder-7b-2	72.0	58.0	40.6	37.3
pipSQL-1.3b	78.5	57.5	42.1	28.3
pipSQL-7b	63.0	40.0	30.2	25.0
sqlcoder-7b	60.6	48.2	28.3	20.4
gpt-3.5	58.8	44.7	31.0	28.4

我們還在Defog評估上進行了基準測試，該評估包含Defog團隊精心挑選的200個測試數據點。

Defog SQL-Eval

團隊成員

Avi Kothari、Pratham Gupta、Ritvik Aryan Kalra、Rohan Bhatial、Soham Acharya

📦 安裝指南

pip install transformers

💻 使用示例

基礎用法

prompt = f"""<schema>{schema}</schema>
<question>{question}</question>
<sql>"""

高級用法 - PyTorch

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b")
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")

inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])

高級用法 - Flax

from transformers import FlaxAutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = FlaxAutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b",from_pt=True)
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")

inputs = tokenizer(text, return_tensors="jax")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])

示例數據與查詢

數據庫表結構

CREATE TABLE Products (
  product_id number,
  parent_product_id number,
  product_name text,
  product_price number,
  product_color text,
  product_size text,
  product_description text);

CREATE TABLE Customers (
  customer_id number,
  gender_code text,
  customer_first_name text,
  customer_middle_initial text,
  customer_last_name text,
  email_address text,
  login_name text,
  login_password text,
  phone_number text,
  address_line_1 text,
  town_city text,
  county text,
  country text);

CREATE TABLE Customer_Payment_Methods (
  customer_id number,
  payment_method_code text);

CREATE TABLE Invoices (
  invoice_number number,
  invoice_status_code text,
  invoice_date time);

CREATE TABLE Orders (
  order_id number,
  customer_id number,
  order_status_code text,
  date_order_placed time);

CREATE TABLE Order_Items (
  order_item_id number,
  product_id number,
  order_id number,
  order_item_status_code text);

CREATE TABLE Shipments (
  shipment_id number,
  order_id number,
  invoice_number number,
  shipment_tracking_number text,
  shipment_date time);

CREATE TABLE Shipment_Items (
  shipment_id number,
  order_item_id number);

查詢示例

問題1：最不常見性別客戶的電子郵件地址、所在城鎮和所在縣是什麼？

SELECT email_address ,  town_city ,  county FROM customers GROUP BY gender_code ORDER BY count(*) ASC LIMIT 1

問題2：價格高於平均水平的產品的價格和尺寸是多少？

SELECT product_price ,  product_size FROM products WHERE product_price  > (SELECT avg(product_price) FROM products)

問題3：哪些客戶沒有下過任何訂單？列出他們的名字、中間名首字母和姓氏。

SELECT T1.customer_first_name ,  T1.customer_middle_initial ,  T1.customer_last_name FROM Customers AS T1 WHERE T1.customer_id NOT IN (SELECT T2.customer_id FROM Orders AS T2)

📄 許可證

該模型遵循Apache 2.0許可證開源。

屬性	詳情
模型類型	基於DeepSeek的蒸餾SQL模型
訓練數據	PipableAI/pip-txt-to-sql-spider-bird-dataset
評估指標	準確率
標籤	sql、code、text2sql、instruction_tuned、basemodel、jax、pytorch、text-generation-inference
庫名稱	transformers
任務類型	文本生成