đ Chat2DB-GLM
Chat2DB-GLM is part of the open - source Chat2DB project, offering an efficient way to transform natural language queries into structured SQL statements.
đ Quick Start
Chat2DB-GLM is a component of the open - source Chat2DB project. Its key mission is to convert natural language queries into structured SQL statements. The open - sourced Chat2DB-SQL-7B model, with 7B parameters, is fine - tuned based on CodeLlama. It is tailored for natural language to SQL conversion, supports multiple SQL dialects, and can handle up to 16k of context length.
⨠Features
Dialect Support
The Chat2DB-SQL-7B model supports a wide array of SQL dialects, including but not restricted to MySQL, PostgreSQL, SQLite, and other common SQL dialects. This cross - dialect ability guarantees the model's wide applicability and flexibility.
Model Performance
The Chat2DB-SQL-7B model has demonstrated excellent performance across multiple dialects and key SQL parts. The following table shows the model's performance on different SQL key parts, taking generic SQL as an example, evaluated using the spider dataset. It showcases the model's capacity to handle various SQL functions (e.g., date functions, string functions, etc.).
Dialect |
select |
where |
group |
order |
function |
total |
Generic SQL |
91.5 |
83.7 |
80.5 |
98.2 |
96.2 |
77.3 |
đ Documentation
Model Limitations and Usage Notes
The Chat2DB-SQL-7B was mainly fine - tuned for MySQL, PostgreSQL, and generic SQL dialects. Although it can offer basic conversion capabilities for other SQL dialects, inaccuracies may occur when dealing with specific dialects' special functions (such as date functions, string functions, etc.). Performance may vary with dataset changes.
Please note that this model is mainly for academic research and learning. While we strive to ensure output accuracy, its performance in a production environment is not guaranteed. Any potential losses from using this model are not the responsibility of this project or its contributors. We encourage users to carefully assess its applicability in specific use cases.
Hardware Requirements
Model |
Minimum GPU Memory (Inference) |
Minimum GPU Memory (Efficient Parameter Fine - Tuning) |
Chat2DB-SQL-7B |
14GB |
20GB |
Contribution Guide
We welcome and encourage community members to contribute to the Chat2DB - GLM project. Whether it's reporting issues, suggesting new features, or directly submitting code fixes and improvements, your help is highly valuable.
If you're interested in contributing, please follow our contribution guidelines:
- Report Issues: Report any issues or bugs via GitHub Issues.
- Submit Pull Requests: If you want to contribute to the codebase, fork the repository and submit a pull request (PR).
- Improve Documentation: Contributions to best practices, example code, and documentation improvements are welcome.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_path = "Chat2DB/Chat2DB-SQL-7B"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16, use_cache=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, return_full_text=False, max_new_tokens=100)
prompt = "### Database Schema\n\n['CREATE TABLE \"stadium\" (\\n\"Stadium_ID\" int,\\n\"Location\" text,\\n\"Name\" text,\\n\"Capacity\" int,\\n\"Highest\" int,\\n\"Lowest\" int,\\n\"Average\" int,\\nPRIMARY KEY (\"Stadium_ID\")\\n);', 'CREATE TABLE \"singer\" (\\n\"Singer_ID\" int,\\n\"Name\" text,\\n\"Country\" text,\\n\"Song_Name\" text,\\n\"Song_release_year\" text,\\n\"Age\" int,\\n\"Is_male\" bool,\\nPRIMARY KEY (\"Singer_ID\")\\n);', 'CREATE TABLE \"concert\" (\\n\"concert_ID\" int,\\n\"concert_Name\" text,\\n\"Theme\" text,\\n\"Stadium_ID\" text,\\n\"Year\" text,\\nPRIMARY KEY (\"concert_ID\"),\\nFOREIGN KEY (\"Stadium_ID\") REFERENCES \"stadium\"(\"Stadium_ID\")\\n);', 'CREATE TABLE \"singer_in_concert\" (\\n\"concert_ID\" int,\\n\"Singer_ID\" text,\\nPRIMARY KEY (\"concert_ID\",\"Singer_ID\"),\\nFOREIGN KEY (\"concert_ID\") REFERENCES \"concert\"(\"concert_ID\"),\\nFOREIGN KEY (\"Singer_ID\") REFERENCES \"singer\"(\"Singer_ID\")\\n);']\n\n\n### Task \n\nBased on the provided database schema information, How many singers do we have?[SQL]\n"
response = pipe(prompt)[0]["generated_text"]
print(response)
đ License
The model weights in this project are governed by a custom commercial license from Code Llama. For details, please visit: Custom Commercial License
Before using this software, please ensure you have fully understood the terms of the license.