đ TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
TableLLM is a powerful large language model designed to efficiently handle tabular data manipulation tasks, whether the data is embedded in spreadsheets or documents, meeting the needs of real - world office scenarios. The TableLLM series includes two different scales: TableLLM - 7B and TableLLM - 13B, which are fine - tuned based on CodeLlama - 7b - Instruct - hf and CodeLlama - 13b - Instruct - hf respectively.
TableLLM can generate either a code solution or a direct text answer to handle tabular data manipulation tasks according to different scenarios. Code generation is used for tabular data embedded in spreadsheets, often involving operations such as insert, delete, update, query, merge, and plot of tables. Text generation is used for tabular data embedded in documents, usually for query operations on short tables.
| Paper | Training set | Github | Homepage |
đ Quick Start
TableLLM is ready to handle various tabular data manipulation tasks. You can start by exploring the different scales of TableLLM and their fine - tuning bases.
⨠Features
- Dual - mode Generation: Generate code solutions for spreadsheet - embedded tabular data and direct text answers for document - embedded tabular data.
- Multiple Scales: Available in two scales, TableLLM - 7B and TableLLM - 13B, fine - tuned on different CodeLlama models.
đ Documentation
Evaluation Results
We evaluated the code solution generation ability of TableLLM on three benchmarks: WikiSQL, Spider, and a self - created table operation benchmark. The text answer generation ability was tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT - QA, FeTaQA, and OTTQA. The evaluation results are as follows:
Model |
WikiTQ |
TAT - QA |
FeTaQA |
OTTQA |
WikiSQL |
Spider |
Self - created |
Average |
TaPEX |
38.5 |
â |
â |
â |
83.9 |
15.0 |
/ |
45.8 |
TaPas |
31.5 |
â |
â |
â |
74.2 |
23.1 |
/ |
42.92 |
TableLlama |
24.0 |
22.2 |
20.5 |
6.4 |
43.7 |
9.0 |
/ |
20.7 |
GPT3.5 |
58.5 |
72.1 |
71.2 |
60.8 |
81.7 |
67.4 |
77.1 |
69.8 |
GPT4 |
74.1 |
77.1 |
78.4 |
69.5 |
84.0 |
69.5 |
77.8 |
75.8 |
Llama2 - Chat (13B) |
48.8 |
49.6 |
67.7 |
61.5 |
â |
â |
â |
56.9 |
CodeLlama (13B) |
43.4 |
47.2 |
57.2 |
49.7 |
38.3 |
21.9 |
47.6 |
43.6 |
Deepseek - Coder (33B) |
6.5 |
11.0 |
7.1 |
7.4 |
72.5 |
58.4 |
73.9 |
33.8 |
StructGPT (GPT3.5) |
52.5 |
27.5 |
11.8 |
14.0 |
67.8 |
84.8 |
/ |
48.9 |
Binder (GPT3.5) |
61.6 |
12.8 |
6.8 |
5.1 |
78.6 |
52.6 |
/ |
42.5 |
DATER (GPT3.5) |
53.4 |
28.4 |
18.3 |
13.0 |
58.2 |
26.5 |
/ |
37.0 |
TableLLM - 7B (Ours) |
58.8 |
66.9 |
72.6 |
63.1 |
86.6 |
82.6 |
78.8 |
72.8 |
TableLLM - 13B (Ours) |
62.4 |
68.2 |
74.5 |
62.5 |
90.7 |
83.4 |
80.8 |
74.7 |
Prompt Template
The prompts used for generating code solutions and text answers are introduced below.
đģ Usage Examples
Basic Usage - Code Solution
The prompt template for insert, delete, update, query, and plot operations on a single table:
[INST]Below are the first few lines of a CSV file. You need to write a Python program to solve the provided question.
Header and first few lines of CSV file:
{csv_data}
Question: {question}[/INST]
The prompt template for the merge operation on two tables:
[INST]Below are the first few lines two CSV file. You need to write a Python program to solve the provided question.
Header and first few lines of CSV file 1:
{csv_data1}
Header and first few lines of CSV file 2:
{csv_data2}
Question: {question}[/INST]
The csv_data
field is filled with the first few lines of your provided table file. Here is an example:
Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
Basic Usage - Text Answer
The prompt template for direct text answer generation on short tables:
[INST]Offer a thorough and accurate solution that directly addresses the Question outlined in the [Question].
### [Table Text]
{table_descriptions}
### [Table]
{table_in_csv}
### [Question]
{question}
### [Solution][INST/]
For more details on how to use TableLLM, please refer to our GitHub page: https://github.com/TableLLM/TableLLM
đ License
The license for this project is Llama2.