ChartGemma Open-Source Chart Understanding Model - Directly Process Chart Images and Capture Visual and Underlying Information

Chartgemma

Developed by ahmed-masry

ChartGemma is a chart understanding and reasoning model built upon PaliGemma, capable of directly processing chart images through visual instruction fine-tuning to capture visual trends and underlying information.

Image-to-Text

Transformers

EnglishOpen Source License:MIT #Chart Visual Reasoning #Multimodal Instruction Fine-tuning #Real-world Chart Understanding

Downloads 1,243

Release Time : 6/19/2024

Model Overview

ChartGemma is a visual instruction fine-tuning model specialized in chart understanding and reasoning, capable of handling tasks such as chart summarization, Q&A, and fact-checking, achieving state-of-the-art performance in multiple benchmarks.

Model Features

Visual Instruction Fine-tuning

Trained directly on instruction fine-tuning data generated from chart images, capturing high-level trends and underlying visual information in diverse charts.

Multi-task Support

Supports various chart understanding tasks such as chart summarization, Q&A, and fact-checking.

High Performance

Achieves state-of-the-art performance in 5 benchmarks, generating summaries that are more realistic and factually accurate.

Model Capabilities

Chart Summarization

Chart Q&A

Chart Fact-checking

Visual Trend Recognition

Data Reasoning

Use Cases

Data Analysis

Social Media Usage Analysis

Analyze social media platform usage across different age groups

Accurately calculate the total usage of specific age groups across multiple platforms

Business Intelligence

Sales Trend Analysis

Extract key trends and information from sales charts

Generate accurate sales trend summaries

🚀 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

ChartGemma is a novel chart understanding and reasoning model. It addresses the drawbacks of existing methods by training on instruction - tuning data generated directly from chart images, achieving state - of - the - art results across multiple benchmarks.

📚 Documentation

The abstract of the paper states that:

Given the ubiquity of charts as a data analysis, visualization, and decision - making tool across industries and sciences, there has been a growing interest in developing pre - trained foundation models as well as general purpose instruction - tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, and use weakly aligned vision - language backbone models for domain - specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction - tuning data generated directly from chart images, thus capturing both high - level trends and low - level visual information from a diverse set of charts. Our simple approach achieves state - of - the - art results across $5$ benchmarks spanning chart summarization, question answering, and fact - checking, and our elaborate qualitative studies on real - world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries.

Paper Link

🚀 Quick Start

Web Demo

If you wish to quickly try our model, you can access our public web demo hosted on the Hugging Face Spaces platform with a friendly interface!

[ChartGemma Web Demo](https://huggingface.co/spaces/ahmed - masry/ChartGemma)

Inference

You can easily use our models for inference with the huggingface library! You just need to do the following:

Change the image_path to your chart example image path on your system
Write the input_text

We recommend using beam search with a beam size of 4, but if your machine has low memory, you can remove the num_beams from the generate method.

from PIL import Image
import requests
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
import torch

torch.hub.download_url_to_file('https://raw.githubusercontent.com/vis - nlp/ChartQA/main/ChartQA%20Dataset/val/png/multi_col_1229.png', 'chart_example_1.png')

image_path = "/content/chart_example_1.png"
input_text ="program of thought: what is the sum of Faceboob Messnger and Whatsapp values in the 18 - 29 age group?"

# Load Model
model = PaliGemmaForConditionalGeneration.from_pretrained("ahmed - masry/chartgemma", torch_dtype=torch.float16)
processor = AutoProcessor.from_pretrained("ahmed - masry/chartgemma")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Process Inputs
image = Image.open(image_path).convert('RGB')
inputs = processor(text=input_text, images=image, return_tensors="pt")
prompt_length = inputs['input_ids'].shape[1]
inputs = {k: v.to(device) for k, v in inputs.items()}

# Generate
generate_ids = model.generate(**inputs, num_beams=4, max_new_tokens=512)
output_text = processor.batch_decode(generate_ids[:, prompt_length:], skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output_text)

📄 License

This project is licensed under the MIT license.

📞 Contact

If you have any questions about this work, please contact Ahmed Masry using the following email addresses: amasry17@ku.edu.tr or ahmed.elmasry24653@gmail.com.

📚 Reference

Please cite our paper if you use our model in your research.

@misc{masry2024chartgemmavisualinstructiontuningchart,
      title={ChartGemma: Visual Instruction - tuning for Chart Reasoning in the Wild}, 
      author={Ahmed Masry and Megh Thakkar and Aayush Bajaj and Aaryaman Kartha and Enamul Hoque and Shafiq Joty},
      year={2024},
      eprint={2407.04172},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2407.04172}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご