๐ DiagramAgent/Diagram_to_Code_Agent
This agent is designed to convert a given diagram (visual representation) into its corresponding structured code.
๐ Quick Start
This agent is used to convert diagrams into structured code. It can be applied in various scenarios such as automated diagram editing, reverse - engineering of visual diagrams, and enhancing data visualization tools.
โจ Features
- Convert existing diagrams into structured code representations.
- Support diagram editing workflows by providing a reliable code basis for modifications.
- Capture and preserve implicit logical structures and visual details of diagrams.
๐ฆ Installation
No installation steps are provided in the original document, so this section is skipped.
๐ป Usage Examples
Basic Usage
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
model = Qwen2VLForConditionalGeneration.from_pretrained(
"DiagramAgent/Diagram_to_Code_Agent", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("DiagramAgent/Diagram_to_Code_Agent")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "your input",
},
{"type": "text", "text": "image path"},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
๐ Documentation
Model Overview
- Name: DiagramAgent/Diagram_to_Code_Agent
- Description: This agent is tasked with converting a given diagram (visual representation) into its corresponding structured code.
Intended Use
- Primary Tasks:
- Convert existing diagrams into structured code representations.
- Support diagram editing workflows by providing a reliable code basis for modifications.
- Capture and preserve implicit logical structures and visual details of diagrams.
- Application Scenarios:
- Automated diagram editing: Transforming a diagram into code to enable subsequent modifications.
- Reverse engineering of visual diagrams for analysis and reusability.
- Enhancing data visualization tools by integrating code - based diagram representations.
Architecture and Training Details
- Base Model: Utilizes the Qwen2 - VL - 7B model, which is a vision - language fusion model.
- Training Process:
- Trained on diverse diagram samples from the DiagramGenBenchmark dataset.
- Aims to generate code that is highly consistent with a reference code, ensuring that all diagram elements are accurately captured.
- Uses a specialized loss function to reduce the edit distance between the generated and reference code.
- Module Interaction: Works closely with the Check Agent, which validates the generated code and provides feedback for further refinement.
๐ง Technical Details
The model uses the Qwen2 - VL - 7B model as the base. It is trained on the DiagramGenBenchmark dataset, aiming to generate code highly consistent with the reference code. A specialized loss function is used to reduce the edit distance between the generated and reference code. It also interacts with the Check Agent for code validation and refinement.
๐ License
The model is licensed under the Apache - 2.0 license.
๐ Paper Link
๐paper link
๐ Citation
If you find our work helpful, feel free to give us a cite.
@inproceedings{wei2024wordsstructuredvisualsbenchmark,
title={From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing},
author={Jingxuan Wei and Cheng Tan and Qi Chen and Gaowei Wu and Siyuan Li and Zhangyang Gao and Linzhuang Sun and Bihui Yu and Ruifeng Guo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
๐ Information Table
Property |
Details |
Model Type |
DiagramAgent/Diagram_to_Code_Agent |
Training Data |
DiagramGenBenchmark |
Pipeline Tag |
visual - question - answering |
Base Model |
Qwen/Qwen2 - VL - 7B - Instruct |
License |
Apache - 2.0 |