Diagram_to_Code_Agent Open-source Model - Effortlessly Convert Diagrams to Structured Code

Home

Diagram To Code Agent

Developed by DiagramAgent

This model is a vision-language fusion model specifically designed to convert diagrams into structured code.

Image-to-Text

Safetensors

EnglishOpen Source License:Apache-2.0 #Diagram to Code #Vision-Language Fusion #Reverse Engineering

Downloads 51

Release Time : 3/3/2025

Model Overview

The agent's task is to transform given diagrams (visual representations) into corresponding structured code, supporting diagram editing workflows and reverse engineering.

Model Features

Vision-Language Fusion

Combines visual and linguistic information to accurately understand diagram content and generate corresponding structured code.

High-Precision Code Generation

Reduces the edit distance between generated code and reference code through specialized loss functions, ensuring code accuracy.

Modular Collaboration

Works closely with the Check Agent to verify generated code and provide optimization feedback.

Model Capabilities

Diagram to Code

Visual Question Answering

Structured Code Generation

Use Cases

Automated Diagram Editing

Diagram Reverse Engineering

Converts existing diagrams into code for subsequent modification and analysis.

Generates code highly consistent with reference code, ensuring all diagram elements are accurately captured.

Data Visualization Tool Enhancement

Integrated Code Representation

Enhances data visualization tools through code-based diagram representations.

Provides reliable code foundation to support diagram editing workflows.

🚀 DiagramAgent/Diagram_to_Code_Agent

This agent is designed to convert a given diagram (visual representation) into its corresponding structured code.

🚀 Quick Start

This agent is used to convert diagrams into structured code. It can be applied in various scenarios such as automated diagram editing, reverse - engineering of visual diagrams, and enhancing data visualization tools.

✨ Features

Convert existing diagrams into structured code representations.
Support diagram editing workflows by providing a reliable code basis for modifications.
Capture and preserve implicit logical structures and visual details of diagrams.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# default: Load the model on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "DiagramAgent/Diagram_to_Code_Agent", torch_dtype="auto", device_map="auto"
)

# default processer
processor = AutoProcessor.from_pretrained("DiagramAgent/Diagram_to_Code_Agent")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "your input",
            },
            {"type": "text", "text": "image path"},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

📚 Documentation

Model Overview

Name: DiagramAgent/Diagram_to_Code_Agent
Description: This agent is tasked with converting a given diagram (visual representation) into its corresponding structured code.

Intended Use

Primary Tasks:
- Convert existing diagrams into structured code representations.
- Support diagram editing workflows by providing a reliable code basis for modifications.
- Capture and preserve implicit logical structures and visual details of diagrams.
Application Scenarios:
- Automated diagram editing: Transforming a diagram into code to enable subsequent modifications.
- Reverse engineering of visual diagrams for analysis and reusability.
- Enhancing data visualization tools by integrating code - based diagram representations.

Architecture and Training Details

Base Model: Utilizes the Qwen2 - VL - 7B model, which is a vision - language fusion model.
Training Process:
- Trained on diverse diagram samples from the DiagramGenBenchmark dataset.
- Aims to generate code that is highly consistent with a reference code, ensuring that all diagram elements are accurately captured.
- Uses a specialized loss function to reduce the edit distance between the generated and reference code.
Module Interaction: Works closely with the Check Agent, which validates the generated code and provides feedback for further refinement.

🔧 Technical Details

The model uses the Qwen2 - VL - 7B model as the base. It is trained on the DiagramGenBenchmark dataset, aiming to generate code highly consistent with the reference code. A specialized loss function is used to reduce the edit distance between the generated and reference code. It also interacts with the Check Agent for code validation and refinement.

📄 License

The model is licensed under the Apache - 2.0 license.

📑 Paper Link

📑paper link

📚 Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{wei2024wordsstructuredvisualsbenchmark,
  title={From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing},
  author={Jingxuan Wei and Cheng Tan and Qi Chen and Gaowei Wu and Siyuan Li and Zhangyang Gao and Linzhuang Sun and Bihui Yu and Ruifeng Guo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

📊 Information Table

Property	Details
Model Type	DiagramAgent/Diagram_to_Code_Agent
Training Data	DiagramGenBenchmark
Pipeline Tag	visual - question - answering
Base Model	Qwen/Qwen2 - VL - 7B - Instruct
License	Apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご