DriveLMM-o1 Open-Source Autonomous Driving Model - Multi-View Image Inference Provides Efficient Support for Autonomous Driving

Drivelmmo1

Developed by ayeshaishaq

DriveLMM-o1 is a fine-tuned large multimodal model optimized for autonomous driving, based on the InternVL2.5-8B architecture and adapted using LoRA technology, achieving step-by-step reasoning through stitched multi-view images.

Multimodal Fusion

Transformers

EnglishOpen Source License:Apache-2.0 #Autonomous Driving Inference #Multi-view Image Fusion #Chain-of-Thought Decision Explanation

Downloads 233

Release Time : 3/11/2025

Model Overview

DriveLMM-o1 is a large multimodal model designed for autonomous driving inference, integrating multi-view images for panoramic scene understanding and generating detailed intermediate reasoning steps to explain decision-making processes.

Model Features

Multimodal Fusion

Integrates multi-view images for panoramic scene understanding

Chain-of-Thought Reasoning

Generates detailed intermediate reasoning steps to explain decision-making processes

Efficient Adaptation

Employs dynamic image patching and LoRA fine-tuning technology to process high-resolution inputs with minimal additional parameters

Performance Breakthrough

Achieves significant improvements in final answer accuracy and overall reasoning scores compared to existing open-source models

Model Capabilities

Multi-view Image Processing

Autonomous Driving Decision Inference

Scene Perception and Object Understanding

Risk Assessment

Traffic Rule Compliance Analysis

Use Cases

Autonomous Driving

Risk Assessment

Analyzes potential risks in the driving environment through multi-view images

Risk assessment accuracy reaches 73.01%

Traffic Rule Compliance

Evaluates whether driving behavior complies with traffic rules

Traffic rule compliance rate reaches 81.56%

Scene Perception and Object Understanding

Identifies and understands various objects and scenes in the driving environment

Scene perception and object understanding accuracy reaches 75.39%

🚀 DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning

DriveLMM-o1 is a fine - tuned large multimodal model tailored for autonomous driving, offering enhanced decision - making and interpretability in complex driving scenarios.

📦 Model Information

Property	Details
Base Model	OpenGVLab/InternVL2_5 - 8B
Language	en
License	apache - 2.0
Pipeline Tag	image - text - to - text
Library Name	transformers
Datasets	ayeshaishaq/DriveLMMo1

📚 Paper

Paper

✨ Features

Multimodal Integration: Combines multiview images for comprehensive scene understanding.
Step - by - Step Reasoning: Produces detailed intermediate reasoning steps to explain decisions.
Efficient Adaptation: Utilizes dynamic image patching and LoRA finetuning for high - resolution inputs with minimal extra parameters.
Performance Gains: Achieves significant improvements in both final answer accuracy and overall reasoning scores compared to previous open - source models.

📊 Performance Comparison

Model	Risk Assessment Accuracy	Traffic Rule Adherence	Scene Awareness & Object Understanding	Relevance	Missing Details	Overall Reasoning Score	Final Answer Accuracy
GPT - 4o (Closed)	71.32	80.72	72.96	76.65	71.43	72.52	57.84
Qwen - 2.5 - VL - 7B	46.44	60.45	51.02	50.15	52.19	51.77	37.81
Ovis1.5 - Gemma2 - 9B	51.34	66.36	54.74	55.72	55.74	55.62	48.85
Mulberry - 7B	51.89	63.66	56.68	57.27	57.45	57.65	52.86
LLaVA - CoT	57.62	69.01	60.84	62.72	60.67	61.41	49.27
LlamaV - o1	60.20	73.52	62.67	64.66	63.41	63.13	50.02
InternVL2.5 - 8B	69.02	78.43	71.52	75.80	70.54	71.62	54.87
DriveLMM - o1 (Ours)	73.01	81.56	75.39	79.42	74.49	75.24	62.36

💻 Usage Examples

Basic Usage

from transformers import AutoModel, AutoTokenizer
import torch

path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True
).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    path,
    trust_remote_code=True,
    use_fast=False
)

For detailed usage instructions and additional configurations, please refer to the OpenGVLab/InternVL2_5-8B repository.

🔗 Code

https://github.com/ayesha-ishaq/DriveLMM-o1

⚠️ Important Note

While DriveLMM - o1 demonstrates strong performance in autonomous driving tasks, it is fine - tuned for domain - specific reasoning. Users may need to further fine - tune or adapt the model for different driving environments.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご