MMAlaya2 Open-Source Multimodal Model - Excellent Performance Based on Fine-Tuning, Suitable for Diverse Scenarios

Home

Mmalaya2

Developed by DataCanvas

A multimodal model fine-tuned based on InternVL-Chat-V1-5, excelling in MMBench benchmark tests

Image-to-Text

Safetensors

Open Source License:Apache-2.0 #LoRA Fine-tuning Enhancement #Multimodal Q&A #Chinese Image Understanding

Downloads 26

Release Time : 8/19/2024

Model Overview

MMAlaya2 enhances multimodal understanding and generation capabilities through fine-tuning with 20 LoRA modules and TIES merging method, achieving GPT-4o level performance in Chinese multimodal benchmarks

Model Features

Multi-LoRA Module Fusion

Significantly improves model performance through fine-tuning with 20 LoRA modules and TIES merging method

Chinese Multimodal Advantage

Achieves 82.1 points in MMBench Chinese tests, on par with GPT-4o

Domain-Specific Optimization

Conducts error analysis and data supplementation for specific categories like natural relationships and image emotions

Model Capabilities

Image Understanding

Multimodal Q&A

Scene Recognition

Sentiment Analysis

Style Recognition

Use Cases

Visual Q&A

Image Scene Understanding

Recognizes scenes and contexts in images

Excellent performance in MMBench image scene category

Sentiment Analysis

Analyzes emotions conveyed in images

Improved accuracy in image sentiment category

Multimodal Reasoning

Natural Relationship Understanding

Understands natural relationships between objects in images

Reduced error rate in natural relationship category

🚀 MMAlaya2

MMAlaya2 fine-tunes 20 LoRA modules on the InternVL-Chat-V1-5 model, and then merges these fine-tuned modules with the base model using the TIES method, achieving excellent performance in multiple benchmarks.

🚀 Quick Start

MMAlaya2 fine-tunes 20 LoRA modules based on the InternVL-Chat-V1-5 model. These fine-tuned LoRA modules are then merged with the InternVL-Chat-V1-5 model using the PEFT model merging method, TIES.

You can find the inference code here.

✨ Features

Dataset Preparation

The MMBench benchmark contains 20 categories in the mmbench_dev_cn_20231003.tsv dataset. For each category, we first use CoT (Chain of Thought) consistency with the InternVL-Chat-V1-5 model to prepare the training dataset. For specific categories like nature_relation, image_emotion, image_scene, action_recognition, and image_style, we analyze the bad cases made by the InternVL-Chat-V1-5 model. We then prepare images and QA text from online sources to address these issues.

Model Merging

After fine-tuning the 20 LoRAs, they are merged with the InternVL-Chat-V1-5 model using the TIES method.

Benchmark Performance

A huge thank you to the OpenCompass MMBench team for updating the leaderboard on August 27, 2024. We have collected the ranks and scores from the leaderboard for reference. For example, a ranking of "7/82.1" indicates a 7th place finish with a score of 82.1 in that category. We chose GPT-4o (0513, detail-high) because it is the best-performing GPT-4o model in the MMBench Test (CN).

Model	MMBench Test (CN)	MMBench v1.1 Test (CN)	CCBench dev	MMBench Test	MMBench v1.1 Test
GPT-4o (0513, detail-high)	4/82.1	5/81.5	7/71.2	4/83.4	5/83
MMAlaya2	7/82.1	8/79.7	8/70	9/82.5	9/80.6
InternVL-Chat-V1.5	14/80.7	15/79.1	9/69.8	11/82.3	10/80.3

The average score on the MMBench Test (CN) reached 82.1, surpassing the InternVL-Chat-V1-5 model's score of 80.7 by 1.4 points. Although the rank is 7, this score matches GPT-4o's performance, which is ranked 4th, placing the model on par with GPT-4o. Additionally, scores on the other four benchmarks—MMBench v1.1 Test (CN), CCBench dev, MMBench Test, and MMBench v1.1 Test—have also improved by 0.2 to 0.6 points, further closing the gap to GPT-4o's performance.

We found this result noteworthy. As a result, we are sharing this model publicly.

📄 License

This project is released under the MIT license, aligning with the InternVL-Chat-V1-5 model's license. However, InternLM2 is licensed under the Apache-2.0 license.

📚 Documentation

Citation

If you find this project useful in your research, please consider citing:

@misc{datacanvas2024mmalaya2,
    author = {DataCanvas Ltd.},
    title = {MMAlaya2},
    year = {2024},
    howpublished = {\url{https://huggingface.co/DataCanvas/MMAlaya2}},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご