Chitrarth Open-Source Multilingual Vision-Language Model - Connecting Images and Languages, Supporting Multiple Indian Languages

Chitrarth

Developed by krutrim-ai-labs

Chitrarth is a multilingual vision-language model designed to connect vision and language, with a special focus on supporting multiple Indian languages.

Image-to-Text

Safetensors

Supports Multiple LanguagesOpen Source License:Other #Multilingual Image Understanding #Support for Indian Native Languages #Bridge Between Vision and Language

Downloads 410

Release Time : 2/2/2025

Model Overview

This model focuses on image-text-to-text tasks, supporting various Indian languages, aiming to provide a bridge between vision and language for a billion people.

Model Features

Multilingual Support

Supports 11 major Indian languages, including Hindi, Bengali, Tamil, etc.

Vision-Language Connection

Focuses on converting visual information into text, bridging the gap between vision and language.

Community-Driven

Adopts the Krutrim Community License Agreement to encourage community participation and contributions.

Model Capabilities

Image-text recognition

Multilingual text generation

Visual content understanding

Use Cases

Multilingual Content Generation

Multilingual Image Captioning

Generates descriptive text for images in multiple Indian languages.

Accessibility Services

Visual Assistance

Provides language descriptions of image content for visually impaired individuals.

🚀 Chitrarth: Bridging Vision and Language for a Billion People

Chitrarth integrates a state - of - the - art multilingual LLM with a vision module, enabling image - text processing across 10 Indian languages and English, aiming to serve a vast population.

🚀 Quick Start

To quickly get started with Chitrarth, you can access the model via the web interface at Chitrarth Online. For local inference, follow the steps in the "Inference code" section.

✨ Features

Multilingual Support: Supports 10 Indic languages (Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese) along with English.
Model Architecture: Utilizes Krutrim - 1 as the base LLM and SigLIP as the visual encoder with a 2 - layer MLP.
General - Purpose Usage: A general - purpose VLM suitable for various image - text - to - text tasks.

📦 Installation

git clone https://github.com/ola-krutrim/Chitrarth.git
conda create --name chitrarth python=3.10
conda activate chitrarth

cd Chitrarth 
pip install -e .

💻 Usage Examples

Basic Usage

python chitrarth/inference.py --model-path "krutrim-ai-labs/chitrarth" --image-file "assets/govt_school.jpeg" --query "Explain the image. "

📚 Documentation

Model Summary

Property	Details
Model Type	Krutrim - 1 as the base LLM, SigLIP as the visual encoder with 2 layer MLP
Languages Supported	10 Indic languages (Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese) and English
Usage	General purpose VLM

Evaluation Results

model

Chitrarth shows excellent performance against SOTA VLMs on different academic multimodal tasks. It consistently outperforms IDEFICS 2 (7B) and PALO 7B on different benchmarks while remaining competitive on TextVQA and Vizwiz.

We introduce BharatBench, a comprehensive evaluation benchmark suite for 10 under - resourced Indic languages across 3 tasks. The performance of Chitrarth on the BharatBench Evaluation framework sets a strong baseline for future research in this domain.

Language	POPE	LLaVA - Bench	MMVet
Telugu	79.9	54.8	43.76
Hindi	78.68	51.5	38.85
Bengali	83.24	53.7	33.24
Malayalam	85.29	55.5	25.36
Kannada	85.52	58.1	46.19
Assamese	55.59	59.1	37.29
Tamil	83.28	58.3	34.31
Marathi	79.17	52.8	40.96
Gujarati	84.75	55.9	39.03
Odia	82.03	62.8	19.67
English	87.63	67.9	30.49

📄 License

This code repository and the model weights are licensed under the Krutrim Community License.

📚 Citation

@inproceedings{
  khan2024chitrarth,
  title={Chitrarth: Bridging Vision and Language for a Billion People},
  author={Shaharukh Khan, Ayush Tarun, Abhinav Ravi, Ali Faraz, Praveen Kumar Pokala, Anagha Bhangare, Raja Kolla, Chandra Khatri, Shubham Agarwal},
  booktitle={NeurIPS Multimodal Algorithmic Reasoning},
  year={2024},
}

🤝 Contact

Contributions are welcome! If you have any improvements or suggestions, feel free to submit a pull request on GitHub.

🙏 Acknowledgement

Chitrarth is built with reference to the code of the following projects: Transformers, and [LLaVA - 1.5](https://github.com/haotian - liu/LLaVA). Thanks for their awesome work!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご