đ Chitrarth: Bridging Vision and Language for a Billion People
Chitrarth integrates a state - of - the - art multilingual LLM with a vision module, enabling image - text processing across 10 Indian languages and English, aiming to serve a vast population.
đ Quick Start
To quickly get started with Chitrarth, you can access the model via the web interface at Chitrarth Online. For local inference, follow the steps in the "Inference code" section.
⨠Features
- Multilingual Support: Supports 10 Indic languages (Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese) along with English.
- Model Architecture: Utilizes Krutrim - 1 as the base LLM and SigLIP as the visual encoder with a 2 - layer MLP.
- General - Purpose Usage: A general - purpose VLM suitable for various image - text - to - text tasks.
đĻ Installation
git clone https://github.com/ola-krutrim/Chitrarth.git
conda create --name chitrarth python=3.10
conda activate chitrarth
cd Chitrarth
pip install -e .
đģ Usage Examples
Basic Usage
python chitrarth/inference.py --model-path "krutrim-ai-labs/chitrarth" --image-file "assets/govt_school.jpeg" --query "Explain the image. "
đ Documentation
Model Summary
Property |
Details |
Model Type |
Krutrim - 1 as the base LLM, SigLIP as the visual encoder with 2 layer MLP |
Languages Supported |
10 Indic languages (Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese) and English |
Usage |
General purpose VLM |
Evaluation Results

Chitrarth shows excellent performance against SOTA VLMs on different academic multimodal tasks. It consistently outperforms IDEFICS 2 (7B) and PALO 7B on different benchmarks while remaining competitive on TextVQA and Vizwiz.
We introduce BharatBench, a comprehensive evaluation benchmark suite for 10 under - resourced Indic languages across 3 tasks. The performance of Chitrarth on the BharatBench Evaluation framework sets a strong baseline for future research in this domain.
Language |
POPE |
LLaVA - Bench |
MMVet |
Telugu |
79.9 |
54.8 |
43.76 |
Hindi |
78.68 |
51.5 |
38.85 |
Bengali |
83.24 |
53.7 |
33.24 |
Malayalam |
85.29 |
55.5 |
25.36 |
Kannada |
85.52 |
58.1 |
46.19 |
Assamese |
55.59 |
59.1 |
37.29 |
Tamil |
83.28 |
58.3 |
34.31 |
Marathi |
79.17 |
52.8 |
40.96 |
Gujarati |
84.75 |
55.9 |
39.03 |
Odia |
82.03 |
62.8 |
19.67 |
English |
87.63 |
67.9 |
30.49 |
đ License
This code repository and the model weights are licensed under the Krutrim Community License.
đ Citation
@inproceedings{
khan2024chitrarth,
title={Chitrarth: Bridging Vision and Language for a Billion People},
author={Shaharukh Khan, Ayush Tarun, Abhinav Ravi, Ali Faraz, Praveen Kumar Pokala, Anagha Bhangare, Raja Kolla, Chandra Khatri, Shubham Agarwal},
booktitle={NeurIPS Multimodal Algorithmic Reasoning},
year={2024},
}
đ¤ Contact
Contributions are welcome! If you have any improvements or suggestions, feel free to submit a pull request on GitHub.
đ Acknowledgement
Chitrarth is built with reference to the code of the following projects: Transformers, and [LLaVA - 1.5](https://github.com/haotian - liu/LLaVA). Thanks for their awesome work!