# Multimodal vision-language model
Biotrove CLIP
MIT
BioTrove-CLIP is a set of CLIP-style visual-language foundation models for biodiversity, trained on a dataset containing 40 million images and 33,000 plant and animal species.
Text-to-Image English
B
BGLab
48
2
Qwen For Jawi V1
A Jawi OCR model fine-tuned from Qwen2-VL-2B-Instruct, specifically designed for recognizing historical Malay texts
Image-to-Text
Transformers

Q
culturalheritagenus
155
1
Featured Recommended AI Models