# Multimodal Evaluation
Tinyllava Video R1
Apache-2.0
TinyLLaVA-Video-R1 is a small-scale video reasoning model based on the traceable training model TinyLLaVA-Video. It significantly enhances reasoning and thinking abilities through reinforcement learning and exhibits the emergent property of 'epiphany moments'.
Video-to-Text
Transformers

T
Zhang199
123
2
Llava Critic 7b Hf
This is a transformers-compatible vision-language model with image understanding and text generation capabilities
Text-to-Image
Transformers

L
FuryMartin
21
1
Uiclip Jitteredwebsites 2 224 Paraphrased
MIT
UIClip is a multimodal model that quantifies the design quality and relevance of user interface (UI) screenshots through textual descriptions.
Text-to-Image
Transformers

U
biglab
9,739
1
Chartve
Apache-2.0
ChartVE is a visual entailment model designed to evaluate the factual accuracy of generated caption sentences relative to input charts.
Image-to-Text
Transformers English

C
khhuang
38
3
Featured Recommended AI Models