# Synthetic Data Training
Smolvlm 500M Anime Caption V0.2
Apache-2.0
A vision-language model specialized in describing anime-style images, fine-tuned based on SmolVLM-500M-Base
Image-to-Text English
S
Andres77872
17
0
Cockatiel 8B
A video caption generation model based on VILA-v1.5-8B, capable of generating detailed and human-preference-aligned captions for input videos.
Video-to-Text
Transformers

C
Fr0zencr4nE
19
0
Poseless 3B
Apache-2.0
PoseLess is an innovative robotic hand control framework that directly maps 2D images to joint angles using projection representations, eliminating the need for explicit pose estimation.
Multimodal Fusion
Transformers

P
homebrewltd
98
7
Poseless 3B
Apache-2.0
Poseless-3B is a vision-language model (VLM)-based robotic hand control framework that directly maps 2D images to joint angles without explicit pose estimation.
Pose Estimation
Transformers

P
Menlo
65
10
Gliner Biomed Bi Large V1.0
Apache-2.0
GLiNER-BioMed is an efficient open NER model suite based on the GLiNER framework, specifically designed for the biomedical domain to recognize various types of biomedical entities.
Sequence Labeling English
G
Ihor
56
1
Gliner Biomed Bi Base V1.0
Apache-2.0
GLiNER-BioMed is an efficient open biomedical named entity recognition model suite based on the GLiNER framework, specifically designed for the biomedical domain, capable of recognizing multiple entity types.
Sequence Labeling English
G
Ihor
25
1
Gliner Biomed Large V1.0
Apache-2.0
GLiNER-BioMed is a specialized and efficient open biomedical NER model suite based on the GLiNER framework, achieving state-of-the-art zero-shot and few-shot performance in biomedical entity recognition tasks.
Sequence Labeling English
G
Ihor
163
6
Asagi 8B
Apache-2.0
Asagi-8B is a large-scale Japanese Vision-Language Model (VLM) trained on extensive Japanese datasets, integrating diverse data sources.
Image-to-Text
Transformers Japanese

A
MIL-UT
58
4
Slam Scaled
MIT
A high-quality speech language model trained on a single GPU within 24 hours, fine-tuned based on Qwen2.5-0.5B, using Hubert tokens as vocabulary
Audio Generation
Transformers

S
slprl
792
6
Modernbert Large Bias Type Classifier
MIT
A text classification model fine-tuned based on ModernBERT-large, designed to detect and classify various types of bias in text.
Text Classification
Transformers English

M
cirimus
424
2
Asagi 14B
Apache-2.0
Asagi-14B is a large-scale Japanese Vision and Language Model (VLM) trained on a wide range of Japanese datasets, integrating diverse data sources.
Image-to-Text
Transformers Japanese

A
MIL-UT
83
9
Flux.1 Dev Controlnet Upscaler
Other
A ControlNet model developed by Jasper Research Team for low-resolution image upscaling
Image Enhancement
F
R1000
106
3
Multilingual Sentiment Analysis
A multilingual sentiment analysis model fine-tuned based on DistilBERT, supporting 21 languages, suitable for various scenarios such as social media and customer feedback analysis.
Text Classification
Transformers Supports Multiple Languages

M
tabularisai
162.07k
145
Euclid Convnext Xxlarge 120524
Apache-2.0
A multimodal large language model specifically trained to enhance low-level geometric perception, improving geometric analysis capabilities through high-fidelity synthetic visual descriptions
Text-to-Image
Transformers English

E
euclid-multimodal
22
4
Mstyledistance
MIT
mStyleDistance is a multilingual style embedding model designed to closely embed texts with similar writing styles while distancing those with different styles, regardless of content or language constraints.
Text Embedding
M
StyleDistance
207
2
Pegasus X Base Synthsumm Open 16k
Apache-2.0
A text summarization model fine-tuned based on pegasus-x-base, trained with synthetic data, excelling in long document summarization tasks.
Text Generation
Transformers English

P
BEE-spoke-data
115
2
Flux.1 Dev Controlnet Upscaler
Other
A ControlNet model developed by the Jasper research team for low-resolution image upscaling
Image Enhancement
F
jasperai
11.16k
710
Reflection Llama 3.1 70B
Reflection Llama-3.1 70B is an open-source large language model trained with 'reflection tuning' technology, capable of autonomously detecting reasoning errors and correcting its approach.
Large Language Model
Transformers

R
mattshumer
199
1,712
Depth Anything V2 Metric Indoor Large Hf
A fine-tuned version of Depth Anything V2 for indoor metric depth estimation using the synthetic Hypersim dataset, compatible with the transformers library.
3D Vision
Transformers

D
depth-anything
47.99k
9
Depth Anything V2 Metric Indoor Base Hf
A version fine-tuned for indoor metric depth estimation tasks using the Hypersim synthetic dataset, based on the Depth Anything V2 model
3D Vision
Transformers

D
depth-anything
9,056
1
Depth Anything V2 Metric Indoor Small Hf
A model fine-tuned from Depth Anything V2 for indoor metric depth estimation tasks, trained on the synthetic dataset Hypersim, compatible with the transformers library.
3D Vision
Transformers

D
depth-anything
750
2
Depth Anything V2 Metric Outdoor Small Hf
A fine-tuned version of Depth Anything V2, specifically designed for metric depth estimation in outdoor scenes, trained on the synthetic dataset Virtual KITTI.
3D Vision
Transformers

D
depth-anything
459
1
Depth Anything V2 Metric Outdoor Base Hf
A version fine-tuned for outdoor metric depth estimation tasks using the synthetic Virtual KITTI dataset, compatible with the transformers library.
3D Vision
Transformers

D
depth-anything
436
0
Robust Sentiment Analysis
Apache-2.0
A sentiment analysis model fine-tuned based on distilbert/distilbert-base-uncased, trained solely on synthetic data, supporting 5 sentiment classifications.
Text Classification
Transformers English

R
tabularisai
2,632
14
Styledistance
MIT
StyleDistance is a style embedding model designed to closely embed texts with similar writing styles and distance those with different styles, unaffected by content.
Text Embedding English
S
StyleDistance
492
4
Gemma 2 9B It SPPO Iter3
An 8.9 billion parameter language model developed in the third iteration using self-play preference optimization, starting from google/gemma-2-9b-it and fine-tuned with the UltraFeedback dataset
Large Language Model
Transformers English

G
UCLA-AGI
6,704
125
Qwen2 1.5B Summarize
Apache-2.0
A specialized summarization model fine-tuned for 2 rounds based on Qwen2-1.5B-Instruct
Text Generation
Transformers English

Q
thepowerfuldeez
228
1
Trocr Base Ru
Apache-2.0
TrOCR-Ru is an optical character recognition model fine-tuned on synthetic datasets of Russian and English, based on microsoft/trocr-base-handwritten, focusing on image-to-text tasks.
Text Recognition
Transformers Supports Multiple Languages

T
sherstpasha99
30
0
Merlinite 7b Lab
Apache-2.0
Merlinite 7B is a language model developed based on Mistral-7B-v0.1. It is trained using the LAB alignment method developed by IBM Research and performs excellently in multiple benchmark tests.
Large Language Model
Transformers

M
instructlab
285
22
Roberta Base Zeroshot V2.0 C
MIT
A zero-shot classification model based on the RoBERTa architecture, designed for text classification tasks without requiring training data, supports both GPU and CPU operation, and is trained using fully business-friendly data.
Text Classification
Transformers English

R
MoritzLaurer
3,188
4
Zephyr 7b Gemma V0.1
Other
Zephyr 7B Gemma is a language model fine-tuned based on google/gemma-7b, trained on publicly available synthetic datasets using Direct Preference Optimization (DPO), designed to serve as a helpful assistant.
Large Language Model
Transformers

Z
HuggingFaceH4
502
124
Trocr Base Ru
Apache-2.0
A Russian and English OCR model fine-tuned from microsoft/trocr-base-handwritten, specializing in handwritten and printed text recognition
Text Recognition
Transformers Supports Multiple Languages

T
raxtemur
977
26
Openmath Mistral 7B V0.1 Hf
Apache-2.0
The OpenMath model solves mathematical problems by combining textual reasoning with Python interpreter-executed code blocks, fine-tuned based on Mistral-7B-v0.1
Large Language Model
Transformers Supports Multiple Languages

O
nvidia
22
31
Ko Deplot
Apache-2.0
ko-deplot is a Korean visual question answering model based on Google's Pix2Struct architecture, fine-tuned from the Deplot model, supporting chart image question-answering tasks in Korean and English.
Image-to-Text
Transformers Supports Multiple Languages

K
nuua
252
5
Orca 2 13b
Other
Orca 2 is a research-oriented language model developed by Microsoft, focusing on enhancing the reasoning capabilities of small language models.
Large Language Model
Transformers

O
microsoft
11.10k
666
Orca 2 7b
Other
Orca 2 is a research-oriented language model developed by Microsoft, focusing on enhancing the reasoning capabilities of small language models, fine-tuned based on LLAMA-2.
Large Language Model
Transformers

O
microsoft
120.21k
219
Donutlicenses3v3
MIT
This model is used to extract structured information from EU driver's license images and return the results in JSON format.
Text Recognition
Transformers English

D
felipebandeira
54
5
Trocr Small Korean
Apache-2.0
TrOCR is a Korean image-to-text model based on a vision encoder-decoder architecture, using DeiT as the image encoder and RoBERTa as the text decoder.
Image-to-Text Korean
T
team-lucid
342
17
Pythia 2.8b Deduped Synthetic Instruct
Apache-2.0
An instruction generation model fine-tuned on the deduplicated version of Pythia-2.8B, optimized for synthetic instruction datasets
Large Language Model
Transformers English

P
lambdalabs
46
6
Octfusion Exp1 HKDB Synthetic
OCTFusion is a PyTorch-based image classification model that achieved 100% accuracy on synthetic data.
Image Classification
Transformers

O
g30rv17ys
33
0
- 1
- 2
Featured Recommended AI Models