# Synthetic Data Training

Smolvlm 500M Anime Caption V0.2
Apache-2.0
A vision-language model specialized in describing anime-style images, fine-tuned based on SmolVLM-500M-Base
Image-to-Text English
S
Andres77872
17
0
Cockatiel 8B
A video caption generation model based on VILA-v1.5-8B, capable of generating detailed and human-preference-aligned captions for input videos.
Video-to-Text Transformers
C
Fr0zencr4nE
19
0
Poseless 3B
Apache-2.0
PoseLess is an innovative robotic hand control framework that directly maps 2D images to joint angles using projection representations, eliminating the need for explicit pose estimation.
Multimodal Fusion Transformers
P
homebrewltd
98
7
Poseless 3B
Apache-2.0
Poseless-3B is a vision-language model (VLM)-based robotic hand control framework that directly maps 2D images to joint angles without explicit pose estimation.
Pose Estimation Transformers
P
Menlo
65
10
Gliner Biomed Bi Large V1.0
Apache-2.0
GLiNER-BioMed is an efficient open NER model suite based on the GLiNER framework, specifically designed for the biomedical domain to recognize various types of biomedical entities.
Sequence Labeling English
G
Ihor
56
1
Gliner Biomed Bi Base V1.0
Apache-2.0
GLiNER-BioMed is an efficient open biomedical named entity recognition model suite based on the GLiNER framework, specifically designed for the biomedical domain, capable of recognizing multiple entity types.
Sequence Labeling English
G
Ihor
25
1
Gliner Biomed Large V1.0
Apache-2.0
GLiNER-BioMed is a specialized and efficient open biomedical NER model suite based on the GLiNER framework, achieving state-of-the-art zero-shot and few-shot performance in biomedical entity recognition tasks.
Sequence Labeling English
G
Ihor
163
6
Asagi 8B
Apache-2.0
Asagi-8B is a large-scale Japanese Vision-Language Model (VLM) trained on extensive Japanese datasets, integrating diverse data sources.
Image-to-Text Transformers Japanese
A
MIL-UT
58
4
Slam Scaled
MIT
A high-quality speech language model trained on a single GPU within 24 hours, fine-tuned based on Qwen2.5-0.5B, using Hubert tokens as vocabulary
Audio Generation Transformers
S
slprl
792
6
Modernbert Large Bias Type Classifier
MIT
A text classification model fine-tuned based on ModernBERT-large, designed to detect and classify various types of bias in text.
Text Classification Transformers English
M
cirimus
424
2
Asagi 14B
Apache-2.0
Asagi-14B is a large-scale Japanese Vision and Language Model (VLM) trained on a wide range of Japanese datasets, integrating diverse data sources.
Image-to-Text Transformers Japanese
A
MIL-UT
83
9
Flux.1 Dev Controlnet Upscaler
Other
A ControlNet model developed by Jasper Research Team for low-resolution image upscaling
Image Enhancement
F
R1000
106
3
Multilingual Sentiment Analysis
A multilingual sentiment analysis model fine-tuned based on DistilBERT, supporting 21 languages, suitable for various scenarios such as social media and customer feedback analysis.
Text Classification Transformers Supports Multiple Languages
M
tabularisai
162.07k
145
Euclid Convnext Xxlarge 120524
Apache-2.0
A multimodal large language model specifically trained to enhance low-level geometric perception, improving geometric analysis capabilities through high-fidelity synthetic visual descriptions
Text-to-Image Transformers English
E
euclid-multimodal
22
4
Mstyledistance
MIT
mStyleDistance is a multilingual style embedding model designed to closely embed texts with similar writing styles while distancing those with different styles, regardless of content or language constraints.
Text Embedding
M
StyleDistance
207
2
Pegasus X Base Synthsumm Open 16k
Apache-2.0
A text summarization model fine-tuned based on pegasus-x-base, trained with synthetic data, excelling in long document summarization tasks.
Text Generation Transformers English
P
BEE-spoke-data
115
2
Flux.1 Dev Controlnet Upscaler
Other
A ControlNet model developed by the Jasper research team for low-resolution image upscaling
Image Enhancement
F
jasperai
11.16k
710
Reflection Llama 3.1 70B
Reflection Llama-3.1 70B is an open-source large language model trained with 'reflection tuning' technology, capable of autonomously detecting reasoning errors and correcting its approach.
Large Language Model Transformers
R
mattshumer
199
1,712
Depth Anything V2 Metric Indoor Large Hf
A fine-tuned version of Depth Anything V2 for indoor metric depth estimation using the synthetic Hypersim dataset, compatible with the transformers library.
3D Vision Transformers
D
depth-anything
47.99k
9
Depth Anything V2 Metric Indoor Base Hf
A version fine-tuned for indoor metric depth estimation tasks using the Hypersim synthetic dataset, based on the Depth Anything V2 model
3D Vision Transformers
D
depth-anything
9,056
1
Depth Anything V2 Metric Indoor Small Hf
A model fine-tuned from Depth Anything V2 for indoor metric depth estimation tasks, trained on the synthetic dataset Hypersim, compatible with the transformers library.
3D Vision Transformers
D
depth-anything
750
2
Depth Anything V2 Metric Outdoor Small Hf
A fine-tuned version of Depth Anything V2, specifically designed for metric depth estimation in outdoor scenes, trained on the synthetic dataset Virtual KITTI.
3D Vision Transformers
D
depth-anything
459
1
Depth Anything V2 Metric Outdoor Base Hf
A version fine-tuned for outdoor metric depth estimation tasks using the synthetic Virtual KITTI dataset, compatible with the transformers library.
3D Vision Transformers
D
depth-anything
436
0
Robust Sentiment Analysis
Apache-2.0
A sentiment analysis model fine-tuned based on distilbert/distilbert-base-uncased, trained solely on synthetic data, supporting 5 sentiment classifications.
Text Classification Transformers English
R
tabularisai
2,632
14
Styledistance
MIT
StyleDistance is a style embedding model designed to closely embed texts with similar writing styles and distance those with different styles, unaffected by content.
Text Embedding English
S
StyleDistance
492
4
Gemma 2 9B It SPPO Iter3
An 8.9 billion parameter language model developed in the third iteration using self-play preference optimization, starting from google/gemma-2-9b-it and fine-tuned with the UltraFeedback dataset
Large Language Model Transformers English
G
UCLA-AGI
6,704
125
Qwen2 1.5B Summarize
Apache-2.0
A specialized summarization model fine-tuned for 2 rounds based on Qwen2-1.5B-Instruct
Text Generation Transformers English
Q
thepowerfuldeez
228
1
Trocr Base Ru
Apache-2.0
TrOCR-Ru is an optical character recognition model fine-tuned on synthetic datasets of Russian and English, based on microsoft/trocr-base-handwritten, focusing on image-to-text tasks.
Text Recognition Transformers Supports Multiple Languages
T
sherstpasha99
30
0
Merlinite 7b Lab
Apache-2.0
Merlinite 7B is a language model developed based on Mistral-7B-v0.1. It is trained using the LAB alignment method developed by IBM Research and performs excellently in multiple benchmark tests.
Large Language Model Transformers
M
instructlab
285
22
Roberta Base Zeroshot V2.0 C
MIT
A zero-shot classification model based on the RoBERTa architecture, designed for text classification tasks without requiring training data, supports both GPU and CPU operation, and is trained using fully business-friendly data.
Text Classification Transformers English
R
MoritzLaurer
3,188
4
Zephyr 7b Gemma V0.1
Other
Zephyr 7B Gemma is a language model fine-tuned based on google/gemma-7b, trained on publicly available synthetic datasets using Direct Preference Optimization (DPO), designed to serve as a helpful assistant.
Large Language Model Transformers
Z
HuggingFaceH4
502
124
Trocr Base Ru
Apache-2.0
A Russian and English OCR model fine-tuned from microsoft/trocr-base-handwritten, specializing in handwritten and printed text recognition
Text Recognition Transformers Supports Multiple Languages
T
raxtemur
977
26
Openmath Mistral 7B V0.1 Hf
Apache-2.0
The OpenMath model solves mathematical problems by combining textual reasoning with Python interpreter-executed code blocks, fine-tuned based on Mistral-7B-v0.1
Large Language Model Transformers Supports Multiple Languages
O
nvidia
22
31
Ko Deplot
Apache-2.0
ko-deplot is a Korean visual question answering model based on Google's Pix2Struct architecture, fine-tuned from the Deplot model, supporting chart image question-answering tasks in Korean and English.
Image-to-Text Transformers Supports Multiple Languages
K
nuua
252
5
Orca 2 13b
Other
Orca 2 is a research-oriented language model developed by Microsoft, focusing on enhancing the reasoning capabilities of small language models.
Large Language Model Transformers
O
microsoft
11.10k
666
Orca 2 7b
Other
Orca 2 is a research-oriented language model developed by Microsoft, focusing on enhancing the reasoning capabilities of small language models, fine-tuned based on LLAMA-2.
Large Language Model Transformers
O
microsoft
120.21k
219
Donutlicenses3v3
MIT
This model is used to extract structured information from EU driver's license images and return the results in JSON format.
Text Recognition Transformers English
D
felipebandeira
54
5
Trocr Small Korean
Apache-2.0
TrOCR is a Korean image-to-text model based on a vision encoder-decoder architecture, using DeiT as the image encoder and RoBERTa as the text decoder.
Image-to-Text Korean
T
team-lucid
342
17
Pythia 2.8b Deduped Synthetic Instruct
Apache-2.0
An instruction generation model fine-tuned on the deduplicated version of Pythia-2.8B, optimized for synthetic instruction datasets
Large Language Model Transformers English
P
lambdalabs
46
6
Octfusion Exp1 HKDB Synthetic
OCTFusion is a PyTorch-based image classification model that achieved 100% accuracy on synthetic data.
Image Classification Transformers
O
g30rv17ys
33
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase