pmc_vit-l-14_hf Open-Source Vision-Language Model - Empowering Image-Text Association Applications through Fine-Tuning on Specific Datasets

Home

Pmc Vit L 14 Hf

Developed by ryanyip7777

A vision-language model fine-tuned on the PMC-OA dataset based on CLIP-ViT-L/14

Text-to-Image

Transformers

#Medical Image-Text Alignment #PMC Literature Adaptation #Multimodal Retrieval

Downloads 260

Release Time : 9/7/2023

Model Overview

This model is a fine-tuned version of OpenAI CLIP-ViT-L/14, specifically optimized for biomedical literature image-text matching tasks.

Model Features

Biomedical Domain Optimization

Fine-tuned on the PMC-OA biomedical literature dataset, enhancing the ability to process medical images and text

Multimodal Understanding

Capable of processing both image and text inputs, understanding the semantic relationships between them

Model Capabilities

Image Feature Extraction

Text Feature Extraction

Cross-modal Similarity Calculation

Image-Text Matching

Use Cases

Medical Research

Medical Literature Image Retrieval

Retrieve relevant medical images based on text descriptions

Medical Image Annotation

Generate descriptive text for medical images

🚀 clip-vit-l-14-pmc-finetuned

This model is a fine - tuned version of openai/clip-vit-large-patch14, which can provide better performance on the pmc_oa dataset.

🚀 Quick Start

This model is a fine-tuned version of openai/clip-vit-large-patch14 on an pmc_oa (https://huggingface.co/datasets/axiong/pmc_oa) dataset. It achieves the following results on the evaluation set:

Loss: 1.0125

✨ Features

Fine - tuned on the pmc_oa dataset to potentially improve performance on related tasks.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("ryanyip7777/pmc_vit-l-14_hf")
processor = CLIPProcessor.from_pretrained("ryanyip7777/pmc_vit-l-14_hf")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

Advanced Usage

finetune this model use the script from run_clip.py (https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text)

python -W ignore run_clip.py --model_name_or_path openai/clip-vit-large-patch14 \
      --output_dir ./clip-vit-l-14-pmc-finetuned \
      --train_file data/pmc_roco_train.csv \
      --validation_file data/pmc_roco_valid.csv \
      --image_column image --caption_column caption \
      --max_seq_length 77 \
      --do_train --do_eval \
      --per_device_train_batch_size 16 --per_device_eval_batch_size 8 \
      --remove_unused_columns=False \
      --learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1 \
      --overwrite_output_dir  \
      --num_train_epochs 10 \
      --logging_dir ./pmc_vit_logs \
      --save_total_limit 2 \
      --report_to  tensorboard

📚 Documentation

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10.0

Framework versions

Transformers 4.31.0
Pytorch 2.0.1
Datasets 2.14.4
Tokenizers 0.13.3

🔧 Technical Details

The model is a fine - tuned version of openai/clip-vit-large-patch14 on the pmc_oa dataset. The fine - tuning process uses specific hyperparameters and scripts to optimize the model for better performance on related tasks.

📄 License

No license information is provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご