🚀 Transformers
This is a library for image-to-text tasks, fine-tuned on specific datasets to generate text descriptions from images.
🚀 Quick Start
This library is designed for image-to-text tasks. You can utilize the provided code examples to quickly start generating text descriptions from images.
✨ Features
- Fine-tuned Model: Fine-tuned from [microsoft/kosmos-2-patch14-224], suitable for specific image-to-text tasks.
- Easy to Use: Simple code examples are provided to help you quickly start using the model.
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
import torch
from io import BytesIO
import requests
from PIL import Image
processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224")
my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)
image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
prompt = "Pokemon name is"
inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
with torch.no_grad():
generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text.split("</image>")[-1].split(" and")[0] + ".")
'''
Output: Pokemon name is Wartortle.
'''
📚 Documentation
Model Details
Model Description
- Developed by: [https://huggingface.co/Mit1208]
- Finetuned from model: [microsoft/kosmos-2-patch14-224]
Training Details
You can find the training details in https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb.
Inference Details
You can find the inference details in https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb.
Limitation
This model was fine-tuned using free colab version so only used 300 samples in training for 85 epochs.
Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data and/or update tokenizer padding token to tokenizer eos token.
📄 License
This project is licensed under the CC BY-NC 4.0 license.