Kosmos-2-PokemonCards-trl-merged Open-source Model - Accurately Identify the Names of Pokémon in Pokémon Cards

Kosmos 2 PokemonCards Trl Merged

Developed by Mit1208

This is a multimodal model fine-tuned based on Microsoft's Kosmos-2 model, specifically designed for recognizing Pokemon names on Pokemon cards.

Image-to-Text

Transformers

English#Pokemon card recognition #Image-to-text generation #Few-shot fine-tuning

Downloads 51

Release Time : 5/12/2024

Model Overview

The model can identify Pokemon names on cards through its image-to-text conversion capability. It was fine-tuned for 85 epochs on 300 samples.

Model Features

Multimodal capability

Combines visual and language understanding to extract text information from images

Domain-specific optimization

Fine-tuned specifically for Pokemon card recognition

Lightweight deployment

Can run on consumer-grade GPUs

Model Capabilities

Image-to-text conversion

Specific object recognition

Multimodal understanding

Use Cases

Gaming assistance

Pokemon card recognition

Identify Pokemon names on cards

Outputs Pokemon names like 'Wartortle'

Collection management

Card categorization

Helps collectors automatically categorize Pokemon cards

🚀 Transformers

This is a library for image-to-text tasks, fine-tuned on specific datasets to generate text descriptions from images.

🚀 Quick Start

This library is designed for image-to-text tasks. You can utilize the provided code examples to quickly start generating text descriptions from images.

✨ Features

Fine-tuned Model: Fine-tuned from [microsoft/kosmos-2-patch14-224], suitable for specific image-to-text tasks.
Easy to Use: Simple code examples are provided to help you quickly start using the model.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
import torch
from io import BytesIO
import requests
from PIL import Image

processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224")
my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)

# load image
image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
response = requests.get(image_url)
# Read the image from the response content
image = Image.open(BytesIO(response.content))

prompt = "Pokemon name is"

inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
with torch.no_grad():
    # autoregressively generate completion
    generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
# convert generated token IDs back to strings
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text.split("</image>")[-1].split(" and")[0] + ".")

'''
Output: Pokemon name is Wartortle.
'''

📚 Documentation

Model Details

Model Description

Developed by: [https://huggingface.co/Mit1208]
Finetuned from model: [microsoft/kosmos-2-patch14-224]

Training Details

You can find the training details in https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb.

Inference Details

You can find the inference details in https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb.

Limitation

This model was fine-tuned using free colab version so only used 300 samples in training for 85 epochs. Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data and/or update tokenizer padding token to tokenizer eos token.

📄 License

This project is licensed under the CC BY-NC 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご