vit-gpt2-verifycode-caption開源驗證碼識別模型

首頁

Vit Gpt2 Verifycode Caption

由AIris-Channel開發

基於60000張圖片訓練集微調的Vit-GPT2架構驗證碼識別模型，能夠準確識別圖像中的驗證碼文本。

圖像生成文本

Transformers

開源協議:Apache-2.0 #驗證碼識別 #ViT-GPT2架構 #高精度字符識別

下載量 28

發布時間 : 8/17/2023

模型概述

該模型是一個圖像轉文本模型，專門用於驗證碼識別任務，能夠將驗證碼圖像轉換為對應的文本內容。

模型特點

高效驗證碼識別

基於60000張圖片訓練集微調，能夠準確識別各種驗證碼。

Vit-GPT2架構

結合視覺編碼器和文本解碼器的優勢，實現圖像到文本的高效轉換。

易於集成

提供標準的Transformers調用接口，方便在各種應用中集成使用。

模型能力

圖像轉文本

驗證碼識別

圖像描述生成

使用案例

安全驗證

網站驗證碼識別

自動識別網站登錄或註冊時的驗證碼，提高自動化測試效率。

準確識別驗證碼文本

自動化測試

在自動化測試流程中自動處理驗證碼驗證環節。

提高測試自動化程度

🚀 世萌驗證碼識別模型

世萌驗證碼識別模型可用於圖像轉文本任務，具體為圖像描述生成。該模型基於vit - gpt2微調，使用了60000張圖片的訓練集，能有效識別驗證碼。

🚀 快速開始

本模型可在Transformers庫中使用，以下是使用示例代碼：

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
import torch
from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
feature_extractor = ViTImageProcessor.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
tokenizer = AutoTokenizer.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

max_length = 16
num_beams = 4
gen_kwargs = {"max_length": max_length, "num_beams": num_beams}
def predict_step(image_paths):
  images = []
  for image_path in image_paths:
    i_image = Image.open(image_path)
    if i_image.mode != "RGB":
      i_image = i_image.convert(mode="RGB")

    images.append(i_image)

  pixel_values = feature_extractor(images=images, return_tensors="pt").pixel_values
  pixel_values = pixel_values.to(device)

  output_ids = model.generate(pixel_values, **gen_kwargs)

  preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  preds = [pred.strip() for pred in preds]
  return preds

pred=predict_step(['ZZZTVESE.jpg'])
print(pred) #zzztvese

✨ 主要特性

圖像描述生成：可實現圖像到文本的轉換，適用於圖像描述任務。
基於微調：基於vit - gpt2進行微調，能更好地適應驗證碼識別任務。
訓練數據豐富：使用了60000張圖片的訓練集，保證了模型的準確性和泛化能力。

💻 使用示例

基礎用法

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
import torch
from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
feature_extractor = ViTImageProcessor.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
tokenizer = AutoTokenizer.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

max_length = 16
num_beams = 4
gen_kwargs = {"max_length": max_length, "num_beams": num_beams}
def predict_step(image_paths):
  images = []
  for image_path in image_paths:
    i_image = Image.open(image_path)
    if i_image.mode != "RGB":
      i_image = i_image.convert(mode="RGB")

    images.append(i_image)

  pixel_values = feature_extractor(images=images, return_tensors="pt").pixel_values
  pixel_values = pixel_values.to(device)

  output_ids = model.generate(pixel_values, **gen_kwargs)

  preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  preds = [pred.strip() for pred in preds]
  return preds

pred=predict_step(['ZZZTVESE.jpg'])
print(pred) #zzztvese