vit-gpt2-verifycode-caption开源验证码识别模型

首页

Vit Gpt2 Verifycode Caption

由 AIris-Channel 开发

基于60000张图片训练集微调的Vit-GPT2架构验证码识别模型，能够准确识别图像中的验证码文本。

图像生成文本

Transformers

开源协议:Apache-2.0 #验证码识别 #ViT-GPT2架构 #高精度字符识别

下载量 28

发布时间 : 8/17/2023

模型简介

该模型是一个图像转文本模型，专门用于验证码识别任务，能够将验证码图像转换为对应的文本内容。

模型特点

高效验证码识别

基于60000张图片训练集微调，能够准确识别各种验证码。

Vit-GPT2架构

结合视觉编码器和文本解码器的优势，实现图像到文本的高效转换。

易于集成

提供标准的Transformers调用接口，方便在各种应用中集成使用。

模型能力

图像转文本

验证码识别

图像描述生成

使用案例

安全验证

网站验证码识别

自动识别网站登录或注册时的验证码，提高自动化测试效率。

准确识别验证码文本

自动化测试

在自动化测试流程中自动处理验证码验证环节。

提高测试自动化程度

🚀 世萌验证码识别模型

世萌验证码识别模型可用于图像转文本任务，具体为图像描述生成。该模型基于vit - gpt2微调，使用了60000张图片的训练集，能有效识别验证码。

🚀 快速开始

本模型可在Transformers库中使用，以下是使用示例代码：

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
import torch
from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
feature_extractor = ViTImageProcessor.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
tokenizer = AutoTokenizer.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

max_length = 16
num_beams = 4
gen_kwargs = {"max_length": max_length, "num_beams": num_beams}
def predict_step(image_paths):
  images = []
  for image_path in image_paths:
    i_image = Image.open(image_path)
    if i_image.mode != "RGB":
      i_image = i_image.convert(mode="RGB")

    images.append(i_image)

  pixel_values = feature_extractor(images=images, return_tensors="pt").pixel_values
  pixel_values = pixel_values.to(device)

  output_ids = model.generate(pixel_values, **gen_kwargs)

  preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  preds = [pred.strip() for pred in preds]
  return preds

pred=predict_step(['ZZZTVESE.jpg'])
print(pred) #zzztvese

✨ 主要特性

图像描述生成：可实现图像到文本的转换，适用于图像描述任务。
基于微调：基于vit - gpt2进行微调，能更好地适应验证码识别任务。
训练数据丰富：使用了60000张图片的训练集，保证了模型的准确性和泛化能力。

💻 使用示例

基础用法

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
import torch
from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
feature_extractor = ViTImageProcessor.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")
tokenizer = AutoTokenizer.from_pretrained("AIris-Channel/vit-gpt2-verifycode-caption")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

max_length = 16
num_beams = 4
gen_kwargs = {"max_length": max_length, "num_beams": num_beams}
def predict_step(image_paths):
  images = []
  for image_path in image_paths:
    i_image = Image.open(image_path)
    if i_image.mode != "RGB":
      i_image = i_image.convert(mode="RGB")

    images.append(i_image)

  pixel_values = feature_extractor(images=images, return_tensors="pt").pixel_values
  pixel_values = pixel_values.to(device)

  output_ids = model.generate(pixel_values, **gen_kwargs)

  preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  preds = [pred.strip() for pred in preds]
  return preds

pred=predict_step(['ZZZTVESE.jpg'])
print(pred) #zzztvese