donut-base-japanese-visual-novel Open-source Model - Accurately Identify Visual Novel Text and Options

Donut Base Japanese Visual Novel

Developed by oshizo

This model was trained on a synthetic dataset of visual novel-style images based on naver-clova-ix/donut-base, specifically designed to recognize text and options in visual novels.

Image-to-Text

Transformers

JapaneseOpen Source License:MIT #Visual Novel Text Recognition #Japanese Document Parsing #Game UI Extraction

Downloads 14

Release Time : 5/3/2023

Model Overview

The Donut model is fine-tuned to recognize text content in visual novel-style images, including dialogues, options, and character names.

Model Features

Specialized for Visual Novels

Optimized specifically for visual novel-style images, accurately recognizing dialogues, options, and character names.

Layout Adaptation

Training includes various common visual novel layouts and their variants, capable of handling different formatting styles.

Furigana Filtering

Designed to ignore furigana (phonetic annotations) and focus on accurately reading the main text content.

UI Element Filtering

Capable of minimizing the reading of non-dialogue UI elements such as SAVE, LOAD buttons, and date displays.

Model Capabilities

Visual Novel Image Recognition

Japanese Text Extraction

Dialogue Option Parsing

Character Name Recognition

Use Cases

Game Development

Visual Novel Text Extraction

Automatically recognizes dialogue content and options in visual novel game screenshots

Outputs structured JSON format dialogue information

Game Testing Automation

Used for automated testing of text display in visual novel games

Verifies whether game text is displayed correctly

Localization Tools

Translation Assistance

Extracts visual novel text for translation work

Provides accurate extraction of text to be translated

🚀 Donut (base-sized model, fine-tuned on visual novel like synthetic dataset)

This is a model obtained by fine-tuning naver-clova-ix/donut-base on a synthetic dataset of visual novel-like images.

🚀 Quick Start

Please refer to the sample notebook sample_predictions_colab.ipynb.

oshizo/donut-base-japanese-visual-novel

💻 Usage Examples

Basic Usage

{'options': '', 'names': '結月', 'messages': 'この神社には古い言い伝えがあるの。神樹の下で誓いを立てると、その願いは必ず叶うという。心を開いて、自分の想いを信じてみて。'}

{'options': ['行こう!', '今回は見送る', '準備を整えるまで待って(会話から抜けます)', '旅の目的について詳しく教えてください'], 'names': 'リリアン', 'messages': '私たちの使命は、新たな発見と交流を通じて地球と宇宙の未来を築くこと。この壮大な旅に参加する準備はできているかしら?'}

{'options': ['全力で攻撃する!勝利をつかめ!', '堅実に守り、敵の隙を待とう。'], 'names': '', 'messages': '敵を誘い込んで、戦術を駆使せよ。'}

{'options': 'もちろん、手伝います!', 'names': '下尾崎菊欠郎', 'messages': 'この書斎は重要な手がかりが隠されているかもしれない。君も協力してくれるか?'}

📚 Documentation

Specifications

It does not read furigana. The goal is to read the main text without being affected even if furigana is displayed.
The goal is to avoid reading UI elements such as "SAVE" and "LOAD" and date displays such as "Day 2" and "4/3" as much as possible.
It outputs a JSON with three keys: options, names, and messages.

Layouts included in training

The training data includes the following layouts and patterns where each pattern does not exist.

Layouts not included in training

Patterns not included in the training data, such as the following, cannot be read well.

Other limitations

Since it is only trained and evaluated on images with a width of 1,920px and a height of 1,080px, the recognition accuracy may decrease if the aspect ratio is significantly different.
The decoder's tokenizer is based on XLMRobertaTokenizer with about 1,500 types of Japanese kanji added. There are kanji that do not exist in the tokenizer and will not be output.

Training method

More detailed information is described in the following note article.

Memo on fine-tuning the end-to-end document image recognition model Donut

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご