InfiMM-HDオープンソース多モーダルモデル - 無料でデプロイして画像と文章を組み合わせた内容の理解と生成を実現

Home

Infimm Hd

Developed by Infi-MM

InfiMM-HDは高解像度マルチモーダルモデルで、画像とテキストを組み合わせたコンテンツを理解し生成できます。

画像生成テキスト

Transformers

English#高解像度マルチモーダル #画像からテキストへ #マルチモーダル理解

Downloads 17

Release Time : 3/3/2024

Model Overview

このモデルは高解像度マルチモーダル理解に特化しており、画像説明生成などの画像とテキストの共同タスクを処理できます。

Model Features

高解像度画像理解

高解像度画像を処理し、豊富な視覚情報を抽出できます

マルチモーダル融合

視覚とテキスト情報を効果的に融合し、クロスモーダル理解を実現します

中国語最適化

特に中国語シーン向けに最適化されています

Model Capabilities

画像説明生成

視覚的質問応答

マルチモーダルコンテンツ理解

画像からテキストへ

Use Cases

コンテンツ生成

画像自動説明

画像に対して詳細な中国語の説明を生成します

正確で豊富な画像説明を生成できます

支援ツール

視覚支援

視覚障害者が画像内容を理解するのを支援します

詳細な画像の文字説明を提供します

🚀 InfiMM-HD

画像とテキストを組み合わせたマルチモーダルなテキスト生成モデルで、高解像度の画像理解を実現します。

🚀 クイックスタート

ベースモデルを使って始めるには、以下のコードを使用します。

基本的な使用法

import torch
from transformers import AutoModelForCausalLM, AutoProcessor

processor = AutoProcessor.from_pretrained("Infi-MM/infimm-hd", trust_remote_code=True)

prompts = [
    {
        "role": "user",
        "content": [
            {"image": "/xxx/test.jpg"}, # change it with you image
            "Please describe the image in detail.",
        ],
    }
]
inputs = processor(prompts)
# use bf16 and gpu 0
model = AutoModelForCausalLM.from_pretrained(
    "Infi-MM/infimm-hd",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to(0).eval()

inputs = inputs

inputs["batch_images"] = inputs["batch_images"].to(torch.bfloat16)
for k in inputs:
    inputs[k] = inputs[k].to(model.device)

generated_ids = model.generate(
    **inputs,
    min_new_tokens=0,
    max_new_tokens=256,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_text)

📚 ドキュメント

詳細は、こちらの論文を参照してください。また、事前学習モデルとPyTorchコードは、こちらのGitHubリポジトリで公開しています。事前学習モデルから独自のモデルを構築することができます。

📄 ライセンス

このプロジェクトは CC BY - NC 4.0 ライセンスの下で提供されています。

画像の著作権は原著作者に帰属します。

詳細については、LICENSE を参照してください。

お問い合わせ

ご質問があれば、infimmbytedance@gmail.com までお気軽にご連絡ください。

引用

@misc{liu2024infimmhd,
      title={InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding}, 
      author={Haogeng Liu and Quanzeng You and Xiaotian Han and Yiqi Wang and Bohan Zhai and Yongfei Liu and Yunzhe Tao and Huaibo Huang and Ran He and Hongxia Yang},
      year={2024},
      eprint={2403.01487},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}