đ doubutsu-2b-pt-756
doubutsu
is a family of small Vision-Language Models (VLMs) designed to be finetuned for specific use cases. It's built by @qtnx_ and @yeswondwerr.
đ Quick Start
This model is not intended for standalone use. You need to either finetune it with this notebook or use an existing adapter.
â ī¸ Important Note
This model is not meant to be used alone, you need to either finetune it with this notebook or use an existing adapter.
đĄ Usage Tip
These models require smaller temperatures. We recommend to use a temperature of 0.1 - 0.3.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model_id = "qresearch/doubutsu-2b-pt-756"
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(
model_id,
use_fast=True,
)
model.load_adapter("qresearch/doubutsu-2b-lora-756-docci")
image = Image.open("IMAGE")
print(
model.answer_question(
image, "Describe the image", tokenizer, max_new_tokens=128, temperature=0.1
),
)
đ License
This project is licensed under the Apache-2.0 license.
đ Acknowledgements
- Liu et al. : LLaVA
- Moon et al. : AnyMAL
- vikhyatk : moondream codebase
.x+=:.
z` ^% .uef^"
.u . . <k .u . :d88E
.u@u .d88B :@8c .u .@8Ned8" .u u .d88B :@8c . `888E
.zWF8888bx ="8888f8888r ud8888. .@^%8888" ud8888. us888u. ="8888f8888r .udR88N 888E .z8k
.888 9888 4888>'88" :888'8888. x88: `)8b. :888'8888. .@88 "8888" 4888>'88" <888'888k 888E~?888L
I888 9888 4888> ' d888 '88%" 8888N=*8888 d888 '88%" 9888 9888 4888> ' 9888 'Y" 888E 888E
I888 9888 4888> 8888.+" %8" R88 8888.+" 9888 9888 4888> 9888 888E 888E
I888 9888 .d888L .+ 8888L @8Wou 9% 8888L 9888 9888 .d888L .+ 9888 888E 888E
`888Nx?888 ^"8888*" '8888c. .+ .888888P` '8888c. .+ 9888 9888 ^"8888*" ?8888u../ 888E 888E
"88" '888 "Y" "88888% ` ^"F "88888% "888*""888" "Y" "8888P' m888N= 888>
88E "YP' "YP' ^Y" ^Y' "P' `Y" 888
98> J88"
'8 @%
` :"
Property |
Details |
Pipeline Tag |
image-text-to-text |
Datasets |
liuhaotian/LLaVA-CC3M-Pretrain-595K |
License |
Apache-2.0 |