🚀 OpenFly
OpenFly是一个平台,包含用于空中视觉语言导航(VLN)的多功能工具链和大规模基准测试。该代码完全基于HuggingFace,简洁且性能高效。
✨ 主要特性
- 提供多功能工具链和大规模基准测试,适用于空中视觉语言导航。
- 代码基于HuggingFace,简洁且性能高效。
📦 安装指南
文档未提及具体安装步骤,暂不提供。
💻 使用示例
基础用法
from typing import Dict, List, Optional, Union
from pathlib import Path
import numpy as np
import torch
from PIL import Image
from transformers import LlamaTokenizerFast
from transformers import AutoConfig, AutoImageProcessor, AutoModelForVision2Seq, AutoProcessor
import os, json
from model.prismatic import PrismaticVLM
from model.overwatch import initialize_overwatch
from model.action_tokenizer import ActionTokenizer
from model.vision_backbone import DinoSigLIPViTBackbone, DinoSigLIPImageTransform
from model.llm_backbone import LLaMa2LLMBackbone
from extern.hf.configuration_prismatic import OpenFlyConfig
from extern.hf.modeling_prismatic import OpenVLAForActionPrediction
from extern.hf.processing_prismatic import PrismaticImageProcessor, PrismaticProcessor
AutoConfig.register("openvla", OpenFlyConfig)
AutoImageProcessor.register(OpenFlyConfig, PrismaticImageProcessor)
AutoProcessor.register(OpenFlyConfig, PrismaticProcessor)
AutoModelForVision2Seq.register(OpenFlyConfig, OpenVLAForActionPrediction)
model_name_or_path="IPEC-COMMUNITY/openfly-agent-7b"
processor = AutoProcessor.from_pretrained(model_name_or_path)
model = AutoModelForVision2Seq.from_pretrained(
model_name_or_path,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
).to("cuda:0")
image = Image.fromarray(cv2.imread("example.png"))
prompt = "Take off, go straight pass the river"
inputs = processor(prompt, [image, image, image]).to("cuda:0", dtype=torch.bfloat16)
action = model.predict_action(**inputs, unnorm_key="vln_norm", do_sample=False)
print(action)
📚 详细文档
如需完整详细信息,请阅读我们的论文并查看我们的项目页面。
🔧 技术细节
模型详情
📄 许可证
本项目采用MIT许可证。