F

Fuyu 8b

Developed by adept
Fuyu-8B is a multimodal text-image transformer developed by Adept AI, designed for digital agents, supporting arbitrary image resolutions with swift responses and a streamlined architecture.
Downloads 14.22k
Release Time : 10/17/2023

Model Overview

Fuyu-8B is a multimodal model capable of receiving image and text inputs to generate text outputs, particularly suited for digital agent applications such as parsing charts and answering user interface-based questions.

Model Features

Streamlined Architecture
Utilizes a decoder-only Transformer design without a separate image encoder. Image patches are directly input into the first Transformer layer via linear projection, making the architecture simple to understand, scale, and deploy.
Arbitrary Image Resolution Support
Supports arbitrary image resolutions by treating image token sequences like text token sequences, removing image-specific positional embeddings, and inputting the required number of image tokens in raster scan order.
Fast Response
Processes large-sized images with response times under 100 milliseconds, making it suitable for real-time applications.
Multi-Scenario Optimization
Although optimized for digital agent scenarios, it still performs excellently in standard image understanding benchmarks, supporting few-shot learning and multi-scenario fine-tuning.

Model Capabilities

Image Understanding
Text Generation
Chart Parsing
User Interface Question Answering
Fine-Grained Screen Image Localization

Use Cases

Digital Agent
Chart Parsing
Parse chart data and answer related questions
Scored 64.5 on the AI2D chart parsing test
User Interface Interaction
Answer user interface-based questions
Image Understanding
Visual Question Answering
Answer natural language questions about image content
Scored 74.2 on the VQAv2 test
Image Caption Generation
Generate COCO-style image captions
Scored 141 on the COCO caption generation test
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase