Liquid_V1_7B Open-Source Model - Realize the Fusion of Image Codes and Text Markup to助力视觉理解生成 --> Assist in Vision Understanding and Generation Liquid_V1_7B Open-Source Model - Realize the Fusion of Image Codes and Text Markup to Assist in Vision Understanding and Generation

Liquid V1 7B

Developed by Junfeng5

Liquid is an autoregressive generation paradigm that achieves seamless fusion of visual understanding and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens in a shared feature space.

Text-to-Image

Transformers

EnglishOpen Source License:MIT #Multimodal Generation #Autoregressive Model #Image-Text Fusion

Downloads 11.35k

Release Time : 2/21/2025

Model Overview

Liquid is an innovative Multimodal Large Language Model (MLLM) that seamlessly integrates vision and text using only a single Large Language Model (LLM), without relying on externally pre-trained visual embeddings.

Model Features

Single-Model Multimodal Fusion

Achieves seamless fusion of vision and text using only a single Large Language Model (LLM), without relying on externally pre-trained visual embeddings.

Autoregressive Generation Paradigm

Tokenizes images into discrete codes and learns these code embeddings alongside text tokens in a shared feature space.

Multi-Scale Variants

Provides six pre-trained versions with parameter sizes ranging from 0.5B to 32B, and a 7B instruction-tuned version based on GEMMA.

Mutual Promotion of Understanding and Generation

Explores scaling laws for multimodal hybrid models, discovering mutual promotion between understanding tasks and generation tasks.

Model Capabilities

Text Generation

Image Generation

Visual Understanding

Multimodal Fusion

Use Cases

Content Creation

Multimodal Content Generation

Generate images from text descriptions, or generate descriptive text from images.

Achieves seamless conversion between text and images.

Education

Interactive Learning Tool

Helps students understand complex concepts through multimodal interaction.

Enhances learning experience and comprehension.

🚀 Liquid - An Auto - Regressive Generation Paradigm

Liquid is an auto - regressive generation paradigm that integrates visual comprehension and generation. It tokenizes images into discrete codes and learns code embeddings with text tokens in a shared vision - language feature space. Different from previous MLLMs, it uses a single LLM without external pretrained visual embeddings like CLIP. It also explores the scaling law of the multimodal hybrid model and discovers the mutual - promotion phenomenon between understanding and generation tasks.

📚 Documentation

Model Details

We present Liquid, an auto - regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. Liquid explores the scaling law of this multimodal hybrid model and discovers the phenomenon of mutual promotion between understanding and generation tasks.

Variations: Liquid comes in six sizes — 0.5B, 1B, 2B, 7B, 9B, 32B parameters (from multi modal families) in pre - trained variant, and 7B (from GEMMA) in instruction tuned variant.

Input: Models input text and image.

Output: Models generate text or generated image.

Model Architecture: Liquid is an auto - regressive model extending from existing LLMs that uses a transformer architecture.

Citation instructions

@article{wu2024liquid,
    title={Liquid: Language Models are Scalable Multi-modal Generators},
    author={Wu, Junfeng and Jiang, Yi and Ma, Chuofan and Liu, Yuliang and Zhao, Hengshuang and Yuan, Zehuan and Bai, Song and Bai, Xiang},
    journal={arXiv preprint arXiv:2412.04332},
    year={2024}
}

📄 License

This project is licensed under the MIT License.

📦 Installation

No installation steps were provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples were provided in the original document, so this section is skipped.

🔧 Technical Details

No additional technical details beyond what's in the model details were provided in the original document, so this section is skipped.

Information Table

Property	Details
Library Name	transformers
Datasets	mlfoundations/dclm - baseline - 1.0, cerebras/SlimPajama - 627B, bigcode/starcoderdata, JourneyDB/JourneyDB
Language	en
Base Model	google/gemma - 7b
Pipeline Tag	any - to - any

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご