Janus-Pro-1B-ONNX Open-Source Multimodal Model - Free Support for Multi-Tasks like Text-Image Conversion

Home

Janus Pro 1B ONNX

Developed by onnx-community

Janus-Pro-1B is a multimodal causal language model that supports various tasks such as text-to-image and image-to-text.

Text-to-Image

Transformers

Open Source License:MIT #Multimodal Generation #Image-Text Interconversion #LaTeX Formula Recognition

Downloads 3,010

Release Time : 1/27/2025

Model Overview

Janus-Pro-1B is a multimodal model based on ONNX weights, supporting interactive generation tasks between text and images, suitable for various cross-modal application scenarios.

Model Features

Multimodal Support

Supports interactive generation between text and images, capable of handling cross-modal tasks.

ONNX Compatibility

Provides ONNX weights for easy deployment in environments like Transformers.js.

Efficient Generation

Supports efficient text and image generation, suitable for real-time application scenarios.

Model Capabilities

Text-to-Image Generation

Image-to-Text Generation

Image-Text-to-Text Generation

Use Cases

Content Generation

Image Captioning

Generates descriptive text based on input images.

Text-to-Image Generation

Generates corresponding images based on text descriptions.

Education

Formula Conversion

Converts mathematical formulas in images to LaTeX code.

🚀 Janus-Pro-1B ONNX for Transformers.js

This project adapts the deepseek-ai/Janus-Pro-1B model with ONNX weights to be compatible with the Transformers.js library, enabling multi - modality tasks such as text - to - image, image - to - text, and image - text - to - text.

🚀 Quick Start

The model from https://huggingface.co/deepseek-ai/Janus-Pro-1B is adapted with ONNX weights to work seamlessly with Transformers.js.

📦 Installation

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

💻 Usage Examples

Basic Usage

Image+text to text

import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/Janus-Pro-1B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);

// Prepare inputs
const conversation = [
  {
    role: "<|User|>",
    content: "<image_placeholder>\nConvert the formula into latex code.",
    images: ["https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/quadratic_formula.png"],
  },
];
const inputs = await processor(conversation);

// Generate response
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 150,
  do_sample: false,
});

// Decode output
const new_tokens = outputs.slice(null, [inputs.input_ids.dims.at(-1), null]);
const decoded = processor.batch_decode(new_tokens, { skip_special_tokens: true });
console.log(decoded[0]);

Advanced Usage

Text to image

import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/Janus-Pro-1B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);

// Prepare inputs
const conversation = [
  {
    role: "<|User|>",
    content: "A stunning princess from kabul in red, white traditional clothing, blue eyes, brown hair",
  },
];
const inputs = await processor(conversation, { chat_template: "text_to_image" });

// Generate response
const num_image_tokens = processor.num_image_tokens;
const outputs = await model.generate_images({
  ...inputs,
  min_new_tokens: num_image_tokens,
  max_new_tokens: num_image_tokens,
  do_sample: true,
});

// Save the generated image
await outputs[0].save("test.png");

📄 License

This project is licensed under the MIT license.

📚 Documentation

Property	Details
Model Type	deepseek-ai/Janus-Pro-1B
Pipeline Tag	any - to - any
Library Name	transformers.js
Tags	text - to - image, image - to - text, image - text - to - text

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご