Janus-1.3B-ONNX Open-Source Multimodal AI Model - Supports Flexible Conversion Tasks between Text and Images

Janus 1.3B ONNX

Developed by onnx-community

Janus-1.3B is a multimodal causal language model that supports text-to-image, image-to-text, and image-text-to-text conversion tasks.

Text-to-Image

Transformers

Open Source License:Other #Multimodal Generation #Text-to-Image Conversion #LaTeX Formula Recognition

Downloads 123

Release Time : 10/26/2024

Model Overview

Janus-1.3B is a multimodal model based on ONNX weights, compatible with Transformers.js, capable of handling interactive tasks involving images and text, such as generating image descriptions or creating images from text.

Model Features

Multimodal Support

Capable of processing both image and text inputs, enabling cross-modal interaction and generation.

ONNX Compatibility

Provides ONNX weights, compatible with Transformers.js, facilitating deployment in browsers and edge devices.

Efficient Generation

Supports efficient text and image generation, suitable for real-time applications.

Model Capabilities

Text-to-image generation

Image-to-text conversion

Image-text-to-text conversion

Multimodal interaction

Use Cases

Education

Formula Conversion

Convert images of mathematical formulas into LaTeX code.

Generates accurate LaTeX code for academic document writing.

Creative Design

Image Generation

Generate high-quality images based on text descriptions.

Produces images that match the descriptions, suitable for artistic creation and design.

🚀 Janus-1.3B ONNX with Transformers.js

This project makes https://huggingface.co/deepseek-ai/Janus-1.3B compatible with Transformers.js using ONNX weights. It supports various modalities like text - to - image, image - to - text, and image - text - to - text.

🚀 Quick Start

📦 Installation

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

💻 Usage Examples

[Basic Usage]

Example 1: Image+text to text

import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/Janus-1.3B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);

// Prepare inputs
const conversation = [
  {
    role: "User",
    content: "<image_placeholder>\nConvert the formula into latex code.",
    images: ["https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/quadratic_formula.png"],
  },
];
const inputs = await processor(conversation);

// Generate response
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 150,
  do_sample: false,
});

// Decode output
const new_tokens = outputs.slice(null, [inputs.input_ids.dims.at(-1), null]);
const decoded = processor.batch_decode(new_tokens, { skip_special_tokens: true });
console.log(decoded[0]);

Sample output:

Sure, here is the LaTeX code for the given formula:

x = \frac{-b \pm \sqrt{b^2 - 4a c}}{2a}


This code represents the mathematical expression for the variable \( x \).

Example 2: Text to image

import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/Janus-1.3B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);

// Prepare inputs
const conversation = [
  {
    role: "User",
    content: "A cute and adorable baby fox with big brown eyes, autumn leaves in the background enchanting,immortal,fluffy, shiny mane,Petals,fairyism,unreal engine 5 and Octane Render,highly detailed, photorealistic, cinematic, natural colors.",
  },
];
const inputs = await processor(conversation, { chat_template: "text_to_image" });

// Generate response
const num_image_tokens = processor.num_image_tokens;
const outputs = await model.generate_images({
  ...inputs,
  min_new_tokens: num_image_tokens,
  max_new_tokens: num_image_tokens,
  do_sample: true,
});

// Save the generated image
await outputs[0].save("test.png");

Sample outputs:

🎈 Try it out

Want to play around with the model? Check out the online WebGPU demo.

📄 License

The license for this project is other.

📚 Documentation

Model Information

Property	Details
Model Type	Compatible with Transformers.js using ONNX weights, supporting text - to - image, image - to - text, and image - text - to - text modalities
Base Model	deepseek - ai/Janus - 1.3B
Pipeline Tag	any - to - any
Library Name	transformers.js
Tags	text - to - image, image - to - text, image - text - to - text

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご