BiRefNet Open-Source Image Segmentation Model - Achieving Precise Segmentation of High-Resolution Binary Images

Birefnet

Developed by ZhengPeng7

BiRefNet is a deep learning model for high-resolution binary image segmentation, which achieves accurate image segmentation through a bilateral reference network.

Image Segmentation

Transformers

Open Source License:MIT #High-resolution image segmentation #Bilateral reference network #Camouflaged object detection

Downloads 626.54k

Release Time : 7/12/2024

Model Overview

BiRefNet is a deep learning model specifically designed for high-resolution binary image segmentation. It can generate accurate masks and remove the background, suitable for various image segmentation tasks.

Model Features

High-resolution processing

Supports segmentation processing of high-resolution images and can handle images with a resolution of 1024x1024.

Bilateral reference network

Adopts a bilateral reference network architecture, combining global and local information to improve segmentation accuracy.

Multi-task support

Supports various image segmentation tasks, including binary image segmentation, salient object detection, and camouflaged object detection.

Model Capabilities

Image segmentation

Background removal

Mask generation

Salient object detection

Camouflaged object detection

Use Cases

Image processing

Background removal

Accurately remove the background from the image and retain the foreground object.

Generate high-quality transparent background images.

Salient object detection

Detect the salient objects in the image and generate corresponding masks.

Accurately identify and segment salient objects.

Computer vision

Camouflaged object detection

Detect and segment camouflaged objects in the image.

Accurately identify camouflaged objects in complex backgrounds.

🚀 BiRefNet

BiRefNet is a model for high - resolution dichotomous image segmentation, achieving SOTA performance on multiple tasks such as DIS, HRSOD, and COD.

🚀 Quick Start

This repo is the official implementation of "Bilateral Reference for High-Resolution Dichotomous Image Segmentation" (CAAI AIR 2024). Visit our GitHub repo: https://github.com/ZhengPeng7/BiRefNet for more details -- codes, docs, and model zoo!

✨ Features

Multiple Applications: Suitable for background removal, mask generation, dichotomous image segmentation, camouflaged object detection, and salient object detection.
Trained on DIS - TR: Trained on the DIS - TR dataset and validated on DIS - TEs and DIS - VD.
SOTA Performance: Achieved state - of - the - art performance on three tasks (DIS, HRSOD, and COD).

📦 Installation

0. Install Packages:

pip install -qr https://raw.githubusercontent.com/ZhengPeng7/BiRefNet/main/requirements.txt

💻 Usage Examples

Basic Usage

Load BiRefNet:

Use codes + weights from HuggingFace

Only use the weights on HuggingFace -- Pro: No need to download BiRefNet codes manually; Con: Codes on HuggingFace might not be latest version (I'll try to keep them always latest).

# Load BiRefNet with weights
from transformers import AutoModelForImageSegmentation
birefnet = AutoModelForImageSegmentation.from_pretrained('ZhengPeng7/BiRefNet', trust_remote_code=True)

Use codes from GitHub + weights from HuggingFace

Only use the weights on HuggingFace -- Pro: codes are always latest; Con: Need to clone the BiRefNet repo from my GitHub.

# Download codes
git clone https://github.com/ZhengPeng7/BiRefNet.git
cd BiRefNet

# Use codes locally
from models.birefnet import BiRefNet

# Load weights from Hugging Face Models
birefnet = BiRefNet.from_pretrained('ZhengPeng7/BiRefNet')

Use codes from GitHub + weights from local space

Only use the weights and codes both locally.

# Use codes and weights locally
import torch
from utils import check_state_dict

birefnet = BiRefNet(bb_pretrained=False)
state_dict = torch.load(PATH_TO_WEIGHT, map_location='cpu')
state_dict = check_state_dict(state_dict)
birefnet.load_state_dict(state_dict)

Use the loaded BiRefNet for inference

# Imports
from PIL import Image
import matplotlib.pyplot as plt
import torch
from torchvision import transforms
from models.birefnet import BiRefNet

birefnet = ... # -- BiRefNet should be loaded with codes above, either way.
torch.set_float32_matmul_precision(['high', 'highest'][0])
birefnet.to('cuda')
birefnet.eval()
birefnet.half()

def extract_object(birefnet, imagepath):
    # Data settings
    image_size = (1024, 1024)
    transform_image = transforms.Compose([
        transforms.Resize(image_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    image = Image.open(imagepath)
    input_images = transform_image(image).unsqueeze(0).to('cuda').half()

    # Prediction
    with torch.no_grad():
        preds = birefnet(input_images)[-1].sigmoid().cpu()
    pred = preds[0].squeeze()
    pred_pil = transforms.ToPILImage()(pred)
    mask = pred_pil.resize(image.size)
    image.putalpha(mask)
    return image, mask

# Visualization
plt.axis("off")
plt.imshow(extract_object(birefnet, imagepath='PATH-TO-YOUR_IMAGE.jpg')[0])
plt.show()

Advanced Usage

Use inference endpoint locally:

You may need to click the deploy and set up the endpoint by yourself, which would make some costs.

import requests
import base64
from io import BytesIO
from PIL import Image


YOUR_HF_TOKEN = 'xxx'
API_URL = "xxx"
headers = {
    "Authorization": "Bearer {}".format(YOUR_HF_TOKEN)
}

def base64_to_bytes(base64_string):
    # Remove the data URI prefix if present
    if "data:image" in base64_string:
        base64_string = base64_string.split(",")[1]

    # Decode the Base64 string into bytes
    image_bytes = base64.b64decode(base64_string)
    return image_bytes

def bytes_to_base64(image_bytes):
    # Create a BytesIO object to handle the image data
    image_stream = BytesIO(image_bytes)

    # Open the image using Pillow (PIL)
    image = Image.open(image_stream)
    return image

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "https://hips.hearstapps.com/hmg-prod/images/gettyimages-1229892983-square.jpg",
    "parameters": {}
})

output_image = bytes_to_base64(base64_to_bytes(output))
output_image

📚 Documentation

Information Table

Property	Details
Model Type	BiRefNet for standard dichotomous image segmentation (DIS)
Training Data	Trained on DIS - TR and validated on DIS - TEs and DIS - VD

Try our online demos for inference:

Online Image Inference on Colab:
Online Inference with GUI on Hugging Face with adjustable resolutions:
Inference and evaluation of your given weights:

Acknowledgement:

Many thanks to @Freepik for their generous support on GPU resources for training higher resolution BiRefNet models and more of my explorations.
Many thanks to @fal for their generous support on GPU resources for training better general BiRefNet models.
Many thanks to @not - lain for his help on the better deployment of our BiRefNet model on HuggingFace.

Citation

@article{zheng2024birefnet,
  title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
  author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
  journal={CAAI Artificial Intelligence Research},
  volume = {3},
  pages = {9150038},
  year={2024}
}

📄 License

This project is licensed under the MIT license. You can find the detailed license information here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご