FLIP-base-32 Open-source Vision Language Model - Trained on 80 million face images, suitable for face image applications

Home

FLIP Base 32

Developed by FLIP-dataset

This is a vision-language model based on the CLIP architecture, specifically post-trained on 80 million face images.

Multimodal Fusion

Transformers

Open Source License:Apache-2.0 #Face CLIP Model #Large-scale Face Pretraining #Multimodal Contrastive Learning

Downloads 16

Release Time : 6/28/2023

Model Overview

This model is based on the CLIP architecture and enhances performance on face-related tasks through post-training on 80 million face images. Suitable for tasks such as face recognition and image retrieval.

Model Features

Large-scale face data training

Post-trained on 80 million face images to enhance performance on face-related tasks.

Based on CLIP architecture

Inherits the powerful visual-language alignment capabilities of the CLIP model.

Efficient training

Trained using 8 A100 GPUs, optimized for training efficiency with the TencentPretrain framework.

Model Capabilities

Face image feature extraction

Image-text matching

Face image retrieval

Cross-modal understanding

Use Cases

Face recognition

Face verification

Verify whether two face images belong to the same person.

Face search

Search for similar faces in a large database.

Content moderation

Face content filtering

Identify and filter inappropriate face content.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

FLIP Base 32

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 FaceCLIP-base-32

🚀 Quick Start

📦 Installation

💻 Usage Examples

Basic Usage

📄 License