FoodSeg103 Open-Source Food Image Dataset - Over 100 Types of Ingredient Annotations to Boost Food Image Recognition

Test2

Developed by mccaly

FoodSeg103 is a dataset containing 7,118 food images, annotated with 104 ingredient categories, with an average of 6 ingredient labels and pixel-level masks per image.

Image Segmentation

Transformers

Open Source License:Apache-2.0 #Food Image Segmentation #Multimodal Pre-training #Fine-grained Recognition

Downloads 22

Release Time : 7/14/2023

Model Overview

This model is used for semantic segmentation of food images, capable of identifying and segmenting multiple ingredients in an image.

Model Features

Large-scale Food Image Dataset

Contains 7,118 images annotated with 104 ingredient categories, with an average of 6 ingredient labels and pixel-level masks per image.

Multimodal Pre-training Method

Proposes the ReLeM multimodal pre-training method, explicitly equipping the segmentation model with rich and semantic food knowledge.

Multiple Baseline Models

Provides multiple baseline models based on dilated convolution, feature pyramid, and vision transformers.

Model Capabilities

Food Image Segmentation

Ingredient Recognition

Pixel-level Mask Generation

Use Cases

Food Industry

Food Ingredient Analysis

Used to analyze ingredients in food images, aiding in nutritional calculation and dietary management.

Accurately identifies and segments multiple ingredients.

Smart Dining

Used for food recognition and ingredient analysis in smart dining systems.

Enhances the automation and intelligence level of dining systems.

Health Management

Diet Recording

Helps users record ingredients and nutritional content in their diet.

Provides accurate ingredient recognition and segmentation results.

🚀 A Large-Scale Benchmark for Food Image Segmentation

This project builds a new food image dataset FoodSeg103 and proposes a multi - modality pre - training approach ReLeM to facilitate fine - grained food image understanding.

🚀 Quick Start

Dataset

Please download the file from url and unzip the data in ./data folder (./data/FoodSeg103/), with passwd: LARCdataset9947.

Installation

Please refer to get_started.md for installation.

Train & Test

Train script

CUDA_VISIBLE_DEVICES=0,1,2,3  python -m torch.distributed.launch --nproc_per_node=4 --master_port=${PORT:-300}    tools/train.py --config [config]  --work-dir [work-dir]  --launcher pytorch

Example

CUDA_VISIBLE_DEVICES=0,1,2,3  python -m torch.distributed.launch --nproc_per_node=4 --master_port=${PORT:-300}    tools/train.py --config configs/foodnet/SETR_Naive_768x768_80k_base_RM.py  --work-dir  checkpoints/SETR_Naive_ReLeM  --launcher pytorch

Test script

CUDA_VISIBLE_DEVICES=0,1,2,3  python  -m torch.distributed.launch --nproc_per_node=4  --master_port=${PORT:-999} tools/test.py  [config]   [weights]  --launcher pytorch --eval mIoU

Example

CUDA_VISIBLE_DEVICES=0,1,2,3  python  -m torch.distributed.launch --nproc_per_node=4  --master_port=${PORT:-999} tools/test.py  checkpoints/SETR_Naive_ReLeM/SETR_Naive_768x768_80k_base_RM.py   checkpoints/SETR_Naive_ReLeM/iter_80000.pth  --launcher pytorch --eval mIoU

✨ Features

We build a new food image dataset FoodSeg103 containing 7,118 images. We annotate these images with 104 ingredient classes and each image has an average of 6 ingredient labels and pixel - wise masks.
We propose a multi - modality pre - training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
We use three popular semantic segmentation methods (i.e., Dilated Convolution based, Feature Pyramid based, and Vision Transformer based) as baselines, and evaluate them as well as ReLeM on our new datasets.

📚 Documentation

Introduction

We build a new food image dataset FoodSeg103 containing 7,118 images. We annotate these images with 104 ingredient classes and each image has an average of 6 ingredient labels and pixel - wise masks. In addition, we propose a multi - modality pre - training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.

In this software, we use three popular semantic segmentation methods (i.e., Dilated Convolution based, Feature Pyramid based, and Vision Transformer based) as baselines, and evaluate them as well as ReLeM on our new datasets. We believe that the FoodSeg103 and the pre - trained models using ReLeM can serve as a benchmark to facilitate future works on fine - grained food image understanding.

Please refer our paper and our homepage for more details.

Leaderboard

Please refer to leaderboard in paperwithcode website.

Benchmark and model zoo

:exclamation::exclamation::exclamation: We have finished the course so the models are available again. Please download the trained models from THIS link:eyes: .

Encoder	Decoder	Crop Size	Batch Size	mIoU	mAcc	Link
R - 50	FPN	512x1024	8	27.8	38.2	Model+Config
ReLeM - R - 50	FPN	512x1024	8	29.1	39.8	Model+Config
R - 50	CCNet	512x1024	8	35.5	45.3	Model+Config
ReLeM - R - 50	CCNet	512x1024	8	36.8	47.4	Model+Config
PVT - S	FPN	512x1024	8	31.3	43.0	Model+Config
ReLeM - PVT - S	FPN	512x1024	8	32.0	44.1	Model+Config
ViT - 16/B	Naive	768x768	4	41.3	52.7	Model+Config
ReLeM - ViT - 16/B	Naive	768x768	4	43.9	57.0	Model+Config
ViT - 16/B	PUP	768x768	4	38.5	49.1	Model+Config
ReLeM - ViT - 16/B	PUP	768x768	4	42.5	53.9	Model+Config
ViT - 16/B	MLA	768x768	4	45.1	57.4	Model+Config
ReLeM - ViT - 16/B	MLA	768x768	4	43.3	55.9	Model+Config
ViT - 16/L	MLA	768x768	4	44.5	56.6	Model+Config
Swin - S	UperNet	512x1024	8	41.6	53.6	Model+Config
Swin - B	UperNet	512x1024	8	41.2	53.9	Model+Config

[1] We do not include the implementation of swin in this software. You can use the official implementation based on our provided models.
[2] We use Step - wise learning policy to train PVT model since we found this policy can yield higher performance, and for other baselines we adopt the default settings.
[3] We use Recipe1M to train ReLeM - PVT - S while other ReLeM models are trained with Recipe1M+ due to time limitation.

ReLeM

We train recipe information based on the implementation of im2recipe with small modifications, which is trained on Recipe1M+ dataset (test images of FoodSeg103 are removed). I may upload the lmdb file later due to the huge datasize (>35G).

It takes about 2~3 weeks to train a ReLeM ViT - Base model with 8 Tesla - V100 cards, so I strongly recommend you use my pre - trained models(link).

Other Issues

If you meet other issues in using the software, you can check the original mmsegmentation (see doc for more details).

Acknowledgement

The segmentation software in this project was developed mainly by extending the segmentation.

🔧 Technical Details

We build a new food image dataset FoodSeg103 and propose a multi - modality pre - training approach ReLeM. We use three popular semantic segmentation methods as baselines and evaluate them on the new dataset. The training and testing scripts are provided, and different models' performance on the benchmark is also presented.

📄 License

This project is released under the Apache 2.0 license.

📖 Citation

If you find this project useful in your research, please consider cite:

@inproceedings{wu2021foodseg,
    title={A Large-Scale Benchmark for Food Image Segmentation},
    author={Wu, Xiongwei and Fu, Xin and Liu, Ying and Lim, Ee - Peng and Hoi, Steven CH and Sun, Qianru},
    booktitle={Proceedings of ACM international conference on Multimedia},
    year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご