Perception-LM-1B Open-source Pre-trained Language Model - Free for Non-commercial Research Scenarios

Home

Perception LM 1B

Developed by facebook

Pre-trained language model based on PyTorch released by Meta, suitable for non-commercial research purposes

Large Language Model

PyTorch

EnglishOpen Source License:Other #PyTorch Framework #Multilingual Support #Non-Commercial Research

Downloads 1,140

Release Time : 4/10/2025

Model Overview

This model is a pre-trained language model developed based on the PyTorch framework, primarily used for natural language processing research

Model Features

Non-Commercial Research License

Complies with the FAIR Non-Commercial Research License, allowing for research, development, and educational use

PyTorch Framework

Developed based on the PyTorch deep learning framework, facilitating use and extension by researchers

Pre-trained Model

Provides pre-trained weights, supporting transfer learning and fine-tuning

Model Capabilities

Text Understanding

Text Generation

Language Model Fine-tuning

Use Cases

Academic Research

Natural Language Processing Research

Can be used for benchmarking and research on various NLP tasks

Education

Deep Learning Teaching

Can serve as an example model for PyTorch and NLP teaching

🚀 Perception Language Model (PLM)

Perception Language Model (PLM) is a state - of - the - art, fully open and reproducible MLLM designed for transparent research in image and video understanding. It addresses the challenges in these fields by leveraging advanced techniques and high - quality data, providing valuable insights and tools for researchers.

🚀 Quick Start

The training and evaluation code for PLM is available at perception_models codebase. You can find detailed instructions and more information in the GitHub repo.

✨ Features

Advanced Architecture: PLM consists of a vision encoder paired with a small - scale (<8B parameters) LLM decoder.
Data - Driven Approach: It starts with an analysis of standard training pipelines using available data, without relying on proprietary model distillation.
Large - Scale Synthetic Data: Investigates large - scale synthetic data and establishes key scaling laws to identify data gaps in video understanding, especially for spatio - temporal reasoning and fine - grained understanding tasks.
High - Quality Human - Labeled Data: To fill the identified gaps, 2.8M high - quality human - labeled data is created, which is nearly an order of magnitude larger than the largest existing video datasets.

📚 Documentation

Model Overview

PLM was introduced in "PerceptionLM: Open - Access Data and Models for Detailed Visual Understanding". You can also refer to the tech report and the GitHub repository.

Resources

Property	Details	Documentation
Evaluation	Evaluation of PLM using lmms - eval	`docs/evaluation.md`
Training / Finetuning	Training and finetuning instructions for PLM	`docs/training.md`
PLM - VideoBench	Evaluation on PLM - VideoBench using lmms - eval	`docs/plm_videobench.md`
End - to - End Finetuning Example	End - to - end finetuning example on radiology images	`docs/finetune_example.md`
Generating Response	Generate responses using a trained model with `generate.py`	`generate.py`

Benchmark Results

PLM Image Benchmark Results

Model	DocVQA	ChartQA	TextVQA	InfoQA	AI2D	OCRBench	COCO	Nocap	Flickr	MMMU	VQAv2	OKVQA	VizWiz	MME	SEED	BLINK	CVBench	RealWorldQA	VSR	POPE
PLM1B	90.7	78.6	82.1	63.0	84.9	807	138.6	124.2	100.5	34.8	81.7	61.0	59.7	1603	76.3	46.8	73.8	67.1	68.8	88.4
PLM3B	93.8	84.3	84.3	74.6	90.9	830	144.9	126.5	98.0	41.2	84.3	66.8	64.0	1879	78.5	55.4	81.4	72.4	80.4	88.7
PLM8B	94.6	85.5	86.5	80.9	92.7	870	146.7	129.9	105.6	46.1	85.6	69.6	67.0	1989	79.3	56.0	81.3	75.0	82.8	89.9

PLM Video Benchmark Results

Model	VATEX	DREAM 1K	How2QA	MVBench	NExTQA	PerceptionTest (test)	STAR	TVQA	VideoMME	TVBench	ActivityNetQA	EgoSchema (test)	TemporalBench	TOMATO	MotionBench (dev)	TempCompass (MCQ)	CGBench (clue)	Charades STA	VideoHallucer	Halluc. EventHallusion
PLM1B	92.5	34.3	86.4	70.1	80.3	72.7	83.7	50.3	49.2	50.4	62.5	60.4	18.2	25.5	52.2	64.6	43.6	55.2	49.2	79.5
PLM3B	96.1	37.4	89.4	74.7	83.4	79.3	84.8	55.3	54.9	58.9	66.2	66.9	23.4	30.9	60.4	69.3	47.2	57.7	55.5	76.5
PLM8B	99.7	35.9	90.7	77.1	84.1	82.7	84.9	59.3	58.3	63.5	67.3	68.8	28.3	33.2	61.4	72.7	46.4	58.6	57.7	77.3

📄 License

The use of PLM is subject to the FAIR Noncommercial Research License. By clicking “I Accept” or using or distributing any portion of the Research Materials, you agree to be bound by this Agreement.

Key Terms

Acceptable Use Policy: The FAIR Acceptable Use Policy applicable to Research Materials.
Agreement: The terms and conditions for using, reproducing, distributing, and modifying the Research Materials.
Documentation: Specifications, manuals, and documentation accompanying the Research Materials distributed by Meta.
Licensee: You, your employer, or any other person or entity entering into this Agreement.
Meta: Meta Platforms Ireland Limited (if in the EEA or Switzerland) or Meta Platforms, Inc. (outside the EEA or Switzerland).
Noncommercial Research Uses: Non - commercial research use cases not primarily for commercial advantage or monetary compensation.
Research Materials: Documentation, models, software, algorithms, and related elements distributed by Meta under this Agreement.

Prohibited Uses

You agree not to use the Research Materials for:

Illegal or Unlawful Activities: Such as violence, terrorism, exploitation of children, human trafficking, sexual solicitation, and other criminal activities.
Harassment and Discrimination: Engaging in, promoting, or facilitating harassment, abuse, discrimination, or other unlawful or harmful conduct.
Unauthorized Professional Practice: Unauthorized or unlicensed practice of any profession.
Sensitive Information: Collecting, processing, or disclosing sensitive personal information without proper consent.
Infringement: Engaging in actions that infringe, misappropriate, or violate third - party rights.
Malicious Code: Creating or facilitating the creation of malicious code or anything that could harm a website or computer system.
Dangerous Activities: Engaging in activities presenting a risk of death or bodily harm, such as military, warfare, or illegal weapon - related activities.
Deception: Intentionally deceiving or misleading others, including generating fraud, disinformation, or spam.
Failure to Disclose: Failing to appropriately disclose any known dangers of the Research Materials.

Please report any violations of this Policy at [https://docs.google.com/forms/d/e/1FAIpQLSeb11cryAopJ7LNrC4nxEUXrHY26hfkXQMf_uH - oFgA3WlYZQ/viewform].

📚 Citation

If you find our code useful for your research, please consider citing:

@article{cho2025PerceptionLM,
  title={PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding},
  author={Jang Hyun Cho and Andrea Madotto and Effrosyni Mavroudi and Triantafyllos Afouras and Tushar Nagarajan and Muhammad Maaz and Yale Song and Tengyu Ma and Shuming Hu and Hanoona Rasheed and Peize Sun and Po-Yao Huang and Daniel Bolya and Suyog Jain and Miguel Martin and Huiyu Wang and Nikhila Ravi and Shashank Jain and Temmy Stark and Shane Moon and Babak Damavandi and Vivian Lee and Andrew Westbury and Salman Khan and Philipp Kr\"{a}henb\"{u}hl and Piotr Doll{\'a}r and Lorenzo Torresani and Kristen Grauman and Christoph Feichtenhofer},
  journal={arXiv},
  year={2025}
}

@article{bolya2025PerceptionEncoder,
  title={Perception Encoder: The best visual embeddings are not at the output of the network},
  author={Daniel Bolya and Po-Yao Huang and Peize Sun and Jang Hyun Cho and Andrea Madotto and Chen Wei and Tengyu Ma and Jiale Zhi and Jathushan Rajasegaran and Hanoona Rasheed and Junke Wang and Marco Monteiro and Hu Xu and Shiyu Dong and Nikhila Ravi and Daniel Li and Piotr Doll{\'a}r and Christoph Feichtenhofer},
  journal={arXiv},
  year={2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご