DeepReviewer-7B Open-source Academic Paper Review Model - Freely Generate Structured and In-depth Review Comments

Deepreviewer 7B

Developed by WestlakeNLP

DeepReviewer is an academic paper review large language model built upon Qwen2.5-7B-Instruct, providing structured in-depth review generation functionality

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Other #Academic Paper Review #Multi-reviewer Simulation #Structured Feedback

Downloads 38

Release Time : 4/25/2025

Model Overview

This model generates structured paper review opinions through a multi-stage reasoning framework, supporting three review modes: rapid, standard, and optimal, aiming to assist in the self-improvement and learning of academic papers

Model Features

Multi-mode Review

Provides three review modes: rapid, standard, and optimal to meet different depth and efficiency needs

Multi-perspective Simulation

Standard and optimal modes can simulate multiple reviewer perspectives, providing diverse expert opinions

Structured Output

Generates a complete review structure including summary, rating, key points, and detailed analysis

High Performance with Small Parameter Size

The 7B parameter model surpasses larger-scale models in multiple metrics, demonstrating high efficiency

Model Capabilities

Paper Quality Assessment

Structured Feedback Generation

Multilingual Text Processing

Academic Writing Analysis

Improvement Suggestions Provision

Use Cases

Academic Research

Paper Self-improvement

Authors use the model to obtain structured feedback before submission to improve their papers

Enhances paper quality and submission success rate

Academic Writing Teaching

Used as a teaching tool to help students understand peer review standards

Improves students' academic writing skills

Research Assistance

Research Concept Validation

Researchers use the model to validate the rationality of research hypotheses

Accelerates the research iteration process

Literature Review Assistance

Assists researchers in refining the literature review section

Improves the quality of literature analysis

🚀 DeepReviewer

DeepReviewer is a set of generative large language models tailored for academic paper review. It offers structured feedback and diverse review modes, promoting self - improvement in research and advancing automated academic evaluation.

🚀 Quick Start

The models in this repository can be used with the transformers or vllm code libraries. To generate review comments, a long context is required (14000 tokens for Input and 5000 tokens for Output). Make sure you have enough GPU memory. Here are the recommended configurations:

Model Name	Recommended Config (bs>=5)	Minimum Config (bs=1)
DeepReviewer-7B	1 x RTX3090/4090/5090 (bf16)	1 x RTX 4070 (int8)
DeepReviewer-14B	1 x A100 (bf16)	1 x RTX3090/4090/5090 (int8)

Getting Your Paper Text

If you can provide the original Latex or Markdown version of your paper, you can skip this step. If you only have the PDF version, convert it to Markdown or Latex format first. Tools like MagicPDF or other PDF - to - text converters are recommended.

Using with vllm

from ai_researcher.deep_reviewer import DeepReviewer
import torch

# Initialize DeepReviewer
reviewer = DeepReviewer(
    model_size="14B",  # Use "7B" for the smaller model
    device="cuda",
    tensor_parallel_size=1,  # Increase for multi-GPU setup
    gpu_memory_utilization=0.95
)

# Load paper content
paper_content = "Your paper content here"  # Replace with actual paper content

# Generate reviews in different modes
# Fast Mode for quick overview
fast_review = reviewer.evaluate([paper_content], mode="Fast Mode")

# Standard Mode with multiple reviewers
standard_review = reviewer.evaluate([paper_content], mode="Standard Mode", reviewer_num=3)


# Parse the review results
for result in standard_review:
    print("--- Meta-Review ---")
    print(f"Summary: {result['meta_review'].get('summary', 'N/A')}")
    print(f"Rating: {result['meta_review'].get('rating', 'N/A')}")
    print(f"Decision: {result['decision']}")

✨ Features

Multi - Mode Reviews: DeepReviewer offers three review modes: Fast Mode for quick reviews, Standard Mode for simulated multiple - reviewer perspectives, and Best Mode for comprehensive reviews.
Near - Human Evaluation: It can automatically evaluate paper quality, providing comprehensive analysis, strengths, weaknesses, and suggestions.
Diverse Purposes: Suitable for various research - related uses such as paper improvement, writing practice, and as a reward model for reinforcement learning systems.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from ai_researcher.deep_reviewer import DeepReviewer
import torch

# Initialize DeepReviewer
reviewer = DeepReviewer(
    model_size="14B",  # Use "7B" for the smaller model
    device="cuda",
    tensor_parallel_size=1,  # Increase for multi-GPU setup
    gpu_memory_utilization=0.95
)

# Load paper content
paper_content = "Your paper content here"  # Replace with actual paper content

# Generate reviews in different modes
# Fast Mode for quick overview
fast_review = reviewer.evaluate([paper_content], mode="Fast Mode")

# Standard Mode with multiple reviewers
standard_review = reviewer.evaluate([paper_content], mode="Standard Mode", reviewer_num=3)


# Parse the review results
for result in standard_review:
    print("--- Meta-Review ---")
    print(f"Summary: {result['meta_review'].get('summary', 'N/A')}")
    print(f"Rating: {result['meta_review'].get('rating', 'N/A')}")
    print(f"Decision: {result['decision']}")

Advanced Usage

# In advanced scenarios, you can adjust the parameters according to your needs, such as changing the number of reviewers in Standard Mode or using different device configurations.
# For example, increasing the tensor_parallel_size for multi - GPU acceleration
reviewer = DeepReviewer(
    model_size="14B",
    device="cuda",
    tensor_parallel_size=4,  # Increase for multi - GPU setup
    gpu_memory_utilization=0.95
)

📚 Documentation

Model Info

Homepage & Demo: http://ai-researcher.net

DeepReviewer is a set of generative large language models that have undergone additional supervised training for academic paper review, with sizes of 7B and 14B. Both models are pure text language models based on the Phi - 4 pre - trained language model. They utilize a multi - stage reasoning framework to generate in - depth, structured reviews of academic papers.

DeepReviewer offers three review modes to balance between depth and efficiency:

Fast Mode: Quick reviews with summary, scores, and key points
Standard Mode: Simulated multiple reviewer perspectives with verification
Best Mode: Most comprehensive reviews with detailed analysis across all dimensions

According to the license, all models created/trained/distributed/replicated based on these cannot be used for any formal review work.

Intended Uses

Expected Use Cases DeepReviewer models are suitable for research purposes in multiple languages. This includes but is not limited to the following objectives:

Paper Improvement: Assist in enhancing the quality and clarity of academic papers.
Writing Practice: Provide a platform for users to practice and refine their academic writing skills.
Self - assessment Tool: Enable researchers to evaluate their own work before submission.
Learning Aid: Support students and researchers in understanding the peer review process.
Feedback Simulation: Offer simulated peer review feedback to prepare authors for actual reviews.
Revision Guide: Provide structured guidance for revising academic papers.
Concept Validator: Help researchers validate their ideas and hypotheses.
Reward Model: Serve as a component in machine learning systems for academic writing improvement.
Educational Resource: Act as a teaching tool for academic writing and peer review processes.
Research Assistant: Aid in literature reviews and research methodology refinement.
Supplementary Tool: Complement human review in informal, non - official settings.

Out of Scope The following are not permitted:

Official Reviews: DeepReviewer explicitly prohibits use for official peer reviews in any capacity.
Legal or Ethical Decisions: Not designed to make judgments on research ethics or legal compliance.
Factual Verification: While it can offer feedback, it should not be the sole source for fact - checking or verifying scientific claims.
Plagiarism Detection: Not equipped to serve as a plagiarism detection tool.
Publication Decisions: Cannot be used to make final decisions on whether a paper should be published.
Expert Consultation: Not a replacement for expert consultation in specialized fields.

If you are unsure whether you meet the License requirements, please contact us for further inquiry

Ethical Considerations

Academic Integrity: Although DeepReviewer is designed to assist researchers in improving paper quality, it should not be used to replace the real peer review process. We strongly recommend users to use this tool only as an auxiliary means for self - improvement and learning.
Fairness: The model may have biases, especially when evaluating interdisciplinary or emerging field research. Users should be aware of this and be cautious about the model's feedback.
Responsible Use: We call on users to use this model responsibly, and require users not to use it to produce false review opinions or manipulate the academic evaluation process according to our agreement.
Transparency: When using content generated by this model in any public setting, the DeepReviewer source should be clearly stated to maintain transparency and honesty in academia.

Limitations

Knowledge Cutoff Date: The model's knowledge is cut off in October 2024, so it may lack understanding of new technologies, methods, or research trends that emerged after this date. This may lead to undervaluation of some highly innovative research.
Pure Text Limitations: As a pure text model, DeepReviewer cannot directly parse or evaluate images, charts, or complex formulas in papers. This may affect the comprehensive assessment of papers that heavily rely on visual elements.
Depth in Specialized Fields: Although the model has been trained across various domains, its evaluation may not be as accurate as human experts in very specialized or cutting - edge sub - fields.
Lack of Real - time Information: The model cannot access real - time academic databases or the latest published papers, which may lead to bias in assessing research novelty.
Disciplinary Bias: Due to limitations in training data, the model may have preferences for certain disciplines or research methods. Users should be aware of this and combine it with other opinions.
Language and Cultural Limitations: The model may perform poorly in handling papers with cultural nuances or field - specific terminology outside its training distribution.

🔧 Technical Details

No specific technical details (more than 50 words) are provided in the original document, so this section is skipped.

📄 License

The code in this repository is open - sourced under the Apache - 2.0 license. The model weights are open - sourced under the DeepReviewer License, which incorporates additional content to ensure the model is not misused.

📊 Model Performance

ICLR 2024

Metric	DeepReviewer-7B	DeepReviewer-14B	CycleReviewer-70B	GPT-o1	DeepSeek-R1	Gemini-2.0-Flash-Thinking
Rating MSE↓	1.8262	1.3137	2.4870	4.3414	4.1648	4.9297
Rating MAE↓	1.0870	0.9102	1.2514	1.7294	1.6526	1.8711
Decision Accuracy$\uparrow$	0.5975	0.6406	0.6304	0.4500	0.5248	0.5743
Decision F1$\uparrow$	0.5428	0.6307	0.5696	0.4424	0.4988	0.5197
Rating Spearman$\uparrow$	0.2126	0.3559	0.3356	0.2621	0.3256	0.0745
Pairwise Rating Acc$\uparrow$	0.5749	0.6242	0.6160	0.5881	0.6206	0.5343

ICLR 2025

Metric	DeepReviewer-7B	DeepReviewer-14B	CycleReviewer-70B	GPT-o1	DeepSeek-R1	Gemini-2.0-Flash-Thinking
Rating MSE↓	1.6730	1.3410	2.4294	4.3072	4.7719	3.9232
Rating MAE↓	1.0379	0.9243	1.2128	1.7917	1.8099	1.6470
Decision Accuracy$\uparrow$	0.6660	0.6878	0.6782	0.4167	0.4259	0.6139
Decision F1$\uparrow$	0.5564	0.6227	0.5737	0.4157	0.4161	0.4808
Rating Spearman$\uparrow$	0.2973	0.4047	0.2674	0.2991	0.3237	0.2565
Pairwise Rating Acc$\uparrow$	0.6038	0.6402	0.5928	0.6318	0.6289	0.6040

DeepReviewer significantly outperforms other models on most metrics, despite its smaller parameter count. The 14B model achieves particularly strong results on Decision Accuracy and Score MSE, demonstrating its reliability in overall paper quality assessment.

📖 CITE

@inproceedings{
weng2025cycleresearcher,
title={CycleResearcher: Improving Automated Research via Automated Review},
author={Yixuan Weng and Minjun Zhu and Guangsheng Bao and Hongbo Zhang and Jindong Wang and Yue Zhang and Linyi Yang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=bjcsVLoHYs}
}

@misc{zhu2025deepreviewimprovingllmbasedpaper,
      title={DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process}, 
      author={Minjun Zhu and Yixuan Weng and Linyi Yang and Yue Zhang},
      year={2025},
      eprint={2503.08569},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.08569}, 
}

📮 Contact

Submit an Issue
Email: zhuminjun@westlake.edu.cn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご