Open-source Models - Free Identification of Scientific Literature: Challenges and Research Directions

Scientific Challenges And Directions

Developed by DanL

A multi-label text classification model fine-tuned on PubMedBERT for identifying challenges and research directions in scientific literature

Text Classification

Transformers

English#Biomedical Text Classification #Research Challenge Identification #Research Direction Prediction

Downloads 28

Release Time : 3/2/2022

Model Overview

This model aims to assist scientists and medical professionals in identifying challenges (problems, difficulties, knowledge gaps) and potential research directions (suggestions, hypotheses, exploration needs) from scientific literature, with a particular focus on the COVID-19 pandemic and related research fields.

Model Features

Biomedical Domain Optimization

Based on the PubMedBERT pre-trained model, specifically optimized for biomedical literature

Multi-label Classification

Capable of simultaneously identifying two independent labels: challenges and research directions in text

Expert-annotated Data

Training data annotated by experts with backgrounds in biomedicine and bioNLP

Model Capabilities

Scientific Literature Analysis

Challenge Identification

Research Direction Identification

Multi-label Text Classification

Use Cases

Research Assistance

Literature Review Assistance

Quickly identify key challenges and research gaps in large volumes of literature

Improves literature review efficiency and helps researchers locate key issues

Research Direction Discovery

Automatically extract suggested future research directions from literature

Assists researchers in planning research pathways

Academic Search Engines

Challenge and Direction Retrieval

Build specialized search engines for retrieving scientific challenges and research directions

Refer to the example application links provided by the model

🚀 scientific-challenges-and-directions

We present a novel resource to help scientists and medical professionals discover challenges and potential directions across scientific literature, focusing on a broad corpus related to the COVID-19 pandemic and related historical research.

🚀 Quick Start

This project offers a valuable resource for scientists and medical professionals to explore scientific challenges and potential research directions in the literature related to the COVID-19 pandemic and historical research. Here are some key links and information:

Our paper: A Search Engine for Discovery of Scientific Challenges and Directions
Our dataset: scientific-challenges-and-directions-dataset
Our search engine: https://challenges.apps.allenai.org/

✨ Features

Challenge and Direction Definition:
- Challenge: A sentence mentioning a problem, difficulty, flaw, limitation, failure, lack of clarity, or knowledge gap.
- Research direction: A sentence mentioning suggestions or needs for further research, hypotheses, speculations, indications or hints that an issue is worthy of exploration.
Model Application: This model is designed for multi - label text classification, helping to identify challenges and directions in scientific literature.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

You can use the model for inference with the provided example notebook in our repo. See Inference_Notebook.ipynb.

Advanced Usage

For training the model, refer to the training notebook in the same repo.

📚 Documentation

Model description

This model is a fine - tuned version of PubMedBERT on the scientific-challenges-and-directions-dataset, designed for multi - label text classification.

Training and evaluation data

The scientific - challenges - and - directions model is trained based on a dataset that is a collection of 2894 sentences and their surrounding contexts, from 1786 full - text papers in the CORD - 19 corpus, labeled for classification of challenges and directions by expert annotators with biomedical and bioNLP backgrounds. For full details on the train/test/split of the data see section 3.1 in our paper.

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning rate: 2e - 05
train batch size: 8
eval batch size: 4
seed: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr scheduler type: linear
lr scheduler warmup steps: 500
num epochs: 30

Training results

The model achieves the following results on the test set:

Precision Challenge: 0.768719
Recall Challenge: 0.780405
F1 Challenge: 0.774518
Precision Direction: 0.758112
Recall Direction: 0.774096
F1 Direction: 0.766021
Precision (micro avg. on both labels): 0.764894
Recall (micro avg. on both labels): 0.778139
F1 (micro avg. on both labels): 0.771459

Framework versions

Transformers 4.15.0
Pytorch 1.10.0+cu111
Datasets 1.17.0
Tokenizers 0.10.3

📄 License

No license information is provided in the original document, so this section is skipped.

📋 Information Table

Property	Details
Model Type	A fine - tuned version of PubMedBERT on the scientific - challenges - and - directions - dataset for multi - label text classification
Training Data	A collection of 2894 sentences and their surrounding contexts from 1786 full - text papers in the CORD - 19 corpus, labeled by expert annotators

📖 Citation

If using our dataset and models, please cite:

@misc{lahav2021search,
      title={A Search Engine for Discovery of Scientific Challenges and Directions}, 
      author={Dan Lahav and Jon Saad Falcon and Bailey Kuehl and Sophie Johnson and Sravanthi Parasa and Noam Shomron and Duen Horng Chau and Diyi Yang and Eric Horvitz and Daniel S. Weld and Tom Hope},
      year={2021},
      eprint={2108.13751},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📞 Contact us

Please don't hesitate to reach out. Email: lahav@mail.tau.ac.il, tomh@allenai.org.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご