đ scientific-challenges-and-directions
We present a novel resource to help scientists and medical professionals discover challenges and potential directions across scientific literature, focusing on a broad corpus related to the COVID-19 pandemic and related historical research.
đ Quick Start
This project offers a valuable resource for scientists and medical professionals to explore scientific challenges and potential research directions in the literature related to the COVID-19 pandemic and historical research. Here are some key links and information:
⨠Features
- Challenge and Direction Definition:
- Challenge: A sentence mentioning a problem, difficulty, flaw, limitation, failure, lack of clarity, or knowledge gap.
- Research direction: A sentence mentioning suggestions or needs for further research, hypotheses, speculations, indications or hints that an issue is worthy of exploration.
- Model Application: This model is designed for multi - label text classification, helping to identify challenges and directions in scientific literature.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
You can use the model for inference with the provided example notebook in our repo. See Inference_Notebook.ipynb
.
Advanced Usage
For training the model, refer to the training notebook in the same repo.
đ Documentation
Model description
This model is a fine - tuned version of PubMedBERT on the scientific-challenges-and-directions-dataset, designed for multi - label text classification.
Training and evaluation data
The scientific - challenges - and - directions model is trained based on a dataset that is a collection of 2894 sentences and their surrounding contexts, from 1786 full - text papers in the CORD - 19 corpus, labeled for classification of challenges and directions by expert annotators with biomedical and bioNLP backgrounds. For full details on the train/test/split of the data see section 3.1 in our paper.
đ§ Technical Details
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning rate: 2e - 05
- train batch size: 8
- eval batch size: 4
- seed: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr scheduler type: linear
- lr scheduler warmup steps: 500
- num epochs: 30
Training results
The model achieves the following results on the test set:
- Precision Challenge: 0.768719
- Recall Challenge: 0.780405
- F1 Challenge: 0.774518
- Precision Direction: 0.758112
- Recall Direction: 0.774096
- F1 Direction: 0.766021
- Precision (micro avg. on both labels): 0.764894
- Recall (micro avg. on both labels): 0.778139
- F1 (micro avg. on both labels): 0.771459
Framework versions
- Transformers 4.15.0
- Pytorch 1.10.0+cu111
- Datasets 1.17.0
- Tokenizers 0.10.3
đ License
No license information is provided in the original document, so this section is skipped.
đ Information Table
Property |
Details |
Model Type |
A fine - tuned version of PubMedBERT on the scientific - challenges - and - directions - dataset for multi - label text classification |
Training Data |
A collection of 2894 sentences and their surrounding contexts from 1786 full - text papers in the CORD - 19 corpus, labeled by expert annotators |
đ Citation
If using our dataset and models, please cite:
@misc{lahav2021search,
title={A Search Engine for Discovery of Scientific Challenges and Directions},
author={Dan Lahav and Jon Saad Falcon and Bailey Kuehl and Sophie Johnson and Sravanthi Parasa and Noam Shomron and Duen Horng Chau and Diyi Yang and Eric Horvitz and Daniel S. Weld and Tom Hope},
year={2021},
eprint={2108.13751},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
đ Contact us
Please don't hesitate to reach out.
Email: lahav@mail.tau.ac.il
, tomh@allenai.org
.