🚀 Psych-Search
Psych-Search aims to apply cutting - edge NLP to mental health practitioners, providing a foundation for classification and NLU models in the mental health field.
🚀 Quick Start
Psych-Search is an ongoing project that aims to bring advanced NLP to mental health practitioners. The model presented here serves as a basis for both traditional classification models and NLU models for the Psych - Search application. The objective of the Psych - Search Application is to use a combination of traditional text classification models to expand the MESH taxonomy by including relevant categories for mental health practitioners designing suicide prevention programs for adolescent communities in the United States, as well as automatically extracting and standardizing entities such as risk factors and protective factors.
Our initial expansion of the MESH taxonomy includes the following categories:
- Prevention Strategies
- Protective Factors
We are actively seeking partners for this project. Please contact us at nlp4good@gmail.com.
✨ Features
Model description
This model is an extension of allenai/scibert_scivocab_uncased. It was further pretrained using SciBERT as the base model, with only abstract texts from Psychology and Psychiatry PubMed research. The training was conducted on approximately 3.5 million papers for 10 epochs and evaluated on a task similar to BioASQ Task A.
Intended uses & limitations
How to use
from transformers import AutoTokenizer, AutoModel
mname = "nlp4good/psych-search"
tokenizer = AutoTokenizer.from_pretrained(mname)
model = AutoModel.from_pretrained(mname)
Limitations and bias
This model was trained on all PubMed abstracts categorized under Psychology and Psychiatry. As of March 1, this amounts to approximately 3.2 million papers with abstract text. Among these 3.2 million papers, relevant sparse mental health categories were back - translated to enhance the representation of certain mental health categories.
There are several limitations with this dataset, including significant discrepancies in the number of papers associated with Sexual and Gender Minorities. The training data had the following breakdown across gender groups:
Female |
Male |
Sexual and Gender Minorities |
1,896,301 |
1,945,279 |
4,529 |
Similar discrepancies exist within Ethnic Groups as defined in the MESH taxonomy:
African Americans |
Arabs |
Asian Americans |
Hispanic Americans |
Indians, Central American |
Indians, North American |
Indians, South American |
Indigenous Peoples |
Mexican Americans |
31,027 |
2,437 |
5,612 |
18,893 |
124 |
5,657 |
633 |
174 |
3,234 |
These discrepancies can significantly impact information retrieval systems, downstream machine learning models, and other NLP applications that utilize these pretrained models.
Training data
This model was trained on all PubMed abstracts categorized under Psychology and Psychiatry. As of March 1, this corresponds to approximately 3.2 million papers with abstract text. Among these 3.2 million papers, relevant sparse categories were back - translated from English to French and then from French to English to increase the representation of sparser mental health categories. This included back - translating papers in the following categories:
- Depressive Disorder
- Risk Factors
- Mental Disorders
- Child, Preschool
- Mental Health
In total, this process added 557,980 additional papers to the training data.
Training procedure
The model was further pretrained on Psychology and Psychiatry PubMed papers for 10 epochs. Default parameters were used, except for the gradient accumulation steps, which were set to 4, and the per - device train batch size was 32. Two Nvidia 3090 GPUs were used in the model development.
Evaluation results
To assess the effectiveness of Psych - Search in the mental health domain, an evaluation task was designed by fine - tuning Psych - Search on a task similar to BioASQ Task A. In this evaluation, large - scale biomedical indexing was performed using the MESH taxonomy associated with each paper in Psychology and Psychiatry. The evaluation metric is the micro F1 score across all second - level descriptors within Psychology and Psychiatry, corresponding to 38 different MESH categories used during evaluation.
Model |
Micro F1 Score |
bert - base - uncased |
0.7348 |
SciBERT Scivocab Uncased |
0.7394 |
Psych - Search |
0.7415 |
Next Steps
If you are interested in continuing this work or have other ideas on how to build upon it, please contact us at nlp4good@gmail.com. Our goal is to bring state - of - the - art NLP capabilities to under - researched areas, with mental health as our top priority.
📄 License
This project is licensed under the Apache - 2.0 license.