Psych-Search Open-Source NLP Model - Empowering Mental Health Practitioners to Automatically Extract Risk and Protective Factors

Psych Search

Developed by nlp4good

A natural language processing model designed for mental health practitioners, extending the MESH classification system and enabling automatic extraction of risk and protective factors

Large Language Model EnglishOpen Source License:Apache-2.0 #Mental Health Classification #Medical Literature Analysis #Risk Factor Extraction

Downloads 24

Release Time : 3/2/2022

Model Overview

A psychology domain pre-trained model based on SciBERT, used for mental health-related text classification and entity extraction, specifically optimized for U.S. youth suicide prevention programs

Model Features

Psychology Domain Optimization

Continuous pre-training using 3.5 million psychology and psychiatry paper abstracts

MESH Classification Extension

Added mental health-specific categories such as prevention strategies and protective factors

Data Augmentation Techniques

Employed English-French back-translation to enhance data representation for sparse categories

Model Capabilities

Mental health text classification

Risk factor identification

Protective factor extraction

Academic literature analysis

Use Cases

Mental Health Research

Suicide Prevention Program Analysis

Identifying risk and protective factors in literature

Improved retrieval efficiency for related studies

Mental Health Literature Classification

Automatically labeling research directions in psychology papers

Classification accuracy outperforms baseline models

🚀 Psych-Search

Psych-Search aims to apply cutting - edge NLP to mental health practitioners, providing a foundation for classification and NLU models in the mental health field.

🚀 Quick Start

Psych-Search is an ongoing project that aims to bring advanced NLP to mental health practitioners. The model presented here serves as a basis for both traditional classification models and NLU models for the Psych - Search application. The objective of the Psych - Search Application is to use a combination of traditional text classification models to expand the MESH taxonomy by including relevant categories for mental health practitioners designing suicide prevention programs for adolescent communities in the United States, as well as automatically extracting and standardizing entities such as risk factors and protective factors.

Our initial expansion of the MESH taxonomy includes the following categories:

Prevention Strategies
Protective Factors

We are actively seeking partners for this project. Please contact us at nlp4good@gmail.com.

✨ Features

Model description

This model is an extension of allenai/scibert_scivocab_uncased. It was further pretrained using SciBERT as the base model, with only abstract texts from Psychology and Psychiatry PubMed research. The training was conducted on approximately 3.5 million papers for 10 epochs and evaluated on a task similar to BioASQ Task A.

Intended uses & limitations

How to use

from transformers import AutoTokenizer, AutoModel

mname = "nlp4good/psych-search"
tokenizer = AutoTokenizer.from_pretrained(mname)
model = AutoModel.from_pretrained(mname)

Limitations and bias

This model was trained on all PubMed abstracts categorized under Psychology and Psychiatry. As of March 1, this amounts to approximately 3.2 million papers with abstract text. Among these 3.2 million papers, relevant sparse mental health categories were back - translated to enhance the representation of certain mental health categories.

There are several limitations with this dataset, including significant discrepancies in the number of papers associated with Sexual and Gender Minorities. The training data had the following breakdown across gender groups:

Female	Male	Sexual and Gender Minorities
1,896,301	1,945,279	4,529

Similar discrepancies exist within Ethnic Groups as defined in the MESH taxonomy:

African Americans	Arabs	Asian Americans	Hispanic Americans	Indians, Central American	Indians, North American	Indians, South American	Indigenous Peoples	Mexican Americans
31,027	2,437	5,612	18,893	124	5,657	633	174	3,234

These discrepancies can significantly impact information retrieval systems, downstream machine learning models, and other NLP applications that utilize these pretrained models.

Training data

This model was trained on all PubMed abstracts categorized under Psychology and Psychiatry. As of March 1, this corresponds to approximately 3.2 million papers with abstract text. Among these 3.2 million papers, relevant sparse categories were back - translated from English to French and then from French to English to increase the representation of sparser mental health categories. This included back - translating papers in the following categories:

Depressive Disorder
Risk Factors
Mental Disorders
Child, Preschool
Mental Health

In total, this process added 557,980 additional papers to the training data.

Training procedure

The model was further pretrained on Psychology and Psychiatry PubMed papers for 10 epochs. Default parameters were used, except for the gradient accumulation steps, which were set to 4, and the per - device train batch size was 32. Two Nvidia 3090 GPUs were used in the model development.

Evaluation results

To assess the effectiveness of Psych - Search in the mental health domain, an evaluation task was designed by fine - tuning Psych - Search on a task similar to BioASQ Task A. In this evaluation, large - scale biomedical indexing was performed using the MESH taxonomy associated with each paper in Psychology and Psychiatry. The evaluation metric is the micro F1 score across all second - level descriptors within Psychology and Psychiatry, corresponding to 38 different MESH categories used during evaluation.

Model	Micro F1 Score
bert - base - uncased	0.7348
SciBERT Scivocab Uncased	0.7394
Psych - Search	0.7415

Next Steps

If you are interested in continuing this work or have other ideas on how to build upon it, please contact us at nlp4good@gmail.com. Our goal is to bring state - of - the - art NLP capabilities to under - researched areas, with mental health as our top priority.

📄 License

This project is licensed under the Apache - 2.0 license.

Property	Details
Model Type	An extension of allenai/scibert_scivocab_uncased
Training Data	Approximately 3.2 million PubMed abstracts under Psychology and Psychiatry, with additional 557,980 papers after back - translation of relevant sparse mental health categories

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご