social-bias-ner Open-source Named Entity Recognition Model - Free Detection of Social Bias Categories in Text

Home

Social Bias Ner

Developed by ethical-spectacle

A BERT fine-tuned named entity recognition model for detecting social bias categories in text

Sequence Labeling

Transformers

EnglishOpen Source License:MIT #Multi-label Bias Identification #BERT Fine-tuning #Text Fairness Detection

Downloads 3,435

Release Time : 9/20/2024

Model Overview

This model employs multi-label tagging classification technology to specifically identify three types of social bias content in text: Generalizations (GEN), Unfairness (UNFAIR), and Stereotypes (STEREO).

Model Features

Multi-label Classification Capability

Supports simultaneous identification of multiple social bias types in text

High-precision Detection

Achieves an F1 score of 0.7864, effectively identifying subtle expressions of social bias

Eco-friendly Training

Training process produced only 8kg CO2 equivalent emissions

Model Capabilities

Text Bias Detection

Multi-label Entity Recognition

Social Bias Classification

Use Cases

Content Moderation

Social Media Content Screening

Automatically detects potential biased expressions in user-generated content

Marks text segments containing stereotypes or unfair evaluations

Academic Research

Bias Language Analysis

Used in social science research to quantify bias levels in textual materials

Provides structured annotation data to support statistical analysis

🚀 Social Bias NER

This NER model, fine - tuned from BERT, is designed for multi - label token classification of generalizations, unfairness, and stereotypes, offering a solution for detecting social bias in text.

🚀 Quick Start

Transformers pipeline doesn't have a class for multi - label token classification, but you can use this code to load the model, run it, and format the output.

Basic Usage

import json
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
import gradio as gr

# init important things
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('ethical-spectacle/social-bias-ner')
model.eval()
model.to('cuda' if torch.cuda.is_available() else 'cpu')

# ids to labels we want to display
id2label = {
    0: 'O',
    1: 'B-STEREO',
    2: 'I-STEREO',
    3: 'B-GEN',
    4: 'I-GEN',
    5: 'B-UNFAIR',
    6: 'I-UNFAIR'
}

# predict function you'll want to use if using in your own code
def predict_ner_tags(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=128)
    input_ids = inputs['input_ids'].to(model.device)
    attention_mask = inputs['attention_mask'].to(model.device)

    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        probabilities = torch.sigmoid(logits)
        predicted_labels = (probabilities > 0.5).int() # remember to try your own threshold

    result = []
    tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
    for i, token in enumerate(tokens):
        if token not in tokenizer.all_special_tokens:
            label_indices = (predicted_labels[0][i] == 1).nonzero(as_tuple=False).squeeze(-1)
            labels = [id2label[idx.item()] for idx in label_indices] if label_indices.numel() > 0 else ['O']
            result.append({"token": token, "labels": labels})

    return json.dumps(result, indent=4)

✨ Features

This NER model is fine - tuned from BERT, for multi - label token classification of:

(GEN)eralizations
(UNFAIR)ness
(STEREO)types

You can try it out in spaces :).

📚 Documentation

GUS - Net Project Details:

Resources:

Please visit this collection for the datasets and model presented in the GUS - Net paper.
GUS - Net was implemented as part of The Fair - ly Project, in a Chrome Extension, and PyPI package.

Please cite:

@article{powers2024gusnet,
  title={{GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes}},
  author={Maximus Powers and Umang Mavani and Harshitha Reddy Jonala and Ansh Tiwari and Hua Wei},
  journal={arXiv preprint arXiv:2410.08388},
  year={2024},
  url={https://arxiv.org/abs/2410.08388}
}

Give our research group, Ethical Spectacle, a follow ;).

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	Fine - tuned from BERT for multi - label token classification
Training Data	Not specified
Metrics	F1: 0.7864, Recall: 0.7617
Base Model	bert - base - uncased
CO2 Eq Emissions	Emissions: 8, Training Type: fine - tuning, Geographical Location: Phoenix, AZ, Hardware Used: T4

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご