Open-source model: xlm-roberta-large-english-cap-minor-platforms - Essential for multilingual text classification and fine-grained topic encoding

Xlm Roberta Large English Cap Minor Platforms

Developed by poltextlab

Multilingual text classification model based on xlm-roberta-large architecture, specifically designed for minor topic coding in the Comparative Agendas Project

Text Classification

PyTorch

OtherOpen Source License:MIT #Multilingual zero-shot classification #Policy text analysis #Comparative Agendas Project specific

Downloads 18

Release Time : 4/8/2025

Model Overview

This model is fine-tuned on English training data labeled with minor topic codes from the Comparative Agendas Project, primarily used for zero-shot text classification tasks.

Model Features

Multilingual support

Based on XLM-RoBERTa architecture, capable of processing multilingual texts

Minor topic coding

Optimized specifically for minor topic coding tasks in the Comparative Agendas Project

Academic use only

Primarily intended for academic research purposes, non-academic use requires special application

Model Capabilities

Zero-shot text classification

Multilingual text processing

Minor topic recognition

Use Cases

Political text analysis

Policy document classification

Performing minor topic coding on government policy documents

Accuracy 0.39, weighted F1 score 0.3

Comparative political research

Supporting cross-national comparative analysis of political agendas

🚀 xlm-roberta-large-english-cap-minor-platforms

An xlm-roberta-large model finetuned on English training data for text classification.

🚀 Quick Start

This is an xlm-roberta-large model that has been fine - tuned on English training data. The data is labeled with minor topic codes from the Comparative Agendas Project.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(
    model="poltextlab/xlm-roberta-large-english-cap-minor-platforms",
    task="text-classification",
    tokenizer=tokenizer,
    use_fast=False,
    token="<your_hf_read_only_token>"
)

text = "We will place an immediate 6 - month halt on the finance driven closure of beds and wards, and set up an independent audit of needs and facilities."
pipe(text)

Advanced Usage

# Due to the gated access, you must pass the `token` parameter when loading the model. 
# In earlier versions of the Transformers package, you may need to use the `use_auth_token` parameter instead.
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(
    model="poltextlab/xlm-roberta-large-english-cap-minor-platforms",
    task="text-classification",
    tokenizer=tokenizer,
    use_fast=False,
    # In earlier versions, use use_auth_token instead
    use_auth_token="<your_hf_read_only_token>" 
)

text = "Your input text here"
pipe(text)

📚 Documentation

Model Performance

The model was evaluated on a test set of 8922 examples (20% of the available data).

Accuracy: 0.39.
Weighted Average F1 - score: 0.3

Inference Platform

This model is used by the CAP Babel Machine, an open - source and free natural language processing tool, designed to simplify and speed up projects for comparative research.

Cooperation

Model performance can be significantly improved by extending our training sets. We appreciate every submission of CAP - coded corpora (of any domain and language) at poltextlab{at}poltextlab{dot}com or by using the CAP Babel Machine.

Debugging and Issues

This architecture uses the sentencepiece tokenizer. In order to run the model before transformers==4.27 you need to install it manually.

If you encounter a RuntimeError when loading the model using the from_pretrained() method, adding ignore_mismatched_sizes=True should solve the issue.

📄 License

This model is released under the MIT license.

🔍 Additional Information

Metrics

accuracy
f1 - score

Gated Access

Our models are intended for academic use only. If you are not affiliated with an academic institution, please provide a rationale for using our models. Please allow us a few business days to manually review subscriptions.

Gated Fields

Property	Details
Name	text
Country	country
Institution	text
Institution Email	text
Please specify your academic use case	text

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご