classify-news-category-iptc Open-source News Classification Model - Multi-language Support for Intelligent Classification of 16 News Categories

Classify News Category Iptc

Developed by ilsilfverskiold

This is a multilingual news classification model that can classify news content in Norwegian, Swedish, and English according to IPTC news codes, supporting 16 predefined categories.

Text Classification

Transformers

#Multilingual news classification #IPTC code classification #Nordic language support

Downloads 125.81k

Release Time : 5/16/2024

Model Overview

This model is fine-tuned on the basis of KB/bert-base-swedish-cased and is specifically designed for news text classification tasks. It is optimized for severely skewed datasets and shows good performance on the evaluation set.

Model Features

Multilingual support

Capable of handling news text classification in Norwegian, Swedish, and English

Domain-specific optimization

Specifically optimized for 16 categories specified by IPTC news codes

Confidence threshold

Output categories only when the prediction confidence exceeds 60% to ensure classification reliability

Model Capabilities

News text classification

Multilingual text processing

IPTC news code recognition

Use Cases

News media

Automatic news classification

Automatically classify news content into predefined IPTC categories

Achieved an accuracy of 74.31% on the evaluation set

Multilingual news aggregation

Perform unified classification processing on news in Norwegian, Swedish, and English

Supports news classification in three languages

🚀 News Category Classification for IPTC NewsCodes

This model is designed to classify news content in Norwegian, Swedish, and English into 16 specified categories according to IPTC NewsCodes. It's a fine - tuned version of [KB/bert - base - swedish - cased](https://huggingface.co/KB/bert - base - swedish - cased) on a private dataset, aiming to outperform Claude Haiku and GPT - 3.5 in this specific use case.

✨ Features

Built from a limited set of English, Swedish, and Norwegian titles for news content classification.
Fine - tuned on a skewed dataset that has been slightly augmented for stability.
Can categorize news texts into 16 IPTC NewsCodes categories.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

To use this model for news category classification, you can input a news title and get the corresponding category.

Input: Mann siktet for drapsforsøk på Slovakias statsministeren Output: politics

Input: Tre døde i kioskbrann i Tyskland Output: disaster, accident, and emergency incident

Input: Kultfilm får Netflix - oppfølger. Kultfilmen «Happy Gilmore» fra 1996 får en oppfølger på Netflix. Det røper strømmetjenesten selv på X, tidligere Twitter. –Happy Gilmore er tilbake! Output: arts, culture, entertainment and media

Advanced Usage

When using the model, it's recommended to only set the category if the value for the label is at least 60%. Otherwise, the model is uncertain about the classification.

📚 Documentation

Model description

The model is a test model for demonstration purposes. It's intended to categorize Norwegian, Swedish, and English news content within the 16 specified categories. However, it needs more data in several categories to provide 100% value.

Intended uses & limitations

Use it to categorize news texts. Only set the category if the value is at least 60% for the label. Otherwise, the model is uncertain about the classification.

Performance

It achieves the following results on the evaluation set:

Loss: 0.8030
Accuracy: 0.7431
F1: 0.7474
Precision: 0.7695
Recall: 0.7431

See the performance (accuracy) for each label below:

Category	Accuracy
Arts, culture, entertainment and media	0.6842
Conflict, war and peace	0.7351
Crime, law and justice	0.8918
Disaster, accident, and emergency incident	0.8699
Economy, business, and finance	0.6893
Environment	0.4483
Health	0.7222
Human interest	0.3182
Labour	0.5
Lifestyle and leisure	0.5556
Politics	0.7909
Science and technology	0.4583
Society	0.3538
Sport	0.9615
Weather	1.0
Religion	0.0

Training and evaluation data

The model was trained with the trainer, setting a learning rate of 2e - 05 and a batch size of 16 for 3 epochs.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	2e - 05
train_batch_size	16
eval_batch_size	16
seed	42
gradient_accumulation_steps	2
total_train_batch_size	32
optimizer	Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type	linear
lr_scheduler_warmup_steps	500
num_epochs	3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall	Accuracy Label Arts, culture, entertainment and media	Accuracy Label Conflict, war and peace	Accuracy Label Crime, law and justice	Accuracy Label Disaster, accident, and emergency incident	Accuracy Label Economy, business, and finance	Accuracy Label Environment	Accuracy Label Health	Accuracy Label Human interest	Accuracy Label Labour	Accuracy Label Lifestyle and leisure	Accuracy Label Politics	Accuracy Label Science and technology	Accuracy Label Society	Accuracy Label Sport	Accuracy Label Weather
1.9761	0.2907	200	1.4046	0.6462	0.6164	0.6057	0.6462	0.3158	0.8315	0.7629	0.7055	0.5437	0.0	0.5	0.0	0.0	0.3333	0.4843	0.0833	0.0	0.9615	0.0
1.2153	0.5814	400	1.0225	0.6894	0.6868	0.7652	0.6894	0.7895	0.6554	0.8196	0.8562	0.6408	0.2414	0.8333	0.1364	0.0	0.6667	0.8467	0.375	0.0154	0.9615	1.0
0.954	0.8721	600	0.8858	0.7231	0.7138	0.7309	0.7231	0.7368	0.7795	0.8918	0.8699	0.6214	0.3448	0.8889	0.1818	1.0	0.5556	0.6899	0.0833	0.0	0.9615	0.0
0.6662	1.1628	800	0.9381	0.6881	0.7009	0.7618	0.6881	0.7895	0.6126	0.8454	0.8630	0.6505	0.4483	0.7222	0.2273	1.0	0.4444	0.8293	0.5417	0.2308	0.9615	1.0
0.5554	1.4535	1000	0.8791	0.7025	0.7124	0.7628	0.7025	0.7368	0.6478	0.9021	0.8562	0.6602	0.3103	0.7778	0.3636	0.5	0.5556	0.8084	0.5	0.1846	0.9615	1.0
0.4396	1.7442	1200	0.8275	0.7175	0.7280	0.7686	0.7175	0.7895	0.6631	0.8196	0.8836	0.6893	0.3793	0.8333	0.4091	0.5	0.5556	0.8362	0.4167	0.3692	0.9615	1.0
0.383	2.0349	1400	0.7929	0.745	0.7501	0.7653	0.745	0.6842	0.7841	0.8866	0.8767	0.7087	0.4483	0.7778	0.4091	0.5	0.5556	0.6899	0.4167	0.2923	0.9615	0.0
0.3418	2.3256	1600	0.8042	0.7438	0.7440	0.7686	0.7438	0.7895	0.7351	0.9072	0.8493	0.7864	0.4483	0.7778	0.3182	0.5	0.5556	0.7909	0.4167	0.1846	0.9615	0.0
0.248	2.6163	1800	0.8387	0.7275	0.7325	0.7610	0.7275	0.6842	0.6891	0.8814	0.8699	0.7573	0.4138	0.8333	0.4091	0.5	0.5556	0.8014	0.4167	0.2769	0.9615	0.0
0.2525	2.9070	2000	0.8137	0.735	0.7413	0.7697	0.735	0.6842	0.7106	0.8763	0.8699	0.6796	0.4483	0.7222	0.3636	0.5	0.5556	0.8153	0.4583	0.3385	0.9615	0.0

Framework versions

Transformers 4.40.2
Pytorch 2.2.1+cu121
Datasets 2.19.1
Tokenizers 0.19.1

🔧 Technical Details

The model is fine - tuned from [KB/bert - base - swedish - cased](https://huggingface.co/KB/bert - base - swedish - cased) on a private dataset. The dataset is skewed but has been augmented to some extent. The model uses specific hyperparameters during training to achieve better performance in news category classification.

📄 License

No license information is provided in the original document, so this section is skipped.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご