Interpress Turkish News Classification Open-Source Model - Achieve Turkish news classification for free with high accuracy

Interpress Turkish News Classification

Developed by serdarakyol

This is a Turkish news classification model trained on the interpress news dataset with an accuracy rate of 97%.

Text Classification Other#Turkish News Classification #High Accuracy (97%)#Multi-category News Recognition

Downloads 40

Release Time : 3/2/2022

Model Overview

This model is used for classifying Turkish news, supporting 10 categories including politics, economy, international, etc.

Model Features

High Accuracy

Achieves 97% accuracy on training and validation data

Multi-category Classification

Supports classification into 10 different news categories

Turkish Language Support

Specially optimized for Turkish news

Model Capabilities

Turkish Text Classification

News Content Analysis

Multi-category Prediction

Use Cases

News Media

Automatic News Classification

Automatically categorizes news articles into 10 predefined categories

97% accuracy

Content Analysis

News Trend Analysis

Analyzes news trends over specific time periods through classification results

🚀 INTERPRESS NEWS CLASSIFICATION

This project focuses on classifying Interpress news using machine learning techniques, achieving high accuracy on real - world data.

🚀 Quick Start

The following sections will guide you through the dataset, model performance, and how to use the model with Torch and TensorFlow.

✨ Features

Real - world data: Utilizes a real - world news dataset from Interpress.
High accuracy: The model achieves 97% accuracy on both training and validation data.
Multi - framework support: Can be used with both Torch and TensorFlow.

📦 Installation

To use this model, you need to install the transformers library. You can use the following command:

pip install transformers or pip install transformers==4.3.3

📚 Documentation

Dataset

The dataset is downloaded from Interpress. It is real - world data. Initially, there were 273K data, but I filtered them and used 108K data for this model. For more information about the dataset, please visit this link.

Model

The model's accuracy on the training data and validation data is 97%. The data is split into 80% for training and 20% for validation. The results are shown as follows:

Classification report

Confusion matrix

💻 Usage Examples

Usage for Torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("serdarakyol/interpress-turkish-news-classification")
model = AutoModelForSequenceClassification.from_pretrained("serdarakyol/interpress-turkish-news-classification")

import torch

if torch.cuda.is_available():    
    device = torch.device("cuda")
    model = model.cuda()
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('GPU name is:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

import numpy as np

def prediction(news):
    news=[news]
    indices=tokenizer.batch_encode_plus(
    news,
    max_length=512,
    add_special_tokens=True,
    return_attention_mask=True,
    padding='max_length',
    truncation=True,
    return_tensors='pt')

    inputs = indices["input_ids"].clone().detach().to(device)
    masks = indices["attention_mask"].clone().detach().to(device)

    with torch.no_grad():
        output = model(inputs, token_type_ids=None,attention_mask=masks)

    logits = output[0]
    logits = logits.detach().cpu().numpy()
    pred = np.argmax(logits,axis=1)[0]
    return pred

news = r"ABD'den Prens Selman'a yaptırım yok Beyaz Saray Sözcüsü Psaki, Muhammed bin Selman'a yaptırım uygulamamanın \"doğru karar\" olduğunu savundu. Psaki, \"Tarihimizde, Demokrat ve Cumhuriyetçi başkanların yönetimlerinde diplomatik ilişki içinde olduğumuz ülkelerin liderlerine yönelik yaptırım getirilmemiştir\" dedi."

You can find the news in this link (news date: 02/03/2021)

labels = {
    0 : "Culture-Art",
    1 : "Economy",
    2 : "Politics",
    3 : "Education",
    4 : "World",
    5 : "Sport",
    6 : "Technology",
    7 : "Magazine",
    8 : "Health",
    9 : "Agenda"
}
pred = prediction(news)
print(labels[pred])
# > World

Usage for TensorFlow

import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
import numpy as np

tokenizer = BertTokenizer.from_pretrained('serdarakyol/interpress-turkish-news-classification')
model = TFBertForSequenceClassification.from_pretrained("serdarakyol/interpress-turkish-news-classification")

inputs = tokenizer(news, return_tensors="tf")
inputs["labels"] = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1

outputs = model(inputs)
loss = outputs.loss
logits = outputs.logits
pred = np.argmax(logits,axis=1)[0]
labels[pred]
# > World

Acknowledgments

Thanks to @yavuzkomecoglu for contributions.

If you have any questions, please don't hesitate to contact me.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご