roberta_with_kornliオープンソースモデル - 韓国語のゼロサンプル分類タスクに無料で利用可能

ホーム

Roberta With Kornli

pongjinによって開発

このモデルはklue/roberta-baseモデルを基に、kor_nliデータセットのmnliとxnliを使用してファインチューニングされ、韓国語ゼロショット分類タスク専用に設計されています。

テキスト分類

Transformers

韓国語オープンソースライセンス:Apache-2.0 #韓国語ゼロショット分類 #NLIファインチューニングモデル #金融テキスト分析

ダウンロード数 52

リリース時間 : 6/22/2023

モデル概要

このモデルはKLUE-RoBERTaベースモデルをファインチューニングすることで、特定のタスク訓練なしに韓国語テキストのゼロショット分類を可能にします。

モデル特徴

韓国語ゼロショット分類

韓国語テキストに特化したゼロショット分類能力で、特定タスク訓練なしに分類可能。

KLUE-RoBERTaベース

韓国語理解評価(KLUE)ベンチマークで最適化されたRoBERTaモデルで、韓国語理解力に優れる。

カスタムパラメータ処理

transformers 4.7.0バージョンにおけるゼロショット分類パイプラインの互換性問題をカスタムパラメータプロセッサで解決。

モデル能力

韓国語テキスト分類

ゼロショット学習

自然言語推論

使用事例

金融ニュース分類

株式ニュース分類

韓国語金融ニュースを株式、為替等の事前定義カテゴリに自動分類。

例では、モデルはニュース'배당락 D-1 코스피, 2330선 상승세...외인·기관 사자'を'주식'(株式)と分類し、精度50.5%を達成。

コンテンツモデレーション

韓国語コンテンツ分類

ユーザー生成韓国語コンテンツの自動分類で、コンテンツモデレーションや推薦システムに活用。

🚀 ゼロショット分類モデル

このモデルは、kor_nliのmnliとxnliでklue/roberta-baseモデルをファインチューニングしたものです。

🚀 クイックスタート

このモデルは、以下のリンクを参考に作成されました: https://github.com/Huffon/klue-transformers-tutorial.git

✨ 主な機能

ゼロショット分類タスクに対応
RoBERTaベースのモデルをkor_nliデータセットでファインチューニング

📦 インストール

インストールに関する具体的な手順は提供されていません。

💻 使用例

基本的な使用法

class ArgumentHandler(ABC):
    """
    Base interface for handling arguments for each :class:`~transformers.pipelines.Pipeline`.
    """

    @abstractmethod
    def __call__(self, *args, **kwargs):
        raise NotImplementedError()


class CustomZeroShotClassificationArgumentHandler(ArgumentHandler):
    """
    Handles arguments for zero-shot for text classification by turning each possible label into an NLI
    premise/hypothesis pair.
    """

    def _parse_labels(self, labels):
        if isinstance(labels, str):
            labels = [label.strip() for label in labels.split(",")]
        return labels

    def __call__(self, sequences, labels, hypothesis_template):
        if len(labels) == 0 or len(sequences) == 0:
            raise ValueError("You must include at least one label and at least one sequence.")
        if hypothesis_template.format(labels[0]) == hypothesis_template:
            raise ValueError(
                (
                    'The provided hypothesis_template "{}" was not able to be formatted with the target labels. '
                    "Make sure the passed template includes formatting syntax such as {{}} where the label should go."
                ).format(hypothesis_template)
            )

        if isinstance(sequences, str):
            sequences = [sequences]
        labels = self._parse_labels(labels)

        sequence_pairs = []
        for label in labels:
            # 수정부: 두 문장을 페어로 입력했을 때, `token_type_ids`가 자동으로 붙는 문제를 방지하기 위해 미리 두 문장을 `sep_token` 기준으로 이어주도록 함
            sequence_pairs.append(f"{sequences} {tokenizer.sep_token} {hypothesis_template.format(label)}")

        return sequence_pairs, sequences

高度な使用法

classifier = pipeline(
    "zero-shot-classification",
    args_parser=CustomZeroShotClassificationArgumentHandler(),
    model="pongjin/roberta_with_kornli"
)

sequence = "배당락 D-1 코스피, 2330선 상승세...외인·기관 사자"	
candidate_labels =["외환",'환율', "경제", "금융", "부동산","주식"]

classifier(
    sequence,
    candidate_labels,
    hypothesis_template='이는 {}에 관한 것이다.',
)

>>{'sequence': '배당락 D-1 코스피, 2330선 상승세...외인·기관 사자',
 'labels': ['주식', '금융', '경제', '외환', '환율', '부동산'],
 'scores': [0.5052872896194458,
  0.17972524464130402,
  0.13852974772453308,
  0.09460823982954025,
  0.042949128895998,
  0.038900360465049744]}