KeywordGen-v2オープンソースキーワード生成モデル - 無料デプロイで製品レビュー分析を支援

ホーム

Keywordgen V2

mrutyunjay-patilによって開発

KeywordGen-v2はT5ベースのモデルで、テキストからキーワードを生成するために特別に設計されており、製品レビュー分析に特に適しています。

テキスト生成

Transformers

複数言語対応オープンソースライセンス:Apache-2.0 #製品レビューのキーワード抽出 #T5ファインチューニング #ECテキスト分析

ダウンロード数 83

リリース時間 : 8/6/2023

モデル概要

このモデルはT5ベースモデルをファインチューニングしており、入力テキストから2〜8単語のキーワードを抽出できます。特に製品レビューの処理に優れており、ユーザーがテキストの核心テーマを迅速に把握するのに役立ちます。

モデル特徴

製品レビュー最適化

製品レビューに特化して最適化されており、効率的にキーポイントやテーマを抽出できます。

複数キーワード生成

2〜8単語のキーワードを生成可能で、テキストの複数の核心テーマをカバーします。

プレフィックス最適化

入力前に'Keyword: 'プレフィックスを追加することで、生成効果が大幅に向上します。

モデル能力

テキスト生成

キーワード抽出

製品レビュー分析

使用事例

製品レビュー分析

電子製品レビューのキーワード抽出

電子製品のレビューからキーワードを抽出し、ユーザーフィードバックの核心点を迅速に理解するのに役立ちます。

'画面の色彩が鮮やか'、'バッテリーの持ちが優れている'などのキーワードを生成。

多言語レビュー分析

英語レビューからのキーワード抽出をサポートし、国際化製品の分析に適しています。

レビュー内容と高度に関連する英語キーワードを生成。

🚀 KeywordGen-v2モデル

KeywordGen-v1は、文章からキーワードを生成するためにファインチューニングされたT5ベースのモデルです。入力された文章に対して、関連するキーワードを返します。

✨ 主な機能

この「KeywordGen-v2」モデルは、「KeywordGen」シリーズの第2版です。T5ベースモデルをベースに、特に商品レビューからのキーワード生成にファインチューニングされています。

このモデルは、商品レビューから要点やテーマを抽出することで、有用な洞察を提供することができます。出力は2〜8語のキーワードを含むことが期待されます。入力が少なくとも2〜3文以上の場合、モデルの性能が向上します。

📦 インストール

このモデルを使用するには、Hugging Face Transformersライブラリが必要です。以下のコマンドでインストールできます。

pip install transformers

💻 使用例

基本的な使用法

このモデルは、テキスト生成用のパイプラインで直接使用できます。モデルを使用する際には、入力の前に「Keyword: 」を付けると、最適な結果が得られます。

以下は、PythonでHugging Face Transformersライブラリを使用してこのモデルを使用する方法です。

単一入力の場合

from transformers import T5Tokenizer, T5ForConditionalGeneration

# Initialize the tokenizer and model
tokenizer = T5Tokenizer.from_pretrained("mrutyunjay-patil/keywordGen-v2")
model = T5ForConditionalGeneration.from_pretrained("mrutyunjay-patil/keywordGen-v2")

# Define your input sequence, prefixing with "Keyword: "
input_sequence = "Keyword: I purchased the new Android smartphone last week and I've been thoroughly impressed. The display is incredibly vibrant and sharp, and the battery life is surprisingly good, easily lasting a full day with heavy usage."

# Encode the input sequence
input_ids = tokenizer.encode(input_sequence, return_tensors="pt")

# Generate output
outputs = model.generate(input_ids)
output_sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(output_sequence)

複数入力の場合

from transformers import T5Tokenizer, T5ForConditionalGeneration

# Initialize the tokenizer and model
tokenizer = T5Tokenizer.from_pretrained("mrutyunjay-patil/keywordGen-v2")
model = T5ForConditionalGeneration.from_pretrained("mrutyunjay-patil/keywordGen-v2")

# Define the prefix
task_prefix = "Keyword: "

# Define your list of input sequences
inputs = [
    "Absolutely love this tablet. It has a clear, sharp screen and runs apps smoothly without any hiccups.",
    "The headphones are fantastic with great sound quality, but the build quality could be better.",
    "Bought this smartwatch last week, and I'm thrilled with its performance. Battery life is impressive.",
    "This laptop exceeded my expectations. Excellent speed, plenty of storage, and light weight. Perfect for my needs.",
    "The camera quality on this phone is exceptional. It captures detailed and vibrant photos. However, battery life is not the best."
]

# Loop through each input and generate keywords
for sample in inputs:
    input_sequence = task_prefix + sample
    input_ids = tokenizer.encode(input_sequence, return_tensors="pt")
    outputs = model.generate(input_ids)
    output_sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(sample, "\n --->", output_sequence)

🔧 技術詳細

このモデルは、カスタムデータセットでトレーニングされています。ベースモデルにはT5ベースモデルが使用されています。

📄 ライセンス

このモデルは、Apache-2.0ライセンスの下で提供されています。

⚠️ 制限事項と今後の展望

他の機械学習モデルと同様に、このキーワード生成器の出力は、トレーニングに使用されたデータに依存します。入力テキストに不適切または偏った内容が含まれている場合、モデルが不適切または偏ったキーワードを生成する可能性があります。今後のモデルのバージョンでは、ロバスト性と公平性の向上、および潜在的なバイアスの最小化を目指します。