question_decomposer_t5オープンソースモデル - 無料デプロイで複雑な問題を部分問題に分解を支援

ホーム

Question Decomposer T5

thenHungによって開発

これはT5-baseをベースとしたシーケンス・ツー・シーケンスモデルで、複雑な問題を複数のサブ問題に分解するために特別に設計されています。

テキスト生成

Safetensors

英語#問題分解 #マルチターン質問応答 #T5モデル

ダウンロード数 317

リリース時間 : 11/20/2024

モデル概要

このモデルは、複雑なマルチパート問題をより単純なサブ問題のシーケンスに分解でき、後続の処理や質問応答システムでの使用に適しています。

モデル特徴

複雑問題分解

複数の部分を含む複雑な問題を独立したサブ問題に分解可能

シーケンス・ツー・シーケンスアーキテクチャ

T5ベースのseq2seqアーキテクチャで、テキスト生成タスクに適している

複数問題処理

複数の情報ポイントを必要とする複合問題を処理可能

モデル能力

テキスト生成

問題分解

自然言語処理

使用事例

質問応答システム

複雑問題処理

質問応答システムで複雑な問題を前処理し、個別に回答可能なサブ問題に分解

複合問題に対する質問応答システムの処理能力向上

情報検索

多次元クエリ分解

複数の次元を含むクエリを独立したクエリに分解

検索システムの精度向上

## 🚀 質問分解器（T5とSeq2seqベース）

*このプロジェクトはT5とSeq2seqをベースにした質問分解器で、複合的な質問を複数のサブ質問に分解します。*

### データセットとモデル情報
| プロパティ | 詳細 |
|----------|---------|
| データセット | microsoft/ms_marco |
| 言語 | en |
| ベースモデル | google-t5/t5-base |
| パイプラインタグ | text2text-generation |

### 例
例: What is the capital of France and when it entablish ?
- What is the capital of France ?
- When was the capital of France entablish ?

👉🏻 [デモ](https://huggingface.co/spaces/thenHung/Demo-question-decomposer) をこちらで確認できます。

## 🚀 クイックスタート

### 💻 使用例

#### 基本的な使用法
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Set device
device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"

# Load model and tokenizer
model_path = "thenHung/question_decomposer_t5"
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)
model.to(device)
model.eval()

# Decompose question
question = "Who is taller between John and Mary?"
input_text = f"decompose question: {question}"
input_ids = tokenizer(
    input_text,
    max_length=128,
    padding="max_length",
    truncation=True,
    return_tensors="pt"
).input_ids.to(device)

with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_length=128,
        num_beams=4,
        early_stopping=True
    )

# Decode output
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
sub_questions = decoded_output.split(" [SEP] ")

# Print sub-questions
print(sub_questions)
# ['What is the height of John?', 'What is the height of Mary?']