オープンソースのLLMLingua - 2モデル - タスク非依存のプロンプト圧縮を効率的に実現、無料でデプロイできる超便利です！

ホーム

Llmlingua 2 Xlm Roberta Large Meetingbank

microsoftによって開発

LLMLingua-2はXLM-RoBERTa大規模モデルをファインチューニングしたトークン分類モデルで、タスク非依存のプロンプト圧縮に使用されます。

大規模言語モデル

Transformers

オープンソースライセンス:MIT #プロンプト圧縮 #多言語サポート #会議議事録処理

ダウンロード数 33.74k

リリース時間 : 3/17/2024

モデル概要

このモデルはタスク非依存のプロンプト圧縮におけるトークン分類を実行し、各トークンの保持確率を圧縮指標として使用します。

モデル特徴

タスク非依存プロンプト圧縮

特定のタスクに依存せずに効率的なプロンプト圧縮が可能

データ蒸留手法

データ蒸留手法を用いて訓練され、圧縮の効率性と忠実度が向上

多言語サポート

XLM-RoBERTaモデルベースで多言語処理をサポート

モデル能力

テキスト圧縮

トークン分類

多言語処理

使用事例

会議議事録処理

会議議事録圧縮

冗長な会議議事録を圧縮し、重要な情報を保持

QAや要約などの下流タスクの効率向上

プロンプト最適化

LLMプロンプト圧縮

入力プロンプトの長さを短縮しながら意味的完全性を維持

計算コスト削減、推論速度向上

🚀 LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank

このモデルは論文 LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression (Pan et al, 2024) で紹介されました。これは、タスク非依存のプロンプト圧縮のためのトークン分類を行うように微調整された XLM-RoBERTa (大型モデル) です。各トークン $x_i$ の確率 $p_{preserve}$ が圧縮の指標として使用されます。このモデルは、LLMLingua-2 で提案された方法論を用いて構築された抽出型テキスト圧縮データセットでトレーニングされており、MeetingBank (Hu et al, 2023) のトレーニング例をシードデータとして使用しています。

このデータセットを使用して、質問応答 (QA) や圧縮された会議記録の要約などの下流タスクでモデルを評価することができます。

詳細については、LLMLingua-2 と LLMLingua Series のホームページをご確認ください。

🚀 クイックスタート

このモデルは、タスク非依存のプロンプト圧縮を行うために設計されています。以下の使用例を参考に、モデルを使ってみましょう。

💻 使用例

基本的な使用法

from llmlingua import PromptCompressor

compressor = PromptCompressor(
    model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
    use_llmlingua2=True
)

original_prompt = """John: So, um, I've been thinking about the project, you know, and I believe we need to, uh, make some changes. I mean, we want the project to succeed, right? So, like, I think we should consider maybe revising the timeline.
Sarah: I totally agree, John. I mean, we have to be realistic, you know. The timeline is, like, too tight. You know what I mean? We should definitely extend it.
"""
results = compressor.compress_prompt_llmlingua2(
    original_prompt,
    rate=0.6,
    force_tokens=['\n', '.', '!', '?', ','],
    chunk_end_tokens=['.', '\n'],
    return_word_label=True,
    drop_consecutive=True
)

print(results.keys())
print(f"Compressed prompt: {results['compressed_prompt']}")
print(f"Original tokens: {results['origin_tokens']}")
print(f"Compressed tokens: {results['compressed_tokens']}")
print(f"Compression rate: {results['rate']}")

# get the annotated results over the original prompt
word_sep = "\t\t|\t\t"
label_sep = " "
lines = results["fn_labeled_original_prompt"].split(word_sep)
annotated_results = []
for line in lines:
    word, label = line.split(label_sep)
    annotated_results.append((word, '+') if label == '1' else (word, '-')) # list of tuples: (word, label)
print("Annotated results:")
for word, label in annotated_results[:10]:
    print(f"{word} {label}")

📄 ライセンス

このプロジェクトは MIT ライセンスの下でライセンスされています。

📚 引用

@article{wu2024llmlingua2,
    title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
    author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
    url = "https://arxiv.org/abs/2403.12968",
    journal = "ArXiv preprint",
    volume = "abs/2403.12968",
    year = "2024",
}