開源LLMLingua-2模型 - 高效實現任務無關提示壓縮，免費部署超實用！

首頁

Llmlingua 2 Xlm Roberta Large Meetingbank

由microsoft開發

LLMLingua-2是一個基於XLM-RoBERTa大尺寸模型微調的令牌分類模型，用於任務無關的提示壓縮。

大型語言模型

Transformers

開源協議:MIT #提示壓縮 #多語言支持 #會議記錄處理

下載量 33.74k

發布時間 : 3/17/2024

模型概述

該模型用於執行任務無關的提示壓縮中的令牌分類，每個令牌的保留概率作為壓縮指標。

模型特點

任務無關提示壓縮

能夠在不依賴特定任務的情況下進行高效的提示壓縮

數據蒸餾方法

採用數據蒸餾方法訓練，提高了壓縮的效率和忠實度

多語言支持

基於XLM-RoBERTa模型，支持多語言處理

模型能力

文本壓縮

令牌分類

多語言處理

使用案例

會議記錄處理

會議記錄壓縮

壓縮冗長的會議記錄，保留關鍵信息

提高下游任務（如QA和摘要）的效率

提示優化

LLM提示壓縮

減少輸入提示的長度，同時保持語義完整性

降低計算成本，提高推理速度

🚀 LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank

該模型出自論文 LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression (Pan et al, 2024)。它是在 XLM-RoBERTa（大型模型）基礎上微調而來，用於執行與任務無關的提示壓縮的標記分類任務。每個標記 $x_i$ 的保留概率 $p_{preserve}$ 被用作壓縮指標。此模型在提取式文本壓縮數據集上進行訓練，該數據集是根據 LLMLingua-2 中提出的方法構建的，使用 MeetingBank (Hu et al, 2023) 中的訓練示例作為種子數據。

你可以使用此數據集在下游任務（如問答（QA）和壓縮會議記錄的摘要）上評估該模型。

更多詳細信息，請查看 LLMLingua-2 和 LLMLingua 系列的主頁。

🚀 快速開始

本模型可用於執行與任務無關的提示壓縮的標記分類任務，能夠有效提升下游任務（如問答和摘要）的效率。

✨ 主要特性

基於 XLM - RoBERTa 大型模型微調，用於任務無關的提示壓縮。
使用標記保留概率作為壓縮指標。
在特定構建的提取式文本壓縮數據集上訓練。

💻 使用示例

基礎用法

from llmlingua import PromptCompressor

compressor = PromptCompressor(
    model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
    use_llmlingua2=True
)

original_prompt = """John: So, um, I've been thinking about the project, you know, and I believe we need to, uh, make some changes. I mean, we want the project to succeed, right? So, like, I think we should consider maybe revising the timeline.
Sarah: I totally agree, John. I mean, we have to be realistic, you know. The timeline is, like, too tight. You know what I mean? We should definitely extend it.
"""
results = compressor.compress_prompt_llmlingua2(
    original_prompt,
    rate=0.6,
    force_tokens=['\n', '.', '!', '?', ','],
    chunk_end_tokens=['.', '\n'],
    return_word_label=True,
    drop_consecutive=True
)

print(results.keys())
print(f"Compressed prompt: {results['compressed_prompt']}")
print(f"Original tokens: {results['origin_tokens']}")
print(f"Compressed tokens: {results['compressed_tokens']}")
print(f"Compression rate: {results['rate']}")

# get the annotated results over the original prompt
word_sep = "\t\t|\t\t"
label_sep = " "
lines = results["fn_labeled_original_prompt"].split(word_sep)
annotated_results = []
for line in lines:
    word, label = line.split(label_sep)
    annotated_results.append((word, '+') if label == '1' else (word, '-')) # list of tuples: (word, label)
print("Annotated results:")
for word, label in annotated_results[:10]:
    print(f"{word} {label}")

📄 許可證

本項目採用 MIT 許可證。

📚 引用

@article{wu2024llmlingua2,
    title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
    author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
    url = "https://arxiv.org/abs/2403.12968",
    journal = "ArXiv preprint",
    volume = "abs/2403.12968",
    year = "2024",
}