llama-7b-v1-Receipt-Key-Extraction開源模型 - 免費實現英語和阿拉伯語收據關鍵信息提取

首頁

Llama 7b V1 Receipt Key Extraction

由abdoelsayed開發

基於LLamA v1的70億參數模型，用於英語和阿拉伯語收據條目的關鍵信息提取

大型語言模型

Transformers

支持多種語言#收據關鍵信息提取 #多語言支持 #零售數據分析

下載量 41

發布時間 : 9/21/2023

模型概述

該模型是一個基於LLamA v1架構的70億參數模型，專門用於從收據文本中提取關鍵信息，支持英語和阿拉伯語。

模型特點

多語言支持

支持英語和阿拉伯語收據的關鍵信息提取

高精度提取

能夠準確提取收據中的類別、品牌、重量、單價等多種關鍵信息

基於AMuRD數據集

使用帶註釋的多語言收據數據集訓練，適用於跨語言關鍵信息提取

模型能力

文本信息提取

多語言處理

結構化數據生成

使用案例

零售業

收據信息自動化處理

自動從零售收據中提取商品信息

提高數據處理效率，減少人工錄入錯誤

財務系統

費用報銷自動化

自動識別和分類報銷單據中的費用項目

簡化報銷流程，提高財務處理效率

🚀 llama-7b-v1-Receipt-Key-Extraction

llama-7b-v1-Receipt-Key-Extraction是一個基於LLamA v1的70億參數模型，主要用於英文和阿拉伯文的收據關鍵信息提取研究。

🚀 快速開始

模型使用

該模型僅用於英文和阿拉伯文收據中物品的關鍵信息提取研究。

開始使用模型

使用以下代碼開始使用該模型：

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

checkpoint = "abdoelsayed/llama-7b-v1-Receipt-Key-Extraction"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, model_max_length=512,
        padding_side="right",
        use_fast=False,)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

def generate_response(instruction, input_text, max_new_tokens=100, temperature=0.1,  num_beams=4 ,top_k=40):
    prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to(device)
    generation_config = GenerationConfig(
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            num_beams=num_beams,
        )
    with torch.no_grad():
        outputs = model.generate(input_ids,generation_config=generation_config, max_new_tokens=max_new_tokens)
    outputs = tokenizer.decode(outputs.sequences[0])
    return output.split("### Response:")[-1].strip().replace("</s>","")

instruction = "Extract the class, Brand, Weight, Number of units, Size of units, Price, T.Price, Pack, Unit from the following sentence"
input_text = "Americana Okra zero 400 gm"

response = generate_response(instruction, input_text)
print(response)

📚 詳細文檔

引用方式

請使用以下格式引用該模型：

@misc{abdallah2023amurd,
    title={AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification},
    author={Abdelrahman Abdallah and Mahmoud Abdalla and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt},
    year={2023},
    eprint={2309.09800},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

📄 許可證

本模型使用llama2許可證。

📋 模型信息

屬性	詳情
模型類型	llama-7b-v1-Receipt-Key-Extraction
支持語言	英文、阿拉伯文
評估指標	準確率、F1值
庫名稱	transformers
參考論文	AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification