🚀 Orkhan/llama-2-7b-absa
Orkhan/llama-2-7b-absa
是 Llama-2-7b 模型的微调版本,它使用包含 2000 个句子的手动标注数据集,针对基于方面的情感分析(ABSA)进行了优化。这一改进使该模型能够熟练地识别方面并准确分析情感,使其成为各种应用中进行细致情感分析的宝贵工具。
与传统的基于方面的情感分析模型相比,其优势在于,由于 llama-2-7b-absa 模型具有很好的泛化能力,你无需使用特定领域的标注数据来训练模型。不过,你可能需要更多的计算资源。

🚀 快速开始
在进行推理时,请注意该模型是在句子上进行训练的,而非段落。它适用于支持 T4 GPU 的免费 Google Colab 笔记本。
点击访问 Colab 笔记本
它能做什么?
你输入一个句子,就能得到该句子中的方面、观点、情感和短语(观点 + 方面)。
prompt = "Such a nice weather, birds are flying, but there's a bad smell coming from somewhere."
raw_result, output_dict = process_prompt(prompt, base_model)
print(output_dict)
>>>{'user_prompt': 'Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.',
'interpreted_input': ' Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.',
'aspects': ['weather', 'birds', 'smell'],
'opinions': ['nice', 'flying', 'bad'],
'sentiments': ['Positive', 'Positive', 'Negative'],
'phrases': ['nice weather', 'flying birds', 'bad smell']}
📦 安装指南
安装依赖
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
导入必要的库
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig, PeftModel
import torch
加载模型并与 LoRA 权重合并
model_name = "Orkhan/llama-2-7b-absa"
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map={"": 0},
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1
加载分词器
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
💻 使用示例
基础用法
对于处理输入和输出,建议使用以下与 ABSA 相关的函数:
def process_output(result, user_prompt):
interpreted_input = result[0]['generated_text'].split('### Assistant:')[0].split('### Human:')[1]
new_output = result[0]['generated_text'].split('### Assistant:')[1].split(')')[0].strip()
new_output.split('## Opinion detected:')
aspect_opinion_sentiment = new_output
aspects = aspect_opinion_sentiment.split('Aspect detected:')[1].split('##')[0]
opinions = aspect_opinion_sentiment.split('Opinion detected:')[1].split('## Sentiment detected:')[0]
sentiments = aspect_opinion_sentiment.split('## Sentiment detected:')[1]
aspect_list = [aspect.strip() for aspect in aspects.split(',') if ',' in aspects]
opinion_list = [opinion.strip() for opinion in opinions.split(',') if ',' in opinions]
sentiments_list = [sentiment.strip() for sentiment in sentiments.split(',') if ',' in sentiments]
phrases = [opinion + ' ' + aspect for opinion, aspect in zip(opinion_list, aspect_list)]
output_dict = {
'user_prompt': user_prompt,
'interpreted_input': interpreted_input,
'aspects': aspect_list,
'opinions': opinion_list,
'sentiments': sentiments_list,
'phrases': phrases
}
return output_dict
def process_prompt(user_prompt, model):
edited_prompt = "### Human: " + user_prompt + '.###'
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=len(tokenizer.encode(user_prompt))*4)
result = pipe(edited_prompt)
output_dict = process_output(result, user_prompt)
return result, output_dict
推理示例
prompt = "Such a nice weather, birds are flying, but there's a bad smell coming from somewhere."
raw_result, output_dict = process_prompt(prompt, base_model)
print('raw_result: ', raw_result)
print('output_dict: ', output_dict)
输出示例
raw_result:
[{'generated_text': '### Human: Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.
output_dict:
{'user_prompt': 'Such a nice weather, birds are flying,but there's a bad smell coming from somewhere.',
'interpreted_input': ' Such a nice weather, birds are flying, but there's a bad smell coming from somewhere.',
'aspects': ['weather', 'birds', 'smell'],
'opinions': ['nice', 'flying', 'bad'],
'sentiments': ['Positive', 'Positive', 'Negative'],
'phrases': ['nice weather', 'flying birds', 'bad smell']}
使用完整代码
你可以在以下 Colab 笔记本中使用完整代码:
点击访问 Colab 笔记本
📄 许可证
本项目采用 Apache-2.0 许可证。