🚀 Qwen2.5-14B-CIC-ACLARC
这是一个针对引文意图分类进行微调的模型,基于 Qwen 2.5 14B Instruct 构建,并在 ACL-ARC 数据集上进行训练。
GGUF 版本:https://huggingface.co/sknow-lab/Qwen2.5-14B-CIC-ACLARC-GGUF
🚀 快速开始
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "sknow-lab/Qwen2.5-14B-CIC-ACLARC"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
system_prompt = """
# CONTEXT #
You are an expert researcher tasked with classifying the intent of a citation in a scientific publication.
########
# OBJECTIVE #
You will be given a sentence containing a citation, you must output the appropriate class as an answer.
########
# CLASS DEFINITIONS #
The six (6) possible classes are the following: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE".
The definitions of the classes are:
1 - BACKGROUND: The cited paper provides relevant Background information or is part of the body of literature.
2 - MOTIVATION: The citing paper is directly motivated by the cited paper.
3 - USES: The citing paper uses the methodology or tools created by the cited paper.
4 - EXTENDS: The citing paper extends the methods, tools or data, etc. of the cited paper.
5 - COMPARES_CONTRASTS: The citing paper expresses similarities or differences to, or disagrees with, the cited paper.
6 - FUTURE: The cited paper may be a potential avenue for future work.
########
# RESPONSE RULES #
- Analyze only the citation marked with the @@CITATION@@ tag.
- Assign exactly one class to each citation.
- Respond only with the exact name of one of the following classes: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE".
- Do not provide any explanation or elaboration.
"""
test_citing_sentence = "However , the method we are currently using in the ATIS domain ( @@CITATION@@ ) represents our most promising approach to this problem."
user_prompt = f"""
{test_citing_sentence}
### Question: Which is the most likely intent for this citation?
a) BACKGROUND
b) MOTIVATION
c) USES
d) EXTENDS
e) COMPARES_CONTRASTS
f) FUTURE
### Answer:
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
关于系统提示和查询模板的详细信息可以在论文中找到。
可能需要一个清理函数来从输出中提取预测标签。你可以在 GitHub 上找到我们的实现。
📚 详细文档
ACL-ARC 类别
类别 |
描述 |
背景信息 |
被引用的论文提供了相关的背景信息,或是文献体系的一部分。 |
动机 |
引用论文直接受到被引用论文的启发。 |
使用 |
引用论文使用了被引用论文所创建的方法或工具。 |
扩展 |
引用论文对被引用论文的方法、工具或数据等进行了扩展。 |
比较或对比 |
引用论文表达了与被引用论文的相似性、差异性或不同观点。 |
未来方向 |
被引用的论文可能是未来工作的潜在方向。 |
📄 许可证
本项目采用 Apache-2.0 许可证。
📚 引用
@misc{koloveas2025llmspredictcitationintent,
title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs},
author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos},
year={2025},
eprint={2502.14561},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14561},
}