🚀 ESM-2序列分类器
这是一个基于GPT - 4生成的合成数据训练的小型序列分类器,它可以将蛋白质序列分为三个类别:酶
(类别 0
)、受体蛋白
(类别 1
)和结构蛋白
(类别 2
)。该分类器使用了 facebook/esm2_t6_8M_UR50D 进行训练,这是 ESM - 2模型 之一。
此模型尚未经过充分测试,仅用于实验和教育目的,请谨慎使用。
🚀 快速开始
✨ 主要特性
📦 安装指南
文档未提及安装步骤,故跳过此章节。
💻 使用示例
基础用法
model = EsmForSequenceClassification.from_pretrained("AmelieSchreiber/esm2_t6_8M_UR50D_sequence_classifier_v1")
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
new_sequences_0 = [
"ACGYLKTPKLADPPVLRGDSSVTKAICKPDPVLEK",
"GVALDECKALDYLPGKPLPMDGKVCQCGSKTPLRP",
"VLPGYTCGELDCKPGKPLPKCGADKTQVATPFLRG",
"TCGALVQYPSCADPPVLRGSDSSVKACKKLDPQDK",
"GALCEECKLCPGADYKPMDGDRLPAAATSKTRPVG",
"PAVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYG",
"VLGYTCGALDCKPGKPLPKCGADKTQVATPFLRGA",
"CGALVQYPSCADPPVLRGSDSSVKACKKLDPQDKT",
"ALCEECKLCPGADYKPMDGDRLPAAATSKTRPVGK",
"AVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYGR",
]
new_sequences_1 = [
"VGQRFYGGRQKNRHCELSPLPSACRGSVQGALYTD",
"KDQVLTVPTYACRCCPKMDSKGRVPSTLRVKSARS",
"PLAGVACGRGLDYRCPRKMVPGDLQVTPATQRPYG",
"CGVRLGYPGCADVPLRGRSSFAPRACMKKDPRVTR",
"RKGVAYLYECRKLRCRADYKPRGMDGRRLPKASTT",
"RPTGAVNCKQAKVYRGLPLPMMGKVPRVCRSRRPY",
"RLDGGYTCGQALDCKPGRKPPKMGCADLKSTVATP",
"LGTCRKLVRYPQCADPPVMGRSSFRPKACCRQDPV",
"RVGYAMCSPKLCSCRADYKPPMGDGDRLPKAATSK",
"QPKAVNCRKAMVYRPKPLPMDKGVPVCRSKRPRPY",
]
new_sequences_2 = [
"VGKGFRYGSSQKRYLHCQKSALPPSCRRGKGQGSAT",
"KDPTVMTVGTYSCQCPKQDSRGSVQPTSRVKTSRSK",
"PLVGKACGRSSDYKCPGQMVSGGSKQTPASQRPSYD",
"CGKKLVGYPSSKADVPLQGRSSFSPKACKKDPQMTS",
"RKGVASLYCSSKLSCKAQYSKGMSDGRSPKASSTTS",
"RPKSAASCEQAKSYRSLSLPSMKGKVPSKCSRSKRP",
"RSDVSYTSCSQSKDCKPSKPPKMSGSKDSSTVATPS",
"LSTCSKKVAYPSSKADPPSSGRSSFSMKACKKQDPPV",
"RVGSASSEPKSSCSVQSYSKPSMSGDSSPKASSTSK",
"QPSASNCEKMSSYRPSLPSMSKGVPSSRSKSSPPYQ",
]
new_sequences = new_sequences_0 + new_sequences_1 + new_sequences_2
inputs = tokenizer(new_sequences, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_ids = torch.argmax(logits, dim=-1)
for sequence, predicted_class in zip(new_sequences, predicted_class_ids):
print(f"Sequence: {sequence}, Predicted class: {predicted_class.item()}")
📚 详细文档
文档未提及详细说明内容,故跳过此章节。
🔧 技术细节
文档未提及技术实现细节,故跳过此章节。
📄 许可证
本项目采用MIT许可证。