🚀 GPTuzmodel
GPTuz是基于GPT - 2小型模型的乌兹别克语先进语言模型。该模型在NVIDIA V100 32GB的GPU上,使用从kun.uz获取的0.53GB数据,基于迁移学习和微调技术训练了超过1天。
🚀 快速开始
模型加载
from transformers import AutoTokenizer, AutoModelWithLMHead
import torch
tokenizer = AutoTokenizer.from_pretrained("rifkat/GPTuz")
model = AutoModelWithLMHead.from_pretrained("rifkat/GPTuz")
tokenizer.model_max_length=1024
生成单个单词
text = "Covid-19 га қарши эмлаш бошланди,"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs, labels=inputs["input_ids"])
loss, logits = outputs[:2]
predicted_index = torch.argmax(logits[0, -1, :]).item()
predicted_text = tokenizer.decode([predicted_index])
print('input text:', text)
print('predicted text:', predicted_text)
生成完整序列
text = "Covid-19 га қарши эмлаш бошланди, "
inputs = tokenizer(text, return_tensors="pt")
sample_outputs = model.generate(inputs.input_ids,
pad_token_id=50256,
do_sample=True,
max_length=50,
top_k=40,
num_return_sequences=1)
for i, sample_output in enumerate(sample_outputs):
print(">> Generated text {}\n\n{}".format(i+1, tokenizer.decode(sample_output.tolist())))
📚 详细文档
模型信息
属性 |
详情 |
模型类型 |
基于GPT - 2小型模型的乌兹别克语语言模型 |
训练数据 |
从kun.uz获取的0.53GB数据 |
训练硬件 |
NVIDIA V100 32GB GPU |
训练技术 |
迁移学习和微调 |
引用信息
@misc {rifkat_davronov_2022,
authors = { {Adilova Fatima,Rifkat Davronov, Samariddin Kushmuratov, Ruzmat Safarov} },
title = { GPTuz (Revision 2a7e6c0) },
year = 2022,
url = { https://huggingface.co/rifkat/GPTuz },
doi = { 10.57967/hf/0143 },
publisher = { Hugging Face }
}
📄 许可证
本项目采用Apache - 2.0许可证。