Arsh-llm開源語言模型 - 免費生成創意故事、連貫文本與實用代碼

首頁

Arsh Llm

由Arsh-ai開發

Arsh-llm 是一個基於Llama架構的5000萬參數語言模型，擅長生成創意故事、連貫文本和實用代碼。

大型語言模型

Transformers

英語開源協議:MIT #輕量級故事生成 #對話微調優化 #代碼輔助生成

下載量 1,481

發布時間 : 5/27/2025

模型概述

Arsh-llm 是一個緊湊而強大的語言模型，經過預訓練和微調，適用於創意寫作、代碼生成和對話式AI等任務。

模型特點

緊湊高效

僅5000萬參數，在T4 GPU上訓練，資源佔用低但性能出色。

多功能生成

能夠生成創意故事、連貫文本和實用代碼片段。

對話優化

經過20小時的對話數據微調，適合聊天機器人等應用。

開源許可

採用MIT許可證，允許自由使用和修改。

模型能力

創意寫作

文本生成

代碼生成

對話交互

數學問題解答

使用案例

創意寫作

短篇小說生成

生成引人入勝的短篇故事或敘事提示。

編程輔助

代碼片段生成

為各種編程任務生成實用的代碼片段。

對話式AI

聊天機器人

為聊天機器人或助手提供自然對話能力。

教育工具

數學問題解答

輔助解決數學問題或逐步解釋概念。

🚀 Arsh-llm：一個擁有5000萬參數的緊湊強大模型

Arsh-llm 是一個基於Llama架構的5000萬參數語言模型，旨在出色地生成富有創意的故事、連貫的文本和實用的代碼。該模型在T4 GPU上使用精心挑選的小型但強大的數據集進行了35小時的預訓練，並在對話數據上進行了20小時的微調。它就像一臺精簡高效的文本生成機器，潛力巨大。其訓練損失在 1.2 - 1.9 之間，已經展現出了良好的前景，並且隨著更多的訓練有望進一步提升性能。繫好安全帶，這僅僅是個開始！

📚 模型概述

屬性	詳情
架構	基於Llama的因果語言模型
參數數量	5000萬
上下文長度	128個令牌
預訓練時長	在NVIDIA T4 GPU上約35小時
微調時長	在對話數據集上約20小時
訓練損失	1.2 - 1.9（有提升空間！）
庫	Transformers（Hugging Face）
許可證	MIT

📦 數據集

Arsh-llm在多種不同的數據集上進行了訓練，以確保在故事講述、文本生成和代碼相關任務中具有通用性：

roneneldan/TinyStories：用於敘事生成的簡短創意故事。
Salesforce/wikitext：基於維基百科的文本，用於獲取常識和保證文本連貫性。
abhinand/alpaca-gpt4-sharegpt：基於指令的對話數據，用於面向任務的回覆。
shibing624/sharegpt_gpt4：高質量的對話數據，用於類聊天交互。
ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions：帶有逐步解決方案的數學問題，用於提升邏輯推理能力。

微調是在結構化的ShareGPT聊天模板上進行的，以增強對話能力，使Arsh-llm成為基於對話的應用程序的理想起點。

🎯 使用場景

Arsh-llm是一個多功能模型，適用於以下場景：

創意寫作：生成引人入勝的短篇小說或敘事提示。
代碼生成：為各種編程任務生成實用的代碼片段。
對話式AI：為聊天機器人或助手提供自然對話能力。
教育工具：輔助解決數學問題或逐步解釋概念。

⚠️ 重要提示

該模型仍在開發中。為了獲得生產級別的性能，建議在更大的數據集上進行進一步的預訓練，並在對話數據上進行後訓練。

🚀 快速開始

要使用Arsh-llm，你可以直接從Hugging Face加載它：

基礎用法

import torch
from transformers import pipeline, set_seed

# Set up the text-generation pipeline
model_name = "arshiaafshani/Arsh-llm"
chatbot = pipeline(
    "text-generation",
    model=model_name,
    device=0 if torch.cuda.is_available() else -1
)

# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"

# Set seed for reproducibility (optional)
set_seed(42)

print("Arsh llm is ready! Type 'exit' to end the conversation.")

# Initialize the conversation history
conversation_history = []

conversation_history.append({"role": "system", "content": "You are a helpful assistant."})

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        print("Exited from the chat. Bye!")
        break

    # Append user message to the conversation history
    conversation_history.append({"role": "user", "content": user_input})

    # Prepare the messages with the conversation history and an empty assistant turn
    messages = conversation_history + [{"role": "assistant", "content": ""}]

    # Use the tokenizer's apply_chat_template() method to format the prompt.
    prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

    # Generate text using the formatted prompt.
    response = chatbot(
        prompt,
        do_sample=True,
        max_new_tokens=512,
        top_k=50,
        temperature=0.6,
        num_return_sequences=1,
        repetition_penalty=1.1,
        pad_token_id=chatbot.tokenizer.eos_token_id,
        min_new_tokens=20
    )

    # The returned 'generated_text' includes the prompt plus the generation.
    full_text = response[0]["generated_text"]
    # Extract the assistant's response by removing the prompt portion.
    bot_response = full_text[len(prompt):].strip()
    print(f"Bot: {bot_response}")