Trendyol-LLM-7B-chat-v4.1.0開源模型 - 免費支持土耳其語電商領域文本生成

首頁

Trendyol LLM 7B Chat V4.1.0

由Trendyol開發

Trendyol LLM v4.1.0 是一款基於 Trendyol LLM base v4.0（在130億token上繼續預訓練的Qwen2.5 7B版本）的生成模型，專注於電商領域和土耳其語理解。

大型語言模型

Safetensors

其他開源協議:Apache-2.0 #電商領域優化 #土耳其語增強 #函數調用支持

下載量 854

發布時間 : 3/7/2025

模型概述

Trendyol LLM v4.1.0 是一款生成模型，增強電商領域知識（如商品描述生成、屬性提取、內容摘要等）和土耳其語理解能力，支持函數調用。

模型特點

電商領域知識增強

模型在商品描述生成、屬性提取、內容摘要等電商相關任務上表現優異。

土耳其語理解能力提升

針對土耳其語進行了優化，能夠更好地理解和生成土耳其語內容。

支持函數調用

部分功能支持函數調用，增強了模型的交互能力和實用性。

模型能力

商品描述生成

屬性提取

內容摘要

時尚對話

商品標籤提取

類目檢測

用戶畫像解析

檢索增強生成（RAG）

使用案例

電商

商品描述生成

根據商品屬性自動生成吸引人的商品描述。

提升商品頁面的轉化率。

用戶畫像解析

基於用戶行為數據生成用戶畫像。

幫助商家更好地理解用戶需求。

多語言支持

土耳其語內容生成

生成高質量的土耳其語文本內容。

滿足土耳其語用戶的需求。

🚀 時尚前沿大語言模型Trendyol LLM v4.1.0

Trendyol LLM v4.1.0是一款基於Trendyol LLM base v4.0的生成式模型。Trendyol LLM base v4.0是在130億個標記上對Qwen2.5 7B進行持續預訓練的版本。本倉庫為聊天模型倉庫。

✨ 主要特性

電商知識增強
- 描述生成
- 屬性提取
- 摘要生成
- 時尚對話
- 產品標籤提取
- 類別檢測
- 基於行為的人物角色解讀
- 檢索增強生成（RAG）
- 等
土耳其語知識提升
函數調用支持（部分完成，後續迭代將全部完成）

📦 安裝指南

文檔未提及安裝步驟，可參考transformers庫的安裝方式來安裝依賴：

pip install transformers torch

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch


model_id = "Trendyol/Trendyol-LLM-7B-chat-v4.1.0"

pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "use_cache":True, 
        "use_flash_attention_2": True
    },
    device_map='auto',
)


sampling_params = dict(do_sample=True, temperature=0.3, top_k=50, top_p=0.9, repetition_penalty=1.1)
DEFAULT_SYSTEM_PROMPT = "Sen yardımsever bir asistansın ve sana verilen talimatlar doğrultusunda en iyi cevabı üretmeye çalışacaksın."

messages = [
    {"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
    {"role": "user", "content": "Türkiye'de kaç il var?"}
]

outputs = pipe(
    messages,
    max_new_tokens=1024,
    return_full_text=False,
    **sampling_params
)

print(outputs[0]["generated_text"])

高級用法

tools = [
    {
        "name": "get_city_count",
        "description": "Get current city count of given country.",
        "parameters": {
            "type": "object",
            "properties": {
                "country_name": {
                    "type": "string",
                    "description": 'The name of the country to get the count for.',
                },
            },
            "required": ["country_name"],
        },
    },
    {
        "name": "get_temperature_date",
        "description": "Get temperature at a location and date.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": 'The location to get the temperature for, in the format "City, State, Country".',
                },
                "date": {
                    "type": "string",
                    "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": 'The unit to return the temperature in. Defaults to "celsius".',
                },
            },
            "required": ["location", "date"],
        },
    },
]

messages = [
    {"role": "system", "content": "Sen, aşağıdaki fonksiyonlara erişimi olan yardımcı bir asistansın. Gerektiğinde bunları kullanabilirsin -"},
    {"role": "user", "content": "Türkiye'de kaç il var?"}
]

text = pipe.tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False)
inputs = pipe.tokenizer(text, return_tensors="pt").to(pipe.model.device)
outputs = pipe.model.generate(**inputs, max_new_tokens=512)
output_text = pipe.tokenizer.batch_decode(outputs)[0][len(text):]
print(output_text)
# '<function>{"name": "get_city_count", "arguments": \'{"country_name": "Turkey"}\'}</function><|im_end|>'

📚 詳細文檔

侷限性、風險、偏差和倫理考量

侷限性和已知偏差

主要功能和應用：Trendyol LLM是一種自迴歸語言模型，主要用於預測文本字符串中的下一個標記。雖然它常用於各種應用，但需要注意的是，該模型尚未經過廣泛的實際應用測試。其在不同場景下的有效性和可靠性在很大程度上仍未得到驗證。
語言理解和生成：該模型主要在標準英語和土耳其語上進行訓練。在理解和生成俚語、非正式語言或其他語言時，其性能可能會受到限制，從而導致潛在的錯誤或誤解。
虛假信息生成：用戶應注意，Trendyol LLM可能會產生不準確或誤導性的信息。其輸出應被視為起點或建議，而非確定的答案。

風險和倫理考量

有害使用的可能性：Trendyol LLM存在被用於生成冒犯性或有害語言的風險。我們強烈反對將其用於任何此類目的，並強調在部署前需要進行特定應用的安全性和公平性評估。
意外內容和偏差：該模型在大量文本數據語料庫上進行訓練，這些數據並未明確檢查是否存在冒犯性內容或現有偏差。因此，它可能會無意中產生反映這些偏差或不準確信息的內容。
毒性：儘管我們努力選擇合適的訓練數據，但該模型仍有可能生成有害內容，尤其是在明確提示的情況下。我們鼓勵開源社區制定策略以儘量減少此類風險。

安全和道德使用建議

人工監督：我們建議在公共應用中加入人工審核層或使用過濾器來管理和提高輸出質量。這種方法有助於降低意外生成令人反感內容的風險。
特定應用測試：打算使用Trendyol LLM的開發者應針對其特定應用進行全面的安全測試和優化。這一點至關重要，因為該模型的響應可能不可預測，偶爾可能存在偏差、不準確或冒犯性。
負責任的開發和部署：Trendyol LLM的開發者和用戶有責任確保其應用符合倫理和安全標準。我們敦促用戶注意該模型的侷限性，並採取適當的保障措施以防止濫用或產生有害後果。