seed-coder-triton-8b-v1開源大語言模型 - 支持長序列輸入與高效訓練

首頁

Seed Coder Triton 8b V1

由winglian開發

基於ByteDance-Seed/Seed-Coder-8B-Base模型在特定數據集上微調的大語言模型，支持長序列輸入和高效訓練策略。

大型語言模型

Transformers

開源協議:MIT #長序列推理 #代碼生成優化 #高效微調

下載量 1,388

發布時間 : 5/13/2025

模型概述

該模型是在axolotl-ai-internal/gpumode-py2triton-reasoning-v2數據集上對Seed-Coder-8B-Base進行微調的成果，適用於特定領域的任務需求。

模型特點

長序列支持

支持長達16384的序列輸入，適合處理長文本或複雜代碼

高效訓練策略

採用樣本打包和填充策略，結合多種優化插件，提高訓練效率

優化架構

使用LigerPlugin等優化技術改進模型架構，提升性能

模型能力

代碼生成

邏輯推理

長文本處理

使用案例

代碼相關

代碼生成

根據需求生成特定功能的代碼

在評估集上損失值為0.2177

代碼推理

理解和分析現有代碼邏輯

🚀 Transformers模型項目

本項目基於transformers庫，對模型進行微調訓練。該模型是在特定數據集上對基礎模型進行微調的成果，解決了特定領域的任務需求，為相關領域的應用提供了有力支持。

🚀 快速開始

本模型是 ByteDance-Seed/Seed-Coder-8B-Base 在 axolotl-ai-internal/gpumode-py2triton-reasoning-v2 數據集上的微調版本。在評估集上取得了以下結果：

損失值：0.2177

✨ 主要特性

基於強大的基礎模型ByteDance-Seed/Seed-Coder-8B-Base進行微調。
採用了多種插件和優化策略，如LigerPlugin、CutCrossEntropyPlugin等。
支持長序列輸入，序列長度可達16384。
運用了樣本打包和填充策略，提高訓練效率。

📦 安裝指南

文檔未提供具體安裝步驟，暫不展示。

💻 使用示例

文檔未提供代碼示例，暫不展示。

📚 詳細文檔

Axolotl配置詳情

查看Axolotl配置

Axolotl版本：0.10.0.dev0

base_model: ByteDance-Seed/Seed-Coder-8B-Base

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true

chat_template: llama3
datasets:
  - path: axolotl-ai-internal/gpumode-py2triton-reasoning-v2
    type: chat_template
    split: train

dataset_prepared_path: last_run_prepared
val_set_size: 0.005
output_dir: ./outputs/out

sequence_len: 16384
sample_packing: true
pad_to_sequence_len: true

wandb_project: seed-coder-8b-grpo-triton
wandb_entity: axolotl-ai
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch_fused
max_grad_norm: 0.1
neftune_noise_alpha: 10
lr_scheduler: cosine
learning_rate: 1e-6
lr_groups:
  - name: embeddings
    modules:
      - embed_tokens
      - lm_head
    lr: 0.00003  # scalu up LR for embeddings as these are unused tokens

bf16: true
tf32: true

gradient_checkpointing: offload
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: true

warmup_steps: 100
evals_per_epoch: 5
saves_per_epoch: 1
weight_decay: 0.01
deepspeed: deepspeed_configs/zero1.json
special_tokens:
  eos_token: <|eot_id|>
added_tokens_overrides:
  7: <|start_header_id|>
  8: <|end_header_id|>
  9: <|eot_id|>
  10: <think>
  11: </think>
fix_untrained_tokens: [7, 8, 9, 10, 11]

訓練超參數

訓練過程中使用了以下超參數：

學習率：1e-06
訓練批次大小：2
評估批次大小：2
隨機種子：42
分佈式類型：多GPU
設備數量：10
總訓練批次大小：20
總評估批次大小：20
優化器：使用OptimizerNames.ADAMW_TORCH_FUSED，betas=(0.9,0.999)，epsilon=1e-08，無額外優化器參數
學習率調度器類型：餘弦
學習率調度器熱身步數：100
訓練輪數：3.0

訓練結果

訓練損失	輪數	步數	驗證損失
0.5293	0.0046	1	5.7151
0.4449	0.2018	44	0.4878
0.425	0.4037	88	0.4319
0.3437	0.6055	132	0.3322
0.2903	0.8073	176	0.2893
0.2528	1.0092	220	0.2677
0.2578	1.2110	264	0.2531
0.2522	1.4128	308	0.2414
0.2403	1.6147	352	0.2352
0.232	1.8165	396	0.2252
0.2093	2.0183	440	0.2360
0.2406	2.2202	484	0.2311
0.2523	2.4220	528	0.2260
0.2139	2.6239	572	0.2259
0.2296	2.8257	616	0.2177