The open-source large language model seed-coder-triton-8b-v1 - Supports long sequence input and efficient training

Seed Coder Triton 8b V1

Developed by winglian

A large language model fine-tuned on a specific dataset based on the ByteDance-Seed/Seed-Coder-8B-Base model, supporting long sequence input and efficient training strategies.

Large Language Model

Transformers

Open Source License:MIT #Long sequence reasoning #Code generation optimization #Efficient fine-tuning

Downloads 1,388

Release Time : 5/13/2025

Model Overview

This model is the result of fine-tuning Seed-Coder-8B-Base on the axolotl-ai-internal/gpumode-py2triton-reasoning-v2 dataset, suitable for specific domain task requirements.

Model Features

Long sequence support

Supports sequence input up to 16384, suitable for processing long texts or complex code

Efficient training strategy

Adopts sample packing and padding strategies, combined with various optimization plugins, to improve training efficiency

Optimized architecture

Uses optimization techniques such as LigerPlugin to improve the model architecture and enhance performance

Model Capabilities

Code generation

Logical reasoning

Long text processing

Use Cases

Code-related

Code generation

Generate code with specific functions according to requirements

The loss value on the evaluation set is 0.2177

Code reasoning

Understand and analyze the logic of existing code

🚀 Transformer Model

This project is based on the transformers library. It fine - tunes the base model ByteDance - Seed/Seed - Coder - 8B - Base on the axolotl - ai - internal/gpumode - py2triton - reasoning - v2 dataset, offering valuable applications in relevant fields.

🚀 Quick Start

See axolotl config

The axolotl version used is 0.10.0.dev0.

base_model: ByteDance-Seed/Seed-Coder-8B-Base

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true

chat_template: llama3
datasets:
  - path: axolotl-ai-internal/gpumode-py2triton-reasoning-v2
    type: chat_template
    split: train

dataset_prepared_path: last_run_prepared
val_set_size: 0.005
output_dir: ./outputs/out

sequence_len: 16384
sample_packing: true
pad_to_sequence_len: true

wandb_project: seed-coder-8b-grpo-triton
wandb_entity: axolotl-ai
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch_fused
max_grad_norm: 0.1
neftune_noise_alpha: 10
lr_scheduler: cosine
learning_rate: 1e-6
lr_groups:
  - name: embeddings
    modules:
      - embed_tokens
      - lm_head
    lr: 0.00003  # scalu up LR for embeddings as these are unused tokens

bf16: true
tf32: true

gradient_checkpointing: offload
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: true

warmup_steps: 100
evals_per_epoch: 5
saves_per_epoch: 1
weight_decay: 0.01
deepspeed: deepspeed_configs/zero1.json
special_tokens:
  eos_token: <|eot_id|>
added_tokens_overrides:
  7: <|start_header_id|>
  8: <|end_header_id|>
  9: <|eot_id|>
  10: <think>
  11: </think>
fix_untrained_tokens: [7, 8, 9, 10, 11]

✨ Features

This model is a fine - tuned version of [ByteDance - Seed/Seed - Coder - 8B - Base](https://huggingface.co/ByteDance - Seed/Seed - Coder - 8B - Base) on the axolotl - ai - internal/gpumode - py2triton - reasoning - v2 dataset. It achieves a loss of 0.2177 on the evaluation set.

📚 Documentation

Model Information

Property	Details
Library Name	transformers
License	MIT
Base Model	ByteDance - Seed/Seed - Coder - 8B - Base
Datasets	axolotl - ai - internal/gpumode - py2triton - reasoning - v2

Training and Evaluation

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 06
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi - GPU
num_devices: 10
total_train_batch_size: 20
total_eval_batch_size: 20
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 3.0

Training Results

Training Loss	Epoch	Step	Validation Loss
0.5293	0.0046	1	5.7151
0.4449	0.2018	44	0.4878
0.425	0.4037	88	0.4319
0.3437	0.6055	132	0.3322
0.2903	0.8073	176	0.2893
0.2528	1.0092	220	0.2677
0.2578	1.2110	264	0.2531
0.2522	1.4128	308	0.2414
0.2403	1.6147	352	0.2352
0.232	1.8165	396	0.2252
0.2093	2.0183	440	0.2360
0.2406	2.2202	484	0.2311
0.2523	2.4220	528	0.2260
0.2139	2.6239	572	0.2259
0.2296	2.8257	616	0.2177

Framework Versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.1
Tokenizers 0.21.1

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご