đ Dolphin 2.9.1 Yi 1.5 34b đŦ
Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations. This project presents a remarkable AI model, offering high performance and diverse capabilities.
đ Quick Start
This model is based on Yi-1.5-34b, governed by the apache 2.0 license. Although the base model has a max positional embeddings of 4k, we used a rope theta of 1000000.0 and trained with a sequence length of 8k. We also plan to train on the upcoming 32k version.
⨠Features
- High Performance: Achieved 77.4 on MMLU with the 34b model. It demonstrates excellent performance in various tasks.
- Long Sequence Support: Despite the base model's 4k limit, we trained with an 8k sequence length, enhancing its ability to handle longer texts.
- Diverse Skills: Dolphin-2.9.1 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
The model uses the ChatML prompt template format. Here is an example:
<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Advanced Usage
The advanced usage of this model can be further explored based on its diverse capabilities, such as handling complex instructions and performing function calls.
đ Documentation
- Model Information:
Property |
Details |
Model Type |
Fine - tuned version of [01 - ai/Yi - 1.5 - 34B](https://huggingface.co/01 - ai/Yi - 1.5 - 34B) |
Training Data |
cognitivecomputations/Dolphin - 2.9, teknium/OpenHermes - 2.5, m - a - p/CodeFeedback - Filtered - Instruction, cognitivecomputations/dolphin - coder, cognitivecomputations/samantha - data, microsoft/orca - math - word - problems - 200k, Locutusque/function - calling - chatml, internlm/Agent - FLAN |
- Training Hyperparameters:
- learning_rate: 1e - 05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi - GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
- Training Results:
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.6265 | 0.0 | 1 | 0.6035 |
| 0.4674 | 0.25 | 327 | 0.4344 |
| 0.4337 | 0.5 | 654 | 0.4250 |
| 0.4346 | 0.75 | 981 | 0.4179 |
| 0.3985 | 1.0 | 1308 | 0.4118 |
| 0.3128 | 1.23 | 1635 | 0.4201 |
| 0.3261 | 1.48 | 1962 | 0.4157 |
| 0.3259 | 1.73 | 2289 | 0.4122 |
| 0.3126 | 1.98 | 2616 | 0.4079 |
| 0.2265 | 2.21 | 2943 | 0.4441 |
| 0.2297 | 2.46 | 3270 | 0.4427 |
| 0.2424 | 2.71 | 3597 | 0.4425 |
- Framework Versions:
- Transformers 4.40.0.dev0
- Pytorch 2.2.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0
đ§ Technical Details
The training process used the Axolotl framework. Here is the axolotl config:
base_model: 01-ai/Yi-1.5-34B
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: true
datasets:
- path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
type: sharegpt
conversation: chatml
chat_template: chatml
dataset_prepared_path: yi34b
val_set_size: 0.01
output_dir: ./out-yi
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
wandb_project: dolphin-2.9-yi-34b
wandb_watch:
wandb_run_id:
wandb_log_model:
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 4
save_total_limit: 2
save_steps:
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.05
fsdp:
fsdp_config:
special_tokens:
bos_token: "<|startoftext|>"
eos_token: "<|im_end|>"
pad_token: "<unk>"
unk_token: "<unk>"
tokens:
- "<|im_start|>"
đ License
This model is governed by the apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
Additional Information
- Sponsors:
- Crusoe Cloud - provided excellent on - demand 8xH100 node
- [OnDemand](https://on - demand.io/) - provided inference sponsorship
- Discord: Join our Discord
- Evals:
â ī¸ Important Note
The model is uncensored. We have filtered the dataset to remove alignment and bias, making the model more compliant. However, you are advised to implement your own alignment layer before exposing the model as a service, as it will be highly compliant with any requests, even unethical ones. Please read [this blog post](https://erichartford.com/uncensored - models) about uncensored models. You are responsible for any content you create using this model. Enjoy responsibly.