CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity Open Source Model

Caplattessdolxaboros Yi 34B 200K DARE Ties HighDensity

Developed by brucethemoose

This is a high-density merged model based on the Yi-34B-200K foundation model, integrating multiple homologous models through the DARE Ties method, featuring 200K long-context processing capability.

Large Language Model

Transformers

EnglishOpen Source License:Other #200K long context #Multi-model fusion #High-density merging

Downloads 94

Release Time : 12/9/2023

Model Overview

The model merges multiple homologous models such as Dolphin-2.2-yi-34b-200k, Nous-Capybara-34B, and Tess-M-v1.4 using mergekit's DARE Ties method, retaining Yi-34B-200K's long-context capabilities while excelling in various benchmark tests.

Model Features

Long-context processing

Supports 200K tokens of long-context processing, suitable for handling lengthy documents and complex reasoning tasks.

High-density merging

Uses the DARE Ties method to merge multiple homologous models at a higher-than-recommended density, enhancing model performance.

Multi-model advantage fusion

Integrates the strengths of multiple models like Dolphin, Capybara, and Tess, providing diverse capabilities.

Efficient inference

Runs on 24GB GPUs and supports 45K-75K context lengths on exllamav2.

Model Capabilities

Text generation

Long-text understanding

Complex reasoning

Q&A systems

Knowledge-based Q&A

Use Cases

Knowledge-based Q&A

AI2 Reasoning Challenge

Performance on few-shot samples from the AI2 Reasoning Challenge (ARC)

Normalized accuracy 67.41

Commonsense reasoning

HellaSwag test

Commonsense reasoning capability on the HellaSwag dataset

Normalized accuracy 85.77

Mathematical reasoning

GSM8k math problems

Ability to solve elementary school math word problems

Accuracy 61.33

🚀 CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity

This is a text - generation model created by merging multiple models using a new experimental implementation of "dare ties". It shows good performance on various text - generation tasks.

🚀 Quick Start

This model is a text - generation model. You can use it for tasks such as text generation. Here are some key points to note when using it:

Prompt Template

SYSTEM: {system_message}
USER: {prompt}
ASSISTANT:

It might recognize ChatML from Dolphin+Xaberius, and Llama - chat from Airoboros. Sometimes the model "spells out" the stop token as </s> like Capybara, so you may need to add </s> as an additional stopping condition.

Running the Model

Being a Yi model, try disabling the BOS token and/or running a lower temperature with 0.05 - 0.13 MinP, a little repitition penalty, and no other samplers. Yi tends to run "hot" by default.

24GB GPUs can run Yi - 34B - 200K models at 45K - 75K context with exllamav2. More details can be found in this post.

It is recommended to use exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw! Some quantizations are published here: 4bpw 3.1bpw.

To load this in full - context backends like transformers and vllm, you must change max_position_embeddings in config.json to a lower value than 200,000, otherwise you will OOM!

✨ Features

Model Merging: This model is created by merging Dolphin - 2.2 - yi - 34b - 200k, Nous - Capybara - 34B, Tess - M - v1.4, Airoboros - 3_1 - yi - 34b - 200k, PlatYi - 34B - 200K - Q, and Una - xaberius - 34b - v1beta with a new, experimental implementation of "dare ties" via mergekit.
Good Performance: It shows good performance on various text - generation tasks, as shown in the Open LLM Leaderboard results.

🔧 Technical Details

Model Merging Configuration

This variant is merged with a "higher than recommended" density with the following config, and the tokenizer from chargoddard's Yi - Llama:

models:
  - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
    # no parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
    parameters:
      weight: 0.19
      density: 0.6
  - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
    parameters:
      weight: 0.14
      density: 0.5
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.19
      density: 0.6
  - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200K-Q
    parameters:
      weight: 0.14
      density: 0.5
  - model: /home/alpha/FastModels/ehartford_dolphin-2.2-yi-34b-200k
    parameters:
      weight: 0.19
      density: 0.6
  - model: /home/alpha/FastModels/fblgit_una-xaberius-34b-v1beta
    parameters:
      weight: 0.15
      density: 0.08
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
parameters:
  int8_mask: true
dtype: bfloat16

Testing Notes

Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
This particular version is merged with more than the "recommended" max density of 0.5. It seems to result in even better perplexity, and a much higher position on the hf leaderboard, but it's not sure if this translates to better output.
Weights that add up to 1 seems to be optimal.
Dare Ties is also resulting in seemingly better, lower perplexity merges than a regular ties merge, task arithmetic or a slerp merge.
Xaberuis is not a 200K model, hence it was merged at a very low density to try and preserve Yi 200K's long context performance while still inheriting some of Xaberius's performance.
Other finetunes were not included because they aren't trained on the 200K base.

📄 License

The model is under the yi - license.

📚 Documentation

Credits

https://github.com/cg123/mergekit/tree/dare
https://huggingface.co/ehartford/dolphin-2.2-yi-34b-200k
https://huggingface.co/kyujinpy/PlatYi-34B-200K-Q
https://huggingface.co/NousResearch/Nous-Capybara-34B/
https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
https://huggingface.co/migtissera/Tess-M-v1.4
https://huggingface.co/fblgit/una-xaberius-34b-v1beta
https://huggingface.co/chargoddard/Yi-34B-200K-Llama
https://huggingface.co/01-ai/Yi-34B-200K

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	72.15
AI2 Reasoning Challenge (25 - Shot)	67.41
HellaSwag (10 - Shot)	85.77
MMLU (5 - Shot)	77.44
TruthfulQA (0 - shot)	57.84
Winogrande (5 - shot)	83.11
GSM8k (5 - shot)	61.33

Possibly obsolete

This model might be replaced by https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5. Old model description is provided above.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご