🚀 Laser-Dolphin-Mixtral-2x7b-dpo
A medium-sized MoE implementation for text generation, based on a pre - trained model, with improved evaluation performance.
🚀 Quick Start
Using Ollama
ollama run macadeliccc/laser-dolphin-mixtral-2x7b-dpo
Using Python
from transformers import AutoModelForCausalLM, AutoTokenizer
def generate_response(prompt):
"""
Generate a response from the model based on the input prompt.
Args:
prompt (str): Prompt for the model.
Returns:
str: The generated response from the model.
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
prompt = "Write a quicksort algorithm in python"
print("Response:")
print(generate_response(prompt), "\n")
You can also check the colab for usage examples.
✨ Features
- Improved Performance: The new version shows ~1 point increase in evaluation performance on average.
- Multiple Quantizations: Available in ExLlamav2, GGUF, AWQ quantizations.
📦 Installation
Not provided in the original document.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
def generate_response(prompt):
"""
Generate a response from the model based on the input prompt.
Args:
prompt (str): Prompt for the model.
Returns:
str: The generated response from the model.
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
prompt = "Write a quicksort algorithm in python"
print("Response:")
print(generate_response(prompt), "\n")
Advanced Usage
Switch the commented model definition to use in 4 - bit. Should work with 9GB and still exceed the single 7B model by 5 - 6 points roughly. The code is the same as the basic usage in this case, but you can adjust the parameters according to your needs.
📚 Documentation
Overview
This model is a medium - sized MoE implementation based on cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser. The new version shows ~1 point increase in evaluation performance on average.
Process
- The process is outlined in this notebook.
- The mergekit_config is in the files.
- The models used in the configuration are not lasered, but the final product is. This is an update from the last version.
- This process is experimental. Your mileage may vary.
Future Goals
- [ ] Function Calling
- [ ] v2 with new base model to improve performance
Quantizations
ExLlamav2
These are the recommended quantizations for users that are running the model on GPU. Thanks to user bartowski we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:
Branch |
Bits |
lm_head bits |
VRAM (4k) |
VRAM (16k) |
VRAM (32k) |
Description |
8_0 |
8.0 |
8.0 |
13.7 GB |
15.1 GB |
17.2 GB |
Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
6_5 |
6.5 |
8.0 |
11.5 GB |
12.9 GB |
15.0 GB |
Near unquantized performance at vastly reduced size, recommended. |
5_0 |
5.0 |
6.0 |
9.3 GB |
10.7 GB |
12.8 GB |
Slightly lower quality vs 6.5, great for 12gb cards with 16k context. |
4_25 |
4.25 |
6.0 |
8.2 GB |
9.6 GB |
11.7 GB |
GPTQ equivalent bits per weight. |
3_5 |
3.5 |
6.0 |
7.0 GB |
8.4 GB |
10.5 GB |
Lower quality, not recommended. |
His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
GGUF
Current GGUF Quantizations
AWQ
Current AWQ Quantizations
TheBloke
These Quants will result in unpredicted behavior. New quants are available as I have updated the model. Quatizations provided by TheBloke
HF Spaces
- GGUF chat available here
- 4 - bit bnb chat available here
Eval
EQ Bench
----Benchmark Complete----
2024-01-31 16:55:37
Time taken: 31.1 mins
Prompt Format: ChatML
Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF
Score (v2): 72.76
Parseable: 171.0
---------------
Batch completed
Time taken: 31.2 mins
---------------
You can check the evaluation colab
Summary of previous evaluation
Detailed current evaluation
AGIEval
Task |
Version |
Metric |
Value |
|
Stderr |
agieval_aqua_rat |
0 |
acc |
21.26 |
± |
2.57 |
|
|
acc_norm |
21.65 |
± |
2.59 |
agieval_logiqa_en |
0 |
acc |
34.72 |
± |
1.87 |
|
|
acc_norm |
35.64 |
± |
1.88 |
agieval_lsat_ar |
0 |
acc |
26.96 |
± |
2.93 |
|
|
acc_norm |
26.96 |
± |
2.93 |
agieval_lsat_lr |
0 |
acc |
45.88 |
± |
2.21 |
|
|
acc_norm |
46.08 |
± |
2.21 |
agieval_lsat_rc |
0 |
acc |
59.48 |
± |
3.00 |
|
|
acc_norm |
59.48 |
± |
3.00 |
agieval_sat_en |
0 |
acc |
73.79 |
± |
3.07 |
|
|
acc_norm |
73.79 |
± |
3.07 |
agieval_sat_en_without_passage |
0 |
acc |
42.23 |
± |
3.45 |
|
|
acc_norm |
41.26 |
± |
3.44 |
agieval_sat_math |
0 |
acc |
37.27 |
± |
3.27 |
|
|
acc_norm |
33.18 |
± |
3.18 |
Average: 42.25%
GPT4All
Task |
Version |
Metric |
Value |
|
Stderr |
arc_challenge |
0 |
acc |
58.36 |
± |
1.44 |
|
|
acc_norm |
58.02 |
± |
1.44 |
arc_easy |
0 |
acc |
82.20 |
± |
0.78 |
|
|
acc_norm |
77.40 |
± |
0.86 |
boolq |
1 |
acc |
87.52 |
± |
0.58 |
hellaswag |
0 |
acc |
67.50 |
± |
0.47 |
|
|
acc_norm |
84.43 |
± |
0.36 |
openbookqa |
0 |
acc |
34.40 |
± |
2.13 |
|
|
acc_norm |
47.00 |
± |
2.23 |
piqa |
0 |
acc |
81.61 |
± |
0.90 |
|
|
acc_norm |
82.59 |
± |
0.88 |
winogrande |
0 |
acc |
77.19 |
± |
1.18 |
Average: 73.45%
GSM8K
Task |
Version |
Metric |
Value |
|
Stderr |
gsm8k |
2 |
exact_match,get-answer |
0.75 |
|
|
|
|
exact_match_stderr,get-answer |
0.01 |
|
|
|
|
alias |
gsm8k |
|
|
TruthfulQA
Task |
Version |
Metric |
Value |
|
Stderr |
truthfulqa_mc |
1 |
mc1 |
45.90 |
± |
1.74 |
|
|
mc2 |
63.44 |
± |
1.56 |
Average: 63.44%
Bigbench
Task |
Version |
Metric |
Value |
|
Stderr |
bigbench_causal_judgement |
0 |
multiple_choice_grade |
58.42 |
± |
3.59 |
bigbench_date_understanding |
0 |
multiple_choice_grade |
60.70 |
± |
2.55 |
bigbench_disambiguation_qa |
0 |
multiple_choice_grade |
38.37 |
± |
3.03 |
bigbench_geometric_shapes |
0 |
multiple_choice_grade |
21.73 |
± |
2.18 |
|
|
exact_str_match |
0.00 |
± |
0.00 |
bigbench_logical_deduction_five_objects |
0 |
multiple_choice_grade |
35.00 |
± |
2.14 |
bigbench_logical_deduction_seven_objects |
0 |
multiple_choice_grade |
23.57 |
± |
1.61 |
bigbench_logical_deduction_three_objects |
0 |
multiple_choice_grade |
50.33 |
± |
2.89 |
bigbench_movie_recommendation |
0 |
multiple_choice_grade |
45.00 |
± |
2.23 |
bigbench_navigate |
0 |
multiple_choice_grade |
50.00 |
± |
1.58 |
bigbench_reasoning_about_colored_objects |
0 |
multiple_choice_grade |
60.35 |
± |
1.09 |
bigbench_ruin_names |
0 |
multiple_choice_grade |
51.12 |
± |
2.36 |
bigbench_salient_translation_error_detection |
0 |
multiple_choice_grade |
32.26 |
± |
1.48 |
bigbench_snarks |
0 |
multiple_choice_grade |
... |
± |
... |
🔧 Technical Details
Not provided in the original document.
📄 License
This project is licensed under the Apache - 2.0 license.