đ Atla Selene Mini
Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ) that achieves comparable performance to models 10x its size and outperforms GPT - 4o on multiple benchmarks.
đ Quick Start
Installation
You can install the necessary libraries and load the model as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_id = "AtlaAI/Selene-1-Mini-Llama-3.1-8B"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
Usage
prompt = "I heard you can evaluate my responses?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
⨠Features
- High - performance: Atla Selene Mini achieves comparable performance to models 10x its size and outperforms GPT - 4o on RewardBench, EvalBiasBench, and AutoJ.
- Multilingual support: Supports English, German, French, Italian, Portuguese, Hindi, Spanish, Thai.
- Versatile evaluation: Can be used as a general - purpose evaluation model, supporting different inputs & scoring scales, generating structured evaluation outputs, and providing qualitative critiques with reasoning.
- Long context length: Has a context length of 128K.
đĻ Model Details
Property |
Details |
Developed by |
Atla |
Model Type |
Post - trained from Llama - 3.1 - 8B |
Language(s) (NLP) |
Primarily English but supports German, French, Italian, Portuguese, Hindi, Spanish, Thai |
Context length |
128K |
đģ Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_id = "AtlaAI/Selene-1-Mini-Llama-3.1-8B"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "I heard you can evaluate my responses?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Advanced Usage
To achieve best results, you can use the prompts we used for training here. And remember to apply the conversation template of Llama 3.
đ Documentation
- Absolute scoring: Try our cookbook to get started.
- RAG hallucination: Check out our cookbook for this use case.
đ License
This model is licensed under the apache - 2.0 license.
đ Contact
đ Citation
@misc{alexandru2025atlaseleneminigeneral,
title={Atla Selene Mini: A General Purpose Evaluation Model},
author={Andrei Alexandru and Antonia Calvi and Henry Broomfield and Jackson Golden and Kyle Dai and Mathias Leys and Maurice Burger and Max Bartolo and Roman Engeler and Sashank Pisupati and Toby Drane and Young Sun Park},
year={2025},
eprint={2501.17195},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.17195},
}
â ī¸ Important Note
Remember to apply the conversation template of Llama 3 - not doing so might lead to unexpected behaviors.
đĄ Usage Tip
To achieve best results, use the prompts we used for training here.