🚀 Llama-Krikri-8B-Instruct: An Instruction-tuned Large Language Model for the Greek language
Llama-Krikri-8B-Instruct is an instruction-tuned large language model based on Llama-3.1-8B, extending its capabilities for Greek. It can handle various tasks in Greek and English, such as chat, translation, and text generation.
🚀 Quick Start
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("ilsp/Llama-Krikri-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("ilsp/Llama-Krikri-8B-Instruct")
model.to(device)
system_prompt = "Είσαι το Κρικρί, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για τα ελληνικα και εκπαιδεύτηκες από το ΙΕΛ του Ε.Κ. \"Αθηνά\"."
user_prompt = "Σε τι διαφέρει ένα κρικρί από ένα λάμα;"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)
print(tokenizer.batch_decode(outputs)[0])
With OpenAI compatible server via vLLM
vllm serve ilsp/Llama-Krikri-8B-Instruct \
--enforce-eager \
--dtype 'bfloat16' \
--api-key token-abc123
Then, the model can be used through Python using:
from openai import OpenAI
api_key = "token-abc123"
base_url = "http://localhost:8000/v1"
client = OpenAI(
api_key=api_key,
base_url=base_url,
)
system_prompt = "Είσαι ένα ανεπτυγμένο μεταφραστικό σύστημα που απαντάει με λίστες Python. Δεν γράφεις τίποτα άλλο στις απαντήσεις σου πέρα από τις μεταφρασμένες λίστες."
user_prompt = "Δώσε μου την παρακάτω λίστα με μεταφρασμένο κάθε string της στα ελληνικά: ['Ethics of duty', 'Postmodern ethics', 'Consequentialist ethics', 'Utilitarian ethics', 'Deontological ethics', 'Virtue ethics', 'Relativist ethics']"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
response = client.chat.completions.create(model="ilsp/Llama-Krikri-8B-Instruct",
messages=messages,
temperature=0.0,
top_p=0.95,
max_tokens=8192,
stream=False)
print(response.choices[0].message.content)
✨ Features
- Base Model Features:
- Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens.
- 128k context length (approximately 80,000 Greek words).
- Extended pretraining on a large corpus for Greek language proficiency, including 56.7 billion monolingual Greek tokens, 21 billion monolingual English tokens, 5.5 billion Greek - English parallel data tokens, and 7.8 billion math and code tokens. The total corpus was upsampled to 110 billion tokens.
- Instruct Model Features:
- Enhanced chat capabilities and instruction - following in both Greek and English.
- Document translation between Greek and multiple languages (French, German, Italian, Portuguese, Spanish).
- Great performance on generation, comprehension, and editing tasks.
- Domain - specific expertise for legal, financial, medical, and scientific applications.
- Retrieval - Augmented Generation (RAG) with 128k context length.
- Improved coding and agentic capabilities.
- Conversion or structured extraction in data - to - text & text - to - data settings.
- Analytical thinking and Chain - of - Thought (CoT) reasoning.
📦 Installation
No specific installation steps are provided in the original README.
📚 Documentation
Model Information
Base Model
- The Llama-3.1 tokenizer is extended with Greek tokens.
- It has a 128k context length, equivalent to about 80,000 Greek words.
- Pretraining is extended using a large corpus:
- 56.7 billion monolingual Greek tokens from public resources.
- 21 billion monolingual English tokens and 5.5 billion Greek - English parallel data tokens to ensure bilingual capabilities.
- 7.8 billion math and code tokens.
The corpus composition is as follows:
Property |
Details |
Greek |
56.7 B tokens (62.3%) |
English |
21.0 B tokens (23.1%) |
Parallel |
5.5 B tokens (6.0%) |
Math/Code |
7.8 B tokens (8.6%) |
Total |
91 B tokens (100%) |
Chosen subsets of the 91 billion corpus were upsampled to 110 billion tokens.
Instruct Model
Llama-Krikri-8B-Instruct is post - trained on Llama-Kriki-8B-Base and has the features mentioned above.
Post - training Methodology
- 2 - stage Supervised Fine - Tuning with Greek & English instruction - response pairs and multi - turn conversations:
- Stage 1: 856,946 instruction - response pairs (371,379 Greek + 485,567 English).
- Stage 2: 638,408 instruction - response pairs (279,948 Greek + 358,460 English).
- Alignment with Greek & English preference triplets:
- Length Normalized DPO: 92,394 preference triplets (47,132 Greek + 45,262 English).
Post - training Data Construction
- Collect existing high - quality datasets like Tulu 3, SmolTalk, etc.
- Translate data into Greek using an in - house tool.
- Regenerate translated data and create preference triplets.
- Distill models like Gemma 2 27B IT.
- Score data with Skywork Reward Gemma 2 27B v0.2 and filter using rules.
- Create data for translation using parallel corpora from [ELRC - SHARE](https://elrc - share.eu/).
- Synthetically extract question - answer pairs and dialogues from various sources.
Evaluation
- Chat Evaluation Suite:
- Evaluated on Greek IFEval, English IFEval, [Greek MT - Bench](https://huggingface.co/datasets/ilsp/mt - bench - greek), and English MT - Bench using gpt - 4o - 2024 - 08 - 06 as the judge model.
- Llama - Krikri - 8B Instruct outperforms Llama - 3.1 - 8B Instruct by +21.7% and +7.3% on Greek and English IFEval respectively. It also shows strong performance in MT - Bench benchmarks.
Model |
IFEval EL (strict avg) |
IFEval EN (strict avg) |
MT - Bench EL |
MT - Bench EN |
Qwen 2.5 7B Instruct |
46.2% |
74.8% |
5.83 |
7.87 |
EuroLLM 9B Instruct |
51.3% |
64.5% |
5.98 |
6.27 |
Aya Expanse 8B |
50.4% |
62.2% |
7.68 |
6.92 |
Meltemi 7B v1.5 Instruct |
32.7% |
41.2% |
6.25 |
5.46 |
Llama - 3.1 - 8B Instruct |
45.8% |
75.1% |
6.46 |
7.25 |
Llama - Krikri - 8B Instruct |
67.5% |
82.4% |
7.96 |
7.21 |
- Arena - Hard - Auto Evaluation:
- Used [Arena - Hard - Auto](https://huggingface.co/datasets/lmarena - ai/arena - hard - auto - v0.1) and its Greek translated version.
- Two scores are reported: No Style Control and With Style Control.
- Llama - Krikri - 8B Instruct scores higher than models over 8 times its size and is competitive with closed - source and high - performant open - source models.
🔧 Technical Details
The post - training process involves multi - stage Supervised Fine - Tuning and alignment with preference triplets. The data construction uses a variety of methods including collecting existing datasets, translation, distillation, and synthetic data extraction.
📄 License
The license for this model is llama3.1.
⚠️ Important Note
PLEASE USE THE OFFICIAL QUANTIZED VERSIONS. There is no guarantee that you are using the latest improved versions from 3rd party quantizations as we have updated the model's weights.
⚠️ Important Note
More information on post - training, methodology, and evaluation coming soon.