Llama-Krikri-8B-Instruct-GGUF Open-Source Large Language Model - Optimized for Greek and Supporting Multilingual Tasks

Llama Krikri 8B Instruct GGUF

Developed by ilsp

A Greek instruction-tuned large language model based on Llama-3.1-8B, enhancing Greek language capabilities and supporting multilingual tasks

Large Language Model

Transformers

#Greek language enhancement #Multilingual translation #Long context processing

Downloads 257

Release Time : 2/14/2025

Model Overview

A Greek instruction-tuned large language model capable of Greek/English dialogue, document translation, professional domain processing, and 128k context support

Model Features

Enhanced Greek language capabilities

Extended Greek language proficiency through continued pre-training with 56.7 billion Greek tokens while retaining English capabilities

128k long context support

Supports long context processing of approximately 80,000 Greek words, suitable for document-level tasks

Multilingual translation capabilities

Supports bidirectional document translation between Greek and English/French/German/Italian/Portuguese/Spanish

Professional domain processing

Excellent performance in specialized fields such as law, finance, healthcare, and science

Model Capabilities

Greek text generation

English text generation

Multilingual document translation

Professional domain text processing

Retrieval-Augmented Generation (RAG)

Structured data conversion

Code and tool usage

Chain-of-thought reasoning

Use Cases

Language services

Greek-English translation

Bidirectional document translation services

High-quality professional terminology translation

Greek content creation

Generating Greek articles, reports, etc.

Natural output conforming to Greek language conventions

Professional domains

Legal document analysis

Understanding and summarizing Greek legal documents

Accurate identification of legal clauses

Medical report generation

Generating Greek reports based on medical data

Accurate use of professional medical terminology

🚀 Llama-Krikri-8B-Instruct: An Instruction-tuned Large Language Model for the Greek language

Llama-Krikri-8B-Instruct is an instruction-tuned large language model based on Llama-3.1-8B, extending its capabilities for Greek. It can handle various tasks in Greek and English, such as chat, translation, and text generation.

🚀 Quick Start

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"

model = AutoModelForCausalLM.from_pretrained("ilsp/Llama-Krikri-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("ilsp/Llama-Krikri-8B-Instruct")

model.to(device)

system_prompt = "Είσαι το Κρικρί, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για τα ελληνικα και εκπαιδεύτηκες από το ΙΕΛ του Ε.Κ. \"Αθηνά\"."
user_prompt = "Σε τι διαφέρει ένα κρικρί από ένα λάμα;"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])

With OpenAI compatible server via vLLM

vllm serve ilsp/Llama-Krikri-8B-Instruct \
  --enforce-eager \
  --dtype 'bfloat16' \
  --api-key token-abc123

Then, the model can be used through Python using:

from openai import OpenAI

api_key = "token-abc123"
base_url = "http://localhost:8000/v1"

client = OpenAI(
    api_key=api_key,
    base_url=base_url,
)

system_prompt = "Είσαι ένα ανεπτυγμένο μεταφραστικό σύστημα που απαντάει με λίστες Python. Δεν γράφεις τίποτα άλλο στις απαντήσεις σου πέρα από τις μεταφρασμένες λίστες."
user_prompt = "Δώσε μου την παρακάτω λίστα με μεταφρασμένο κάθε string της στα ελληνικά: ['Ethics of duty', 'Postmodern ethics', 'Consequentialist ethics', 'Utilitarian ethics', 'Deontological ethics', 'Virtue ethics', 'Relativist ethics']"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

response = client.chat.completions.create(model="ilsp/Llama-Krikri-8B-Instruct",
                                          messages=messages,
                                          temperature=0.0,
                                          top_p=0.95,
                                          max_tokens=8192,
                                          stream=False)

print(response.choices[0].message.content)
# ['Ηθική καθήκοντος', 'Μεταμοντέρνα ηθική', 'Συνεπειοκρατική ηθική', 'Ωφελιμιστική ηθική', 'Δεοντολογική ηθική', 'Ηθική αρετών', 'Σχετικιστική ηθική']

✨ Features

Base Model Features:
- Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens.
- 128k context length (approximately 80,000 Greek words).
- Extended pretraining on a large corpus for Greek language proficiency, including 56.7 billion monolingual Greek tokens, 21 billion monolingual English tokens, 5.5 billion Greek - English parallel data tokens, and 7.8 billion math and code tokens. The total corpus was upsampled to 110 billion tokens.
Instruct Model Features:
- Enhanced chat capabilities and instruction - following in both Greek and English.
- Document translation between Greek and multiple languages (French, German, Italian, Portuguese, Spanish).
- Great performance on generation, comprehension, and editing tasks.
- Domain - specific expertise for legal, financial, medical, and scientific applications.
- Retrieval - Augmented Generation (RAG) with 128k context length.
- Improved coding and agentic capabilities.
- Conversion or structured extraction in data - to - text & text - to - data settings.
- Analytical thinking and Chain - of - Thought (CoT) reasoning.

📦 Installation

No specific installation steps are provided in the original README.

📚 Documentation

Model Information

Base Model

The Llama-3.1 tokenizer is extended with Greek tokens.
It has a 128k context length, equivalent to about 80,000 Greek words.
Pretraining is extended using a large corpus:
- 56.7 billion monolingual Greek tokens from public resources.
- 21 billion monolingual English tokens and 5.5 billion Greek - English parallel data tokens to ensure bilingual capabilities.
- 7.8 billion math and code tokens.

The corpus composition is as follows:

Property	Details
Greek	56.7 B tokens (62.3%)
English	21.0 B tokens (23.1%)
Parallel	5.5 B tokens (6.0%)
Math/Code	7.8 B tokens (8.6%)
Total	91 B tokens (100%)

Chosen subsets of the 91 billion corpus were upsampled to 110 billion tokens.

Instruct Model

Llama-Krikri-8B-Instruct is post - trained on Llama-Kriki-8B-Base and has the features mentioned above.

Post - training Methodology

2 - stage Supervised Fine - Tuning with Greek & English instruction - response pairs and multi - turn conversations:
- Stage 1: 856,946 instruction - response pairs (371,379 Greek + 485,567 English).
- Stage 2: 638,408 instruction - response pairs (279,948 Greek + 358,460 English).
Alignment with Greek & English preference triplets:
- Length Normalized DPO: 92,394 preference triplets (47,132 Greek + 45,262 English).

Post - training Data Construction

Collect existing high - quality datasets like Tulu 3, SmolTalk, etc.
Translate data into Greek using an in - house tool.
Regenerate translated data and create preference triplets.
Distill models like Gemma 2 27B IT.
Score data with Skywork Reward Gemma 2 27B v0.2 and filter using rules.
Create data for translation using parallel corpora from [ELRC - SHARE](https://elrc - share.eu/).
Synthetically extract question - answer pairs and dialogues from various sources.

Evaluation

Chat Evaluation Suite:
- Evaluated on Greek IFEval, English IFEval, [Greek MT - Bench](https://huggingface.co/datasets/ilsp/mt - bench - greek), and English MT - Bench using gpt - 4o - 2024 - 08 - 06 as the judge model.
- Llama - Krikri - 8B Instruct outperforms Llama - 3.1 - 8B Instruct by +21.7% and +7.3% on Greek and English IFEval respectively. It also shows strong performance in MT - Bench benchmarks.

Model	IFEval EL (strict avg)	IFEval EN (strict avg)	MT - Bench EL	MT - Bench EN
Qwen 2.5 7B Instruct	46.2%	74.8%	5.83	7.87
EuroLLM 9B Instruct	51.3%	64.5%	5.98	6.27
Aya Expanse 8B	50.4%	62.2%	7.68	6.92
Meltemi 7B v1.5 Instruct	32.7%	41.2%	6.25	5.46
Llama - 3.1 - 8B Instruct	45.8%	75.1%	6.46	7.25
Llama - Krikri - 8B Instruct	67.5%	82.4%	7.96	7.21

Arena - Hard - Auto Evaluation:
- Used [Arena - Hard - Auto](https://huggingface.co/datasets/lmarena - ai/arena - hard - auto - v0.1) and its Greek translated version.
- Two scores are reported: No Style Control and With Style Control.
- Llama - Krikri - 8B Instruct scores higher than models over 8 times its size and is competitive with closed - source and high - performant open - source models.

🔧 Technical Details

The post - training process involves multi - stage Supervised Fine - Tuning and alignment with preference triplets. The data construction uses a variety of methods including collecting existing datasets, translation, distillation, and synthetic data extraction.

📄 License

The license for this model is llama3.1.

⚠️ Important Note

PLEASE USE THE OFFICIAL QUANTIZED VERSIONS. There is no guarantee that you are using the latest improved versions from 3rd party quantizations as we have updated the model's weights.

⚠️ Important Note

More information on post - training, methodology, and evaluation coming soon.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご