Llama-2-13b-chat-norwegian Open-source Model - Optimizing Norwegian Text Understanding and Generation

Llama 2 13b Chat Norwegian

Developed by RuterNorway

A Norwegian version fine-tuned based on Meta's Llama 2 13b Chat model, optimized for Norwegian text understanding and generation

Large Language Model

Transformers

Supports Multiple Languages#Norwegian optimization #Multi-turn dialogue #Instruction fine-tuning

Downloads 23

Release Time : 8/16/2023

Model Overview

This model is fine-tuned on Norwegian datasets and is suitable for commercial and research purposes in Norwegian, serving as an assistant-like chat tool

Model Features

Norwegian Optimization

Specially fine-tuned for Norwegian, optimizing text understanding and generation capabilities in Norwegian

Multi-dataset Training

Combines the Norwegian alpaca dataset and 15k Norwegian OpenOrca data to enhance model performance

Safety Mechanisms

Retains Meta's original model safety mechanisms to reduce the generation of harmful, biased, or inappropriate content

Multi-prompt Format Support

Supports both Llama2 Chat and Alpaca prompt formats, adapting to different usage scenarios

Model Capabilities

Norwegian text generation

Norwegian Q&A

Norwegian summarization

Norwegian chat

Use Cases

Commercial Applications

Norwegian Customer Service Assistant

Used for automating customer service in Norwegian businesses

Education & Research

Norwegian Language Learning Aid

Assists students learning Norwegian with language practice

🚀 Llama 2 13b Chat Norwegian

Llama-2-13b-chat-norwegian is a variant of Meta's Llama 2 13b Chat model. It's finetuned on a mix of Norwegian datasets, enabling it to understand and generate text in Norwegian, which is valuable for NLP tasks in the Norwegian language.

🚀 Quick Start

Llama-2-13b-chat-norwegian is ready to use for text generation tasks in Norwegian. You can start leveraging its capabilities right away after setting up the appropriate environment.

✨ Features

Norwegian Language Support: This model is specifically tuned to understand and generate text in Norwegian, making it suitable for various Norwegian NLP applications.
Finetuned on Diverse Datasets: It's finetuned on a combination of Norwegian datasets, including norwegian - alpaca and machine - translated data from OpenOrca, along with a small subset of custom - made instructional data.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

The model can be used in a text - generation pipeline. For example, in a Python environment with the appropriate Hugging Face libraries:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "RuterNorway/Llama-2-13b-chat-norwegian"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "This is a test input"
input_ids = tokenizer(input_text, return_tensors='pt').input_ids
output = model.generate(input_ids)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Advanced Usage

For more complex tasks such as chat - based interactions, you can use the appropriate prompt templates provided by the model:

# Using the Llama2 Chat prompt template
prompt = "<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. Please answer in the same language as the user. <</SYS>> This is a test question[/INST]"
input_ids = tokenizer(prompt, return_tensors='pt').input_ids
output = model.generate(input_ids)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

📚 Documentation

Data

Norwegian alpaca: A key dataset used for finetuning.
15k Norwegian OpenOrcra (to be released): Another dataset that contributes to the model's training.
Small subset of custom - made instructional data: Adds specific knowledge and patterns to the model.

Intended Use

This model is intended for commercial and research use in Norwegian and can be used as an assistant - like chat.

Prompt Template

Llama2 Chat Prompt Format:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. Please answer in the same language as the user.
<</SYS>>
This is a test question[/INST] This is a answer </s><s>

See the original implementation here.

Alpaca Prompt Format:

### Instruction:
Summarize following text.
### Input:
Text to be summarized
### Response:

Why this model?

As a Norwegian company, we recognize the urgent need for powerful language models tailored to specific languages. Our goal is to democratize information, promote innovation, and create a more inclusive digital ecosystem by providing this open - source Norwegian model. We hope it will serve as a foundational resource for future specialized Norwegian models and strengthen the Norwegian NLP community.

Limitations

Knowledge Limitation: It's an LLM, not a knowledge model, and can't be expected to have more information about Norway than the base model.
Task - Specific Performance: Generally performs better on summarization, question - answering, and chat tasks than on tasks requiring in - depth knowledge of Norway, specific domains, or free - form answering.
Data Quality: The training data is machine - translated and may contain grammatical and other errors.
Prompt Tuning: The model is released as is and usually requires prompt tuning for optimal results.

License

Llama 2 is licensed under the LLAMA 2 [Community License](https://ai.meta.com/resources/models - and - libraries/llama - downloads/), Copyright © Meta Platforms, Inc. All Rights Reserved. See the original [model card](https://huggingface.co/meta - llama/Llama - 2 - 13b) for more information. Also, from [norwegian - alpaca](https://huggingface.co/NbAiLab/norwegian - alpaca), note that "the current version uses OpenAI's gpt - 3.5 - turbo; hence, this dataset cannot be used to create models that compete in any way against OpenAI."

Disclaimer

As - Is Availability: The model is available "as is", and Ruter As takes no responsibility for further use.
Ethical Considerations: Although the safeguards implemented by Meta seem to work as expected during testing, developers should refer to the Ethical Considerations and Limitations from the original model card:

Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios.
For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.
Please see the Responsible Use Guide available at https://ai.meta.com/llama/responsible - use - guide/

🔧 Technical Details

No specific technical details (more than 50 words) are provided in the original document.

📄 License

Credits

This model was developed at Ruters AI Lab, which is part of Ruters Data & AI division.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご