Polka-1.1b-chat Open-Source Polish Conversational Assistant - Seamless Chatting with Local Deployment

Polka 1.1b Chat

Developed by eryk-mazus

The first Polish dialogue assistant model specifically designed for local deployment, based on TinyLlama-1.1B with extended Polish tokenizer and trained with DPO optimization

Large Language Model

Transformers

OtherOpen Source License:MIT #Polish conversation #Local deployment optimization #DPO fine-tuning

Downloads 91

Release Time : 2/7/2024

Model Overview

A 1.1B-parameter language model optimized for Polish conversations, supporting multi-turn dialogue scenarios

Model Features

Polish optimization

Extended tokenizer with 5.7 billion additional Polish tokens in pretraining

Conversation optimization

Fine-tuned on 60,000 multi-turn dialogues with DPO technique

Local deployment friendly

Compact model specifically designed for local execution

Long context support

4096-token context length

Model Capabilities

Polish text generation

Multi-turn dialogue processing

Creative writing

Question answering systems

Use Cases

Customer service assistant

Polish customer support

Handling inquiries and problem-solving for Polish-speaking users

Generates polite responses conforming to Polish language conventions

Educational applications

Polish learning assistant

Helping learners practice Polish conversations

Provides natural and fluent Polish expression examples

🚀 Polka-1.1B-Chat

eryk-mazus/polka-1.1b-chat is the first Polish model trained to act as a helpful, conversational assistant that can be run locally. It offers a practical solution for local, efficient Polish text interactions, leveraging advanced training techniques and a custom tokenizer.

image/png

✨ Features

Base Model: The model is based on TinyLlama-1.1B. It uses a custom, extended tokenizer for more efficient Polish text generation and was additionally pretrained on 5.7 billion tokens.
Fine - Tuning: It was fine - tuned on around 60k synthetically generated and machine - translated multi - turn conversations with Direct Preference Optimization (DPO).
Context Size: It supports a context size of 4,096 tokens.
Related Releases:
- polka-1.1b: The base model with an extended tokenizer and additional pre - training on a Polish corpus sampled using [DSIR](https://github.com/p - lambda/dsir).
- polka-pretrain-en-pl-v1: The pre - training dataset.
- polka-dpo-v1: Dataset of DPO pairs.
- polka-1.1b-chat-gguf: GGUF files for the chat model.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_name = "eryk-mazus/polka-1.1b-chat"

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto"
)
streamer = TextStreamer(tokenizer, skip_prompt=True)

# You are a helpful assistant.
system_prompt = "Jesteś pomocnym asystentem."
chat = [{"role": "system", "content": system_prompt}]

# Compose a short song on programming.
user_input = "Napisz krótką piosenkę o programowaniu."
chat.append({"role": "user", "content": user_input})

# Generate - add_generation_prompt to make sure it continues as assistant
inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_tensors="pt")
# For multi-GPU, find the device of the first parameter of the model
first_param_device = next(model.parameters()).device
inputs = inputs.to(first_param_device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        temperature=0.2,
        repetition_penalty=1.15,
        top_p=0.95,
        do_sample=True,
        streamer=streamer,
    )

# Add just the new tokens to our chat
new_tokens = outputs[0, inputs.size(1):]
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
chat.append({"role": "assistant", "content": response})

Advanced Usage

The model works seamlessly with vLLM as well. You can use vLLM to further optimize the inference performance of the model.

📚 Documentation

Prompt format

This model uses ChatML as the prompt format:

<|im_start|>system
Jesteś pomocnym asystentem.
<|im_start|>user
Jakie jest dzienne zapotrzebowanie kaloryczne dorosłej osoby?<|im_end|>
<|im_start|>assistant
Dla dorosłych osób zaleca się spożywanie około 2000 - 3000 kcal dziennie, aby utrzymać optymalne zdrowie i dobre samopoczucie.<|im_end|>

This prompt is available as a chat template, which means you can format messages using the tokenizer.apply_chat_template() method, as demonstrated in the example above.

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご