Llama3.1-Typhoon2-8B Open-source Thai Large Language Model - Free Deployment for Handling Multilingual Tasks

Llama3.1 Typhoon2 8b Instruct

Developed by scb10x

Llama3.1-Typhoon2-8B is a large Thai language model (instructional type) based on the Transformer architecture, capable of handling various language tasks and providing users with efficient and accurate language interaction services.

Large Language Model

Safetensors

#Thai large model #Instructional interaction #Multi-task processing

Downloads 2,831

Release Time : 12/15/2024

Model Overview

This model is an 8-billion-parameter instructional decoder-only model, mainly used for language interaction tasks in Thai and English, such as question answering, mathematics, coding, creative writing, etc.

Model Features

Multi-language support

Supports Thai and English and can handle various language tasks.

Long context processing

Supports a context length of 90k and can handle longer context inputs.

High-performance instruction following

Performs excellently in instruction following and function call tasks.

Specific domain optimization

Performs well in specific domains such as mathematics and coding.

Model Capabilities

Text generation

Question answering

Mathematical calculation

Coding

Creative writing

Role-playing

Teaching

Function call

Use Cases

Education

Teaching assistance

Helps students answer questions or provide learning resources.

Improve learning efficiency

Business

Customer service

Used for automated customer service to answer customer questions.

Improve customer satisfaction

Development

Code generation

Helps developers generate code snippets or solve programming problems.

Improve development efficiency

🚀 Llama3.1-Typhoon2-8B

Llama3.1-Typhoon2-8B is a Thai large language model (Instruct) with 8 billion parameters, based on Llama3.1-8B. It offers high performance in various tasks such as instruction - following, function calls, and specific domains like math and coding.

🚀 Quick Start

To quickly start using the Llama3.1-Typhoon2-8B model, you need to have transformers 4.45.0 or newer installed. Here is a simple example to load the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "scb10x/llama3.1-typhoon2-8b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

✨ Features

Instruction - Following & Function Call Performance

The model shows excellent performance in following instructions and making function calls. You can see the general performance in the following image:

Specific Domain Performance (Math & Coding)

In specific domains like math and coding, the model also performs well. The relevant performance image is as follows:

TTyphoon2 Llama 8B Specific Domain Performance

Long Context Performance

The model can handle long - context tasks effectively. Check out the long - context performance image:

Detail Performance

The following table shows the detailed performance comparison between different models:

Model	IFEval - TH	IFEval - EN	MT - Bench TH	MT - Bench EN	Thai Code - Switching(t = 0.7)	Thai Code - Switching(t = 1.0)	FunctionCall - TH	FunctionCall - EN	GSM8K - TH	GSM8K - EN	MATH - TH	MATH - EN	HumanEval - TH	HumanEval - EN	MBPP - TH	MBPP - EN
Llama3.1 8B Instruct	58.04%	77.64%	5.109	8.118	93%	11.2%	36.92%	66.06%	45.18%	62.4%	24.42%	48%	51.8%	67.7%	64.6%	66.9%
Typhoon2 Llama3 8B Instruct	72.60%	76.43%	5.7417	7.584	98.8%	98%	75.12%	79.08%	71.72%	81.0%	38.48%	49.04%	58.5%	68.9%	60.8%	63.0%

📦 Installation

To use the model, you need to install the transformers library with version 4.45.0 or newer. You can install it using pip:

pip install transformers>=4.45.0

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "scb10x/llama3.1-typhoon2-8b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a male AI assistant named Typhoon created by SCB 10X to be helpful, harmless, and honest. Typhoon is happy to help with analysis, question answering, math, coding, creative writing, teaching, role - play, general discussion, and all sorts of other tasks. Typhoon responds directly to all human messages without unnecessary affirmations or filler phrases like ‚ÄúCertainly!‚Äù, ‚ÄúOf course!‚Äù, ‚ÄúAbsolutely!‚Äù, ‚ÄúGreat!‚Äù, ‚ÄúSure!‚Äù, etc. Specifically, Typhoon avoids starting responses with the word ‚ÄúCertainly‚Äù in any way. Typhoon follows this information in all languages, and always responds to the user in the language they use or request. Typhoon is now being connected with a human. Write in fluid, conversational prose, Show genuine interest in understanding requests, Express appropriate emotions and empathy. Also showing information in term that is easy to understand and visualized."},
    {"role": "user", "content": "‡∏Ç‡∏≠‡∏™‡∏π‡∏ï‡∏£‡πÑ‡∏Å‡πà‡∏¢‡πà‡∏≤‡∏á"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Advanced Usage

Inference Server Hosting Example

pip install vllm
vllm serve scb10x/llama3.1-typhoon2-8b-instruct
# see more information at https://docs.vllm.ai/

Function - Call Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import ast
model_name = "scb10x/llama3.1-typhoon2-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.bfloat16, device_map='auto'
)

get_weather_api = {
    "name": "get_weather",
    "description": "Get the current weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, New York",
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "The unit of temperature to return",
            },
        },
        "required": ["location"],
    },
}


search_api = {
    "name": "search",
    "description": "Search for information on the internet",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query, e.g. 'latest news on AI'",
            }
        },
        "required": ["query"],
    },
}

get_stock = {
    "name": "get_stock_price",
    "description": "Get the stock price",
    "parameters": {
        "type": "object",
        "properties": {
            "symbol": {
                "type": "string",
                "description": "The stock symbol, e.g. AAPL, GOOG",
            }
        },
        "required": ["symbol"],
    },
}
# Tool input are same format with OpenAI tools
openai_format_tools = [get_weather_api, search_api, get_stock]

messages = [
    {"role": "system", "content": "You are an expert in composing functions."},
    {"role": "user", "content": "‡∏Ç‡∏≠‡∏£‡∏≤‡∏Ñ‡∏≤‡∏´‡∏∏‡πâ‡∏ô Tasla (TLS) ‡πÅ‡∏•‡∏∞ Amazon (AMZ) ?"},
]

inputs = tokenizer.apply_chat_template(
    messages, tools=openai_format_tools, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    num_return_sequences=1,
    eos_token_id=[tokenizer.eos_token_id, 128009],
)
response = outputs[0][inputs.shape[-1]:]

print("Here Output:", tokenizer.decode(response, skip_special_tokens=True))


# Decoding function utility
def resolve_ast_by_type(value):
    if isinstance(value, ast.Constant):
        if value.value is Ellipsis:
            output = "..."
        else:
            output = value.value
    elif isinstance(value, ast.UnaryOp):
        output = -value.operand.value
    elif isinstance(value, ast.List):
        output = [resolve_ast_by_type(v) for v in value.elts]
    elif isinstance(value, ast.Dict):
        output = {
            resolve_ast_by_type(k): resolve_ast_by_type(v)
            for k, v in zip(value.keys, value.values)
        }
    elif isinstance(
        value, ast.NameConstant
    ):  # Added this condition to handle boolean values
        output = value.value
    elif isinstance(
        value, ast.BinOp
    ):  # Added this condition to handle function calls as arguments
        output = eval(ast.unparse(value))
    elif isinstance(value, ast.Name):
        output = value.id
    elif isinstance(value, ast.Call):
        if len(value.keywords) == 0:
            output = ast.unparse(value)
        else:
            output = resolve_ast_call(value)
    elif isinstance(value, ast.Tuple):
        output = tuple(resolve_ast_by_type(v) for v in value.elts)
    elif isinstance(value, ast.Lambda):
        output = eval(ast.unparse(value.body[0].value))
    elif isinstance(value, ast.Ellipsis):
        output = "..."
    elif isinstance(value, ast.Subscript):
        try:
            output = ast.unparse(value.body[0].value)
        except:
            output = ast.unparse(value.value) + "[" + ast.unparse(value.slice) + "]"
    else:
        raise Exception(f"Unsupported AST type: {type(value)}")
    return output


def resolve_ast_call(elem):
    func_parts = []
    func_part = elem.func
    while isinstance(func_part, ast.Attribute):
        func_parts.append(func_part.attr)
        func_part = func_part.value
    if isinstance(func_part, ast.Name):
        func_parts.append(func_part.id)
    func_name = ".".join(reversed(func_parts))
    args_dict = {}
    for arg in elem.keywords:
        output = resolve_ast_by_type(arg.value)
        args_dict[arg.arg] = output
    return {func_name: args_dict}


def ast_parse(input_str, language="Python"):
    if language == "Python":
        cleaned_input = input_str.strip("[]'")
        parsed = ast.parse(cleaned_input, mode="eval")
        extracted = []
        if isinstance(parsed.body, ast.Call):
            extracted.append(resolve_ast_call(parsed.body))
        else:
            for elem in parsed.body.elts:
                assert isinstance(elem, ast.Call)
                extracted.append(resolve_ast_call(elem))
        return extracted
    else:
        raise NotImplementedError(f"Unsupported language: {language}")


def parse_nested_value(value):
    """
    Parse a potentially nested value from the AST output.

    Args:
        value: The value to parse, which could be a nested dictionary, which includes another function call, or a simple value.

    Returns:
        str: A string representation of the value, handling nested function calls and nested dictionary function arguments.
    """
    if isinstance(value, dict):
        # Check if the dictionary represents a function call (i.e., the value is another dictionary or complex structure)
        if all(isinstance(v, dict) for v in value.values()):
            func_name = list(value.keys())[0]
            args = value[func_name]
            args_str = ", ".join(
                f"{k}={parse_nested_value(v)}" for k, v in args.items()
            )
            return f"{func_name}({args_str})"
        else:
            # If it's a simple dictionary, treat it as key - value pairs
            return (
                "{"
                + ", ".join(f"'{k}': {parse_nested_value(v)}" for k, v in value.items())
                + "}"
            )
    return repr(value)

def default_decode_ast_prompting(result, language="Python"):
    result = result.strip("`\n ")
    if not result.startswith("["):
        result = "[" + result
    if not result.endswith("]"):
        result = result + "]"
    decoded_output = ast_parse(result, language)
    return decoded_output


fc_result = default_decode_ast_prompting(tokenizer.decode(response, skip_special_tokens=True))
print(fc_result) # [{'Function': {'arguments': '{"symbol": "TLS"}', 'name': 'get_stock_price'}}, {'Function': {'arguments': '{"symbol": "AMZ"}', 'name': 'get_stock_price'}}]

📚 Documentation

Model Description

Property	Details
Model Type	A 8B instruct decoder - only model based on Llama architecture.
Training Data	Not specified in the original document.
Requirement	transformers 4.45.0 or newer.
Context length	90k
Primary Language(s)	Thai and English
License	[Llama 3.1 Community License](https://github.com/meta - llama/llama - models/blob/main/models/llama3_1/LICENSE)

Intended Uses & Limitations

This model is an instructional model. However, it's still undergoing development. It incorporates some level of guardrails, but it still may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.

You can follow us on Twitter: https://twitter.com/opentyphoon

Support

Join our Discord community for support: https://discord.gg/us5gAYmrxw

Citation

If you find Typhoon2 useful for your work, please cite it using:

@misc{typhoon2,
      title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models}, 
      author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na - Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
      year={2024},
      eprint={2412.13702},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13702}, 
}

⚠️ Important Note

This model is still under development and may produce inaccurate, biased, or otherwise objectionable answers. Developers should assess the risks according to their use cases.

💡 Usage Tip

Make sure to install the transformers library with version 4.45.0 or newer before using the model. For inference server hosting, refer to the official documentation of vllm at https://docs.vllm.ai/.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご