mallam-1.1B-4096 Open-Source Pretrained Model - Suitable for Malay Text Processing with Long Context Support

Home

Mallam 1.1B 4096

Developed by mesolitica

A 1.1B parameter pre-trained model based on Malay text, using Mistral architecture, supporting 4096 context length

Large Language Model

Transformers

Other#Malay language large model #Long text processing #Low-resource optimization

Downloads 201

Release Time : 11/27/2023

Model Overview

This is a 1.1B parameter large language model specifically optimized for Malay, pre-trained from scratch on 90 billion Malay text tokens, suitable for Malay text generation and comprehension tasks

Model Features

Malay language optimization

Specifically trained and optimized for Malay text

Long context support

Supports long context processing capability of 4096 tokens

Efficient training

Efficiently completed training using a Ray cluster on 5 nodes (each with 4×A100 80GB)

Model Capabilities

Malay text generation

Long text comprehension

Language model reasoning

Use Cases

Text generation

Malay content creation

Generate Malay articles, stories, or other creative content

Dialogue systems

Build Malay chatbots or virtual assistants

Education

Language learning assistance

Help users learning Malay to practice and understand the language

🚀 MaLLaM 🌙 1.1B (Malaysia Large Language Model)

MaLLaM 1.1B is a large language model specifically pretrained on Malaysian text. It uses the Mistral architecture and has been trained from scratch with 1.1B parameters on 90B Malaysian text tokens, offering a context length of 4096.

🚀 Quick Start

Prerequisites

Before using MaLLaM 1.1B, make sure you have the necessary libraries installed. You can use the following code to load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

TORCH_DTYPE = 'bfloat16'
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
)

tokenizer = AutoTokenizer.from_pretrained('mesolitica/mallam-1.1B-4096')
model = AutoModelForCausalLM.from_pretrained(
    'mesolitica/mallam-1.1B-4096',
    use_flash_attention_2 = True,
    quantization_config = nf4_config
)
prompt = '<s>nama saya'
inputs = tokenizer([prompt], return_tensors='pt', add_special_tokens=False).to('cuda')

generate_kwargs = dict(
    inputs,
    max_new_tokens=512,
    top_p=0.95,
    top_k=50,
    temperature=0.9,
    do_sample=True,
    num_beams=1,
    repetition_penalty=1.05,
)
r = model.generate(**generate_kwargs)

✨ Features

Pretrained on Malaysian Text: MaLLaM 1.1B is trained on 90B Malaysian text tokens, which makes it well - adapted to the Malaysian language context.
Mistral Architecture: Utilizes the Mistral architecture, which provides efficient and powerful language processing capabilities.
4096 Context Length: Supports a context length of 4096, allowing for more comprehensive text understanding and generation.

📦 Installation

There is no specific installation process described in the original document. Usually, you need to have the transformers library installed. You can install it using pip install transformers.

📚 Documentation

README: Check the README at https://github.com/mesolitica/malaya/tree/5.1/pretrained-model/mistral.
Technical Report: Refer to the technical report at https://github.com/mesolitica/malaya/wiki/MaLLaM-%F0%9F%8C%99-Malaysia-Large-Language-Model.

🔧 Technical Details

Training Data: The model is trained on 90B tokens, which are gathered at https://github.com/malaysia-ai/dedup-text-dataset/tree/main/pretrain-llm.
Training Infrastructure: A Ray cluster is used to train the model on 5 nodes of 4x A100 80GB. You can find more details at https://github.com/malaysia-ai/jupyter-gpu/tree/main/ray.

📄 Additional Resources

WandB: Monitor the training process on https://wandb.ai/mesolitica/pretrain-mistral-1.1b?workspace=user-husein-mesolitica.
WandB Report: Read the WandB report at https://wandb.ai/mesolitica/pretrain-mistral-3b/reports/Pretrain-Larger-Malaysian-Mistral--Vmlldzo2MDkyOTgz.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご