Minueza-2-96M Open-Source Language Model - Supports English and Portuguese Bilingual, Enables Smooth Communication in Long Texts

Minueza 2 96M

Developed by Felladrin

A compact language model based on the Llama architecture, supporting English and Portuguese, with 96 million parameters and a context length of 4096 tokens.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Miniature Language Model #Bilingual Support #Mobile Optimization

Downloads 357

Release Time : 4/5/2025

Model Overview

A lightweight foundational model trained from scratch, serving as a base for subsequent fine-tuning for specific applications. While its reasoning and knowledge are limited, it is suitable for use in resource-constrained environments.

Model Features

Compact and Efficient

A small model with only 96 million parameters, suitable for running on devices without GPUs or on mobile platforms.

Bilingual Support

Supports text generation in both English and Portuguese.

Long Context Processing

Supports a context window length of 4096 tokens.

Fine-tuning Friendly

Designed to serve as a base model for fine-tuning in ChatML format.

Model Capabilities

Text Generation

Multilingual Processing

Use Cases

Mobile Applications

In-browser Text Generation

Run on mobile browsers via Wllama and Transformers.js

Enables lightweight client-side text generation.

Resource-Constrained Environments

Deployment on Low-Power Devices

Efficiently runs on devices without GPUs

Provides basic language model capabilities for edge devices.

🚀 Minueza-2-96M

Minueza-2-96M is a compact language model based on the Llama architecture, trained on English and Portuguese datasets. It offers a lightweight foundation for specific applications, despite its limitations compared to larger models.

🚀 Quick Start

Installation

pip install transformers==4.50.0 torch==2.6.0

Usage

from transformers import pipeline, TextStreamer
import torch

prompt = "This book tells the story"

generate_text = pipeline(
    "text-generation",
    model="Felladrin/Minueza-2-96M",
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

generate_text(
    prompt,
    streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True),
    do_sample=True,
    max_new_tokens=512,
    temperature=0.8,
    top_p=0.95,
    top_k=0,
    min_p=0.05,
    repetition_penalty=1.1,
)

✨ Features

Compact Size: With only 96 million parameters, it can be run on mobile web browsers via Wllama and Transformers.js, and run fast on machines without GPU.
Multilingual Training: Trained from scratch on English and Portuguese datasets.
Fine - Tunable: Can serve as a base for fine - tunes using ChatML format.

📦 Installation

pip install transformers==4.50.0 torch==2.6.0

💻 Usage Examples

Basic Usage

from transformers import pipeline, TextStreamer
import torch

prompt = "This book tells the story"

generate_text = pipeline(
    "text-generation",
    model="Felladrin/Minueza-2-96M",
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

generate_text(
    prompt,
    streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True),
    do_sample=True,
    max_new_tokens=512,
    temperature=0.8,
    top_p=0.95,
    top_k=0,
    min_p=0.05,
    repetition_penalty=1.1,
)

📚 Documentation

Summary

Minueza-2-96M is a compact language model based on the Llama architecture. It was trained from scratch on English and Portuguese datasets, utilising a context length of 4096 tokens and processing 185 billion tokens during the training process. With a parameter count of only 96 million, this model serves as a lightweight foundation that can be subsequently fine-tuned for specific applications.

Due to its compact size, the model has significant limitations in reasoning, factual knowledge, and general capabilities compared to larger models. It may generate incorrect, irrelevant, or nonsensical outputs. Furthermore, as it was trained on internet text data, it may harbour biases and potentially produce inappropriate content.

Intended Uses

This model was created with the following objectives in mind:

Run on mobile web browsers via Wllama and Transformers.js.
Run fast on machines without GPU.
Serve as a base for fine - tunes using ChatML format.

🔧 Technical Details

Model Architecture

This is a transformer model with the Llama architecture, trained on a context window of 4096 tokens.

Property	Details
max_position_embeddings	4096
hidden_size	672
intermediate_size	2688
num_hidden_layers	8
num_attention_heads	12
num_key_value_heads	4
head_dim	56
attention_dropout	0.1
vocab_size	32000
rope_theta	500000

The pretraining was made with these hyperparameters:

Property	Details
learning_rate	0.0003
warmup_steps	2000
weight_decay	0.1
max_grad_norm	2.0
total_train_batch_size	512 (2M tokens per batch)
seed	42
optimizer	Adam with betas=(0.9,0.95) and epsilon=1e - 08
lr_scheduler_type	linear

📄 License

This model is licensed under the Apache License 2.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご