LeniaChat - Gemma - 2B - v0 Open-source Model - Free deployment for Spanish, supporting text generation and Q&A

Home

Leniachat Gemma 2b V0

Developed by LenguajeNaturalAI

A 2B-parameter chat and instruction model optimized for Spanish, supporting text generation, dialogue, and Q&A tasks

Large Language Model

Transformers

SpanishOpen Source License:Apache-2.0 #Spanish generation #Multi-task instruction #Long text support

Downloads 132

Release Time : 4/9/2024

Model Overview

A multi-task model fine-tuned from Gemma-2b, specializing in fluent conversation, instruction understanding, and abstract problem solving

Model Features

Spanish optimization

Multi-stage training specifically for Spanish ensures accuracy and fluency in language processing

Multi-task learning

Supports diverse task processing through FLAN-style training

Long context support

Supports sequences up to 8192 tokens, suitable for long documents and complex dialogues

Instruction fine-tuning

High-quality instruction fine-tuning enables accurate understanding and execution of complex instructions

Model Capabilities

Spanish text generation

Multi-turn dialogue processing

Abstract problem solving

Instruction understanding and execution

Knowledge Q&A

Use Cases

Virtual assistant

Customer service chatbot

Automated customer support for Spanish-speaking markets

Handles common queries and provides accurate responses

Education

Language learning assistant

Helps Spanish learners practice conversation and grammar

Delivers natural and fluent conversational experiences

🚀 LanguageNaturalAI Chat and Instruction Model 2B

Developed by LanguageNatural.AI, this model offers advanced text generation, chat, and instruction capabilities for the Spanish - speaking community.

🚀 Quick Start

You can start using this model through the Hugging Face API or integrate it into your applications using the transformers library. Here is an example of how to load the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "LenguajeNaturalAI/leniachat-gemma-2b-v0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
messages = [
  {"role": "system", "content": "Eres un asistente que ayuda al usuario a lo largo de la conversación resolviendo sus dudas."},
  {"role": "user", "content": "¿Qué fue la revolución industrial?"}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
with torch.no_grad():
  output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

✨ Features

Designed for Spanish - speaking Community: The model has been trained exclusively in Spanish to maximize its effectiveness for Spanish - speaking users.
Advanced Training Phases: Trained in three distinct phases, including multi - task learning in Spanish, high - quality instruction training, and chat and abstract QA training.
Based on a Well - known Model: Fine - tuned from google/gemma-2b, incorporating advanced features for better text generation and understanding in Spanish chat and instruction tasks.

📦 Installation

This section is not provided in the original README, so it is skipped.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "LenguajeNaturalAI/leniachat-gemma-2b-v0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
messages = [
  {"role": "system", "content": "Eres un asistente que ayuda al usuario a lo largo de la conversación resolviendo sus dudas."},
  {"role": "user", "content": "¿Qué fue la revolución industrial?"}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
with torch.no_grad():
  output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Advanced Usage

There is no advanced usage example in the original README, so this part is not added.

📚 Documentation

Model Details

This model has been developed by LanguageNatural.AI. It aims to provide advanced tools for text generation, chat, and instruction to the Spanish - speaking community. It is the first in a series of models the company plans to launch.

Training

The model was trained in three phases:

Multi - task Learning in Spanish: Using multiple supervised datasets for FLAN - style training.
High - quality Instruction Training: Fine - tuning the model to understand and generate responses to complex instructions.
Chat and Abstract QA Training: Optimizing the model for smooth conversations and generating responses to abstract questions. The training in all three phases was carried out using the library autotransformers.

Evaluation

To ensure the quality of the model, a comprehensive evaluation was conducted on several datasets, showing significant performance in text generation and instruction understanding in Spanish. The specific details of the evaluation of the LeNIA - Chat models are available in the following table.

image/png

Uses and Limitations

This model is designed for use in Spanish text generation applications, chatbots, and virtual assistants. Although it has been trained to minimize biases and errors, users are recommended to evaluate its performance in their specific use context. Users should be aware of the inherent limitations of language models and use this model responsibly. Also, since the base model has only 2 billion parameters, this model shares the inherent limitations of models of that size.

Future Versions

The developers plan to continue improving this model and launch future versions with expanded capabilities. You can stay updated on their website or their LinkedIn page.

🔧 Technical Details

This section is not provided in the original README, so it is skipped.

📄 License

This model is distributed under the Apache 2.0 license.

Property	Details
Model Type	Language model for text generation, chat, and instruction in Spanish
Training Data	Trained in three phases using multiple supervised datasets, with the help of the autotransformers library
Base Model	`google/gemma-2b`
Language	Spanish
Maximum Sequence Length	8192 tokens

image/png

💡 Usage Tip

Although this model has been trained to minimize biases and errors, it is recommended to evaluate its performance in your specific use context.

⚠️ Important Note

Users should be aware of the inherent limitations of language models and use this model responsibly. Also, since the base model has only 2 billion parameters, this model shares the inherent limitations of models of that size.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご