Llama-3.1-Tango-70b-bnb_4b Open-source Large Model - Optimizing Language Expression Ability in Spanish-speaking Regions

Home

Llama 3.1 Tango 70b Bnb 4b

Developed by sandbox-ai

Tango-70B-Instruct is a large language model designed to enhance language performance in Spanish-speaking regions.

Large Language Model

Transformers

Supports Multiple Languages#Spanish optimization #70B large parameters #Customization for Latin American regions

Downloads 1,768

Release Time : 12/9/2024

Model Overview

Tango-70B-Instruct is trained on a specific dataset and can demonstrate good performance in various natural language processing tasks, especially optimizing the performance of regional Spanish.

Model Features

Regional optimization

Trained on a modified variant of spanish-ir/messirve to improve the speech performance of regional Spanish.

Multilingual support

Supports English and Spanish.

Multi-task processing

Can handle various natural language processing tasks, such as question answering, sentiment analysis, text classification, summary generation, etc.

Model Capabilities

Text generation

Question answering system

Sentiment analysis

Text classification

Summary generation

Use Cases

Education

Answering primary school science questions in Catalan

Answer primary school science questions in Catalan

Performs well on the ARC_ca dataset

Political analysis

Analysis of opinions on Basque election campaigns

Analyze the opinion dataset of Basque election campaigns in 2016

Performs well on the BEC2016eu dataset

Narrative reasoning

Completion of Basque narratives

Complete Basque narratives

Performs well on the xStoryCloze_eu dataset

🚀 Tango-70B-Instruct

Tango-70B-Instruct is a large language model trained to enhance regional Spanish speech performance.

🚀 Quick Start

Tango-70B-Instruct can be used via the HuggingFace Transformers library. You'll need 2 or more 80GB GPUs (NVIDIA Ampere or newer) and at least 150GB of free disk space for the download.

This code has been tested on Transformers v4.44.0, torch v2.4.0, and 2 A100 80GB GPUs. Any setup that supports meta-llama/Llama-3.1-70B-Instruct should also support this model. If you encounter issues, you can try pip install -U transformers.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model_id = "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF"
adapter_model_id = "sandbox-ai/Tango-70b"

# Create quantization config for 4-bit precision
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load tokenizer from base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load the base model with 4-bit quantization
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",  # This will automatically handle model sharding
    trust_remote_code=True
)

# Load the PEFT adapter
model = PeftModel.from_pretrained(
    base_model,
    adapter_model_id,
    device_map="auto",  # This will automatically handle model sharding
)

hola_mundo = """
Bienvenido. 
Tu nombre es "Tango", sos la primer IA hecha en LatinoAmérica, basada en un Large Language Model de 70 billones de parámetros y creada en Argentina. 

Cuál es la importancia de hacer IA nativa en LatinoAmérica? qué beneficios trae haberte creado, en comparación a depender de las IAs creadas en USA, Francia o China?

"""

# Test prompt
messages = [
    {"role": "user", "content": hola_mundo}
]

# Format the input using the chat template
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

# Generate response with memory-efficient settings
with torch.inference_mode():
    outputs = model.generate(
        inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id,  # Set padding token
        attention_mask=torch.ones_like(inputs)  # Add attention mask
    )

# Decode and print the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

✨ Features

Regional Spanish Enhancement: Tango-70B-Instruct is trained on a modified variation of spanish/-ir/messirve to improve regional Spanish speech performance.
Multi - language Support: Supports both English and Spanish.

📦 Model Information

Property	Details
Model Type	Tango-70B-Instruct
License	llama3.1
Supported Languages	English, Spanish
Inference	false
Fine - tuning	true
Tags	nvidia, llama3.1, spanish, tango
Datasets	spanish-ir/messirve
Base Model	nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Pipeline Tag	text-generation
Library Name	transformers

📚 Documentation

Model Overview

Tango-70B-Instruct is a large language model trained by sandbox-ai on a modified variation of spanish/-ir/messirve to improve the regional Spanish speech performance.

See details on the github repo

Terms of use

By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the license, acceptable use policy and Meta’s privacy policy

Evaluation Metrics

Task	Name	Description	Language	Metric	Task type
AQuAS	AQuAS	Abstractive Question - Answering in Spanish	ES	sas_encoder	Abstractive QA
ARC_ca	ARC_ca	Grade - school level science questions in Catalan	CA	acc	Multi choice QA
BEC2016eu	BEC2016eu	Basque Election Campaign 2016 Opinion Dataset	EU	f1	Sentiment Analysis
Belebele Glg	Belebele Glg	Reading Comprehension in Galician	GL	acc	Reading Comprehension
BertaQA	BertaQA	Trivia dataset with global and local questions about the Basque Country	EU	acc	Multi choice QA
BHTCv2	BHTCv2	Topic Classification of News Headlines in Basque	EU	f1	Classification, Topic Classification
caBREU	caBREU	Article Summarization in Catalan	CA	bleu	Summarization
CatalanQA	CatalanQA	Extractive QA in Catalan	CA	f1	Extractive QA
CatCoLA	CatCoLA	Linguistic Acceptability in Catalan	CA	mcc	Linguistic Acceptability
ClinDiagnosES	ClinDiagnosES	Diagnosis of clinical cases in Spanish	ES	sas_encoder	Open QA
ClinTreatES	ClinTreatES	Treatment for clinical cases in Spanish	ES	sas_encoder	Open QA
COPA_ca	COPA_ca	Choice Of Plausible Alternatives in Catalan	CA	acc	Reasoning
CoQCat	CoQCat	Conversational Question Answering in Catalan	CA	f1	Extractive QA
Crows Pairs Spanish	Crows Pairs Spanish	Bias evaluation using stereotypes	ES	pct_stereotype	Bias Detection
EpecKorrefBin	EpecKorrefBin	Coreference resolution in Basque	EU	acc	Coreference Resolution, Textual Entailment
EsCoLA	EsCoLA	Spanish Corpus of Linguistic Acceptability	ES	mcc	Linguistic Acceptability
EusExams	EusExams	Public Service examinations questions in Basque	EU	acc	Multi choice QA
EusProficiency	EusProficiency	C1 - level proficiency questions in Basque	EU	acc	Multi choice QA
EusReading	EusReading	EGA exams reading comprehension in Basque	EU	acc	Multi choice QA
EusTrivia	EusTrivia	Trivia questions in Basque	EU	acc	Multi choice QA
Fake News ES	Fake News ES	Fake News Detection in Spanish	ES	acc	Classification
GalCoLA	GalCoLA	Galician Corpus of Linguistic Acceptability	GL	mcc	Linguistic Acceptability
HumorQA	HumorQA	White humour joke classification	ES	acc	Classification
MGSM_ca	MGSM_ca	Grade - school math problems in Catalan	CA	exact_match	Math Reasoning
MGSM_es	MGSM_es	Grade - school math problems in Spanish	ES	exact_match	Math Reasoning
MGSM_eu	MGSM_eu	Grade - school math problems in Basque	EU	exact_match	Math Reasoning
MGSM_gl	MGSM_gl	Grade - school math problems in Galician	GL	exact_match	Math Reasoning
NoticIA	NoticIA	A Clickbait Article Summarization Dataset in Spanish	ES	rouge1	Summarization
OffendES	OffendES	Clasificación de comentarios ofensivos en español	ES	acc	Classification
OpenBookQA_ca	OpenBookQA_ca	Multi - step reasoning QA in Catalan	CA	acc	Reasoning
OpenBookQA_gl	OpenBookQA_gl	Multi - step reasoning QA in Galician	GL	acc	Reasoning
Parafraseja	Parafraseja	Paraphrase identification in Catalan	CA	acc	Paraphrasing
ParafrasesGL	ParafrasesGL	Paraphrase identification in Galician	GL	acc	Paraphrasing
PAWS_ca	PAWS_ca	Paraphrase Adversaries from Word Scrambling in Catalan	CA	acc	Paraphrasing
PAWS-X_es	PAWS-X_es	Paraphrase Adversaries from Word Scrambling in Spanish	ES	acc	Paraphrasing
PAWS_gl	PAWS_gl	Paraphrase Adversaries from Word Scrambling in Galician	GL	acc	Paraphrasing
PIQA_ca	PIQA_ca	Physical Interaction QA in Catalan	CA	acc	Reasoning
QNLIeu	QNLIeu	Textual Entailment in Basque	EU	acc	NLI, Textual Entailment
RagQuAS	RagQuAS	Retrieval - Augmented - Generation and Question - Answering in Spanish	ES	sas_encoder	Abstractive QA
SIQA_ca	SIQA_ca	Social Interaction QA in Catalan	CA	acc	Reasoning
SpaLawEx	SpaLawEx	Spanish Law School Access Exams	ES	acc	Multi choice QA
SummarizationGL	SummarizationGL	Abstractive Summarization in Galician	GL	bleu	Summarization
TE-ca	TE-ca	Textual Entailment in Catalan	CA	acc	Textual Entailment
TELEIA	TELEIA	Test de Español como Lengua Extranjera para Inteligencia Artificial	ES	acc	Multi choice QA
VaxxStance	VaxxStance	Stance detection on the Antivaxxers movement	EU	f1	Sentiment Analysis, Stance Detection
WiCeu	WiCeu	Word sense disambiguation in Basque	EU	acc	Textual Entailment
WNLI_ca	WNLI_ca	Winograd - schema - type dataset in Catalan	CA	acc	NLI, Textual Entailment
WNLI ES	WNLI ES	Winograd - schema - type dataset in Spanish	ES	acc	NLI, Textual Entailment
XCOPA_eu	XCOPA_eu	Choice Of Plausible Alternatives in Basque	EU	acc	Reasoning
XNLI_ca	XNLI_ca	Cross - lingual Natural Language Inference in Catalan	CA	acc	NLI, Textual Entailment
XNLI_es	XNLI_es	Cross - lingual Natural Language Inference in Spanish	ES	acc	NLI
XNLI_eu	XNLI_eu	Cross - lingual Natural Language Inference in Basque	EU	acc	NLI, Textual Entailment
XQuAD_ca	XQuAD_ca	Cross - lingual Question Answering Dataset in Catalan	CA	f1	Extractive QA
XQuAD_es	XQuAD_es	Cross - lingual Question Answering Dataset in Spanish	ES	f1	Extractive QA
xStoryCloze_ca	xStoryCloze_ca	Narrative completion in Catalan	CA	acc	Reasoning
xStoryCloze_es	xStoryCloze_es	Narrative completion in Spanish	ES	acc	Reasoning
xStoryCloze_eu	xStoryCloze_eu	Narrative completion in Basque	EU	acc	Reasoning

📄 License

This model is under the llama3.1 license. By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the license, acceptable use policy and Meta’s privacy policy

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご