Nano-Mistral Open Source Large Language Model - Free Deployment, Efficiently Process English Texts

Nano Mistral

Developed by crumb

A large language model based on the Mistral architecture, trained on the Pile dataset, supporting English text processing

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #English Q&A Generation #Large Language Model #Zero-shot Learning

Downloads 1,855

Release Time : 3/8/2024

Model Overview

This model is a large language model based on the Mistral architecture, focusing on English text processing tasks and suitable for various natural language processing scenarios

Model Features

Based on Mistral Architecture

Utilizes the efficient Mistral architecture design, providing excellent text processing capabilities

English-Optimized

Specifically trained and optimized for English text

Apache 2.0 License

Adopts the permissive Apache 2.0 license, allowing commercial use

Model Capabilities

Text generation

Q&A systems

Text comprehension

Language modeling

Use Cases

Content Creation

Automatic Article Generation

Generates coherent English articles based on prompts

Intelligent Q&A

Knowledge Q&A System

Answers various knowledge-based questions from users

Text Analysis

Document Summarization

Automatically generates concise summaries of long documents

🚀 Model Card for Model ID

This is a model card for a 🤗 transformers model pushed on the Hub. It provides details about the model, including its uses, training, evaluation, and more.

🚀 Quick Start

Use the code below to get started with the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")

inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
outputs = tokenizer.batch_decode(outputs)
for i in outputs:
  print(i)

✨ Features

General Web Text Completions: Capable of general web text completions with extremely low resource use.
Mistral Model Type: Based on the Mistral model type.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")

inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
outputs = tokenizer.batch_decode(outputs)
for i in outputs:
  print(i)

📚 Documentation

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: me
Model type: Mistral
Language(s) (NLP): en
License: apache

Uses

General Use: general web text completions at extremely low resource use.
Out-of-Scope Use: not an instruct model.

Bias, Risks, and Limitations

The model is trained on web text, though filtered, there are no guarantees that there's no toxic stuff in there.

Training Details

Training Data

crumb/askmistral-pile-2-15

Training Procedure

Parameter	Value
Context Length	2048
Batch Size	128
Learning Rate	6e-4
Scheduler	One-Cycle
Adam eps	1e-8
Adam beta1	0.9
Adam beta2	0.95
Weight Decay	0.1
Max Grad Norm	1.0
Optimizer	adamw_torch
Tokens	3,401,640,960

Training Hyperparameters

Training regime: bf16 non-mixed precision

Evaluation

Testing Data, Factors & Metrics

Testing Data

held out set of crumb/askmistral-pile-2-15

Metrics

open llm leaderboard eval datasets and settings

Results

OpenLLM Leaderboard Mean Score + Stderr: (29.30, 0.42)

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_challenge	1	none	25	acc	0.1843	±	0.0113
		none	25	acc_norm	0.2167	±	0.0120
truthfulqa_mc2	2	none	0	acc	0.4719	±	0.0156
winogrande	1	none	5	acc	0.517	±	0.014
hellaswag	1	none	10	acc	0.2803	±	0.0045
		none	10	acc_norm	0.2886	±	0.0045
gsm8k	3	strict-match	5	exact_match	0.0008	±	0.0008
		flexible-extract	5	exact_match	0.0099	±	0.0027

MMLU

value, stderr = (0.253980701754386, 0.004428598058450528)

Tasks	Filter	n-shot	Metric	Value		Stderr
world_religions	none	5	acc	0.2222	±	0.0319
virology	none	5	acc	0.2711	±	0.0346
us_foreign_policy	none	5	acc	0.3300	±	0.0473
sociology	none	5	acc	0.2388	±	0.0301
security_studies	none	5	acc	0.2367	±	0.0272
public_relations	none	5	acc	0.2273	±	0.0401
professional_psychology	none	5	acc	0.2484	±	0.0175
professional_medicine	none	5	acc	0.4596	±	0.0303
professional_law	none	5	acc	0.2464	±	0.0110
professional_accounting	none	5	acc	0.2021	±	0.0240
prehistory	none	5	acc	0.2130	±	0.0228
philosophy	none	5	acc	0.2219	±	0.0236
nutrition	none	5	acc	0.2157	±	0.0236
moral_scenarios	none	5	acc	0.2380	±	0.0142
moral_disputes	none	5	acc	0.2486	±	0.0233
miscellaneous	none	5	acc	0.2516	±	0.0155
medical_genetics	none	5	acc	0.3000	±	0.0461
marketing	none	5	acc	0.2265	±	0.0274
management	none	5	acc	0.1748	±	0.0376
machine_learning	none	5	acc	0.3125	±	0.0440
logical_fallacies	none	5	acc	0.2393	±	0.0335
jurisprudence	none	5	acc	0.2315	±	0.0408
international_law	none	5	acc	0.3140	±	0.0424
human_sexuality	none	5	acc	0.2519	±	0.0381
human_aging	none	5	acc	0.3049	±	0.0309
high_school_world_history	none	5	acc	0.2658	±	0.0288
high_school_us_history	none	5	acc	0.2451	±	0.0302
high_school_statistics	none	5	acc	0.4722	±	0.0340
high_school_psychology	none	5	acc	0.1963	±	0.0170
high_school_physics	none	5	acc	0.3046	±	0.0376
high_school_microeconomics	none	5	acc	0.2773	±	0.0291
high_school_mathematics	none	5	acc	0.2667	±	0.0270
high_school_macroeconomics	none	5	acc	0.2667	±	0.0224
high_school_government_and_politics	none	5	acc	0.2591	±	0.0316
high_school_geography	none	5	acc	0.2424	±	0.0305
high_school_european_history	none	5	acc	0.2242	±	0.0326
high_school_computer_science	none	5	acc	0.2800	±	0.0451
high_school_chemistry	none	5	acc	0.2857	±	0.0318
high_school_biology	none	5	acc	0.3129	±	0.0264
global_facts	none	5	acc	0.1500	±	0.0359
formal_logic	none	5	acc	0.1905	±	0.0351
elementary_mathematics	none	5	acc	0.2513	±	0.0223
electrical_engineering	none	5	acc	0.2759	±	0.0372
econometrics	none	5	acc	0.2456	±	0.0405
conceptual_physics	none	5	acc	0.2638	±	0.0288
computer_security	none	5	acc	0.1800	±	0.0386
college_physics	none	5	acc	0.2549	±	0.0434
college_medicine	none	5	acc	0.2023	±	0.0306
college_mathematics	none	5	acc	0.2900	±	0.0456
college_computer_science	none	5	acc	0.2700	±	0.0446
college_chemistry	none	5	acc	0.2500	±	0.0435
college_biology	none	5	acc	0.2222	±	0.0348
clinical_knowledge	none	5	acc	0.2377	±	0.0262
business_ethics	none	5	acc	0.2100	±	0.0409
astronomy	none	5	acc	0.1776	±	0.0311
anatomy	none	5	acc	0.2593	±	0.0379
abstract_algebra	none	5	acc	0.2200	±	0.0416

🔧 Technical Details

Model Architecture and Objective

mistral, causal language modelling

Compute Infrastructure

Hardware

lambda vector 2xA6000

Software

huggingface transformers / pytorch / custom trainer

🌱 Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: A6000
Hours used: 34.74
Cloud Provider: n/a
Compute Region iowa
Carbon Emitted: 4.5kg CO2eq.

📄 License

apache

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご