🚀 Loquace-7B-Mistral v0.1
Loquace is an Italian-speaking, instruction-finetuned Large Language Model. It aims to democratize AI and LLM in the Italian landscape, allowing users to train on their own datasets with minimal resources.
✨ Features
- Italian Instruction Following: It excels at following instructions in Italian.
- Prompt Engineering Responsiveness: Responds well to prompt-engineering techniques.
- RAG Compatibility: Performs effectively in a RAG (Retrieval Augmented Generation) setup.
- Cost-Effective Training: Trained on the Loquace-102K dataset using QLoRa and Mistral-7B-Instruct as the base. The training took only 4 hours on a 3090 GPU on Genesis Cloud, costing a little over 1 euro.
- Truly Open Source: The model, dataset, and code to replicate the results are fully open-sourced.
- Garage Creation: Developed in a garage in southern Italy.
📦 Installation
The related code for fine-tuning can be found at:
https://github.com/cosimoiaia/Loquace
The 8-bit quantized GGUF version for CPU inference of Loquace can be found here.
Here is a list of clients and libraries known to support GGUF:
- llama.cpp: The source project for GGUF, offering a CLI and a server option.
- text-generation-webui: A widely used web UI with numerous features and powerful extensions, supporting GPU acceleration.
- KoboldCpp: A fully-featured web UI with GPU acceleration across all platforms and GPU architectures, especially suitable for storytelling.
- LM Studio: An easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
- LoLLMS Web UI: A great web UI with many interesting and unique features, including a full model library for easy model selection.
- Faraday.dev: An attractive and user-friendly character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
- ctransformers: A Python library with GPU acceleration, LangChain support, and an OpenAI-compatible AI server.
- llama-cpp-python: A Python library with GPU acceleration, LangChain support, and an OpenAI-compatible API server.
- candle: A Rust ML framework focusing on performance, including GPU support, and ease of use.
Previous releases of the Loquace family:
The Loquace family began in early 2023 to demonstrate the feasibility of fine-tuning a Large Language Model in a different language. You can find other family members here:
- https://huggingface.co/cosimoiaia/Loquace-70m - Based on pythia-70m
- https://huggingface.co/cosimoiaia/Loquace-410m - Based on pythia-410m
- https://huggingface.co/cosimoiaia/Loquace-7B - Based on Falcon-7B
- https://huggingface.co/cosimoiaia/Loquace-12B - Based on pythia-12B
- https://huggingface.co/cosimoiaia/Loquace-20B - Based on gpt-neox-20B
These models, despite not yielding excellent results, prove that cost-effective fine-tuning is possible in a short time, with a relatively raw dataset, and extremely low resources.
💻 Usage Examples
Basic Usage
from transformers import MistralForCausalLM, AutoTokenizer
def generate_prompt(instruction):
prompt = f"""### Instruction: {instruction}
### Response:
"""
return prompt
model_name = "cosimoiaia/Loquace-7B-Mistral"
model = MistralForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16
)
model.config.use_cache = True
tokenizer = AutoTokenizer.from_pretrained(model_name, add_eos_token=False)
prompt = generate_prompt("Chi era Dante Alighieri?")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, do_sample = True, num_beams = 2, top_k=50, top_p= 0.95, max_new_tokens=2046, early_stopping = True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("Response:")[1].strip())
Advanced Usage
> Pretendi di essere Dante Alighieri e raccontami la tua vita.
Sono Dante Alighieri, un poeta e scrittore italiano del XIV secolo. Nacqui a Firenze nel 1265 e ho trascorso gran parte della mia vita viaggiando in Europa e scrivendo poesie. Ho anche scritto la Divina Commedia, una delle opere letterarie più famose di tutti i tempi.
> Quanto fa 7 x 14 - 98 ?
7 x 14 = 98, quindi la risposta è 0.
> La radice quadrata di 144 ?
La radice quadrata di 144 è 12.
> Inventa una ricetta pugliese con la polenta.
La ricetta è chiamata "Polenta al burro di arachidi e formaggio". Per prepararla, inizia bollendo la polenta secondo le istruzioni sulla confezione. Mentre la polenta si cucina, mescola insieme 1/2 tazza di burro di arachidi, 1/4 di tazza di formaggio grattugiato e 1/4 di tazza di parmigiano grattugiato. Una volta che la polenta è cotta, versala su una teglia da forno e distribuire il composto di burro di arachidi e formaggio sopra la polenta. Metti in forno a 350 gradi per 15 minuti o fino a quando la parte superiore è dorata. Servi caldo con un'insalata di pomodoro e basilico fresco.
📚 Documentation
Limitations
- Loquace-7B may not handle complex or nuanced queries well and may struggle with ambiguous or poorly formatted inputs.
- The model may generate factually incorrect or nonsensical responses. It should be used with caution, and outputs should be carefully verified.
📄 License
This project is licensed under the Apache-2.0 license.
Model Author
Cosimo Iaia cosimo.iaia@gmail.com
Model Card for Loquace-7B-Mistral (Italian Version translated from Loquace)