🚀 Sentence Transformers for RAG and More
This README provides insights into testing various models with ALLM (AnythingLLM) using LM - Studio as a server, and offers guidance on using embedders for Retrieval Augmented Generation (RAG). It also includes tips on document preparation, system prompts, and more.
🚀 Quick Start
All models have been tested with ALLM using LM - Studio as the server. They should also work with Ollama. The setup for local documents is almost the same. GPT4All has only one model (nomic), and koboldcpp is under development.
⚠️ Important Note
Sometimes, the results are more accurate when the “chat with document only” option is used. Also, an embedder is just a part of a good RAG system.
✨ Features
Model Impressions
Some models like nomic - embed - text (up to 2048t context length), mxbai - embed - large, mug - b - 1.6, snowflake - arctic - embed - l - v2.0 (up to 8192t context length), Ger - RAG - BGE - M3 (german, up to 8192t context length), german - roberta, and bge - m3 (up to 8192t context length) work well. Other models' performance may vary.
Similarity of Embedders
With the same settings, these embedders can find 6 - 7 similar snippets out of 10 from a book, indicating that only 3 - 4 snippets are different.
💻 Usage Examples
Using with Large Context
Set your (Max Tokens) context - length to 16000t for the main - model, set your embedder - model (Max Embedding Chunk Length) to 1024t, and set (Max Context Snippets) to 14. In ALLM, also set (Text splitting & Chunking Preferences - Text Chunk Size) to 1024 character parts and (Search Preference) to "accuracy".
Understanding Embedding and Search
When you ask a question about a document, the system searches for keywords or similar semantic terms. If it finds relevant terms, it cuts out a 1024 - token text snippet around those terms for the answer.
💡 Usage Tip
- If you expect multiple search results in your docs, try 16 or more snippets. If you expect only 2, don't use more.
- Using a chunk - length of ~1024t gives more content, while ~256t gives more facts but takes longer due to more chunks.
📚 Documentation
Main Model Importance
The main model is crucial, especially in handling context length. Some models may not perform well even with a relatively small input compared to well - developed ones.
System Prompts
System prompts can significantly influence the output. Here are some examples:
- "You are a helpful assistant who provides an overview of ... under the aspects of ... . You use attached excerpts from the collection to generate your answers! Weight each individual excerpt in order, with the most important excerpts at the top and the less important ones further down. The context of the entire article should not be given too much weight. Answer the user's question! After your answer, briefly explain why you included excerpts (1 to X) in your response and justify briefly if you considered some of them unimportant!"
- "You are an imaginative storyteller who crafts compelling narratives with depth, creativity, and coherence. Your goal is to develop rich, engaging stories that captivate readers, staying true to the themes, tone, and style appropriate for the given prompt. You use attached excerpts from the collection to generate your answers! When generating stories, ensure the coherence in characters, setting, and plot progression. Be creative and introduce imaginative twists and unique perspectives."
- "You are are a warm and engaging companion who loves to talk about cooking, recipes and the joy of food. Your aim is to share delicious recipes, cooking tips and the stories behind different cultures in a personal, welcoming and knowledgeable way."
Document Preparation
Prepare your DOC/PDF documents carefully. Bad input leads to bad output. You can use Python - based pdf - parsers like pdfplumber, fitz/PyMuPDF, and Camelot for simple txt/tables converting.
Indexing Option
For fast search on thousands of PDFs, you can use Jabref (https://github.com/JabRef/jabref/tree/v6.0 - alpha?tab=readme - ov - file) or docfetcher (https://docfetcher.sourceforge.io/en/index.html).
📄 License
All licenses and terms of use go to the original author.
List of Models
- avemio/German - RAG - BGE - M3 - MERGED - x - SNOWFLAKE - ARCTIC - HESSIAN - AI (German, English)
- maidalun1020/bce - embedding - base_v1 (English and Chinese)
- maidalun1020/bce - reranker - base_v1 (English, Chinese, Japanese and Korean)
- BAAI/bge - reranker - v2 - m3 (English and Chinese)
- BAAI/bge - reranker - v2 - gemma (English and Chinese)
- BAAI/bge - m3 (English and Chinese)
- avsolatorio/GIST - large - Embedding - v0 (English)
- ibm - granite/granite - embedding - 278m - multilingual (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese)
- ibm - granite/granite - embedding - 125m - english
- Labib11/MUG - B - 1.6 (?)
- mixedbread - ai/mxbai - embed - large - v1 (multi)
- nomic - ai/nomic - embed - text - v1.5 (English, multi)
- Snowflake/snowflake - arctic - embed - l - v2.0 (English, multi)
- intfloat/multilingual - e5 - large - instruct (100 languages)
- T - Systems - onsite/german - roberta - sentence - transformer - v2
- mixedbread - ai/mxbai - embed - 2d - large - v1
- jinaai/jina - embeddings - v2 - base - en