mirror of https://github.com/Shubhamsaboo/awesome-llm-apps.git synced 2026-03-08 23:13:56 -05:00

Files

Raoul Scalise 5d4730e1e3 Add docstring

Modified files:
- rag_tutorials/autonomous_rag/autorag.py
- rag_tutorials/llama3.1_local_rag/llama3.1_local_rag.py
- rag_tutorials/local_hybrid_search_rag/local_main.py

2025-02-16 19:06:55 +01:00

local_main.py

Add docstring

2025-02-16 19:06:55 +01:00

README.md

chore: update all the README.md about how to go to the desired project directory

2024-12-16 20:21:17 +08:00

requirements.txt

local llama version of the hybrid search RAG system

2024-12-06 02:00:06 +05:30

README.md

🖥️ Local RAG App with Hybrid Search

A powerful document Q&A application that leverages Hybrid Search (RAG) and local LLMs for comprehensive answers. Built with RAGLite for robust document processing and retrieval, and Streamlit for an intuitive chat interface, this system combines document-specific knowledge with local LLM capabilities to deliver accurate and contextual responses.

Demo:

https://github.com/user-attachments/assets/375da089-1ab9-4bf4-b6f3-733f44e47403

Quick Start

For immediate testing, use these tested model configurations:

# LLM Model
bartowski/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct-Q4_K_M.gguf@4096

# Embedder Model
lm-kit/bge-m3-gguf/bge-m3-Q4_K_M.gguf@1024

These models offer a good balance of performance and resource usage, and have been verified to work well together even on a MacBook Air M2 with 8GB RAM.

Features

Local LLM Integration:
- Uses llama-cpp-python models for local inference
- Supports various quantization formats (Q4_K_M recommended)
- Configurable context window sizes
Document Processing:
- PDF document upload and processing
- Automatic text chunking and embedding
- Hybrid search combining semantic and keyword matching
- Reranking for better context selection
Multi-Model Integration:
- Local LLM for text generation (e.g., Llama-3.2-3B-Instruct)
- Local embeddings using BGE models
- FlashRank for local reranking

Prerequisites

Install spaCy Model:

pip install https://github.com/explosion/spacy-models/releases/download/xx_sent_ud_sm-3.7.0/xx_sent_ud_sm-3.7.0-py3-none-any.whl

Install Accelerated llama-cpp-python (Optional but recommended):

# Configure installation variables
LLAMA_CPP_PYTHON_VERSION=0.3.2
PYTHON_VERSION=310 # 3.10, 3.11, 3.12
ACCELERATOR=metal  # For Mac
# ACCELERATOR=cu121  # For NVIDIA GPU
PLATFORM=macosx_11_0_arm64  # For Mac
# PLATFORM=linux_x86_64  # For Linux
# PLATFORM=win_amd64  # For Windows

# Install accelerated version
pip install "https://github.com/abetlen/llama-cpp-python/releases/download/v$LLAMA_CPP_PYTHON_VERSION-$ACCELERATOR/llama_cpp_python-$LLAMA_CPP_PYTHON_VERSION-cp$PYTHON_VERSION-cp$PYTHON_VERSION-$PLATFORM.whl"

Install Dependencies:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd awesome-llm-apps/rag_tutorials/local_hybrid_search_rag
pip install -r requirements.txt

Model Setup

RAGLite extends LiteLLM with support for llama.cpp models using llama-cpp-python. To select a llama.cpp model (e.g., from bartowski's collection), use a model identifier of the form "llama-cpp-python/<hugging_face_repo_id>/@<n_ctx>", where n_ctx is an optional parameter that specifies the context size of the model.

LLM Model Path Format:

llama-cpp-python/<repo>/<model>/<filename>@<context_length>

Example:

bartowski/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct-Q4_K_M.gguf@4096

Embedder Model Path Format:

llama-cpp-python/<repo>/<model>/<filename>@<dimension>

Example:

lm-kit/bge-m3-gguf/bge-m3-Q4_K_M.gguf@1024

Database Setup

The application supports multiple database backends:

PostgreSQL (Recommended):
- Create a free serverless PostgreSQL database at Neon in a few clicks
- Get instant provisioning and scale-to-zero capability
- Connection string format: postgresql://user:pass@ep-xyz.region.aws.neon.tech/dbname

How to Run

Start the Application:
```
streamlit run local_main.py
```
Configure the Application:
- Enter LLM model path
- Enter embedder model path
- Set database URL
- Click "Save Configuration"
Upload Documents:
- Upload PDF files through the interface
- Wait for processing completion
Start Chatting:
- Ask questions about your documents
- Get responses using local LLM
- Fallback to general knowledge when needed

Notes

Context window size of 4096 is recommended for most use cases
Q4_K_M quantization offers good balance of speed and quality
BGE-M3 embedder with 1024 dimensions is optimal
Local models require sufficient RAM and CPU/GPU resources
Metal acceleration available for Mac, CUDA for NVIDIA GPUs

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.