mirror of
https://github.com/Shubhamsaboo/awesome-llm-apps.git
synced 2026-03-09 07:25:00 -05:00
- Changed import path for OllamaEmbedder to reflect new Agno structure. - Switched from show_tool_calls to debug_mode for improved debugging experience.
🐋 Qwen 3 Local RAG Reasoning Agent
This RAG Application demonstrates how to build a powerful Retrieval-Augmented Generation (RAG) system using locally running Qwen 3 and Gemma 3 models via Ollama. It combines document processing, vector search, and web search capabilities to provide accurate, context-aware responses to user queries. Built with Agno v2.0.
Features
-
🧠 Multiple Local LLM Options:
- Qwen3 (1.7b, 8b) - Alibaba's latest language models
- Gemma3 (1b, 4b) - Google's efficient language models with multimodal capabilities
- DeepSeek (1.5b) - Alternative model option
-
📚 Comprehensive RAG System:
- Upload and process PDF documents
- Extract content from web URLs
- Intelligent chunking and embedding
- Similarity search with adjustable threshold
-
🌐 Web Search Integration:
- Fallback to web search when document knowledge is insufficient
- Configurable domain filtering
- Source attribution in responses
-
🔄 Flexible Operation Modes:
- Toggle between RAG and direct LLM interaction
- Force web search when needed
- Adjust similarity thresholds for document retrieval
-
💾 Vector Database Integration:
- Qdrant vector database for efficient similarity search
- Persistent storage of document embeddings
-
🔧 Agno v2.0 Framework:
- Uses Agno v2.0 Knowledge embedder system
- Debug mode for enhanced development experience
- Modern agent architecture with improved tool integration
How to Get Started
Prerequisites
- Ollama installed locally
- Python 3.8+
- Qdrant running locally (via Docker) for vector storage
- Exa API key (optional, for web search capability)
- Agno v2.0 installed
Installation
- Clone the GitHub repository
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd rag_tutorials/qwen_local_rag
- Install the required dependencies:
pip install -r requirements.txt
- Pull the required models using Ollama:
ollama pull qwen3:1.7b # Or any other model you want to use
ollama pull snowflake-arctic-embed # For embeddings
- Run Qdrant locally through Docker:
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
qdrant/qdrant
-
Get your API keys (optional):
- Exa API key (for web search fallback capability)
-
Run the application:
streamlit run qwen_local_rag_agent.py
How It Works
-
Document Processing:
- PDF files are processed using PyPDFLoader
- Web content is extracted using WebBaseLoader
- Documents are split into chunks with RecursiveCharacterTextSplitter
- Metadata is added to track source types and timestamps
-
Vector Database:
- Document chunks are embedded using Ollama's embedding models via Agno's OllamaEmbedder
- Embeddings are stored in Qdrant vector database
- Similarity search retrieves relevant documents based on query with configurable threshold
-
Query Processing:
- User queries are analyzed to determine the best information source
- System checks document relevance using similarity threshold
- Falls back to web search if no relevant documents are found (when enabled)
- Supports forced web search mode via toggle
-
Response Generation:
- Local LLM (Qwen/Gemma/DeepSeek) generates responses based on retrieved context
- Agno agents use debug mode for enhanced visibility into tool calls
- Sources are cited and displayed to the user
- Web search results are clearly indicated when used
- Reasoning process is displayed for reasoning models
Configuration Options
- Model Selection: Choose between different Qwen, Gemma, and DeepSeek models
- RAG Mode: Toggle between RAG-enabled and direct LLM interaction
- Search Tuning: Adjust similarity threshold (0.0-1.0) for document retrieval
- Web Search: Enable/disable web search fallback and configure domain filtering
- Debug Mode: Agents use debug mode by default for better visibility into tool calls and execution flow
Use Cases
- Document Q&A: Ask questions about your uploaded documents
- Research Assistant: Combine document knowledge with web search
- Local Privacy: Process sensitive documents without sending data to external APIs
- Offline Operation: Run advanced AI capabilities with limited or no internet access
Requirements
See requirements.txt for the complete list of dependencies.