Files
awesome-llm-apps/rag_tutorials/qwen_local_rag
Shubhamsaboo 013aa48bf5 refactor: Update qwen_local_rag_agent to use Agno v2.0 and enhance README
- Changed import path for OllamaEmbedder to reflect new Agno structure.
- Switched from show_tool_calls to debug_mode for improved debugging experience.
2025-11-09 14:44:22 -08:00
..

🐋 Qwen 3 Local RAG Reasoning Agent

This RAG Application demonstrates how to build a powerful Retrieval-Augmented Generation (RAG) system using locally running Qwen 3 and Gemma 3 models via Ollama. It combines document processing, vector search, and web search capabilities to provide accurate, context-aware responses to user queries. Built with Agno v2.0.

Features

  • 🧠 Multiple Local LLM Options:

    • Qwen3 (1.7b, 8b) - Alibaba's latest language models
    • Gemma3 (1b, 4b) - Google's efficient language models with multimodal capabilities
    • DeepSeek (1.5b) - Alternative model option
  • 📚 Comprehensive RAG System:

    • Upload and process PDF documents
    • Extract content from web URLs
    • Intelligent chunking and embedding
    • Similarity search with adjustable threshold
  • 🌐 Web Search Integration:

    • Fallback to web search when document knowledge is insufficient
    • Configurable domain filtering
    • Source attribution in responses
  • 🔄 Flexible Operation Modes:

    • Toggle between RAG and direct LLM interaction
    • Force web search when needed
    • Adjust similarity thresholds for document retrieval
  • 💾 Vector Database Integration:

    • Qdrant vector database for efficient similarity search
    • Persistent storage of document embeddings
  • 🔧 Agno v2.0 Framework:

    • Uses Agno v2.0 Knowledge embedder system
    • Debug mode for enhanced development experience
    • Modern agent architecture with improved tool integration

How to Get Started

Prerequisites

  • Ollama installed locally
  • Python 3.8+
  • Qdrant running locally (via Docker) for vector storage
  • Exa API key (optional, for web search capability)
  • Agno v2.0 installed

Installation

  1. Clone the GitHub repository
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd rag_tutorials/qwen_local_rag
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Pull the required models using Ollama:
ollama pull qwen3:1.7b # Or any other model you want to use
ollama pull snowflake-arctic-embed # For embeddings
  1. Run Qdrant locally through Docker:
docker pull qdrant/qdrant

docker run -p 6333:6333 -p 6334:6334 \
    -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
    qdrant/qdrant
  1. Get your API keys (optional):

    • Exa API key (for web search fallback capability)
  2. Run the application:

streamlit run qwen_local_rag_agent.py

How It Works

  1. Document Processing:

    • PDF files are processed using PyPDFLoader
    • Web content is extracted using WebBaseLoader
    • Documents are split into chunks with RecursiveCharacterTextSplitter
    • Metadata is added to track source types and timestamps
  2. Vector Database:

    • Document chunks are embedded using Ollama's embedding models via Agno's OllamaEmbedder
    • Embeddings are stored in Qdrant vector database
    • Similarity search retrieves relevant documents based on query with configurable threshold
  3. Query Processing:

    • User queries are analyzed to determine the best information source
    • System checks document relevance using similarity threshold
    • Falls back to web search if no relevant documents are found (when enabled)
    • Supports forced web search mode via toggle
  4. Response Generation:

    • Local LLM (Qwen/Gemma/DeepSeek) generates responses based on retrieved context
    • Agno agents use debug mode for enhanced visibility into tool calls
    • Sources are cited and displayed to the user
    • Web search results are clearly indicated when used
    • Reasoning process is displayed for reasoning models

Configuration Options

  • Model Selection: Choose between different Qwen, Gemma, and DeepSeek models
  • RAG Mode: Toggle between RAG-enabled and direct LLM interaction
  • Search Tuning: Adjust similarity threshold (0.0-1.0) for document retrieval
  • Web Search: Enable/disable web search fallback and configure domain filtering
  • Debug Mode: Agents use debug mode by default for better visibility into tool calls and execution flow

Use Cases

  • Document Q&A: Ask questions about your uploaded documents
  • Research Assistant: Combine document knowledge with web search
  • Local Privacy: Process sensitive documents without sending data to external APIs
  • Offline Operation: Run advanced AI capabilities with limited or no internet access

Requirements

See requirements.txt for the complete list of dependencies.