diff --git a/rag_tutorials/qwen_local_rag/README.md b/rag_tutorials/qwen_local_rag/README.md index f715a26..ca4bb47 100644 --- a/rag_tutorials/qwen_local_rag/README.md +++ b/rag_tutorials/qwen_local_rag/README.md @@ -1,6 +1,6 @@ # 🐋 Qwen 3 Local RAG Reasoning Agent -This RAG Application demonstrates how to build a powerful Retrieval-Augmented Generation (RAG) system using locally running Qwen 3 and Gemma 3 models via Ollama. It combines document processing, vector search, and web search capabilities to provide accurate, context-aware responses to user queries. +This RAG Application demonstrates how to build a powerful Retrieval-Augmented Generation (RAG) system using locally running Qwen 3 and Gemma 3 models via Ollama. It combines document processing, vector search, and web search capabilities to provide accurate, context-aware responses to user queries. Built with Agno v2.0. ## Features @@ -29,6 +29,11 @@ This RAG Application demonstrates how to build a powerful Retrieval-Augmented Ge - Qdrant vector database for efficient similarity search - Persistent storage of document embeddings +- **🔧 Agno v2.0 Framework**: + + - Uses Agno v2.0 Knowledge embedder system + - Debug mode for enhanced development experience + - Modern agent architecture with improved tool integration ## How to Get Started @@ -36,8 +41,9 @@ This RAG Application demonstrates how to build a powerful Retrieval-Augmented Ge - [Ollama](https://ollama.ai/) installed locally - Python 3.8+ -- Qdrant account (free tier available) for vector storage +- Qdrant running locally (via Docker) for vector storage - Exa API key (optional, for web search capability) +- Agno v2.0 installed ### Installation @@ -58,9 +64,11 @@ pip install -r requirements.txt ```bash ollama pull qwen3:1.7b # Or any other model you want to use -ollama pull snowflake-arctic-embed # Or any other model you want to use +ollama pull snowflake-arctic-embed # For embeddings ``` -4. Run Qdrant locally through docker + +4. Run Qdrant locally through Docker: + ```bash docker pull qdrant/qdrant @@ -69,12 +77,11 @@ docker run -p 6333:6333 -p 6334:6334 \ qdrant/qdrant ``` +5. Get your API keys (optional): -4. Get your API keys: - - - Exa API key (optional, for web search) + - Exa API key (for web search fallback capability) -5. Run the application: +6. Run the application: ```bash streamlit run qwen_local_rag_agent.py @@ -87,28 +94,36 @@ streamlit run qwen_local_rag_agent.py - PDF files are processed using PyPDFLoader - Web content is extracted using WebBaseLoader - Documents are split into chunks with RecursiveCharacterTextSplitter + - Metadata is added to track source types and timestamps + 2. **Vector Database**: - - Document chunks are embedded using Ollama's embedding models + - Document chunks are embedded using Ollama's embedding models via Agno's OllamaEmbedder - Embeddings are stored in Qdrant vector database - - Similarity search retrieves relevant documents based on query + - Similarity search retrieves relevant documents based on query with configurable threshold + 3. **Query Processing**: - User queries are analyzed to determine the best information source - System checks document relevance using similarity threshold - - Falls back to web search if no relevant documents are found + - Falls back to web search if no relevant documents are found (when enabled) + - Supports forced web search mode via toggle + 4. **Response Generation**: - - Local LLM (Qwen/Gemma) generates responses based on retrieved context + - Local LLM (Qwen/Gemma/DeepSeek) generates responses based on retrieved context + - Agno agents use debug mode for enhanced visibility into tool calls - Sources are cited and displayed to the user - Web search results are clearly indicated when used + - Reasoning process is displayed for reasoning models ## Configuration Options - **Model Selection**: Choose between different Qwen, Gemma, and DeepSeek models - **RAG Mode**: Toggle between RAG-enabled and direct LLM interaction -- **Search Tuning**: Adjust similarity threshold for document retrieval +- **Search Tuning**: Adjust similarity threshold (0.0-1.0) for document retrieval - **Web Search**: Enable/disable web search fallback and configure domain filtering +- **Debug Mode**: Agents use debug mode by default for better visibility into tool calls and execution flow ## Use Cases diff --git a/rag_tutorials/qwen_local_rag/qwen_local_rag_agent.py b/rag_tutorials/qwen_local_rag/qwen_local_rag_agent.py index 44403ef..098b2d0 100644 --- a/rag_tutorials/qwen_local_rag/qwen_local_rag_agent.py +++ b/rag_tutorials/qwen_local_rag/qwen_local_rag_agent.py @@ -13,7 +13,7 @@ from qdrant_client import QdrantClient from qdrant_client.models import Distance, VectorParams from langchain_core.embeddings import Embeddings from agno.tools.exa import ExaTools -from agno.embedder.ollama import OllamaEmbedder +from agno.knowledge.embedder.ollama import OllamaEmbedder class OllamaEmbedderr(Embeddings): @@ -254,7 +254,7 @@ def get_web_search_agent() -> Agent: 2. Compile and summarize the most relevant information 3. Include sources in your response """, - show_tool_calls=True, + debug_mode=True, markdown=True, ) @@ -279,7 +279,7 @@ def get_rag_agent() -> Agent: Always maintain high accuracy and clarity in your responses. """, - show_tool_calls=True, + debug_mode=True, markdown=True, ) diff --git a/rag_tutorials/qwen_local_rag/requirements.txt b/rag_tutorials/qwen_local_rag/requirements.txt index f988a22..6489382 100644 --- a/rag_tutorials/qwen_local_rag/requirements.txt +++ b/rag_tutorials/qwen_local_rag/requirements.txt @@ -1,4 +1,4 @@ -agno +agno>=2.2.10 pypdf exa qdrant-client