refactor: Update qwen_local_rag_agent to use Agno v2.0 and enhance README

- Changed import path for OllamaEmbedder to reflect new Agno structure. - Switched from show_tool_calls to debug_mode for improved debugging experience.
2026-03-11 17:48:31 -05:00 · 2025-11-09 14:44:22 -08:00
parent 9bc6394fae
commit 013aa48bf5
3 changed files with 32 additions and 17 deletions
--- a/rag_tutorials/qwen_local_rag/README.md
+++ b/rag_tutorials/qwen_local_rag/README.md
@@ -1,6 +1,6 @@
 # 🐋 Qwen 3 Local RAG Reasoning Agent

-This RAG Application demonstrates how to build a powerful Retrieval-Augmented Generation (RAG) system using locally running Qwen 3 and Gemma 3 models via Ollama. It combines document processing, vector search, and web search capabilities to provide accurate, context-aware responses to user queries.
+This RAG Application demonstrates how to build a powerful Retrieval-Augmented Generation (RAG) system using locally running Qwen 3 and Gemma 3 models via Ollama. It combines document processing, vector search, and web search capabilities to provide accurate, context-aware responses to user queries. Built with Agno v2.0.

 ## Features

@@ -29,6 +29,11 @@ This RAG Application demonstrates how to build a powerful Retrieval-Augmented Ge

  - Qdrant vector database for efficient similarity search
  - Persistent storage of document embeddings
+- **🔧 Agno v2.0 Framework**:
+
+  - Uses Agno v2.0 Knowledge embedder system
+  - Debug mode for enhanced development experience
+  - Modern agent architecture with improved tool integration

 ## How to Get Started

@@ -36,8 +41,9 @@ This RAG Application demonstrates how to build a powerful Retrieval-Augmented Ge

 - [Ollama](https://ollama.ai/) installed locally
 - Python 3.8+
- Qdrant account (free tier available) for vector storage
+- Qdrant running locally (via Docker) for vector storage
 - Exa API key (optional, for web search capability)
+- Agno v2.0 installed

 ### Installation

@@ -58,9 +64,11 @@ pip install -r requirements.txt

 ```bash
 ollama pull qwen3:1.7b # Or any other model you want to use
-ollama pull snowflake-arctic-embed # Or any other model you want to use
+ollama pull snowflake-arctic-embed # For embeddings
 ```
-4. Run Qdrant locally through docker
+
+4. Run Qdrant locally through Docker:
+
 ```bash
 docker pull qdrant/qdrant

@@ -69,12 +77,11 @@ docker run -p 6333:6333 -p 6334:6334 \
    qdrant/qdrant
 ```

+5. Get your API keys (optional):

-4. Get your API keys:
-
-   - Exa API key (optional, for web search)
+   - Exa API key (for web search fallback capability)
   
-5. Run the application:
+6. Run the application:

 ```bash
 streamlit run qwen_local_rag_agent.py
@@ -87,28 +94,36 @@ streamlit run qwen_local_rag_agent.py
   - PDF files are processed using PyPDFLoader
   - Web content is extracted using WebBaseLoader
   - Documents are split into chunks with RecursiveCharacterTextSplitter
+   - Metadata is added to track source types and timestamps
+
 2. **Vector Database**:

-   - Document chunks are embedded using Ollama's embedding models
+   - Document chunks are embedded using Ollama's embedding models via Agno's OllamaEmbedder
   - Embeddings are stored in Qdrant vector database
-   - Similarity search retrieves relevant documents based on query
+   - Similarity search retrieves relevant documents based on query with configurable threshold
+
 3. **Query Processing**:

   - User queries are analyzed to determine the best information source
   - System checks document relevance using similarity threshold
-   - Falls back to web search if no relevant documents are found
+   - Falls back to web search if no relevant documents are found (when enabled)
+   - Supports forced web search mode via toggle
+
 4. **Response Generation**:

-   - Local LLM (Qwen/Gemma) generates responses based on retrieved context
+   - Local LLM (Qwen/Gemma/DeepSeek) generates responses based on retrieved context
+   - Agno agents use debug mode for enhanced visibility into tool calls
   - Sources are cited and displayed to the user
   - Web search results are clearly indicated when used
+   - Reasoning process is displayed for reasoning models

 ## Configuration Options

 - **Model Selection**: Choose between different Qwen, Gemma, and DeepSeek models
 - **RAG Mode**: Toggle between RAG-enabled and direct LLM interaction
- **Search Tuning**: Adjust similarity threshold for document retrieval
+- **Search Tuning**: Adjust similarity threshold (0.0-1.0) for document retrieval
 - **Web Search**: Enable/disable web search fallback and configure domain filtering
+- **Debug Mode**: Agents use debug mode by default for better visibility into tool calls and execution flow

 ## Use Cases

--- a/rag_tutorials/qwen_local_rag/qwen_local_rag_agent.py
+++ b/rag_tutorials/qwen_local_rag/qwen_local_rag_agent.py
@@ -13,7 +13,7 @@ from qdrant_client import QdrantClient
 from qdrant_client.models import Distance, VectorParams
 from langchain_core.embeddings import Embeddings
 from agno.tools.exa import ExaTools
-from agno.embedder.ollama import OllamaEmbedder
+from agno.knowledge.embedder.ollama import OllamaEmbedder


 class OllamaEmbedderr(Embeddings):
@@ -254,7 +254,7 @@ def get_web_search_agent() -> Agent:
        2. Compile and summarize the most relevant information
        3. Include sources in your response
        """,
-        show_tool_calls=True,
+        debug_mode=True,
        markdown=True,
    )

@@ -279,7 +279,7 @@ def get_rag_agent() -> Agent:
        
        Always maintain high accuracy and clarity in your responses.
        """,
-        show_tool_calls=True,
+        debug_mode=True,
        markdown=True,
    )

--- a/rag_tutorials/qwen_local_rag/requirements.txt
+++ b/rag_tutorials/qwen_local_rag/requirements.txt
@@ -1,4 +1,4 @@
-agno
+agno>=2.2.10
 pypdf
 exa
 qdrant-client