Files
awesome-llm-apps/rag_tutorials/agentic_rag_embedding_gemma

🔥 Agentic RAG with EmbeddingGemma

🎓 FREE Step-by-Step Tutorial

👉 Click here to follow our complete step-by-step tutorial and learn how to build this from scratch with detailed code walkthroughs, explanations, and best practices.

This Streamlit app demonstrates an agentic Retrieval-Augmented Generation (RAG) Agent using Google's EmbeddingGemma for embeddings and Llama 3.2 as the language model, all running locally via Ollama.

Features

  • Local AI Models: Uses EmbeddingGemma for vector embeddings and Llama 3.2 for text generation
  • PDF Knowledge Base: Dynamically add PDF URLs to build a knowledge base
  • Vector Search: Efficient similarity search using LanceDB
  • Interactive UI: Beautiful Streamlit interface for adding sources and querying
  • Streaming Responses: Real-time response generation with tool call visibility

How to Get Started?

  1. Clone the GitHub repository
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd awesome-llm-apps/rag_tutorials/agentic_rag_embedding_gemma
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Ensure Ollama is installed and running with the required models:

    • Pull the models: ollama pull embeddinggemma:latest and ollama pull llama3.2:latest
    • Start Ollama server if not running
  2. Run the Streamlit app:

streamlit run agentic_rag_embeddinggemma.py

(Note: The app file is in the root directory)

  1. Open your web browser to the URL provided (usually http://localhost:8501) to interact with the RAG agent.

How It Works?

  1. Knowledge Base Setup: Add PDF URLs in the sidebar to load and index documents.
  2. Embedding Generation: EmbeddingGemma creates vector embeddings for semantic search.
  3. Query Processing: User queries are embedded and searched against the knowledge base.
  4. Response Generation: Llama 3.2 generates answers based on retrieved context.
  5. Tool Integration: The agent uses search tools to fetch relevant information.

Requirements

  • Python 3.8+
  • Ollama installed and running
  • Required models: embeddinggemma:latest, llama3.2:latest

Technologies Used

  • Agno: Framework for building AI agents
  • Streamlit: Web app framework
  • LanceDB: Vector database
  • Ollama: Local LLM server
  • EmbeddingGemma: Google's embedding model
  • Llama 3.2: Meta's language model