awesome-llm-apps

👉 Click here to follow our complete step-by-step tutorial and learn how to build this from scratch with detailed code walkthroughs, explanations, and best practices.

This Streamlit app demonstrates an agentic Retrieval-Augmented Generation (RAG) Agent using Google's EmbeddingGemma for embeddings and Llama 3.2 as the language model, all running locally via Ollama.

Features

Local AI Models: Uses EmbeddingGemma for vector embeddings and Llama 3.2 for text generation
PDF Knowledge Base: Dynamically add PDF URLs to build a knowledge base
Vector Search: Efficient similarity search using LanceDB
Interactive UI: Beautiful Streamlit interface for adding sources and querying
Streaming Responses: Real-time response generation with tool call visibility

How to Get Started?

Clone the GitHub repository

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd awesome-llm-apps/rag_tutorials/agentic_rag_embedding_gemma

Install the required dependencies:

pip install -r requirements.txt

Ensure Ollama is installed and running with the required models:
- Pull the models: ollama pull embeddinggemma:latest and ollama pull llama3.2:latest
- Start Ollama server if not running
Run the Streamlit app:

streamlit run agentic_rag_embeddinggemma.py

(Note: The app file is in the root directory)

Open your web browser to the URL provided (usually http://localhost:8501) to interact with the RAG agent.

How It Works?

Knowledge Base Setup: Add PDF URLs in the sidebar to load and index documents.
Embedding Generation: EmbeddingGemma creates vector embeddings for semantic search.
Query Processing: User queries are embedded and searched against the knowledge base.
Response Generation: Llama 3.2 generates answers based on retrieved context.
Tool Integration: The agent uses search tools to fetch relevant information.

Requirements

Python 3.8+
Ollama installed and running
Required models: embeddinggemma:latest, llama3.2:latest

Technologies Used

Agno: Framework for building AI agents
Streamlit: Web app framework
LanceDB: Vector database
Ollama: Local LLM server
EmbeddingGemma: Google's embedding model
Llama 3.2: Meta's language model