# πŸ” Knowledge Graph RAG with Verifiable Citations A Streamlit application demonstrating how **Knowledge Graph-based Retrieval-Augmented Generation (RAG)** provides multi-hop reasoning with fully verifiable source attribution. ## 🎯 What Makes This Different? Traditional vector-based RAG finds similar text chunks, but struggles with: - Questions requiring information from multiple documents - Complex reasoning chains - Providing verifiable sources for each claim **Knowledge Graph RAG** solves these by: 1. **Building a structured graph** of entities and relationships from documents 2. **Traversing connections** to find related information (multi-hop reasoning) 3. **Tracking provenance** so every claim links back to its source ## ✨ Features | Feature | Description | |---------|-------------| | πŸ”— **Multi-hop Reasoning** | Traverse entity relationships to answer complex questions | | πŸ“š **Verifiable Citations** | Every claim includes source document and text | | 🧠 **Reasoning Trace** | See exactly how the answer was derived | | 🏠 **Fully Local** | Uses Ollama for LLM, Neo4j for graph storage | ## πŸš€ Quick Start ### Prerequisites 1. **Ollama** - Local LLM inference ```bash # Install from https://ollama.ai ollama pull llama3.2 ``` 2. **Neo4j** - Knowledge graph database ```bash # Using Docker docker run -d \ --name neo4j \ -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/password \ neo4j:latest ``` ### Installation ```bash # Clone and navigate cd knowledge_graph_rag_citations # Install dependencies pip install -r requirements.txt # Run the app streamlit run knowledge_graph_rag.py ``` ## πŸ“– How It Works ### Step 1: Document β†’ Knowledge Graph ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Document β”‚ ──► β”‚ LLM Extraction β”‚ ──► β”‚ Knowledge Graph β”‚ β”‚ (Text/PDF) β”‚ β”‚ (Entities+Rels) β”‚ β”‚ (Neo4j) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` The LLM extracts: - **Entities**: People, organizations, concepts, technologies - **Relationships**: How entities connect (e.g., "works_for", "created", "uses") - **Provenance**: Source document and chunk for each extraction ### Step 2: Query β†’ Multi-hop Traversal ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Query β”‚ ──► β”‚ Find Start β”‚ ──► β”‚ Traverse β”‚ ──► β”‚ Context β”‚ β”‚ β”‚ β”‚ Entities β”‚ β”‚ Relations β”‚ β”‚ + Sourcesβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Step 3: Answer β†’ Verified Citations ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Context β”‚ ──► β”‚ Generate β”‚ ──► β”‚ Answer with β”‚ β”‚ + Sources β”‚ β”‚ Answer β”‚ β”‚ [1][2] Citationsβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Citation Details β”‚ β”‚ β€’ Source Doc β”‚ β”‚ β€’ Source Text β”‚ β”‚ β€’ Reasoning Path β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## πŸ–₯️ Usage Example ### 1. Add a Document Paste or select a sample document. The system extracts entities and relationships: ``` Document: "GraphRAG was developed by Microsoft Research. Darren Edge led the project..." Extracted: β”œβ”€β”€ Entity: GraphRAG (TECHNOLOGY) β”œβ”€β”€ Entity: Microsoft Research (ORGANIZATION) β”œβ”€β”€ Entity: Darren Edge (PERSON) └── Relationship: Darren Edge --[WORKS_FOR]--> Microsoft Research ``` ### 2. Ask a Question ``` Question: "Who developed GraphRAG and what organization are they from?" ``` ### 3. Get Verified Answer ``` Answer: GraphRAG was developed by researchers at Microsoft Research [1], with Darren Edge leading the project [2]. Citations: [1] Source: AI Research Paper Text: "GraphRAG is a technique developed by Microsoft Research..." [2] Source: AI Research Paper Text: "...introduced by researchers including Darren Edge..." ``` ## πŸ”§ Configuration | Setting | Default | Description | |---------|---------|-------------| | Neo4j URI | `bolt://localhost:7687` | Neo4j connection string | | Neo4j User | `neo4j` | Database username | | Neo4j Password | - | Database password | | LLM Model | `llama3.2` | Ollama model for extraction/generation | ## πŸ—οΈ Architecture ``` knowledge_graph_rag_citations/ β”œβ”€β”€ knowledge_graph_rag.py # Main Streamlit application β”œβ”€β”€ requirements.txt # Python dependencies └── README.md # This file ``` ### Key Components - **`KnowledgeGraphManager`**: Neo4j interface for graph operations - **`extract_entities_with_llm()`**: LLM-based entity/relationship extraction - **`generate_answer_with_citations()`**: Multi-hop RAG with provenance tracking ## πŸŽ“ Learn More This example is inspired by [VeritasGraph](https://github.com/bibinprathap/VeritasGraph), an enterprise-grade framework for: - On-premise knowledge graph RAG - Visual reasoning traces (Veritas-Scope) - LoRA-tuned LLM integration ## πŸ“ License MIT License