Voice Enabled RAG Agent with OpenAI Agents SDK

2026-05-05 15:35:43 -05:00 · 2025-03-25 05:12:20 +05:30
parent fc02a7f3dd
commit bdfd184feb
3 changed files with 477 additions and 0 deletions
--- a/rag_tutorials/voice_rag_openaisdk/README.md
+++ b/rag_tutorials/voice_rag_openaisdk/README.md
@@ -0,0 +1,68 @@
+## 🎙️ Voice RAG with OpenAI SDK
+
+This script demonstrates how to build a voice-enabled Retrieval-Augmented Generation (RAG) system using OpenAI's SDK and Streamlit. The application allows users to upload PDF documents, ask questions, and receive both text and voice responses using OpenAI's text-to-speech capabilities.
+
+### Features
+
+- Creates a voice-enabled RAG system using OpenAI's SDK
+- Supports PDF document processing and chunking
+- Uses Qdrant as the vector database for efficient similarity search
+- Implements real-time text-to-speech with multiple voice options
+- Provides a user-friendly Streamlit interface
+- Allows downloading of generated audio responses
+- Supports multiple document uploads and tracking
+
+### How to get Started?
+
+1. Clone the GitHub repository
+```bash
+git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
+cd awesome-llm-apps/rag_tutorials/voice_rag_openaisdk
+```
+
+2. Install the required dependencies:
+```bash
+pip install -r requirements.txt
+```
+
+3. Set up your API keys:
+- Get your [OpenAI API key](https://platform.openai.com/)
+- Set up a [Qdrant Cloud](https://cloud.qdrant.io/) account and get your API key and URL
+- Create a `.env` file with your credentials:
+```bash
+OPENAI_API_KEY='your-openai-api-key'
+QDRANT_URL='your-qdrant-url'
+QDRANT_API_KEY='your-qdrant-api-key'
+```
+
+4. Run the Voice RAG application:
+```bash
+streamlit run rag_voice.py
+```
+
+5. Open your web browser and navigate to the URL provided in the console output to interact with the Voice RAG system.
+
+### How it works?
+
+1. **Document Processing:** 
+   - Upload PDF documents through the Streamlit interface
+   - Documents are split into chunks using LangChain's RecursiveCharacterTextSplitter
+   - Each chunk is embedded using FastEmbed and stored in Qdrant
+
+2. **Query Processing:**
+   - User questions are converted to embeddings
+   - Similar documents are retrieved from Qdrant
+   - A processing agent generates a clear, spoken-word friendly response
+   - A TTS agent optimizes the response for speech synthesis
+
+3. **Voice Generation:**
+   - Text responses are converted to speech using OpenAI's TTS
+   - Users can choose from multiple voice options
+   - Audio can be played directly or downloaded as MP3
+
+4. **Features:**
+   - Real-time audio streaming
+   - Multiple voice personality options
+   - Document source tracking
+   - Download capability for audio responses
+   - Progress tracking for document processing