awesome-llm-apps/rag_tutorials/voice_rag_openaisdk/README.md

## 🎙️ Voice RAG with OpenAI SDK

This script demonstrates how to build a voice-enabled Retrieval-Augmented Generation (RAG) system using OpenAI's SDK and Streamlit. The application allows users to upload PDF documents, ask questions, and receive both text and voice responses using OpenAI's text-to-speech capabilities.

### Features

- Creates a voice-enabled RAG system using OpenAI's SDK
- Supports PDF document processing and chunking
- Uses Qdrant as the vector database for efficient similarity search
- Implements real-time text-to-speech with multiple voice options
- Provides a user-friendly Streamlit interface
- Allows downloading of generated audio responses
- Supports multiple document uploads and tracking

### How to get Started?

1. Clone the GitHub repository
```bash
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd awesome-llm-apps/rag_tutorials/voice_rag_openaisdk
```

2. Install the required dependencies:
```bash
pip install -r requirements.txt
```

3. Set up your API keys:
- Get your [OpenAI API key](https://platform.openai.com/)
- Set up a [Qdrant Cloud](https://cloud.qdrant.io/) account and get your API key and URL
- Create a `.env` file with your credentials:
```bash
OPENAI_API_KEY='your-openai-api-key'
QDRANT_URL='your-qdrant-url'
QDRANT_API_KEY='your-qdrant-api-key'
```

4. Run the Voice RAG application:
```bash
streamlit run rag_voice.py
```

5. Open your web browser and navigate to the URL provided in the console output to interact with the Voice RAG system.

### How it works?

1. **Document Processing:**
   - Upload PDF documents through the Streamlit interface
   - Documents are split into chunks using LangChain's RecursiveCharacterTextSplitter
   - Each chunk is embedded using FastEmbed and stored in Qdrant

2. **Query Processing:**
   - User questions are converted to embeddings
   - Similar documents are retrieved from Qdrant
   - A processing agent generates a clear, spoken-word friendly response
   - A TTS agent optimizes the response for speech synthesis

3. **Voice Generation:**
   - Text responses are converted to speech using OpenAI's TTS
   - Users can choose from multiple voice options
   - Audio can be played directly or downloaded as MP3

4. **Features:**
   - Real-time audio streaming
   - Multiple voice personality options
   - Document source tracking
   - Download capability for audio responses
   - Progress tracking for document processing