mirror of
https://github.com/Shubhamsaboo/awesome-llm-apps.git
synced 2026-05-05 15:35:43 -05:00
Voice Enabled RAG Agent with OpenAI Agents SDK
This commit is contained in:
68
rag_tutorials/voice_rag_openaisdk/README.md
Normal file
68
rag_tutorials/voice_rag_openaisdk/README.md
Normal file
@@ -0,0 +1,68 @@
|
||||
## 🎙️ Voice RAG with OpenAI SDK
|
||||
|
||||
This script demonstrates how to build a voice-enabled Retrieval-Augmented Generation (RAG) system using OpenAI's SDK and Streamlit. The application allows users to upload PDF documents, ask questions, and receive both text and voice responses using OpenAI's text-to-speech capabilities.
|
||||
|
||||
### Features
|
||||
|
||||
- Creates a voice-enabled RAG system using OpenAI's SDK
|
||||
- Supports PDF document processing and chunking
|
||||
- Uses Qdrant as the vector database for efficient similarity search
|
||||
- Implements real-time text-to-speech with multiple voice options
|
||||
- Provides a user-friendly Streamlit interface
|
||||
- Allows downloading of generated audio responses
|
||||
- Supports multiple document uploads and tracking
|
||||
|
||||
### How to get Started?
|
||||
|
||||
1. Clone the GitHub repository
|
||||
```bash
|
||||
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
|
||||
cd awesome-llm-apps/rag_tutorials/voice_rag_openaisdk
|
||||
```
|
||||
|
||||
2. Install the required dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. Set up your API keys:
|
||||
- Get your [OpenAI API key](https://platform.openai.com/)
|
||||
- Set up a [Qdrant Cloud](https://cloud.qdrant.io/) account and get your API key and URL
|
||||
- Create a `.env` file with your credentials:
|
||||
```bash
|
||||
OPENAI_API_KEY='your-openai-api-key'
|
||||
QDRANT_URL='your-qdrant-url'
|
||||
QDRANT_API_KEY='your-qdrant-api-key'
|
||||
```
|
||||
|
||||
4. Run the Voice RAG application:
|
||||
```bash
|
||||
streamlit run rag_voice.py
|
||||
```
|
||||
|
||||
5. Open your web browser and navigate to the URL provided in the console output to interact with the Voice RAG system.
|
||||
|
||||
### How it works?
|
||||
|
||||
1. **Document Processing:**
|
||||
- Upload PDF documents through the Streamlit interface
|
||||
- Documents are split into chunks using LangChain's RecursiveCharacterTextSplitter
|
||||
- Each chunk is embedded using FastEmbed and stored in Qdrant
|
||||
|
||||
2. **Query Processing:**
|
||||
- User questions are converted to embeddings
|
||||
- Similar documents are retrieved from Qdrant
|
||||
- A processing agent generates a clear, spoken-word friendly response
|
||||
- A TTS agent optimizes the response for speech synthesis
|
||||
|
||||
3. **Voice Generation:**
|
||||
- Text responses are converted to speech using OpenAI's TTS
|
||||
- Users can choose from multiple voice options
|
||||
- Audio can be played directly or downloaded as MP3
|
||||
|
||||
4. **Features:**
|
||||
- Real-time audio streaming
|
||||
- Multiple voice personality options
|
||||
- Document source tracking
|
||||
- Download capability for audio responses
|
||||
- Progress tracking for document processing
|
||||
Reference in New Issue
Block a user