## 🧬 Multimodal AI Agent

A Streamlit application that combines video analysis and web search capabilities using Google's Gemini 2.0 model. This agent can analyze uploaded videos and answer questions by combining visual understanding with web-search.

### Features

- Video analysis using Gemini 2.0 Flash
- Web research integration via DuckDuckGo
- Support for multiple video formats (MP4, MOV, AVI)
- Real-time video processing
- Combined visual and textual analysis

### How to get Started?

1. Clone the GitHub repository

```bash
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd ai_agent_tutorials/multimodal_ai_agent
```
2. Install the required dependencies:

```bash
pip install -r requirements.txt
```
3. Get your Google Gemini API Key

- Sign up for an [Google AI Studio account](https://aistudio.google.com/apikey) and obtain your API key.

4. Set up your Gemini API Key as the environment variable

```bash
GOOGLE_API_KEY=your_api_key_here
```

5. Run the Streamlit App
```bash
streamlit run multimodal_agent.py
```