[PR #2412] [CLOSED] Added /screenshot command for multimodal model chats #10881

Closed
opened 2026-04-12 23:14:09 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2412
Author: @ac-99
Created: 2/8/2024
Status: Closed

Base: mainHead: feature/multimodal-screenshot


📝 Commits (6)

  • c3ca0c2 added concise /screenshot instruction
  • 1882b29 updated go.mod and go.sum
  • f3f6131 added screenshot capturing function and command to interactive sessions
  • 3e5c384 more informative screenshot message
  • 03bbc2f added test for capture screenshots
  • 217194d updated screenshot trigger to account for /screenshot in any part of the input

📊 Changes

4 files changed (+96 additions, -0 deletions)

View changed files

📝 cmd/interactive.go (+50 -0)
📝 cmd/interactive_test.go (+32 -0)
📝 go.mod (+4 -0)
📝 go.sum (+10 -0)

📄 Description

Added ability to feed current screen directly to multimodal models with a /screenshot command.

This enables a more dynamic experience for users who can more quickly and easily get contextual responses from their multimodal assistants.

Example use cases

  1. Research assistant -- allows the multimodal LM to use your current screen as context and suggest ideas e.g "what's this animal?"
  2. Study assistant -- allows to multimodal LM to provide explanations, clarifications and examples based on current text or "explain this diagram"
  3. Design assistant -- get quick, direct input on designs

Usage

User types /screenshot into the terminal, identically to the existing path/to/image functionality. Includes support for multiple displays.

Implementation

  1. /screenshot command appearing in user input
  2. captureScreenshots is called
  3. screenshot is saved in a tempdir (as identified by os.TempDir) with name based on the image size and screen index number
  4. These paths are appended to the user input line variable

As a result, these paths are then processed in the same way as existing path/to/file.png images are

I also added some basic sanity checks with tests.

Issues

I dont seem to be able to run the tests locally for some reason, so I'd appreciate some support on that.

Requesting review and input from @jmorganca. I'm more than open to making changes or updates -- this is my first OS contribution!


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2412 **Author:** [@ac-99](https://github.com/ac-99) **Created:** 2/8/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `feature/multimodal-screenshot` --- ### 📝 Commits (6) - [`c3ca0c2`](https://github.com/ollama/ollama/commit/c3ca0c264b3babc2249e524cad9c9c4e5305ca1e) added concise /screenshot instruction - [`1882b29`](https://github.com/ollama/ollama/commit/1882b2979ce6056a8b146f048fbdcd94460340a8) updated go.mod and go.sum - [`f3f6131`](https://github.com/ollama/ollama/commit/f3f61311f7e77fc54f47ec6197a552764b0084a2) added screenshot capturing function and command to interactive sessions - [`3e5c384`](https://github.com/ollama/ollama/commit/3e5c384d0b4c28b1ebf9d150259cea2e15859acc) more informative screenshot message - [`03bbc2f`](https://github.com/ollama/ollama/commit/03bbc2f65e061b63a8165dfbf33a5a763e155f66) added test for capture screenshots - [`217194d`](https://github.com/ollama/ollama/commit/217194db19adcf4e067895366401816ce54df74f) updated screenshot trigger to account for /screenshot in any part of the input ### 📊 Changes **4 files changed** (+96 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `cmd/interactive.go` (+50 -0) 📝 `cmd/interactive_test.go` (+32 -0) 📝 `go.mod` (+4 -0) 📝 `go.sum` (+10 -0) </details> ### 📄 Description Added ability to feed current screen directly to multimodal models with a `/screenshot` command. This enables a more dynamic experience for users who can more quickly and easily get contextual responses from their multimodal assistants. **Example use cases** 1. Research assistant -- allows the multimodal LM to use your current screen as context and suggest ideas e.g "what's this animal?" 2. Study assistant -- allows to multimodal LM to provide explanations, clarifications and examples based on current text or "explain this diagram" 3. Design assistant -- get quick, direct input on designs **Usage** User types `/screenshot` into the terminal, identically to the existing `path/to/image` functionality. Includes support for multiple displays. **Implementation** 1. `/screenshot` command appearing in user input 2. `captureScreenshots` is called 3. `screenshot` is saved in a tempdir (as identified by `os.TempDir`) with name based on the image size and screen index number 4. These paths are appended to the user input `line` variable As a result, these paths are then processed in the same way as existing `path/to/file.png` images are I also added some basic sanity checks with tests. **Issues** I dont seem to be able to run the tests locally for some reason, so I'd appreciate some support on that. Requesting review and input from @jmorganca. I'm more than open to making changes or updates -- this is my first OS contribution! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:14:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#10881