[GH-ISSUE #7374] Reinstate OLLAMA_RUNNERS_DIR #30447

Closed
opened 2026-04-22 10:04:01 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @StarPet on GitHub (Oct 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7374

Originally assigned to: @dhiltgen on GitHub.

It appears that the OLLAMA_RUNNERS_DIR was removed from the code - at least I couldn't find it in github's search function. Currently (0.3.14) it is using /tmp/ollama/runners again, as before the introduction of the OLLAMA_RUNNERS_DIR (or when not set).
IMHO, using /tmp for executables is not a good idea. I'd prefer to have some control over where the executables are extracted to.
So, please, bring back the OLLAMA_RUNNERS_DIR.

Originally created by @StarPet on GitHub (Oct 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7374 Originally assigned to: @dhiltgen on GitHub. It appears that the OLLAMA_RUNNERS_DIR was removed from the code - at least I couldn't find it in github's search function. Currently (0.3.14) it is using /tmp/ollama<number>/runners again, as before the introduction of the OLLAMA_RUNNERS_DIR (or when not set). IMHO, using /tmp for executables is not a good idea. I'd prefer to have some control over where the executables are extracted to. So, please, bring back the OLLAMA_RUNNERS_DIR.
GiteaMirror added the feature requestneeds more info labels 2026-04-22 10:04:02 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 26, 2024):

OLLAMA_TMPDIR

<!-- gh-comment-id:2439695111 --> @rick-github commented on GitHub (Oct 26, 2024): [OLLAMA_TMPDIR](https://github.com/ollama/ollama/blob/35ec7f079ff3d08b1b837bffe202107abaa00555/envconfig/config.go#L250)
Author
Owner

@StarPet commented on GitHub (Oct 26, 2024):

Thanks, for pointing to OLLAMA_TMPDIR.

Changing names of environment variables breaks usability.

<!-- gh-comment-id:2439742875 --> @StarPet commented on GitHub (Oct 26, 2024): Thanks, for pointing to OLLAMA_TMPDIR. Changing names of environment variables breaks usability.
Author
Owner

@dhiltgen commented on GitHub (Nov 5, 2024):

Sorry for the churn. We're still refining how we handle tuning the behavior of the subprocess binaries across the supported OSes.

Can you clarify your use-case on why you were leveraging this variable to change the default behavior? Depending on the scenario you're trying to solve for, #7499 might help make things simpler.

<!-- gh-comment-id:2458138084 --> @dhiltgen commented on GitHub (Nov 5, 2024): Sorry for the churn. We're still refining how we handle tuning the behavior of the subprocess binaries across the supported OSes. Can you clarify your use-case on why you were leveraging this variable to change the default behavior? Depending on the scenario you're trying to solve for, #7499 might help make things simpler.
Author
Owner

@StarPet commented on GitHub (Nov 6, 2024):

Can you clarify your use-case on why you were leveraging this variable to change the default behavior? Depending on the scenario you're trying to solve for, #7499 might help make things simpler.
Sure. Putting executables to /tmp is IMHO not a good idea. For one, it is - in some distributions - using RAM disk. Meaning, you take a way precious memory (on smaller systems). Others may use the system disk. As most disks are now SSDs - and Ollama being not really small - this means more writes to the system SSDs and therefore wearing of SSD's cells faster. Security may be another concern as more users have access to /tmp. Even though the "t" bit does some protection, it is still not a good idea.
I'd like to put the runners to a fast NVMe and hope that Ollama checks if they are up to date when starting and uses the existing once when starting.

HTH,
Peter

<!-- gh-comment-id:2458808037 --> @StarPet commented on GitHub (Nov 6, 2024): > Can you clarify your use-case on why you were leveraging this variable to change the default behavior? Depending on the scenario you're trying to solve for, #7499 might help make things simpler. Sure. Putting executables to /tmp is IMHO not a good idea. For one, it is - in some distributions - using RAM disk. Meaning, you take a way precious memory (on smaller systems). Others may use the system disk. As most disks are now SSDs - and Ollama being not really small - this means more writes to the system SSDs and therefore wearing of SSD's cells faster. Security may be another concern as more users have access to /tmp. Even though the "t" bit does some protection, it is still not a good idea. I'd like to put the runners to a fast NVMe and hope that Ollama checks if they are up to date when starting and uses the existing once when starting. HTH, Peter
Author
Owner

@dhiltgen commented on GitHub (Nov 6, 2024):

Thanks for explaining. Yes, this is the intent of the documented OLLAMA_TMPDIR setting. We didn't document the runner variable in the help output since we weren't sure exactly how we wanted to structure the runners longer term as we evolve the architecture. Sorry that this was confusing.

% ollama serve --help
Start ollama

Usage:
  ollama serve [flags]

Aliases:
  serve, start

Flags:
  -h, --help   help for serve

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_TMPDIR              Location for temporary files
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")
<!-- gh-comment-id:2460098710 --> @dhiltgen commented on GitHub (Nov 6, 2024): Thanks for explaining. Yes, this is the intent of the documented OLLAMA_TMPDIR setting. We didn't document the runner variable in the help output since we weren't sure exactly how we wanted to structure the runners longer term as we evolve the architecture. Sorry that this was confusing. ``` % ollama serve --help Start ollama Usage: ollama serve [flags] Aliases: serve, start Flags: -h, --help help for serve Environment Variables: OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1) OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434) OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m") OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models per GPU OLLAMA_MAX_QUEUE Maximum number of queued requests OLLAMA_MODELS The path to the models directory OLLAMA_NUM_PARALLEL Maximum number of parallel requests OLLAMA_NOPRUNE Do not prune model blobs on startup OLLAMA_ORIGINS A comma separated list of allowed origins OLLAMA_SCHED_SPREAD Always schedule model across all GPUs OLLAMA_TMPDIR Location for temporary files OLLAMA_FLASH_ATTENTION Enabled flash attention OLLAMA_LLM_LIBRARY Set LLM library to bypass autodetection OLLAMA_GPU_OVERHEAD Reserve a portion of VRAM per GPU (bytes) OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m") ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30447