[GH-ISSUE #5237] OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS do not work in WSL2 #3277

Closed
opened 2026-04-12 13:49:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Genesis1231 on GitHub (Jun 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5237

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I have been trying to get OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS working in the past 2 days,
but somehow it just doesnt work

i added these 2 into my envionrmental variables, but i still can only get one model and one inference at a time.
there are certainly enough vram:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090 ...    On  |   00000000:01:00.0 Off |                  N/A |
| N/A   45C    P5             14W /  105W |    5776MiB /  16376MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     41069      C   /ollama_llama_server                        N/A   

and i can echo both,

$ echo $OLLAMA_MAX_LOADED_MODELS
2
$ echo $OLLAMA_NUM_PARALLEL
2

anyone has this problem?? am i doing something wrong?? Thanks in advance

Originally created by @Genesis1231 on GitHub (Jun 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5237 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I have been trying to get OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS working in the past 2 days, but somehow it just doesnt work i added these 2 into my envionrmental variables, but i still can only get one model and one inference at a time. there are certainly enough vram: ``` +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 ... On | 00000000:01:00.0 Off | N/A | | N/A 45C P5 14W / 105W | 5776MiB / 16376MiB | 6% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 41069 C /ollama_llama_server N/A ``` and i can echo both, ``` $ echo $OLLAMA_MAX_LOADED_MODELS 2 $ echo $OLLAMA_NUM_PARALLEL 2 ``` anyone has this problem?? am i doing something wrong?? Thanks in advance
GiteaMirror added the needs more info label 2026-04-12 13:49:24 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 23, 2024):

The most likely explanation is you're not setting this for the server. Ollama is a client-server architecture, and on a linux system typically the server is run as a systemd service.
See https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server for instructions on how to configure the server.

<!-- gh-comment-id:2185314895 --> @dhiltgen commented on GitHub (Jun 23, 2024): The most likely explanation is you're not setting this for the server. Ollama is a client-server architecture, and on a linux system typically the server is run as a systemd service. See https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server for instructions on how to configure the server.
Author
Owner

@Genesis1231 commented on GitHub (Jun 24, 2024):

Thanks, it works now. It turned out i wasn’t configuring that correctly.

<!-- gh-comment-id:2185485362 --> @Genesis1231 commented on GitHub (Jun 24, 2024): Thanks, it works now. It turned out i wasn’t configuring that correctly.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3277