[GH-ISSUE #4011] When my ollama has both the gemma and llama large models, how can I enable them at the same time?Thank you. #64524

Closed
opened 2026-05-03 17:58:05 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @joylijoy on GitHub (Apr 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4011

When my ollama has both the gemma and llama large models, how can I enable them at the same time? Does pressing CTRL+D mean to exit the large model? Thank you.

Originally created by @joylijoy on GitHub (Apr 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4011 When my ollama has both the gemma and llama large models, how can I enable them at the same time? Does pressing CTRL+D mean to exit the large model? Thank you.
Author
Owner

@L-A commented on GitHub (Apr 29, 2024):

It depends on what you're looking to do!

Using a client

If you want to use both models from a client that connects to ollama, you can specify their respective tags in the client, for example ollama3 if you're running the default tag, or for example gemma:7b-instruct-v1.1-q8_0 for a specific quantization. Ollama exposes an API where an app can ask for it to keep the model in memory.

From the command line

If you want to use both models from the command line at the same time, you can spawn two different tabs in your terminal (or however you manage multiple sessions), and run a different model in each. Make sure your machine has enough RAM to load both models in memory at the same time. You're right that exiting (/bye command) would unload a model from RAM, but it doesn't remove it from your computer.

<!-- gh-comment-id:2082693873 --> @L-A commented on GitHub (Apr 29, 2024): It depends on what you're looking to do! **Using a client** If you want to use both models from a client that connects to `ollama`, you can specify their respective tags in the client, for example `ollama3` if you're running the default tag, or for example `gemma:7b-instruct-v1.1-q8_0` for a specific quantization. Ollama exposes an API where an app can ask for it to keep the model in memory. **From the command line** If you want to use both models from the command line at the same time, you can spawn two different tabs in your terminal (or however you manage multiple sessions), and run a different model in each. Make sure your machine has enough RAM to load both models in memory at the same time. You're right that exiting (`/bye command`) would unload a model from RAM, but it doesn't remove it from your computer.
Author
Owner

@pdevine commented on GitHub (May 1, 2024):

As @L-A mentioned, just start the REPL twice (i.e. in one terminal start ollama run gemma and in the other terminal ollama run llama3. If you want both to run at the same time you'll need to have the 0.1.33 pre-release installed. Follow the directions on the release page here.

I'll go ahead and close the issue.

<!-- gh-comment-id:2087977782 --> @pdevine commented on GitHub (May 1, 2024): As @L-A mentioned, just start the REPL twice (i.e. in one terminal start `ollama run gemma` and in the other terminal `ollama run llama3`. If you want both to run at the same time you'll need to have the 0.1.33 pre-release installed. Follow the directions on the release page [here](https://github.com/ollama/ollama/releases). I'll go ahead and close the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64524