[GH-ISSUE #4367] better docs on python library settings #28485

Closed
opened 2026-04-22 06:41:53 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @nikhil-swamix on GitHub (May 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4367

What is the issue?

many options in source reg GPU setting C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ollama\_types.py are useless, ive tried tweaking all of them!, please improve documentation on it only the mlock, mmap options work for resource allocation settings. GPU are completely opaque , usually selects most idle gpu. and change of any model init options aka load time options restart the server, is there any HOT reload setting? like /set in interactive mode? for the model init prams? or not feasible as once initialized , cant be changed?

reference of file which im talking about. num_gpu and main_gpu

class Options(TypedDict, total=False):
  # load time options
  numa: bool
  num_ctx: int
  num_batch: int
  num_gpu: int
  main_gpu: int
  low_vram: bool
  f16_kv: bool
  logits_all: bool
  vocab_only: bool
  use_mmap: bool
  use_mlock: bool
  embedding_only: bool
  num_thread: int

  # runtime options
  num_keep: int
  seed: int
  num_predict: int
  top_k: int
  top_p: float
  tfs_z: float
  typical_p: float
  repeat_last_n: int
  temperature: float
  repeat_penalty: float
  presence_penalty: float
  frequency_penalty: float
  mirostat: int
  mirostat_tau: float
  mirostat_eta: float
  penalize_newline: bool
  stop: Sequence[str]

suggestion

Suggestion: can it be nested like:
settings = {
  "cpu":{ a:b},
  "gpu":{c:d},
  "llamacpp":{more_settings}
}

# and few aliases like
max_tokens=num_ctx

# and converters to human friendly like
num_ctx="32k" -> 32,000

however

from llama_cpp import Llama
this thing works when we set, using os.environ["CUDA_VISIBLE_DEVICES"] = "1" and various other options. please add features appropriately, using is programmatically or using open AI client is very difficult, for example if we want to customize the parameters like mirostat, using profiles like ("creative", "precise","balanced") like copilot, then we need first class support. the ollama.chat method shows very little documentation in docstring.

question

may i work on improving documentation and more customization as raw settings provided by llama.cpp ? let me know. ollama provides good levvel of automation but has customization issues.

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @nikhil-swamix on GitHub (May 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4367 ### What is the issue? many options in source reg GPU setting `C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ollama\_types.py` are useless, ive tried tweaking all of them!, please improve documentation on it only the mlock, mmap options work for resource allocation settings. GPU are completely opaque , usually selects most idle gpu. and change of any model init options `aka load time` options restart the server, is there any HOT reload setting? like `/set` in interactive mode? for the model init prams? or not feasible as once initialized , cant be changed? reference of file which im talking about. num_gpu and main_gpu ``` class Options(TypedDict, total=False): # load time options numa: bool num_ctx: int num_batch: int num_gpu: int main_gpu: int low_vram: bool f16_kv: bool logits_all: bool vocab_only: bool use_mmap: bool use_mlock: bool embedding_only: bool num_thread: int # runtime options num_keep: int seed: int num_predict: int top_k: int top_p: float tfs_z: float typical_p: float repeat_last_n: int temperature: float repeat_penalty: float presence_penalty: float frequency_penalty: float mirostat: int mirostat_tau: float mirostat_eta: float penalize_newline: bool stop: Sequence[str] ``` # suggestion ``` Suggestion: can it be nested like: settings = { "cpu":{ a:b}, "gpu":{c:d}, "llamacpp":{more_settings} } # and few aliases like max_tokens=num_ctx # and converters to human friendly like num_ctx="32k" -> 32,000 ``` _____ # however `from llama_cpp import Llama` this thing works when we set, using `os.environ["CUDA_VISIBLE_DEVICES"] = "1"` and various other options. please add features appropriately, using is programmatically or using open AI client is very difficult, for example if we want to customize the parameters like mirostat, using profiles like ("creative", "precise","balanced") like copilot, then we need first class support. the `ollama.chat ` method shows very little documentation in docstring. # question may i work on improving documentation and more customization as raw settings provided by llama.cpp ? let me know. ollama provides good levvel of automation but has customization issues. ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the pythondocumentation labels 2026-04-22 06:41:53 -05:00
Author
Owner

@dhiltgen commented on GitHub (May 23, 2024):

Ollama is a client-server architecture, and the parameters you're referring to are the client side settings. What's slightly confusing is the num_gpu setting is actually the number of layers of the model to load into the GPU. The docs for this part of the API should be improved, however if you're looking for how to configure the GPU, see https://github.com/ollama/ollama/blob/main/docs/gpu.md

<!-- gh-comment-id:2128053068 --> @dhiltgen commented on GitHub (May 23, 2024): Ollama is a client-server architecture, and the parameters you're referring to are the client side settings. What's slightly confusing is the `num_gpu` setting is actually the number of layers of the model to load into the GPU. The docs for this part of the API should be improved, however if you're looking for how to configure the GPU, see https://github.com/ollama/ollama/blob/main/docs/gpu.md
Author
Owner

@nikhil-swamix commented on GitHub (May 23, 2024):

hi @dhiltgen sir,
in deed at this point i started learning "Go" to understand the internals better.
i would like to ask if this https://wiki.mutable.ai/ollama/ollama , is good or not? how do you view it?
and if it can be included as small link in documentation?
it provide great entry point for developers, and seems a developer guide than user guide, but concept is nice!
may i add this in bottom of readme? with title as "Alternative/Autogenerated Documentation"

<!-- gh-comment-id:2128182177 --> @nikhil-swamix commented on GitHub (May 23, 2024): hi @dhiltgen sir, in deed at this point i started learning "Go" to understand the internals better. i would like to ask if this https://wiki.mutable.ai/ollama/ollama , is good or not? how do you view it? and if it can be included as small link in documentation? it provide great entry point for developers, and seems a developer guide than user guide, but concept is nice! may i add this in bottom of readme? with title as "Alternative/Autogenerated Documentation"
Author
Owner

@ParthSareen commented on GitHub (Dec 2, 2024):

Hey! Going to close this out as we've had a ton of updates on the Python library since the opening of this ticket. Happy to hop on a new one if needed - or feel free to reopen with updates :D

<!-- gh-comment-id:2510829623 --> @ParthSareen commented on GitHub (Dec 2, 2024): Hey! Going to close this out as we've had a ton of updates on the Python library since the opening of this ticket. Happy to hop on a new one if needed - or feel free to reopen with updates :D
Author
Owner

@nikhil-swamix commented on GitHub (Dec 2, 2024):

great

<!-- gh-comment-id:2510973502 --> @nikhil-swamix commented on GitHub (Dec 2, 2024): great
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28485