mirror of
https://github.com/ollama/ollama.git
synced 2026-03-11 17:34:04 -05:00
Currently, context length is unbounded - the cache will keep growing forever independent of the model's trained context length. This caps it and enforces semantics similar to most cloud services: - Long prompts will result in an error, not truncation. - Generation that exceeds the context will be stopped