mirror of
https://github.com/ollama/ollama.git
synced 2026-04-29 15:38:27 -05:00
When truncating inputs to the the context window at the beginning of a sequence, we remove the minimum amount possible. However, this may cause us to truncate to the middle of a set of inputs that the model specified should not be split up. To avoid this, we need to remove the rest of the partial batch.