[GH-ISSUE #6437] how to use batch when using llm #50558

Open
opened 2026-04-28 16:23:23 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @PassStory on GitHub (Aug 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6437

I noticed the api does not support processing batch prompt, the GPU utilization is low, and i want to use batch mode to improve GPU utilization and accelerate the inference process, so, how to do that

Originally created by @PassStory on GitHub (Aug 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6437 I noticed the api does not support processing batch prompt, the GPU utilization is low, and i want to use batch mode to improve GPU utilization and accelerate the inference process, so, how to do that
GiteaMirror added the feature request label 2026-04-28 16:23:23 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 20, 2024):

#!/bin/bash

[ -z "$1" ] && { echo "usage: $0 batch-file" ; exit 1 ; }

export OLLAMA_HOST=${OLLAMA_HOST-localhost:11434}

get_completion() {
  id="$(jq -r .custom_id <<< "$1")"
  url="$(jq -r .url <<< "$1")"
  body="$(jq -cr .body <<< "$1")"
  curl -s $OLLAMA_HOST$url -d "$body" | jq -c '{"custom_id":"'$id'"}+.'
}
export -f get_completion

parallel --jobs ${OLLAMA_NUM_PARALLEL-1} get_completion < "$1"
<!-- gh-comment-id:2299259065 --> @rick-github commented on GitHub (Aug 20, 2024): ```bash #!/bin/bash [ -z "$1" ] && { echo "usage: $0 batch-file" ; exit 1 ; } export OLLAMA_HOST=${OLLAMA_HOST-localhost:11434} get_completion() { id="$(jq -r .custom_id <<< "$1")" url="$(jq -r .url <<< "$1")" body="$(jq -cr .body <<< "$1")" curl -s $OLLAMA_HOST$url -d "$body" | jq -c '{"custom_id":"'$id'"}+.' } export -f get_completion parallel --jobs ${OLLAMA_NUM_PARALLEL-1} get_completion < "$1" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50558