[GH-ISSUE #687] Error: error reading llm response: bufio.Scanner: token too long #26073

Closed
opened 2026-04-22 01:58:47 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @reggi on GitHub (Oct 3, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/687

Originally assigned to: @BruceMacD on GitHub.

I'm passing in about 62 articles that I wrote from the web and trying to get some analysis on them, and I keep seeing this error:

Error: error reading llm response: bufio.Scanner: token too long

Some text response comes back and then after a couple sentences it throws that error.

Is there a better way to do this or is my machine just not capable? Running a M1 Macbook air.

function generate_output() {
  echo -e "Below is a series of articles that I wrote\n"
  echo -e "\nEach article is prefixed with the string 'Here's another article:'\n"
  echo -e "\nStarting Articles\n"

  files=(*.txt)
  count=${#files[@]}
  i=0

  for file in "${files[@]}"; do
    cat "$file"
    ((i++))
    if [ $i -lt $count ]; then
      echo -e "\nHere's another article:\n"
    fi
  done

  echo -e "\nEnd of Articles\n"
  echo -e "\nCan you interpret some common themes from this series of articles?\n"
  # echo -e "\nWhat is the core theme from these articles?\n"
  # echo -e "\nWrite a bio for this author.\n"
}

ollama run mistral "$(generate_output)"
# ollama run codellama "$(generate_output)"

Originally created by @reggi on GitHub (Oct 3, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/687 Originally assigned to: @BruceMacD on GitHub. I'm passing in about 62 articles that I wrote from the web and trying to get some analysis on them, and I keep seeing this error: > `Error: error reading llm response: bufio.Scanner: token too long` Some text response comes back and then after a couple sentences it throws that error. Is there a better way to do this or is my machine just not capable? Running a M1 Macbook air. ```bash function generate_output() { echo -e "Below is a series of articles that I wrote\n" echo -e "\nEach article is prefixed with the string 'Here's another article:'\n" echo -e "\nStarting Articles\n" files=(*.txt) count=${#files[@]} i=0 for file in "${files[@]}"; do cat "$file" ((i++)) if [ $i -lt $count ]; then echo -e "\nHere's another article:\n" fi done echo -e "\nEnd of Articles\n" echo -e "\nCan you interpret some common themes from this series of articles?\n" # echo -e "\nWhat is the core theme from these articles?\n" # echo -e "\nWrite a bio for this author.\n" } ollama run mistral "$(generate_output)" # ollama run codellama "$(generate_output)" ```
Author
Owner

@BruceMacD commented on GitHub (Oct 3, 2023):

Thanks for reporting this. I've opened #692 to get the error fixed.

As a side note, the context window for the LLM wont be able to fit a lot of long articles, they will be automatically truncated. A better approach to this would be breaking this down into individual requests, then sending a final request that asks the LLM to create a summary based on those shorter outputs.

<!-- gh-comment-id:1745678880 --> @BruceMacD commented on GitHub (Oct 3, 2023): Thanks for reporting this. I've opened #692 to get the error fixed. As a side note, the context window for the LLM wont be able to fit a lot of long articles, they will be automatically truncated. A better approach to this would be breaking this down into individual requests, then sending a final request that asks the LLM to create a summary based on those shorter outputs.
Author
Owner

@reggi commented on GitHub (Oct 3, 2023):

@BruceMacD thanks for the feedback, just to confirm I should do this via curl and not piping / arguments via shell script? memory is able to persist across curl requests?

<!-- gh-comment-id:1745825370 --> @reggi commented on GitHub (Oct 3, 2023): @BruceMacD thanks for the feedback, just to confirm I should do this via curl and not piping / arguments via shell script? memory is able to persist across curl requests?
Author
Owner

@BruceMacD commented on GitHub (Oct 4, 2023):

You could still use piping. I'd just suggest breaking up the piped requests into smaller prompts so that the model doesn't lose some context.

<!-- gh-comment-id:1747312151 --> @BruceMacD commented on GitHub (Oct 4, 2023): You could still use piping. I'd just suggest breaking up the piped requests into smaller prompts so that the model doesn't lose some context.
Author
Owner

@reggi commented on GitHub (Oct 4, 2023):

@BruceMacD from expierence new lines \n are split and each line is sent as a prompt I had no way of providing blocks of text with new lines as a single prompt via piping.

<!-- gh-comment-id:1747474821 --> @reggi commented on GitHub (Oct 4, 2023): @BruceMacD from expierence new lines \n are split and each line is sent as a prompt I had no way of providing blocks of text with new lines as a single prompt via piping.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26073