[GH-ISSUE #11360] Add REST endpoint to count prompt tokens for selected model #7495

Closed
opened 2026-04-12 19:34:35 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Tarmenale2 on GitHub (Jul 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11360

It would be very useful to have a REST endpoint that returns the token count for a given prompt, like:

POST /api/tokenize
{
  "model": "qwen:7b",
  "prompt": "Some input here"
}

Response:

{ "token_count": 123 }

Use case:
Before generating a response, I want to decide which model to load based on prompt length — e.g. use a small-context model for short prompts and Qwen3 (with 32k+ context) for longer ones. Without token count info, I’d need to load a model just to check, which defeats the purpose.

Originally created by @Tarmenale2 on GitHub (Jul 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11360 It would be very useful to have a REST endpoint that returns the token count for a given prompt, like: ```http POST /api/tokenize { "model": "qwen:7b", "prompt": "Some input here" } ``` Response: ```json { "token_count": 123 } ``` **Use case**: Before generating a response, I want to decide which model to load based on prompt length — e.g. use a small-context model for short prompts and Qwen3 (with 32k+ context) for longer ones. Without token count info, I’d need to load a model just to check, which defeats the purpose.
GiteaMirror added the feature request label 2026-04-12 19:34:35 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 10, 2025):

https://github.com/ollama/ollama/pull/8106

<!-- gh-comment-id:3057797484 --> @rick-github commented on GitHub (Jul 10, 2025): https://github.com/ollama/ollama/pull/8106
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7495