[GH-ISSUE #962] API Improvements #62505

Open
opened 2026-05-03 09:15:56 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @knoopx on GitHub (Nov 1, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/962

I'm currently writing a webui for ollama but I find the API quite limited/cumbersome.
What is your vision/plan regarding it? Is it in a frozen state, or are you planning to improve it?

Here's some criticism:

  • mixed model/generation endpoints. some namespacing would be nice.

  • mixed model/name params that refer to the same thing.

  • /api/tags: why is this named tags?

  • GET /api/tags to get all available local models but POST /api/show to get one?

  • some endpoints throw errors, some return status as a JSON property.

  • No way to query the available public models repository

  • POST /api/create: doesn't allow to specify the Modelfile as just raw text, so there's no way to create models without file system access (client-side). Also no way to specify model file using just an object. Also for this to work, FROM needs to handle remote resources aswell.

  • POST /api/show: returns a string which forces the client to parse it to get the actual data. It would be nice it it also returned a JSON object.

  • POST /api/embeddings: without batching support is mostly useless

  • template in Modelfile:
    to properly support chat agents it would be nice to have a chat-specific generation endpoint, and to be able to iterate over them in the template.
    otherwise the feature itself is quite limited and requires the client to mostly override and re-implement all the logic (and it also needs to know all the underlying model parameters).
    (this is how Hugginface does https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json#L34)

    Example:

    define Modelfile template :

    {{range .Messages}}
    <|{{ .Role }}|>
    {{ .Content }}
    </s>
    {{end}}
    

    instead of:

    {{- if .System }}
    <|system|>
    {{ .System }}
    </s>
    {{- end }}
    <|user|>
    {{ .Prompt }}
    </s>
    <|assistant|>
    

    and then passing the messages as a JSON array to

    POST /api/chat/generate

    {
        ...,
        "messages": [
            {"role": "system", "content": "you are an assistant"},
            {"role": "user", "content": "hello"},
            {"role": "assistant", "content": "hi there"}
        ]
    }
    

Here's my app if you want to have a peek: https://github.com/knoopx/llm-workbench

Originally created by @knoopx on GitHub (Nov 1, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/962 I'm currently writing a webui for ollama but I find the API quite limited/cumbersome. What is your vision/plan regarding it? Is it in a frozen state, or are you planning to improve it? Here's some criticism: * mixed model/generation endpoints. some namespacing would be nice. * mixed `model`/`name` params that refer to the same thing. * `/api/tags`: why is this named tags? * `GET /api/tags` to get all available local models but `POST /api/show` to get one? * some endpoints throw errors, some return `status` as a JSON property. * No way to query the available public models repository * `POST /api/create`: doesn't allow to specify the `Modelfile` as just raw text, so there's no way to create models without file system access (client-side). Also no way to specify model file using just an object. Also for this to work, `FROM` needs to handle remote resources aswell. * `POST /api/show`: returns a string which forces the client to parse it to get the actual data. It would be nice it it also returned a JSON object. * `POST /api/embeddings`: without batching support is mostly useless * `template` in `Modelfile`: to properly support chat agents it would be nice to have a chat-specific generation endpoint, and to be able to iterate over them in the `template`. otherwise the feature itself is quite limited and requires the client to mostly override and re-implement all the logic (and it also needs to know all the underlying model parameters). (this is how Hugginface does https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json#L34) Example: define `Modelfile` `template` : ``` {{range .Messages}} <|{{ .Role }}|> {{ .Content }} </s> {{end}} ``` instead of: ``` {{- if .System }} <|system|> {{ .System }} </s> {{- end }} <|user|> {{ .Prompt }} </s> <|assistant|> ``` and then passing the messages as a JSON array to `POST /api/chat/generate` ``` { ..., "messages": [ {"role": "system", "content": "you are an assistant"}, {"role": "user", "content": "hello"}, {"role": "assistant", "content": "hi there"} ] } ``` Here's my app if you want to have a peek: https://github.com/knoopx/llm-workbench
GiteaMirror added the feature requestapi labels 2026-05-03 09:16:16 -05:00
Author
Owner

@mysticfall commented on GitHub (Nov 25, 2023):

+1 for the chat agent support and potential template format change.

Even though LangChain supports Ollama out of the box, its model implementation is wrong because it uses its own prompt format (i.e. Alpaca-like) to preprocess the input, which is again wrapped with a model-specific prompt template once the request is sent to the server. (See https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chat_models/ollama.ts#L256)

It's a problem that LangChain should fix, but the real issue is that there's no way to correctly implement the model with how Ollama currently handles the prompt template.

To be specific, LangChain presupposes a chat model can process a list of messages in a single prompt, which can be from the system, user, or AI.

But even though we may change ChatOllama to make it query the model-specific template and send the formatted prompt using the raw parameter, there's no way to parse the template to extract the proper format for each message type.

<!-- gh-comment-id:1826299497 --> @mysticfall commented on GitHub (Nov 25, 2023): +1 for the chat agent support and potential template format change. Even though LangChain supports Ollama out of the box, its model implementation is wrong because it uses its own prompt format (i.e. Alpaca-like) to preprocess the input, which is again wrapped with a model-specific prompt template once the request is sent to the server. (See https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chat_models/ollama.ts#L256) It's a problem that LangChain should fix, but the real issue is that there's no way to correctly implement the model with how Ollama currently handles the prompt template. To be specific, LangChain presupposes a chat model can process a list of messages in a single prompt, which can be from the system, user, or AI. But even though we may change `ChatOllama` to make it query the model-specific template and send the formatted prompt using the `raw` parameter, there's no way to parse the template to extract the proper format for each message type.
Author
Owner

@qsdhj commented on GitHub (May 14, 2024):

Hi, is someone working on the feature to enable batch processing with embeddings? Without it, the feature is, besides for basic testing with small corpuses of text, not useable.

<!-- gh-comment-id:2109542194 --> @qsdhj commented on GitHub (May 14, 2024): Hi, is someone working on the feature to enable batch processing with embeddings? Without it, the feature is, besides for basic testing with small corpuses of text, not useable.
Author
Owner

@IvanoBilenchi commented on GitHub (May 17, 2024):

Batch embeddings really are a must for the whole embeddings feature to be usable. It looks like some work was done in #3642, though it's been in draft state for a while.

<!-- gh-comment-id:2117806720 --> @IvanoBilenchi commented on GitHub (May 17, 2024): Batch embeddings really are a must for the whole embeddings feature to be usable. It looks like some work was done in #3642, though it's been in draft state for a while.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62505