[GH-ISSUE #5461] Webhook support #49928

Open
opened 2026-04-28 13:25:55 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @drale2k on GitHub (Jul 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5461

Since Ollama now supports parallel requests since https://github.com/ollama/ollama/issues/358, i would like to suggest support for webhooks on completed generations.

The reason for this is that because longer running tasks like summarization can take some time, and Ollama will queue up generations if you send more than your server can handle in parallel. This causes my web app having to leave an open handle to the HTTP request which exhausts my server resources since i have to do hundreds of summarizations per day.

If we could have a simple webhook to report on the status change of a generation (completed, error), parallelisation would be much more useful. I think Replicate does a good job and a first implementation could be inspired by their API and be smaller in scope.

Originally created by @drale2k on GitHub (Jul 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5461 Since Ollama now supports parallel requests since https://github.com/ollama/ollama/issues/358, i would like to suggest support for webhooks on completed generations. The reason for this is that because longer running tasks like summarization can take some time, and Ollama will queue up generations if you send more than your server can handle in parallel. This causes my web app having to leave an open handle to the HTTP request which exhausts my server resources since i have to do hundreds of summarizations per day. If we could have a simple webhook to report on the status change of a generation (completed, error), parallelisation would be much more useful. I think Replicate does a good job and a first implementation could be inspired by [their API](https://replicate.com/docs/webhooks) and be smaller in scope.
GiteaMirror added the feature request label 2026-04-28 13:25:55 -05:00
Author
Owner

@ZhelinCheng commented on GitHub (Dec 2, 2024):

I also think this function is very necessary.

<!-- gh-comment-id:2510471210 --> @ZhelinCheng commented on GitHub (Dec 2, 2024): I also think this function is very necessary.
Author
Owner

@craiga commented on GitHub (Jan 31, 2025):

FWIW I'm working on a project which will queue requests to Ollama, and then send webhooks once they've been run.

Webhook support in Ollama itself would be much nicer, but this suits my use case of wanting to interact with Ollama from a web app.

If there's interest, I'd be happy to open source this work.

So far I've got it working by configuring the Ollama Python client to send requests to my webhook server instead of Ollama.

My webhook server sends a simulated Ollama response and adds the request to a queue. A worker in the background sends those requests on to Ollama, then sends a POST request to a configured URL once it's done.

In practice, it looks like this:

In [1]: import ollama

In [2]: o = ollama.Client("http://localhost:11435")  # my webhook server

In [3]: o.generate(model="llama3.2:latest", prompt="Hey, tell me a funny joke!")
Out[3]: GenerateResponse(
    model=None,
    created_at='2025-01-31 20:25:22.798389+00:00',
    done=None,
    done_reason=None,
    total_duration=None,
    load_duration=None,
    prompt_eval_count=None,
    prompt_eval_duration=None,
    eval_count=None,
    eval_duration=None,
    response=(
        "You have sent a request to Ollama webhooks. Your request has entered a queue."
        " Once it has been processed, an HTTP POST request will be sent to"
        " http://localhost:8000/ollama-webhooks/job=2b0d4ca0-8298-43c5-9400-4a70059f8a7c"
        " with Ollama's response in that request's body. You can check these details again"
        " by visiting http://localhost:11435/jobs/2b0d4ca0-8298-43c5-9400-4a70059f8a7c/."
        "\n\n"
        "I've sent the job ID in this JSON response, but have also included it below just"
        " in case you need to parse it from this message."
        "\n\n"
        "2b0d4ca0-8298-43c5-9400-4a70059f8a7c"
    ),
    context=None
)

The raw JSON which is being returned includes a "job" key with the job ID, but I couldn't figure out a way to get at this data with the client library as-is, so I also put all of those details in the simulated response.

<!-- gh-comment-id:2628367760 --> @craiga commented on GitHub (Jan 31, 2025): FWIW I'm working on a project which will queue requests to Ollama, and then send webhooks once they've been run. Webhook support in Ollama itself would be much nicer, but this suits my use case of wanting to interact with Ollama from a web app. If there's interest, I'd be happy to open source this work. So far I've got it working by configuring the Ollama Python client to send requests to my webhook server instead of Ollama. My webhook server sends a simulated Ollama response and adds the request to a queue. A worker in the background sends those requests on to Ollama, then sends a POST request to a configured URL once it's done. In practice, it looks like this: ```python In [1]: import ollama In [2]: o = ollama.Client("http://localhost:11435") # my webhook server In [3]: o.generate(model="llama3.2:latest", prompt="Hey, tell me a funny joke!") Out[3]: GenerateResponse( model=None, created_at='2025-01-31 20:25:22.798389+00:00', done=None, done_reason=None, total_duration=None, load_duration=None, prompt_eval_count=None, prompt_eval_duration=None, eval_count=None, eval_duration=None, response=( "You have sent a request to Ollama webhooks. Your request has entered a queue." " Once it has been processed, an HTTP POST request will be sent to" " http://localhost:8000/ollama-webhooks/job=2b0d4ca0-8298-43c5-9400-4a70059f8a7c" " with Ollama's response in that request's body. You can check these details again" " by visiting http://localhost:11435/jobs/2b0d4ca0-8298-43c5-9400-4a70059f8a7c/." "\n\n" "I've sent the job ID in this JSON response, but have also included it below just" " in case you need to parse it from this message." "\n\n" "2b0d4ca0-8298-43c5-9400-4a70059f8a7c" ), context=None ) ``` The raw JSON which is being returned includes a "job" key with the job ID, but I couldn't figure out a way to get at this data with the client library as-is, so I also put all of those details in the simulated response.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49928