[GH-ISSUE #10255] litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"} #6729

New Issue

GiteaMirror · 2026-04-12T18:29:17-05:00

GiteaMirror commented

2026-04-12 18:29:17 -05:00

Originally created by @liatoutou on GitHub (Apr 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10255

What is the issue?

When using the new google ADK with ollama_chat and litellm (1.65.8 & 1.66.0) in this script:

import datetime
from zoneinfo import ZoneInfo
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm

def get_weather(city: str) -> dict:
    """Retrieves the current weather report for a specified city.

    Args:
        city (str): The name of the city for which to retrieve the weather report.

    Returns:
        dict: status and result or error msg.
    """
    if city.lower() == "new york":
        return {
            "status": "success",
            "report": (
                "The weather in New York is sunny with a temperature of 25 degrees"
                " Celsius (41 degrees Fahrenheit)."
            ),
        }
    else:
        return {
            "status": "error",
            "error_message": f"Weather information for '{city}' is not available.",
        }


def get_current_time(city: str) -> dict:
    """Returns the current time in a specified city.

    Args:
        city (str): The name of the city for which to retrieve the current time.

    Returns:
        dict: status and result or error msg.
    """

    if city.lower() == "new york":
        tz_identifier = "America/New_York"
    else:
        return {
            "status": "error",
            "error_message": (
                f"Sorry, I don't have timezone information for {city}."
            ),
        }

    tz = ZoneInfo(tz_identifier)
    now = datetime.datetime.now(tz)
    report = (
        f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}'
    )
    return {"status": "success", "report": report}


root_agent = Agent(
    name="weather_time_agent",
     model=LiteLlm(
        model='ollama_chat/mistral-small3.1'
    ),
    description=(
        "Agent to answer questions about the time and weather in a city."
    ),
    instruction=(
        "I can answer your questions about the time and weather in a city."
    ),
    tools=[get_weather, get_current_time],
)

I get the following error:

Relevant log output

LLM Request:
-----------------------------------------------------------
System Instruction:
I can answer your questions about the time and weather in a city.

You are an agent. Your internal name is "weather_time_agent".

 The description about you is "Agent to answer questions about the time and weather in a city."
-----------------------------------------------------------
Contents:
{"parts":[{"text":"Can you tell me the time?"}],"role":"user"}
{"parts":[{"text":"In which city?"}],"role":"model"}
{"parts":[{"text":"New york"}],"role":"user"}
{"parts":[{"function_call":{"id":"95ce73b8-91cd-4f3f-a450-afbd6acf6dae","args":{"city":"New york"},"name":"get_current_time"}}],"role":"model"}  
{"parts":[{"function_response":{"id":"95ce73b8-91cd-4f3f-a450-afbd6acf6dae","name":"get_current_time","response":{"status":"success","report":"The current time in New york is 2025-04-13 02:07:14 EDT-0400"}}}],"role":"user"}
-----------------------------------------------------------
Functions:
get_weather: {'city': {'type': <Type.STRING: 'STRING'>}} -> None
get_current_time: {'city': {'type': <Type.STRING: 'STRING'>}} -> None
-----------------------------------------------------------

08:07:14 - LiteLLM:INFO: utils.py:3085 -
LiteLLM completion() model= mistral-small3.1; provider = ollama_chat
2025-04-13 08:07:14,271 - INFO - utils.py:3085 -
LiteLLM completion() model= mistral-small3.1; provider = ollama_chat
2025-04-13 08:07:14,297 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK"
08:07:14 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: ollama_chat/mistral-small3.1
2025-04-13 08:07:14,297 - INFO - cost_calculator.py:636 - selected model name for cost calculation: ollama_chat/mistral-small3.1
2025-04-13 08:07:14,324 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK"
2025-04-13 08:07:14,349 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK"
2025-04-13 08:07:16,363 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK"

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

2025-04-13 08:07:16,629 - ERROR - fast_api.py:616 - Error in event_generator: litellm.APIConnectionError: Ollama_chatException - {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"}
Traceback (most recent call last):
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\main.py", line 477, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\llms\ollama_chat.py", line 607, in ollama_acompletion 
    raise e  # don't use verbose_logger.exception, if exception is raised
    ^^^^^^^
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\llms\ollama_chat.py", line 546, in ollama_acompletion 
    raise OllamaError(status_code=resp.status, message=text)
litellm.llms.ollama_chat.OllamaError: {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"}  

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\cli\fast_api.py", line 605, in event_generator     
    async for event in runner.run_async(
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\runners.py", line 197, in run_async
    async for event in invocation_context.agent.run_async(invocation_context):
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\agents\base_agent.py", line 141, in run_async      
    async for event in self._run_async_impl(ctx):
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\agents\llm_agent.py", line 232, in _run_async_impl 
    async for event in self._llm_flow.run_async(ctx):
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 231, in run_async
    async for event in self._run_one_step_async(invocation_context):
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 257, in _run_one_step_async
    async for llm_response in self._call_llm_async(
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 470, in _call_llm_async
    async for llm_response in llm.generate_content_async(
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\models\lite_llm.py", line 658, in generate_content_async
    response = await self.llm_client.acompletion(**completion_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\models\lite_llm.py", line 88, in acompletion       
    return await acompletion(
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\utils.py", line 1452, in wrapper_async
    raise e
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\utils.py", line 1313, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\main.py", line 496, in acompletion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2214, in exception_type
    raise e
  File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2183, in exception_type
    raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"}

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.6.5

Originally created by @liatoutou on GitHub (Apr 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10255 ### What is the issue? When using the new google ADK with ollama_chat and litellm (1.65.8 & 1.66.0) in this script: ``` import datetime from zoneinfo import ZoneInfo from google.adk.agents import Agent from google.adk.models.lite_llm import LiteLlm def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (41 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } def get_current_time(city: str) -> dict: """Returns the current time in a specified city. Args: city (str): The name of the city for which to retrieve the current time. Returns: dict: status and result or error msg. """ if city.lower() == "new york": tz_identifier = "America/New_York" else: return { "status": "error", "error_message": ( f"Sorry, I don't have timezone information for {city}." ), } tz = ZoneInfo(tz_identifier) now = datetime.datetime.now(tz) report = ( f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}' ) return {"status": "success", "report": report} root_agent = Agent( name="weather_time_agent", model=LiteLlm( model='ollama_chat/mistral-small3.1' ), description=( "Agent to answer questions about the time and weather in a city." ), instruction=( "I can answer your questions about the time and weather in a city." ), tools=[get_weather, get_current_time], ) ``` I get the following error: ### Relevant log output ```shell LLM Request: ----------------------------------------------------------- System Instruction: I can answer your questions about the time and weather in a city. You are an agent. Your internal name is "weather_time_agent". The description about you is "Agent to answer questions about the time and weather in a city." ----------------------------------------------------------- Contents: {"parts":[{"text":"Can you tell me the time?"}],"role":"user"} {"parts":[{"text":"In which city?"}],"role":"model"} {"parts":[{"text":"New york"}],"role":"user"} {"parts":[{"function_call":{"id":"95ce73b8-91cd-4f3f-a450-afbd6acf6dae","args":{"city":"New york"},"name":"get_current_time"}}],"role":"model"} {"parts":[{"function_response":{"id":"95ce73b8-91cd-4f3f-a450-afbd6acf6dae","name":"get_current_time","response":{"status":"success","report":"The current time in New york is 2025-04-13 02:07:14 EDT-0400"}}}],"role":"user"} ----------------------------------------------------------- Functions: get_weather: {'city': {'type': <Type.STRING: 'STRING'>}} -> None get_current_time: {'city': {'type': <Type.STRING: 'STRING'>}} -> None ----------------------------------------------------------- 08:07:14 - LiteLLM:INFO: utils.py:3085 - LiteLLM completion() model= mistral-small3.1; provider = ollama_chat 2025-04-13 08:07:14,271 - INFO - utils.py:3085 - LiteLLM completion() model= mistral-small3.1; provider = ollama_chat 2025-04-13 08:07:14,297 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK" 08:07:14 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: ollama_chat/mistral-small3.1 2025-04-13 08:07:14,297 - INFO - cost_calculator.py:636 - selected model name for cost calculation: ollama_chat/mistral-small3.1 2025-04-13 08:07:14,324 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK" 2025-04-13 08:07:14,349 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK" 2025-04-13 08:07:16,363 - INFO - _client.py:1025 - HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK" Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'. 2025-04-13 08:07:16,629 - ERROR - fast_api.py:616 - Error in event_generator: litellm.APIConnectionError: Ollama_chatException - {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"} Traceback (most recent call last): File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\main.py", line 477, in acompletion response = await init_response ^^^^^^^^^^^^^^^^^^^ File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\llms\ollama_chat.py", line 607, in ollama_acompletion raise e # don't use verbose_logger.exception, if exception is raised ^^^^^^^ File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\llms\ollama_chat.py", line 546, in ollama_acompletion raise OllamaError(status_code=resp.status, message=text) litellm.llms.ollama_chat.OllamaError: {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\cli\fast_api.py", line 605, in event_generator async for event in runner.run_async( File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\runners.py", line 197, in run_async async for event in invocation_context.agent.run_async(invocation_context): File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\agents\base_agent.py", line 141, in run_async async for event in self._run_async_impl(ctx): File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\agents\llm_agent.py", line 232, in _run_async_impl async for event in self._llm_flow.run_async(ctx): File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 231, in run_async async for event in self._run_one_step_async(invocation_context): File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 257, in _run_one_step_async async for llm_response in self._call_llm_async( File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 470, in _call_llm_async async for llm_response in llm.generate_content_async( File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\models\lite_llm.py", line 658, in generate_content_async response = await self.llm_client.acompletion(**completion_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\google\adk\models\lite_llm.py", line 88, in acompletion return await acompletion( ^^^^^^^^^^^^^^^^^^ File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\utils.py", line 1452, in wrapper_async raise e File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\utils.py", line 1313, in wrapper_async result = await original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\main.py", line 496, in acompletion raise exception_type( ^^^^^^^^^^^^^^^ File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2214, in exception_type raise e File "C:\Users\benni\Documents\nosana\demos\agent-sandbox\venv\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2183, in exception_type raise APIConnectionError( litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"json: cannot unmarshal array into Go struct field ChatRequest.messages.content of type string"} ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.5

GiteaMirror added the bug label 2026-04-12 18:29:17 -05:00

GiteaMirror closed this issue

2026-04-12 18:29:17 -05:00

GiteaMirror commented

2026-04-12 18:29:19 -05:00

@rick-github commented on GitHub (Apr 13, 2025):

The litellm library is adding a tool_call_id to the API call which is not supported in the ollama API. It is, however, supported in the OpenAI compatibility endpoint:

--- agent.py.orig	2025-04-14 01:48:17.649535644 +0200
+++ agent.py	2025-04-14 01:48:36.530916307 +0200
@@ -58,7 +58,8 @@
 root_agent = Agent(
     name="weather_time_agent",
      model=LiteLlm(
-        model='ollama_chat/mistral-small3.1'
+        api_base='http://localhost:11434/v1',
+        model='openai/mistral-small3.1'
     ),
     description=(
         "Agent to answer questions about the time and weather in a city."

$ adk run .
Log setup complete: /tmp/agents_log/agent.20250414_014844.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
Running agent weather_time_agent, type exit to exit.
user: what is the time in new york?
01:48:53 - LiteLLM:INFO: utils.py:3085 - 
LiteLLM completion() model= mistral-small3.1; provider = openai
01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1
01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1
01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1
01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: mistral-small3.1
01:48:58 - LiteLLM:INFO: utils.py:3085 - 
LiteLLM completion() model= mistral-small3.1; provider = openai
01:49:07 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1
01:49:07 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1
[weather_time_agent]: The current time in new york is 2025-04-13 19:48:58 EDT-0400

@rick-github commented on GitHub (Apr 13, 2025): The litellm library is adding a `tool_call_id` to the API call which is not supported in the ollama API. It is, however, supported in the OpenAI compatibility endpoint: ```diff --- agent.py.orig 2025-04-14 01:48:17.649535644 +0200 +++ agent.py 2025-04-14 01:48:36.530916307 +0200 @@ -58,7 +58,8 @@ root_agent = Agent( name="weather_time_agent", model=LiteLlm( - model='ollama_chat/mistral-small3.1' + api_base='http://localhost:11434/v1', + model='openai/mistral-small3.1' ), description=( "Agent to answer questions about the time and weather in a city." ``` ```console $ adk run . Log setup complete: /tmp/agents_log/agent.20250414_014844.log To access latest log: tail -F /tmp/agents_log/agent.latest.log Running agent weather_time_agent, type exit to exit. user: what is the time in new york? 01:48:53 - LiteLLM:INFO: utils.py:3085 - LiteLLM completion() model= mistral-small3.1; provider = openai 01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1 01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1 01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1 01:48:58 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: mistral-small3.1 01:48:58 - LiteLLM:INFO: utils.py:3085 - LiteLLM completion() model= mistral-small3.1; provider = openai 01:49:07 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1 01:49:07 - LiteLLM:INFO: cost_calculator.py:636 - selected model name for cost calculation: openai/mistral-small3.1 [weather_time_agent]: The current time in new york is 2025-04-13 19:48:58 EDT-0400 ```

GiteaMirror commented

2026-04-12 18:29:19 -05:00

@kevensen commented on GitHub (Apr 19, 2025):

I am seeing this as well

OS
Linux

GPU
Nvidia

CPU
AMD

Ollama version
0.6.5

@kevensen commented on GitHub (Apr 19, 2025): I am seeing this as well **OS** Linux **GPU** Nvidia **CPU** AMD **Ollama version** 0.6.5

GiteaMirror commented

2026-04-12 18:29:20 -05:00

@rick-github commented on GitHub (Apr 21, 2025):

Have you tried https://github.com/ollama/ollama/issues/10255#issuecomment-2800188768 ?

@rick-github commented on GitHub (Apr 21, 2025): Have you tried https://github.com/ollama/ollama/issues/10255#issuecomment-2800188768 ?

GiteaMirror referenced this issue

2026-04-12 23:52:03 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #12208

GiteaMirror referenced this issue

2026-04-13 00:25:43 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #13386

GiteaMirror referenced this issue

2026-04-16 06:03:47 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #17479

GiteaMirror referenced this issue

2026-04-16 06:42:25 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #18657

GiteaMirror referenced this issue

2026-04-19 16:32:31 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #22748

GiteaMirror referenced this issue

2026-04-19 17:17:26 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #23926

GiteaMirror referenced this issue

2026-04-22 07:10:33 -05:00

[GH-ISSUE #4643] Llama.cpp now supports distributed inference across multiple machines. #28677

GiteaMirror referenced this issue

2026-04-22 07:10:37 -05:00

[GH-ISSUE #4643] Llama.cpp now supports distributed inference across multiple machines. #28677

GiteaMirror referenced this issue

2026-04-22 10:07:28 -05:00

[GH-ISSUE #7416] Optimizing Single Inference Performance on Distributed GPUs with Ollama’s Parallel Inference #30476

GiteaMirror referenced this issue

2026-04-22 10:22:29 -05:00

[GH-ISSUE #7570] How to install Olama in a distributed manner #30583

GiteaMirror referenced this issue

2026-04-22 12:25:38 -05:00

[GH-ISSUE #9147] Does ollama support multi-node pipeline inference? #31711

GiteaMirror referenced this issue

2026-04-22 13:44:20 -05:00

[GH-ISSUE #10189] 双服务器显卡共用 #32444

GiteaMirror referenced this issue

2026-04-22 22:45:19 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #38081

GiteaMirror referenced this issue

2026-04-22 23:55:20 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #39259

GiteaMirror referenced this issue

2026-04-24 23:05:12 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #43456

GiteaMirror referenced this issue

2026-04-25 00:14:35 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #44634

GiteaMirror referenced this issue

2026-04-28 11:47:02 -05:00

[GH-ISSUE #4643] Llama.cpp now supports distributed inference across multiple machines. #49428

GiteaMirror referenced this issue

2026-04-28 11:47:04 -05:00

[GH-ISSUE #4643] Llama.cpp now supports distributed inference across multiple machines. #49428

GiteaMirror referenced this issue

2026-04-28 18:57:18 -05:00

[GH-ISSUE #7416] Optimizing Single Inference Performance on Distributed GPUs with Ollama’s Parallel Inference #51227

GiteaMirror referenced this issue

2026-04-28 19:33:37 -05:00

[GH-ISSUE #7570] How to install Olama in a distributed manner #51334

GiteaMirror referenced this issue

2026-04-28 23:24:37 -05:00

[GH-ISSUE #9147] Does ollama support multi-node pipeline inference? #52462

GiteaMirror referenced this issue

2026-04-29 02:22:26 -05:00

[GH-ISSUE #10189] 双服务器显卡共用 #53196

GiteaMirror referenced this issue

2026-04-29 13:47:23 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #58905

GiteaMirror referenced this issue

2026-04-29 15:00:24 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #60083

GiteaMirror referenced this issue

2026-05-03 19:23:53 -05:00

[GH-ISSUE #4643] Llama.cpp now supports distributed inference across multiple machines. #64954

GiteaMirror referenced this issue

2026-05-03 19:23:57 -05:00

[GH-ISSUE #4643] Llama.cpp now supports distributed inference across multiple machines. #64954

GiteaMirror referenced this issue

2026-05-04 08:08:57 -05:00

[GH-ISSUE #7416] Optimizing Single Inference Performance on Distributed GPUs with Ollama’s Parallel Inference #66772

GiteaMirror referenced this issue

2026-05-04 08:35:46 -05:00

[GH-ISSUE #7570] How to install Olama in a distributed manner #66879

GiteaMirror referenced this issue

2026-05-04 12:14:37 -05:00

[GH-ISSUE #9147] Does ollama support multi-node pipeline inference? #68007

GiteaMirror referenced this issue

2026-05-04 15:03:13 -05:00

[GH-ISSUE #10189] 双服务器显卡共用 #68741

GiteaMirror referenced this issue

2026-05-05 06:36:34 -05:00

[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #74502

GiteaMirror referenced this issue

2026-05-05 08:06:09 -05:00

[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #75680

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-mlx-decode-checkpoints

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#6729