[GH-ISSUE #11102] model magistral is not working by python api ? #7323

Closed
opened 2026-04-12 19:22:21 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @fredmo on GitHub (Jun 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11102

What is the issue?

Hello,

I am not able to use magistral model with the following script python.
Could you please fix or help?

python my_script.sh "test"

=> it s not answering with magistral.
=> this is answering with dolphin3.

For info, Magistral is working by chat with the ollama run magistral.

( tested on both windows and ubuntu )

#!/usr/bin/env python3
import sys
import io
from ollama import chat

# Forcer stdout en UTF-8 (utile surtout sur Windows)
sys.stdout.reconfigure(encoding="utf-8", errors="surrogateescape")

def test_ollama(prompt):
    try:
        #response = chat(model="dolphin3", messages=[{'role': 'user', 'content': prompt}])
        response = chat(model="magistral", messages=[{'role': 'user', 'content': prompt}])
        message = response['message']['content']
        print(message)
    except Exception as e:
        print(f"Erreur : {e}", file=sys.stderr)

if __name__ == "__main__":
    prompt = None

    # 1. Si un argument est fourni
    if len(sys.argv) > 1:
        prompt = sys.argv[1]

    # 2. Sinon, lire depuis stdin (utile pour SSH ou redirection)
    elif not sys.stdin.isatty():
        try:
            prompt = sys.stdin.read()
        except Exception as e:
            print(f"Erreur de lecture depuis stdin : {e}", file=sys.stderr)
            sys.exit(1)

    # 3. Si rien n?est fourni
    else:
        print("Veuillez fournir un argument ou une entrée standard.", file=sys.stderr)
        sys.exit(1)

    # Appel de la fonction principale
    test_ollama(prompt)

Relevant log output

By doing Ctrl+C :

python my_script.py "test?"
Traceback (most recent call last):
  File "C:\Users\me\my_script.py", line 40, in <module>
    test_ollama(prompt)
  File "C:\Users\me\my_script.py", line 13, in test_ollama
    response = chat(model="magistral", messages=[{'role': 'user', 'content': prompt}])
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\ollama\_client.py", line 342, in chat
    return self._request(
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\ollama\_client.py", line 180, in _request
    return cls(**self._request_raw(*args, **kwargs).json())
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\ollama\_client.py", line 120, in _request_raw
    r = self._client.request(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 825, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 914, in send
    response = self._send_handling_auth(
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 1014, in _send_single_request
    response = transport.handle_request(request)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_transports\default.py", line 250, in handle_request
    resp = self._pool.handle_request(req)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection_pool.py", line 256, in handle_request
    raise exc from None
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection_pool.py", line 236, in handle_request
    response = connection.handle_request(
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection.py", line 103, in handle_request
    return self._connection.handle_request(request)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 136, in handle_request
    raise exc
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 106, in handle_request
    ) = self._receive_response_headers(**kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 177, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 217, in _receive_event
    data = self._network_stream.read(
  File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_backends\sync.py", line 128, in read
    return self._sock.recv(max_bytes)
KeyboardInterrupt
^C

OS

Windows

GPU

No response

CPU

No response

Ollama version

ollama version is 0.9.0

Originally created by @fredmo on GitHub (Jun 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11102 ### What is the issue? Hello, I am not able to use magistral model with the following script python. Could you please fix or help? python my_script.sh "test" => it s not answering with magistral. => this is answering with dolphin3. For info, Magistral is working by chat with the ollama run magistral. ( tested on both windows and ubuntu ) ``` #!/usr/bin/env python3 import sys import io from ollama import chat # Forcer stdout en UTF-8 (utile surtout sur Windows) sys.stdout.reconfigure(encoding="utf-8", errors="surrogateescape") def test_ollama(prompt): try: #response = chat(model="dolphin3", messages=[{'role': 'user', 'content': prompt}]) response = chat(model="magistral", messages=[{'role': 'user', 'content': prompt}]) message = response['message']['content'] print(message) except Exception as e: print(f"Erreur : {e}", file=sys.stderr) if __name__ == "__main__": prompt = None # 1. Si un argument est fourni if len(sys.argv) > 1: prompt = sys.argv[1] # 2. Sinon, lire depuis stdin (utile pour SSH ou redirection) elif not sys.stdin.isatty(): try: prompt = sys.stdin.read() except Exception as e: print(f"Erreur de lecture depuis stdin : {e}", file=sys.stderr) sys.exit(1) # 3. Si rien n?est fourni else: print("Veuillez fournir un argument ou une entrée standard.", file=sys.stderr) sys.exit(1) # Appel de la fonction principale test_ollama(prompt) ``` ### Relevant log output By doing Ctrl+C : ```shell python my_script.py "test?" Traceback (most recent call last): File "C:\Users\me\my_script.py", line 40, in <module> test_ollama(prompt) File "C:\Users\me\my_script.py", line 13, in test_ollama response = chat(model="magistral", messages=[{'role': 'user', 'content': prompt}]) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\ollama\_client.py", line 342, in chat return self._request( File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\ollama\_client.py", line 180, in _request return cls(**self._request_raw(*args, **kwargs).json()) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\ollama\_client.py", line 120, in _request_raw r = self._client.request(*args, **kwargs) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 825, in request return self.send(request, auth=auth, follow_redirects=follow_redirects) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 914, in send response = self._send_handling_auth( File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 942, in _send_handling_auth response = self._send_handling_redirects( File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 979, in _send_handling_redirects response = self._send_single_request(request) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 1014, in _send_single_request response = transport.handle_request(request) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_transports\default.py", line 250, in handle_request resp = self._pool.handle_request(req) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection_pool.py", line 256, in handle_request raise exc from None File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection_pool.py", line 236, in handle_request response = connection.handle_request( File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection.py", line 103, in handle_request return self._connection.handle_request(request) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 136, in handle_request raise exc File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 106, in handle_request ) = self._receive_response_headers(**kwargs) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 177, in _receive_response_headers event = self._receive_event(timeout=timeout) File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 217, in _receive_event data = self._network_stream.read( File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_backends\sync.py", line 128, in read return self._sock.recv(max_bytes) KeyboardInterrupt ^C ``` ### OS Windows ### GPU _No response_ ### CPU _No response_ ### Ollama version ollama version is 0.9.0
GiteaMirror added the bug label 2026-04-12 19:22:21 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 17, 2025):

$ ./11102.py hello
<think>
Okay, the user has said "hello". This seems like a greeting. I should respond in a friendly manner to acknowledge the greeting. Since it's just a hello, there might not be any specific task or question here yet. Maybe they want to start a conversation or are testing if I'm available.

But according to the instructions, the user will ask me to solve a task after this. So for now, I should just respond to the greeting.

What's an appropriate response to "hello"? It could be:
- "Hello!"
- "Hi there!"
- "Hey!"

I think any of these would work. Since it's casual and friendly, I'll go with "Hello!" to match their greeting.

But wait, the problem is just "hello". Is this a task or just a greeting? The instruction says: "A user will ask you to solve a task." So "hello" might be separate from any task. Maybe they're just saying hello first before asking for help with something else later.

For now, I'll treat it as a greeting and respond accordingly.
</think>

Summary:
The input is a simple greeting, "hello". As there's no specific task or question to solve at this point, the appropriate response is to acknowledge the greeting in a friendly manner. Therefore, the response should mirror the positivity of the initial message while keeping it concise.

Final answer:

Hello! How can I assist you today?

\boxed{Hello!}

magistral is a much larger model than dolphin3 (14G vs 4.9G). You didn't indicate your hardware but it's possible that magistral does not enitrely fit in the GPU and so will run slower. Additionally, you are not streaming the output, so you have to wait for the entire inference to complete before you see output. This is different to the ollama run experience where you get to see tokens as soon as the model emits them. Also note that magistral is a thinking model, so it will generate a lot more tokens than dolphin3, which will make the wait for output even longer.

<!-- gh-comment-id:2982000640 --> @rick-github commented on GitHub (Jun 17, 2025): ```console $ ./11102.py hello <think> Okay, the user has said "hello". This seems like a greeting. I should respond in a friendly manner to acknowledge the greeting. Since it's just a hello, there might not be any specific task or question here yet. Maybe they want to start a conversation or are testing if I'm available. But according to the instructions, the user will ask me to solve a task after this. So for now, I should just respond to the greeting. What's an appropriate response to "hello"? It could be: - "Hello!" - "Hi there!" - "Hey!" I think any of these would work. Since it's casual and friendly, I'll go with "Hello!" to match their greeting. But wait, the problem is just "hello". Is this a task or just a greeting? The instruction says: "A user will ask you to solve a task." So "hello" might be separate from any task. Maybe they're just saying hello first before asking for help with something else later. For now, I'll treat it as a greeting and respond accordingly. </think> Summary: The input is a simple greeting, "hello". As there's no specific task or question to solve at this point, the appropriate response is to acknowledge the greeting in a friendly manner. Therefore, the response should mirror the positivity of the initial message while keeping it concise. Final answer: Hello! How can I assist you today? \boxed{Hello!} ``` magistral is a much larger model than dolphin3 (14G vs 4.9G). You didn't indicate your hardware but it's possible that magistral does not enitrely fit in the GPU and so will run slower. Additionally, you are not streaming the output, so you have to wait for the entire inference to complete before you see output. This is different to the `ollama run` experience where you get to see tokens as soon as the model emits them. Also note that magistral is a thinking model, so it will generate a lot more tokens than dolphin3, which will make the wait for output even longer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7323