[GH-ISSUE #6355] LLama 3.1 Tools do not work properly #66025

Closed
opened 2026-05-03 23:39:24 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @nomisto on GitHub (Aug 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6355

What is the issue?

I can't get the tooling function of llama 3.1 to work properly.

messages = [
 {'role': 'system', 'content': 'You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the orginal use question.'},
 {'role': 'user', 'content': 'What is the weather in Toronto?'},
 {'role': 'assistant',
  'content': '',
  'tool_calls': [{'function': {'name': 'get_current_weather',
     'arguments': {'city': 'Toronto'}},
    'id': 'call_6duDxk',
    'type': 'function'}]},
 {'role': 'ipython',
  'tool_call_id': 'call_6duDxk',
  'name': 'get_current_weather',
  'content': '{"city": "Toronto", "weather": "sunny"}'}]

Using ollama, I get the following (hallucinated) output:

response = ollama.chat(
    model='llama3.1:8b-instruct-fp16',
    messages=messages,
    tools=tools
)
response
{'model': 'llama3.1:8b-instruct-fp16',
 'created_at': '2024-08-14T09:12:15.708397447Z',
 'message': {'role': 'assistant',
  'content': ' \n\nThe current weather in Toronto is mostly cloudy with a temperature of 22°C (72°F) and a gentle breeze of 15 km/h (9 mph).'},
 'done_reason': 'stop',
 'done': True,
 'total_duration': 1352319726,
 'load_duration': 70634597,
 'prompt_eval_count': 101,
 'prompt_eval_duration': 44061000,
 'eval_count': 34,
 'eval_duration': 1013216000}

Using plain transformers I get the correct answer

import transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
device = "cuda:5"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(device)
outputs = model.generate(inputs, do_sample=False, max_new_tokens=256)
response = tokenizer.batch_decode(outputs[:, inputs.shape[1]:], skip_special_tokens=True)[0]
response
'The current weather in Toronto is sunny.'

Maybe this is related to #6129?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.5

Originally created by @nomisto on GitHub (Aug 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6355 ### What is the issue? I can't get the tooling function of llama 3.1 to work properly. ```python messages = [ {'role': 'system', 'content': 'You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the orginal use question.'}, {'role': 'user', 'content': 'What is the weather in Toronto?'}, {'role': 'assistant', 'content': '', 'tool_calls': [{'function': {'name': 'get_current_weather', 'arguments': {'city': 'Toronto'}}, 'id': 'call_6duDxk', 'type': 'function'}]}, {'role': 'ipython', 'tool_call_id': 'call_6duDxk', 'name': 'get_current_weather', 'content': '{"city": "Toronto", "weather": "sunny"}'}] ``` Using ollama, I get the following (hallucinated) output: ```python response = ollama.chat( model='llama3.1:8b-instruct-fp16', messages=messages, tools=tools ) response ``` ```python {'model': 'llama3.1:8b-instruct-fp16', 'created_at': '2024-08-14T09:12:15.708397447Z', 'message': {'role': 'assistant', 'content': ' \n\nThe current weather in Toronto is mostly cloudy with a temperature of 22°C (72°F) and a gentle breeze of 15 km/h (9 mph).'}, 'done_reason': 'stop', 'done': True, 'total_duration': 1352319726, 'load_duration': 70634597, 'prompt_eval_count': 101, 'prompt_eval_duration': 44061000, 'eval_count': 34, 'eval_duration': 1013216000} ``` Using *plain* transformers I get the correct answer ```python import transformers import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct" device = "cuda:5" model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16) model.to(device) tokenizer = AutoTokenizer.from_pretrained(model_id) inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(device) outputs = model.generate(inputs, do_sample=False, max_new_tokens=256) response = tokenizer.batch_decode(outputs[:, inputs.shape[1]:], skip_special_tokens=True)[0] response ``` ```text 'The current weather in Toronto is sunny.' ``` Maybe this is related to #6129? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.5
GiteaMirror added the bug label 2026-05-03 23:39:24 -05:00
Author
Owner

@nomisto commented on GitHub (Aug 14, 2024):

Using "role": "tool" instead of "role": "ipython" resolved the issue. I don't know if this is intended or not.

Closing this since template of llama 3.1 hints that this is intended:
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

<!-- gh-comment-id:2288281257 --> @nomisto commented on GitHub (Aug 14, 2024): Using "role": "tool" instead of "role": "ipython" resolved the issue. I don't know if this is intended or not. Closing this since template of llama 3.1 hints that this is intended: `{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66025