[GH-ISSUE #9275] GenerateResponse.Metrics incorrect for model load #68102

Open
opened 2026-05-04 12:33:07 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @stevenh on GitHub (Feb 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9275

What is the issue?

If you run a Generate call to load a model the resulting GenerateResponse.Metrics for example:

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/ollama/ollama/api"
)

func loadModel(model string) error {
	client, err := api.ClientFromEnvironment()
	if err != nil {
		return fmt.Errorf("create client: %w", err)
	}

	fmt.Println("Loading model:", model)
	start := time.Now()
	defer func() {
		fmt.Println("Loaded model", model, "took", time.Since(start))
	}()
	if err = client.Generate(context.Background(), &api.GenerateRequest{
		Model: model,
	}, func(resp api.GenerateResponse) error {
		fmt.Printf("Generate response: %#v\n", resp)
		return nil
	}); err != nil {
		return fmt.Errorf("load: %w", err)
	}

	return nil
}

func main() {
	if err := loadModel("llama3.3"); err != nil {
		log.Fatal("load failed:", err)
	}
}

Results in:

go run main.go 
Loading model: llama3.3
api.GenerateResponse{Model:"llama3.3", CreatedAt:time.Date(2025, time.February, 21, 15, 32, 43, 86762200, time.UTC), Response:"", Done:true, DoneReason:"load", Context:[]int(nil), Metrics:api.Metrics{TotalDuration:0, LoadDuration:0, PromptEvalCount:0, PromptEvalDuration:0, EvalCount:0, EvalDuration:0}}
Loaded model llama3.3 took 43.8886362s

As you can see the load actually took 43 seconds but both TotalDuration and LoadDuation are both zero.

Relevant log output

api.GenerateResponse{Model:"llama3.3", CreatedAt:time.Date(2025, time.February, 21, 15, 32, 43, 86762200, time.UTC), Response:"", Done:true, DoneReason:"load", Context:[]int(nil), Metrics:api.Metrics{TotalDuration:0, LoadDuration:0, PromptEvalCount:0, PromptEvalDuration:0, EvalCount:0, EvalDuration:0}}

OS

Windows 11

GPU

Nvidia Geforce RTX 4070 Laptop GPU 8GB

CPU

Intel Core i9 14900hx

Ollama version

ollama version is 0.5.11

Originally created by @stevenh on GitHub (Feb 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9275 ### What is the issue? If you run a Generate call to load a model the resulting `GenerateResponse.Metrics` for example: ```go package main import ( "context" "fmt" "log" "time" "github.com/ollama/ollama/api" ) func loadModel(model string) error { client, err := api.ClientFromEnvironment() if err != nil { return fmt.Errorf("create client: %w", err) } fmt.Println("Loading model:", model) start := time.Now() defer func() { fmt.Println("Loaded model", model, "took", time.Since(start)) }() if err = client.Generate(context.Background(), &api.GenerateRequest{ Model: model, }, func(resp api.GenerateResponse) error { fmt.Printf("Generate response: %#v\n", resp) return nil }); err != nil { return fmt.Errorf("load: %w", err) } return nil } func main() { if err := loadModel("llama3.3"); err != nil { log.Fatal("load failed:", err) } } ``` Results in: ```shell go run main.go Loading model: llama3.3 api.GenerateResponse{Model:"llama3.3", CreatedAt:time.Date(2025, time.February, 21, 15, 32, 43, 86762200, time.UTC), Response:"", Done:true, DoneReason:"load", Context:[]int(nil), Metrics:api.Metrics{TotalDuration:0, LoadDuration:0, PromptEvalCount:0, PromptEvalDuration:0, EvalCount:0, EvalDuration:0}} Loaded model llama3.3 took 43.8886362s ``` As you can see the load actually took 43 seconds but both TotalDuration and LoadDuation are both zero. ### Relevant log output ```shell api.GenerateResponse{Model:"llama3.3", CreatedAt:time.Date(2025, time.February, 21, 15, 32, 43, 86762200, time.UTC), Response:"", Done:true, DoneReason:"load", Context:[]int(nil), Metrics:api.Metrics{TotalDuration:0, LoadDuration:0, PromptEvalCount:0, PromptEvalDuration:0, EvalCount:0, EvalDuration:0}} ``` ### OS Windows 11 ### GPU Nvidia Geforce RTX 4070 Laptop GPU 8GB ### CPU Intel Core i9 14900hx ### Ollama version ollama version is 0.5.11
GiteaMirror added the bug label 2026-05-04 12:33:07 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 21, 2025):

ollama currently doesn't generate statistics for a load operation.

$ curl -s localhost:11434/api/generate -d '{"model":"qwen2.5:0.5b"}' | jq
{
  "model": "qwen2.5:0.5b",
  "created_at": "2025-02-21T16:24:24.000933491Z",
  "response": "",
  "done": true,
  "done_reason": "load"
}

Looking at the code, should be simple to add.

<!-- gh-comment-id:2675010512 --> @rick-github commented on GitHub (Feb 21, 2025): ollama currently doesn't generate statistics for a load operation. ```console $ curl -s localhost:11434/api/generate -d '{"model":"qwen2.5:0.5b"}' | jq { "model": "qwen2.5:0.5b", "created_at": "2025-02-21T16:24:24.000933491Z", "response": "", "done": true, "done_reason": "load" } ``` Looking at the code, should be simple to add.
Author
Owner

@stevenh commented on GitHub (Feb 21, 2025):

It's odd that it specifically has loading stats but doesn't populate them, will have a look at the code.

<!-- gh-comment-id:2675097994 --> @stevenh commented on GitHub (Feb 21, 2025): It's odd that it specifically has loading stats but doesn't populate them, will have a look at the code.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68102