[GH-ISSUE #6259] Inference fails with "llama_get_logits_ith: invalid logits id 7, reason: no logits" #50429

Closed
opened 2026-04-28 15:49:13 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @yurivict on GitHub (Aug 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6259

What is the issue?

Here is the error log:

[GIN] 2024/08/07 - 10:01:31 | 200 |  7.589808394s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-07T10:01:31.521-07:00 level=DEBUG source=sched.go:462 msg="context for request finished"
time=2024-08-07T10:01:31.521-07:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/yuri/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 duration=5m0s
time=2024-08-07T10:01:31.521-07:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/home/yuri/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 refCount=0
time=2024-08-07T10:01:52.804-07:00 level=DEBUG source=sched.go:571 msg="evaluating already loaded" model=/home/yuri/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435
DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="0x236209412000" timestamp=1723050112
time=2024-08-07T10:01:52.854-07:00 level=DEBUG source=routes.go:1346 msg="chat request" images=0 prompt="[INST] Say something.[/INST] "
DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="0x236209412000" timestamp=1723050112
DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="0x236209412000" timestamp=1723050112
DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=8 slot_id=0 task_id=3 tid="0x236209412000" timestamp=1723050112
DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="0x236209412000" timestamp=1723050112
llama_get_logits_ith: invalid logits id 7, reason: no logits
time=2024-08-07T10:01:53.403-07:00 level=DEBUG source=server.go:1048 msg="stopping llama server"
time=2024-08-07T10:01:53.403-07:00 level=DEBUG source=server.go:1054 msg="waiting for llama server to exit"
time=2024-08-07T10:01:53.951-07:00 level=DEBUG source=server.go:1058 msg="llama server stopped"

The llama-cpp project maintainers seem to be puzzled by this error: https://github.com/ggerganov/llama.cpp/issues/8911

OS

No response

GPU

Nvidia

CPU

Intel

Ollama version

0.3.4

Originally created by @yurivict on GitHub (Aug 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6259 ### What is the issue? Here is the error log: ``` [GIN] 2024/08/07 - 10:01:31 | 200 | 7.589808394s | 127.0.0.1 | POST "/api/chat" time=2024-08-07T10:01:31.521-07:00 level=DEBUG source=sched.go:462 msg="context for request finished" time=2024-08-07T10:01:31.521-07:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/yuri/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 duration=5m0s time=2024-08-07T10:01:31.521-07:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/home/yuri/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 refCount=0 time=2024-08-07T10:01:52.804-07:00 level=DEBUG source=sched.go:571 msg="evaluating already loaded" model=/home/yuri/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="0x236209412000" timestamp=1723050112 time=2024-08-07T10:01:52.854-07:00 level=DEBUG source=routes.go:1346 msg="chat request" images=0 prompt="[INST] Say something.[/INST] " DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="0x236209412000" timestamp=1723050112 DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="0x236209412000" timestamp=1723050112 DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=8 slot_id=0 task_id=3 tid="0x236209412000" timestamp=1723050112 DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="0x236209412000" timestamp=1723050112 llama_get_logits_ith: invalid logits id 7, reason: no logits time=2024-08-07T10:01:53.403-07:00 level=DEBUG source=server.go:1048 msg="stopping llama server" time=2024-08-07T10:01:53.403-07:00 level=DEBUG source=server.go:1054 msg="waiting for llama server to exit" time=2024-08-07T10:01:53.951-07:00 level=DEBUG source=server.go:1058 msg="llama server stopped" ``` The llama-cpp project maintainers seem to be puzzled by this error: https://github.com/ggerganov/llama.cpp/issues/8911 ### OS _No response_ ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.4
GiteaMirror added the bug label 2026-04-28 15:49:13 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 8, 2024):

ollama does in fact use the --embedding flag, it's shown in part of the log that wasn't included in the post. Since ollama uses a modified version of llama.cpp, it's presumably some holdover.

<!-- gh-comment-id:2276435566 --> @rick-github commented on GitHub (Aug 8, 2024): ollama does in fact use the `--embedding` flag, it's shown in part of the log that wasn't included in the post. Since ollama uses a modified version of llama.cpp, it's presumably some holdover.
Author
Owner

@yurivict commented on GitHub (Aug 8, 2024):

When we build ollama in the FreeBSD port - it defaults to use the pre-installed shared library libllama.so that comes from the llama-cpp port.

This happens in the engine executables that are placed under /tmp during the run-time.

<!-- gh-comment-id:2276443403 --> @yurivict commented on GitHub (Aug 8, 2024): When we build ollama in the FreeBSD port - it defaults to use the pre-installed shared library libllama.so that comes from the llama-cpp port. This happens in the engine executables that are placed under /tmp during the run-time.
Author
Owner

@rick-github commented on GitHub (Aug 8, 2024):

That probably explains why this doesn't happen on other platforms, I did ollama run mistral:7b-instruct-v0.3-q4_0 and typed Say something. without the logits error. Linux, Nvidia RTX4070.

<!-- gh-comment-id:2276449355 --> @rick-github commented on GitHub (Aug 8, 2024): That probably explains why this doesn't happen on other platforms, I did `ollama run mistral:7b-instruct-v0.3-q4_0` and typed `Say something.` without the logits error. Linux, Nvidia RTX4070.
Author
Owner

@yurivict commented on GitHub (Aug 8, 2024):

When cmake links engine executables it doesn't use the built static llama library, and instead uses the external shared library.

<!-- gh-comment-id:2276452028 --> @yurivict commented on GitHub (Aug 8, 2024): When cmake links engine executables it doesn't use the built static llama library, and instead uses the external shared library.
Author
Owner

@yurivict commented on GitHub (Aug 8, 2024):

llm/ext_server/CMakeLists.txt has to have the full path to the static library in order to link to that static library,

The current code:

set(TARGET ollama_llama_server)
option(LLAMA_SERVER_VERBOSE "Build verbose logging option for Server" ON)
include_directories(${CMAKE_CURRENT_SOURCE_DIR})
add_executable(${TARGET} server.cpp utils.hpp json.hpp httplib.h)
install(TARGETS ${TARGET} RUNTIME)
target_compile_definitions(${TARGET} PRIVATE
    SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}>
)
target_link_libraries(${TARGET} PRIVATE ggml llama common llava ${CMAKE_THREAD_LIBS_INIT})
if (WIN32)
    TARGET_LINK_LIBRARIES(${TARGET} PRIVATE ws2_32)
endif()
target_compile_features(${TARGET} PRIVATE cxx_std_11)

links to the shared lib.

<!-- gh-comment-id:2276458985 --> @yurivict commented on GitHub (Aug 8, 2024): llm/ext_server/CMakeLists.txt has to have the full path to the static library in order to link to that static library, The current code: ``` set(TARGET ollama_llama_server) option(LLAMA_SERVER_VERBOSE "Build verbose logging option for Server" ON) include_directories(${CMAKE_CURRENT_SOURCE_DIR}) add_executable(${TARGET} server.cpp utils.hpp json.hpp httplib.h) install(TARGETS ${TARGET} RUNTIME) target_compile_definitions(${TARGET} PRIVATE SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}> ) target_link_libraries(${TARGET} PRIVATE ggml llama common llava ${CMAKE_THREAD_LIBS_INIT}) if (WIN32) TARGET_LINK_LIBRARIES(${TARGET} PRIVATE ws2_32) endif() target_compile_features(${TARGET} PRIVATE cxx_std_11) ``` links to the shared lib.
Author
Owner

@rick-github commented on GitHub (Aug 8, 2024):

I don't know cmake but the documentation for target_link_libraries indicates that in some cases it uses the linker to find the library, so maybe an adjustment to a build time setting like CMAKE_BUILD_RPATH is needed.

<!-- gh-comment-id:2276474912 --> @rick-github commented on GitHub (Aug 8, 2024): I don't know cmake but the documentation for [target_link_libraries](https://cmake.org/cmake/help/latest/command/target_link_libraries.html) indicates that in some cases it uses the linker to find the library, so maybe an adjustment to a build time setting like [CMAKE_BUILD_RPATH](https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_RPATH.html) is needed.
Author
Owner

@rick-github commented on GitHub (Aug 8, 2024):

Or maybe CMAKE_LIBRARY_PATH.

<!-- gh-comment-id:2276501503 --> @rick-github commented on GitHub (Aug 8, 2024): Or maybe [CMAKE_LIBRARY_PATH](https://cmake.org/cmake/help/latest/variable/CMAKE_LIBRARY_PATH.html).
Author
Owner

@yurivict commented on GitHub (Aug 8, 2024):

I patched the CMakeLists.txt file in the FreeBSD port and the problem went away. Inference works now.

But this looks like a bug in CMakeLists.txt - it needs either full paths to libllama.a and libggml.a, or some of the flags that you mentioned above.

I used full paths in the FreeBSD port patch.

<!-- gh-comment-id:2276533736 --> @yurivict commented on GitHub (Aug 8, 2024): I patched the CMakeLists.txt file in the FreeBSD port and the problem went away. Inference works now. But this looks like a bug in CMakeLists.txt - it needs either full paths to libllama.a and libggml.a, or some of the flags that you mentioned above. I used full paths in the FreeBSD port patch.
Author
Owner

@dhiltgen commented on GitHub (Aug 9, 2024):

It sounds like this has been resolved now so I'll close the issue.

@yurivict you should keep an eye on #5034 as the goal is to eventually vendor the native code and largely build it all with golang+cgo and move away from cmake (we'll still need a minimal makefile to handle the nvcc/hipcc build portions for the ggml library)

<!-- gh-comment-id:2278585647 --> @dhiltgen commented on GitHub (Aug 9, 2024): It sounds like this has been resolved now so I'll close the issue. @yurivict you should keep an eye on #5034 as the goal is to eventually vendor the native code and largely build it all with golang+cgo and move away from cmake (we'll still need a minimal makefile to handle the nvcc/hipcc build portions for the ggml library)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50429