[PR #6170] [CLOSED] Compatibility Patches for LUMI Supercomputer #58739

Closed
opened 2026-04-29 13:38:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6170
Author: @lupreCSC
Created: 8/5/2024
Status: Closed

Base: mainHead: lumi-patches


📝 Commits (5)

  • 32adb46 Skip unreadable AMD GPU nodes in AMDGetGPUInfo.
  • 07eb2f0 Check ROCM install for libhipblas.so* instead of libhipblas.so.2*
  • 77cc65f Index only available AMD GPUs.
  • 9d70623 Revert "Check ROCM install for libhipblas.so* instead of libhipblas.so.2*"
  • c1f6eaa Make libhipblas version runtime check compile-time configurable.

📊 Changes

2 files changed (+14 additions, -5 deletions)

View changed files

📝 Dockerfile (+1 -0)
📝 gpu/amd_linux.go (+13 -5)

📄 Description

These are 3 patches required to allow Ollama to be run on the LUMI EuroHPC Supercomputer.

  1. 32adb46833 - Skip unreadable AMD GPU nodes in AMDGetGPUInfo.
    This is a general bug fix: Errors while reading (as opposed to opening) a /sys/class/kfd/kfd/topology/nodes/*/properties file were previously not caught, leading to uninitialized values for the corresponding GPU in AMDGetGPUInfo.
  2. 07eb2f07a6 - Check ROCM install for libhipblas.so* instead of libhipblas.so.2*
    Ollama compiles fine for older ROCm versions (such as 5.6.1) but then refuses to use them due to the current detection logic requiring libhipblas.so.2* (without the code apparently really relying on hipblas v2). Relaxing to libhipblas.so.* allows it to run without observed issues.
  3. 77cc65fa1b - Index only available AMD GPUs.
    On systems using the SLUM job scheduler and cgroups to limit access to GPU devices, Ollama's mapping of device indices to devices is inaccurate, i.e., devices end up being indexed differently by the ROCm Runtime than by Ollama in this case. (by indexed I here refer to the number assigned to a GPU device, as is used, e.g., to set ROCR_VISIBLE_DEVICES, etc). This is a fix for that.

Please refer to the individual commit texts for more detailed descriptions.

For the last commit I have to admit that I cannot be a hundred percent sure that it would work for all systems. It appears to work fine for LUMI but I was unable to confirm this for other systems. I am willing to exclude it from this PR in favour of getting the other changes in, if you feel strongly opposed to it. However, the underlying problem that it attempts to fix should have another solution then. (Probably ideal would be if Ollama would use some ROCm provided API to obtain all available devices rather than enumerating things in /sys manually).


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6170 **Author:** [@lupreCSC](https://github.com/lupreCSC) **Created:** 8/5/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `lumi-patches` --- ### 📝 Commits (5) - [`32adb46`](https://github.com/ollama/ollama/commit/32adb46833f0e47b2ef3d967dc2d49dd7226e98f) Skip unreadable AMD GPU nodes in AMDGetGPUInfo. - [`07eb2f0`](https://github.com/ollama/ollama/commit/07eb2f07a6ec24deb68980f1544fff7a206479fc) Check ROCM install for libhipblas.so* instead of libhipblas.so.2* - [`77cc65f`](https://github.com/ollama/ollama/commit/77cc65fa1bd9831eb92909b757301213718d7f82) Index only available AMD GPUs. - [`9d70623`](https://github.com/ollama/ollama/commit/9d70623220a3737d15956deb911fcec1496d7aac) Revert "Check ROCM install for libhipblas.so* instead of libhipblas.so.2*" - [`c1f6eaa`](https://github.com/ollama/ollama/commit/c1f6eaa5c89ea32123c1f43b4f6b45cc57f3bf4a) Make libhipblas version runtime check compile-time configurable. ### 📊 Changes **2 files changed** (+14 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile` (+1 -0) 📝 `gpu/amd_linux.go` (+13 -5) </details> ### 📄 Description These are 3 patches required to allow Ollama to be run on the LUMI EuroHPC Supercomputer. 1. 32adb46833f0e47b2ef3d967dc2d49dd7226e98f - Skip unreadable AMD GPU nodes in AMDGetGPUInfo. This is a general bug fix: Errors while reading (as opposed to opening) a `/sys/class/kfd/kfd/topology/nodes/*/properties` file were previously not caught, leading to uninitialized values for the corresponding GPU in `AMDGetGPUInfo`. 2. 07eb2f07a6ec24deb68980f1544fff7a206479fc - Check ROCM install for libhipblas.so* instead of libhipblas.so.2* Ollama compiles fine for older ROCm versions (such as 5.6.1) but then refuses to use them due to the current detection logic requiring ` libhipblas.so.2*` (without the code apparently really relying on hipblas v2). Relaxing to ` libhipblas.so.*` allows it to run without observed issues. 3. 77cc65fa1bd9831eb92909b757301213718d7f82 - Index only available AMD GPUs. On systems using the SLUM job scheduler and cgroups to limit access to GPU devices, Ollama's mapping of device indices to devices is inaccurate, i.e., devices end up being indexed differently by the ROCm Runtime than by Ollama in this case. (by _indexed_ I here refer to the number assigned to a GPU device, as is used, e.g., to set `ROCR_VISIBLE_DEVICES`, etc). This is a fix for that. Please refer to the individual commit texts for more detailed descriptions. For the last commit I have to admit that I cannot be a hundred percent sure that it would work for all systems. It appears to work fine for LUMI but I was unable to confirm this for other systems. I am willing to exclude it from this PR in favour of getting the other changes in, if you feel strongly opposed to it. However, the underlying problem that it attempts to fix should have another solution then. (Probably ideal would be if Ollama would use some ROCm provided API to obtain all available devices rather than enumerating things in `/sys` manually). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 13:38:08 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#58739