[GH-ISSUE #13821] [Feature Request] Add OpenCL and OpenBLAS support for broader hardware compatibility #71113

Closed
opened 2026-05-05 00:23:50 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @jonarser on GitHub (Jan 21, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13821

Description:
I am a user with older/limited hardware ([GT 710M 1GB, Intel HD 4000]) and would like to run local LLMs efficiently. Currently, Ollama/GPT4All primarily supports acceleration via CUDA (NVIDIA), ROCm (AMD), and Metal (Apple), which excludes many integrated and older discrete GPUs that only have OpenCL drivers.

Proposed Solution:
Integrate the OpenCL and OpenBLAS backends from the upstream llama.cpp project. The OpenCL backend in llama.cpp (-DGGML_OPENCL=ON) already provides good cross-platform support for Adreno, older Intel HD Graphics, and many other GPUs. OpenBLAS (-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS) can significantly accelerate CPU inference, which is crucial when GPU memory is insufficient.

This would make the project accessible to a much wider range of users with non-mainstream or legacy hardware.

My Use Case:

  • Hardware: Hybrid graphics, NVIDIA GT 710M 1GB + Intel HD 4000).
  • Driver/OS: NVIDIA driver 388.xx on Windows 10 and 390xx Linux.
  • Current Limitation: Vulkan support is unstable, CUDA is not optimal for this old GPU. OpenCL is the only viable cross-platform compute API that works.
  • Trying to use Functiongemma 270m

Additional context:

  • The llama.cpp project, which underpins these tools, already has stable OpenCL and BLAS backends.
  • An issue requesting OpenCL support for Ollama already exists (https://github.com/ollama/ollama/issues/7536), indicating community interest.
  • For users with weak or no dedicated GPU, CPU acceleration via BLAS libraries is often the only performant option.
Originally created by @jonarser on GitHub (Jan 21, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13821 Description: I am a user with older/limited hardware ([GT 710M 1GB, Intel HD 4000]) and would like to run local LLMs efficiently. Currently, Ollama/GPT4All primarily supports acceleration via CUDA (NVIDIA), ROCm (AMD), and Metal (Apple), which excludes many integrated and older discrete GPUs that only have OpenCL drivers. Proposed Solution: Integrate the OpenCL and OpenBLAS backends from the upstream llama.cpp project. The OpenCL backend in llama.cpp (-DGGML_OPENCL=ON) already provides good cross-platform support for Adreno, older Intel HD Graphics, and many other GPUs. OpenBLAS (-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS) can significantly accelerate CPU inference, which is crucial when GPU memory is insufficient. This would make the project accessible to a much wider range of users with non-mainstream or legacy hardware. My Use Case: - Hardware: Hybrid graphics, NVIDIA GT 710M 1GB + Intel HD 4000). - Driver/OS: NVIDIA driver 388.xx on Windows 10 and 390xx Linux. - Current Limitation: Vulkan support is unstable, CUDA is not optimal for this old GPU. OpenCL is the only viable cross-platform compute API that works. - Trying to use Functiongemma 270m Additional context: - The llama.cpp project, which underpins these tools, already has stable OpenCL and BLAS backends. - An issue requesting OpenCL support for Ollama already exists (https://github.com/ollama/ollama/issues/7536), indicating community interest. - For users with weak or no dedicated GPU, CPU acceleration via BLAS libraries is often the only performant option.
GiteaMirror added the feature request label 2026-05-05 00:23:50 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71113