[GH-ISSUE #2024] falcon model not working. #63207

Closed
opened 2026-05-03 12:35:15 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @iplayfast on GitHub (Jan 16, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2024

I've been working with https://github.com/jmorganca/ollama/issues/1691 and found that it consistently dies with falcon.
So I tried falcon on it's own. It died.
So I tried removing falcon and reinstalling it.
Still died.
I can no longer get falcon to work.
I'm on Ollama version 0.1.20

Originally created by @iplayfast on GitHub (Jan 16, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2024 I've been working with https://github.com/jmorganca/ollama/issues/1691 and found that it consistently dies with falcon. So I tried falcon on it's own. It died. So I tried removing falcon and reinstalling it. Still died. I can no longer get falcon to work. I'm on Ollama version 0.1.20
GiteaMirror added the bug label 2026-05-03 12:35:15 -05:00
Author
Owner

@EMREOYUN commented on GitHub (Apr 2, 2024):

Successfully reproduced.

Debug server throws CUBLAS_STATUS_NOT_SUPPORTED error and only in Falcon models.
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED current device: 0, in function ggml_cuda_mul_mat_batched_cublas at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:10604 cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"

Using:

  • Ollama: 0.1.30 Windows Preview
  • CUDA: nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2024 NVIDIA Corporation
    Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
    Cuda compilation tools, release 12.4, V12.4.99
    Build cuda_12.4.r12.4/compiler.33961263_0
  • GPU: NVIDIA RTX 3050 Laptop
  • CPU: Intel i5-11400H
<!-- gh-comment-id:2031673873 --> @EMREOYUN commented on GitHub (Apr 2, 2024): Successfully reproduced. Debug server throws CUBLAS_STATUS_NOT_SUPPORTED error and only in Falcon models. `CUDA error: CUBLAS_STATUS_NOT_SUPPORTED current device: 0, in function ggml_cuda_mul_mat_batched_cublas at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:10604 cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"` Using: - Ollama: 0.1.30 Windows Preview - CUDA: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0 - GPU: NVIDIA RTX 3050 Laptop - CPU: Intel i5-11400H
Author
Owner

@iplayfast commented on GitHub (Apr 2, 2024):

Yup still dead with ollama 0.1.30 linux

<!-- gh-comment-id:2033187270 --> @iplayfast commented on GitHub (Apr 2, 2024): Yup still dead with ollama 0.1.30 linux
Author
Owner

@pdevine commented on GitHub (May 17, 2024):

We use llama.cpp for the backend runner and it unfortunately dropped support for the original falcon model. There is now the falcon2 model which does work.

<!-- gh-comment-id:2118409444 --> @pdevine commented on GitHub (May 17, 2024): We use llama.cpp for the backend runner and it unfortunately dropped support for the original falcon model. There is now the [falcon2](https://ollama.com/library/falcon2) model which does work.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63207