[GH-ISSUE #4950] Insufficient Memory Allocation on Android System #3128

Closed
opened 2026-04-12 13:35:18 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @pzzmyc on GitHub (Jun 9, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4950

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Ollama is experiencing insufficient memory allocation on Android devices, resulting in slow inference speeds. Despite having ample swap space (40GB) and available physical memory (7GB), Ollama only allocates around 3GB, and sometimes as low as a few hundred megabytes. This limited memory allocation hinders the model's performance and responsiveness.

Screenshot_20240609_213342
Screenshot_20240609_214005

-- bug report helped by qwen2

OS

Linux

GPU

Other

CPU

Other

Ollama version

client version is 0.1.41

Originally created by @pzzmyc on GitHub (Jun 9, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4950 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Ollama is experiencing insufficient memory allocation on Android devices, resulting in slow inference speeds. Despite having ample swap space (40GB) and available physical memory (7GB), Ollama only allocates around 3GB, and sometimes as low as a few hundred megabytes. This limited memory allocation hinders the model's performance and responsiveness. ![Screenshot_20240609_213342](https://github.com/ollama/ollama/assets/43562427/72cfe960-758b-4283-af56-76a63068266e) ![Screenshot_20240609_214005](https://github.com/ollama/ollama/assets/43562427/ffdf1c18-77ee-4684-8cfc-78e96dda66c7) -- bug report helped by qwen2 ### OS Linux ### GPU Other ### CPU Other ### Ollama version client version is 0.1.41
GiteaMirror added the linuxneeds more infobug labels 2026-04-12 13:35:18 -05:00
Author
Owner

@luojiyin1987 commented on GitHub (Jun 9, 2024):

what is your phone cpu ? It should be 8 Gen 2 or 8 Gen 3 .

<!-- gh-comment-id:2156651477 --> @luojiyin1987 commented on GitHub (Jun 9, 2024): what is your phone cpu ? It should be 8 Gen 2 or 8 Gen 3 .
Author
Owner

@pzzmyc commented on GitHub (Jun 9, 2024):

what is your phone cpu ? It should be 8 Gen 2 or 8 Gen 3 .

yes,it is 8gen3

<!-- gh-comment-id:2156674066 --> @pzzmyc commented on GitHub (Jun 9, 2024): > what is your phone cpu ? It should be 8 Gen 2 or 8 Gen 3 . yes,it is 8gen3
Author
Owner

@luojiyin1987 commented on GitHub (Jun 9, 2024):

I use version 0.1.42 , It use all the cpu, before it only can only 50% cpu
https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html

<!-- gh-comment-id:2156683972 --> @luojiyin1987 commented on GitHub (Jun 9, 2024): I use version 0.1.42 , It use all the cpu, before it only can only 50% cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html
Author
Owner

@pzzmyc commented on GitHub (Jun 10, 2024):

I use version 0.1.42 , It use all the cpu, before it only can only 50% cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html

看起来不错,这个mlc的框架也许比ollama更适合在手机上运行,就是不知道模型格式是否限死

<!-- gh-comment-id:2157058293 --> @pzzmyc commented on GitHub (Jun 10, 2024): > I use version 0.1.42 , It use all the cpu, before it only can only 50% cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html 看起来不错,这个mlc的框架也许比ollama更适合在手机上运行,就是不知道模型格式是否限死
Author
Owner

@luojiyin1987 commented on GitHub (Jun 10, 2024):

I use version 0.1.42 , It use all the cpu, before it only can only 50% cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html我用的是0.1.42版本,它用的是所有的cpu,之前只能用50%的cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html

看起来不错,这个mlc的框架也许比ollama更适合在手机上运行,就是不知道模型格式是否限死

It's all in generic format that can be converted with a tool.

<!-- gh-comment-id:2157172217 --> @luojiyin1987 commented on GitHub (Jun 10, 2024): > > I use version 0.1.42 , It use all the cpu, before it only can only 50% cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html我用的是0.1.42版本,它用的是所有的cpu,之前只能用50%的cpu https://soulteary.com/2024/02/29/run-large-ai-models-on-android-phones-with-snapdragon-8-gen-3.html > > 看起来不错,这个mlc的框架也许比ollama更适合在手机上运行,就是不知道模型格式是否限死 It's all in generic format that can be converted with a tool.
Author
Owner

@TonyBlur commented on GitHub (Jun 10, 2024):

借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗

<!-- gh-comment-id:2158057431 --> @TonyBlur commented on GitHub (Jun 10, 2024): 借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗
Author
Owner

@pzzmyc commented on GitHub (Jun 12, 2024):

借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗

好像没有牵扯到跨域请求,如果使用proot容器应该也是可以的,只是端口使用会有些限制,我是用的是linux deploy+termux,属于是chroot容器,在miniconda3里运行的ollama

<!-- gh-comment-id:2163984256 --> @pzzmyc commented on GitHub (Jun 12, 2024): > 借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗 好像没有牵扯到跨域请求,如果使用proot容器应该也是可以的,只是端口使用会有些限制,我是用的是linux deploy+termux,属于是chroot容器,在miniconda3里运行的ollama
Author
Owner

@luojiyin1987 commented on GitHub (Jun 13, 2024):

借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗

跨域请求 ? 不是前端的概念?

<!-- gh-comment-id:2164301342 --> @luojiyin1987 commented on GitHub (Jun 13, 2024): > 借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗 跨域请求 ? 不是前端的概念?
Author
Owner

@TonyBlur commented on GitHub (Jun 13, 2024):

借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗

跨域请求 ? 不是前端的概念?

但是ollama不是需要将环境变量配置为

Environment="Ollama_ORIGINS=*"

以支持允许所有域名的请求吗,在安卓termux里不root的情况下,我想知道怎么配置实现

<!-- gh-comment-id:2164348643 --> @TonyBlur commented on GitHub (Jun 13, 2024): > > 借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗 > > 跨域请求 ? 不是前端的概念? 但是ollama不是需要将环境变量配置为 ``` Environment="Ollama_ORIGINS=*" ``` 以支持允许所有域名的请求吗,在安卓termux里不root的情况下,我想知道怎么配置实现
Author
Owner

@luojiyin1987 commented on GitHub (Jun 13, 2024):

借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗

跨域请求 ? 不是前端的概念?

但是ollama不是需要将环境变量配置为

Environment="Ollama_ORIGINS=*"

以支持允许所有域名的请求吗,在安卓termux里不root的情况下,我想知道怎么配置实现

Environment="OLLAMA_HOST=0.0.0.0" 我是这样配置的, 局域网 公网都可以访问,没遇到 跨域问题。

<!-- gh-comment-id:2164423777 --> @luojiyin1987 commented on GitHub (Jun 13, 2024): > > > 借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗 > > > > > > 跨域请求 ? 不是前端的概念? > > 但是ollama不是需要将环境变量配置为 > > ``` > Environment="Ollama_ORIGINS=*" > ``` > > 以支持允许所有域名的请求吗,在安卓termux里不root的情况下,我想知道怎么配置实现 Environment="OLLAMA_HOST=0.0.0.0" 我是这样配置的, 局域网 公网都可以访问,没遇到 跨域问题。
Author
Owner

@TonyBlur commented on GitHub (Jun 13, 2024):

借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗

跨域请求 ? 不是前端的概念?

但是ollama不是需要将环境变量配置为

Environment="Ollama_ORIGINS=*"

以支持允许所有域名的请求吗,在安卓termux里不root的情况下,我想知道怎么配置实现

Environment="OLLAMA_HOST=0.0.0.0" 我是这样配置的, 局域网 公网都可以访问,没遇到 跨域问题。

好的,感谢解答

<!-- gh-comment-id:2164435187 --> @TonyBlur commented on GitHub (Jun 13, 2024): > > > > 借楼问一下,题主是怎么在termux实现跨域请求的,一定要root才能实现吗 > > > > > > > > > 跨域请求 ? 不是前端的概念? > > > > 但是ollama不是需要将环境变量配置为 > > > > ``` > > Environment="Ollama_ORIGINS=*" > > ``` > > > > 以支持允许所有域名的请求吗,在安卓termux里不root的情况下,我想知道怎么配置实现 > > Environment="OLLAMA_HOST=0.0.0.0" 我是这样配置的, 局域网 公网都可以访问,没遇到 跨域问题。 好的,感谢解答
Author
Owner

@knyipab commented on GitHub (Sep 15, 2024):

I packaged ollama to TUR (in openblas cpu backend). There is no need to use proot anymore and this package will update along with pkg update in the future. First make sure you have tur-repo installed by running pkg install tur-repo. Then run:

pkg update
pkg install -y ollama

My 8gen2 runs qwen2:1.5b with 7 tokens/s. You may test yours with ollama serve and then this command:

ollama run --verbose qwen2:1.5b

I am not sure but it seems that the memory allocated is proportional to model size and specs.

<!-- gh-comment-id:2351510833 --> @knyipab commented on GitHub (Sep 15, 2024): I packaged `ollama` to [TUR (in openblas cpu backend)](https://github.com/termux-user-repository/tur/pull/1198). There is no need to use proot anymore and this package will update along with `pkg update` in the future. First make sure you have `tur-repo` installed by running `pkg install tur-repo`. Then run: ``` pkg update pkg install -y ollama ``` My 8gen2 runs qwen2:1.5b with 7 tokens/s. You may test yours with `ollama serve` and then this command: ``` ollama run --verbose qwen2:1.5b ``` I am not sure but it seems that the memory allocated is proportional to model size and specs.
Author
Owner

@dhiltgen commented on GitHub (Oct 23, 2024):

Is this still a problem? If so, please upgrade to the latest version, and share a server log with OLLAMA_DEBUG=1 set. We should take swap space into consideration.

<!-- gh-comment-id:2433509685 --> @dhiltgen commented on GitHub (Oct 23, 2024): Is this still a problem? If so, please upgrade to the latest version, and share a server log with OLLAMA_DEBUG=1 set. We should take swap space into consideration.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3128