[GH-ISSUE #7287] Version v0.3.14 impacted CPU inference performance #30391

Closed
opened 2026-04-22 09:57:46 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @closesim on GitHub (Oct 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7287

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Hi, I just updated my docker container where I run my small models to the latest version, as I use to every 15 days or so. I'm using a Quad Core CPU (no GPU) and with this new version I noticed that LLama 3.1 8b performance was very slow. I Initially thought it was a hardware issue, like overheating, but after checking htop, I see that Ollama was using 2 threads out of 8 (2 less threads than normal), which means 2 cores out of 4. After setting manually the number of threads for the model, the performance improved as it was before.

I see in the changelog that the thread behavior has changed, so I don't know if this is intended or if it's a bug. Is there a environment variable for setting this in the mean time instead of telling every model to use 4 threads manually?

For context:

  • Main machine is Windows
  • The Linux OS (Ubuntu) with docker runs on Hyper-V with 8 "CPUs" allocated
  • I use Open WebUI to interact with the models
  • Ollama used to use 4 threads.

image

OS

Docker

GPU

No response

CPU

AMD

Ollama version

v0.3.14

Originally created by @closesim on GitHub (Oct 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7287 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Hi, I just updated my docker container where I run my small models to the latest version, as I use to every 15 days or so. I'm using a Quad Core CPU (no GPU) and with this new version I noticed that LLama 3.1 8b performance was very slow. I Initially thought it was a hardware issue, like overheating, but after checking htop, I see that Ollama was using 2 threads out of 8 (2 less threads than normal), which means 2 cores out of 4. After setting manually the number of threads for the model, the performance improved as it was before. I see in the changelog that the thread behavior has changed, so I don't know if this is intended or if it's a bug. Is there a environment variable for setting this in the mean time instead of telling every model to use 4 threads manually? For context: - Main machine is Windows - The Linux OS (Ubuntu) with docker runs on Hyper-V with 8 "CPUs" allocated - I use Open WebUI to interact with the models - Ollama used to use 4 threads. ![image](https://github.com/user-attachments/assets/e5f7526b-52a0-4fb5-ac68-15bf938b8037) ### OS Docker ### GPU _No response_ ### CPU AMD ### Ollama version v0.3.14
GiteaMirror added the dockerbug labels 2026-04-22 09:57:46 -05:00
Author
Owner

@dhiltgen commented on GitHub (Oct 21, 2024):

Can you share the output of the following (sub a different image if you prefer)

docker run --rm ubuntu cat /proc/cpuinfo
<!-- gh-comment-id:2427067225 --> @dhiltgen commented on GitHub (Oct 21, 2024): Can you share the output of the following (sub a different image if you prefer) ``` docker run --rm ubuntu cat /proc/cpuinfo ```
Author
Owner

@closesim commented on GitHub (Oct 21, 2024):

Hi, thanks for replying.

Here's the output


processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7585.49
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7585.49
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7585.49
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7585.49
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7634.51
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7634.51
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3792.747
cache size      : 512 KB
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7634.51
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 96
model name      : AMD Ryzen 3 4100 4-Core Processor
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 3688.684
cache size      : 512 KB
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips        : 7634.51
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:
<!-- gh-comment-id:2427742219 --> @closesim commented on GitHub (Oct 21, 2024): Hi, thanks for replying. Here's the output ``` processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7585.49 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 1 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7585.49 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 2 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7585.49 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 3 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7585.49 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 4 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 2 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7634.51 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 5 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 2 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7634.51 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 6 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3792.747 cache size : 512 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 2 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7634.51 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 7 vendor_id : AuthenticAMD cpu family : 23 model : 96 model name : AMD Ryzen 3 4100 4-Core Processor stepping : 1 microcode : 0xffffffff cpu MHz : 3688.684 cache size : 512 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 2 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso bogomips : 7634.51 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ```
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2024):

This looks like it's the same underlying root cause as the now reopened #5554. In these virtualization systems /proc/cpuinfo is behaving as if the system had multiple physical sockets, so we're just using the number of cores in the first socket we find, which is leading to under-allocating threads. I don't believe these host systems are actually multi-socket servers, so I believe the fix will be to try to discover if the system is actually NUMA, and if not, then ~ignore socket count and pool the cores we find across these mock sockets introduced by the virtualization layer.

<!-- gh-comment-id:2429718368 --> @dhiltgen commented on GitHub (Oct 22, 2024): This looks like it's the same underlying root cause as the now reopened #5554. In these virtualization systems `/proc/cpuinfo` is behaving as if the system had multiple physical sockets, so we're just using the number of cores in the first socket we find, which is leading to under-allocating threads. I don't believe these host systems are actually multi-socket servers, so I believe the fix will be to try to discover if the system is actually NUMA, and if not, then ~ignore socket count and pool the cores we find across these mock sockets introduced by the virtualization layer.
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2024):

@closesim could you verify cat /sys/devices/system/node/online both at your host and inside the VMs/containers reports just 0 and not something like 0-1 ?

<!-- gh-comment-id:2429729109 --> @dhiltgen commented on GitHub (Oct 22, 2024): @closesim could you verify `cat /sys/devices/system/node/online` both at your host and inside the VMs/containers reports just 0 and not something like 0-1 ?
Author
Owner

@closesim commented on GitHub (Oct 22, 2024):

Both, Docker host and containers, report 0-1

<!-- gh-comment-id:2429775536 --> @closesim commented on GitHub (Oct 22, 2024): Both, Docker host and containers, report 0-1
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2024):

OK, so I was mistaken, and this is a NUMA system, making this is a variation on #2929 where we don't currently support running across NUMA nodes. If you can get your virtualization system to allocate cores on a single socket (numa node) then things will work optimally until we can resolve #2929.

I may be able to adjust the default thread count to match the largest number of cores we detect, but I'm a little concerned we could wind up pinned to the wrong numa node and performance would suffer as a result, since we're not currently numa node aware.

<!-- gh-comment-id:2429789718 --> @dhiltgen commented on GitHub (Oct 22, 2024): OK, so I was mistaken, and this is a NUMA system, making this is a variation on #2929 where we don't currently support running across NUMA nodes. If you can get your virtualization system to allocate cores on a single socket (numa node) then things will work optimally until we can resolve #2929. I may be able to adjust the default thread count to match the largest number of cores we detect, but I'm a little concerned we could wind up pinned to the wrong numa node and performance would suffer as a result, since we're not currently numa node aware.
Author
Owner

@closesim commented on GitHub (Oct 22, 2024):

No, It's a basic consumer desktop CPU and motherboard (Socket AM4). A desktop PC.

Edit:
Checking the main machine, here's what Hyper-V is reporting:
image

<!-- gh-comment-id:2429821293 --> @closesim commented on GitHub (Oct 22, 2024): No, It's a basic consumer desktop CPU and motherboard (Socket AM4). A desktop PC. Edit: Checking the main machine, here's what Hyper-V is reporting: ![image](https://github.com/user-attachments/assets/90a7a249-7035-473a-ac4b-65c35ad34b8d)
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2024):

@closesim oh, I misunderstood your statement "docker host" reporting 0-1 to be the host system.

If you can change the configuration to have a single socket, then you'll see better performance.

<!-- gh-comment-id:2429849108 --> @dhiltgen commented on GitHub (Oct 22, 2024): @closesim oh, I misunderstood your statement "docker host" reporting 0-1 to be the host system. If you can change the configuration to have a single socket, then you'll see better performance.
Author
Owner

@closesim commented on GitHub (Oct 22, 2024):

You are right. For some reason the VM was set like that for quite some time. Looks like the default configuration was set to two NUMA nodes from the start.

For reference, in VM configuration, there is an option to allow to reset the topology to the currently installed hardware.

This has fixed the problem.

Thank you so much. I apologize for the confusion.

<!-- gh-comment-id:2429868568 --> @closesim commented on GitHub (Oct 22, 2024): You are right. For some reason the VM was set like that for quite some time. Looks like the default configuration was set to two NUMA nodes from the start. For reference, in VM configuration, there is an option to allow to reset the topology to the currently installed hardware. This has fixed the problem. Thank you so much. I apologize for the confusion.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30391