[GH-ISSUE #14273] glm-4.7-flash emitting gibberish after a while #9294

New Issue

GiteaMirror · 2026-04-12T22:09:39-05:00

GiteaMirror commented

2026-04-12 22:09:39 -05:00

Originally created by @MitchTalmadge on GitHub (Feb 16, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14273

What is the issue?

I have just installed Ollama on my Windows desktop and chose the glm-4.7-flash model. While having it write a poem, it started outputting complete gibberish with no end in sight. I eventually stopped it. Then, all prompts, even after running /clear, would output this gibberish. After running /bye and starting it again, it works fine again.

CPU: Ryzen 9 5950x
GPU: RTX 4070Ti
RAM: 128GB DDR4

Here is when it started; see the line after "Stanza 5: Conclusion"

>>> /clear
Cleared session context
>>> Write a poem about chocolate
Thinking...
1.  **Analyze the Request:**
    *   **Topic:** Chocolate.
    *   **Format:** Poem.
    *   **Tone:** Open, but usually implies appreciation, indulgence, or sensory description.

2.  **Brainstorming Imagery & Themes:**
    *   *Visual:* Dark, glossy, liquid, swirling, melting, dust of cocoa, velvety.
    *   *Tactile:* Smooth, silky, melting on the tongue, cool in the hand, warm in the mouth.
    *   *Taste:* Bitter, sweet, rich, earthy, decadent, forbidden, comforting.
    *   *Emotional:* Joy, warmth, escapism, love, comfort, ritual.
    *   *Associations:* Cacao beans, melting pot, gift, afternoon treat, dessert.

3.  **Drafting - Stanza by Stanza:**

    *   *Stanza 1: Introduction/The Origin (Earthiness).*
        *   Start with the raw material. Beans from the tropics.
        *   *Drafting ideas:* It comes from the sun, dark beans. Earthy soul.
        *   *Refining:* "The soul of the earth in a dusty pod," "Forged in the heat."

    *   *Stanza 2: The Transformation (Processing).*
        *   Roasting and grinding.
        *   *Drafting ideas:* Roasted to perfection, crushed into dust, liquid gold.
        *   *Refining:* "Turned to liquid gold," "The roast and the crush."

    *   *Stanza 3: The Sensory Experience (Taste/Texture).*
        *   Eating it. The contrast of sweet and bitter.
        *   *Drafting ideas:* Melts on the tongue, velvety coat, sweet rush.
        *   *Refining:* "Velvet on the tongue," "The sweet that bites back," "Decadence."

    *   *Stanza 4: The Emotional Connection (Comfort).*
        *   Why we eat it. The feeling of warmth.
        *   *Drafting ideas:* A hug for the soul, rainy day comfort, little piece of joy.
        *   *Refining:* "A hug for the soul," "A little piece of the sun."

    *   *Stanza 5: Conclusion.*
        *   Summary%1(EC"2))068;E%,DF0.+8-3<2=#F*#752*;*@7""67.)*-+)+C719?*6FA$/#B/24-''?C"<@-*'-#6#%-?(7"*#>-049E?@B-<?2E&.$,@*=C&$4%>%.0.8/(2<'4@94C(;6,+-<@(78==-19(9%"DC=;#>DEEF03>*3=$.C?=(?,978$>"),=(@8.&<+,1E?'=8>+6?$9(.0C=5?$%-3-2=,72%*00)"69.$-@$,'3)D$D9"A$5@B&71@53&#:1A7CC-7%62>5-C(E=D8-3;,D3;@'B*">*%FE#B0BA-,9<36+64?)2..=:*5F,)0=6B/2%9,-E"(:??%+':1A1)F;44@D>+>"F.#6+9*2041$#>=54E$<=;:+*,<$<2):?.5,:31+%$C&C*46-(#E4><E=%==863CCF*E:)/$9C;B$96$E#?*751*+-(*".BD+4D(+)@7(;$.129E5F&)"'-A&.C;2;1,/B$E&#;1>E3>=<@3(F4$;0(+'#+E70$)6'D?BD'4@92A+<A=?CE&A4<@D>,8D,B7F/-.,?%%&D#50A"<:(&@9096/%20">/<><6>)56AB-;',83A'5=+93%7-4,:0,';-3C761@/=8#FC7**+(<=)@)9D/4,E87*7=43>-3F&95;8D5$>D2%C:FD5$(2'#A*-6=?.,3=?=4C;-9D9>@>;7%2$-32DB9<8./%:F5($7C?A$71%13.,6,>56*A;&1%2>,64BC"3,0A/4->87)7/8D=;/D37?<:2,7(B#/F;",3"2B(#A="D>23:5'#)C.)9"D>E480)"8A2(***&45816C830,(CF/2A)=A-'FE><2"487./5?&'&%)85>2BBA"9C9&'71>#=0&E;8CA&>E&=2$&.B<(/#2;@9(/94#=7(9(9.7,/(;+>E:4@*C/-*+E*0:)%@(7B.C:>E>2$-5-/2)%/-=/:5*"@>19)BD98*+<.E=:15%*5*26A-F/E/+$;7*4/-*(97$BE65&01C=>%<"1:6B0?5+&?C"&'B@E5<,).%(:=5/E*1""1=.1C;D2-9CC:>*3;7&7->@?&@C2E98%)CC.'>>*,%#,)@A4/%D,(A<-4==9;$*&5070(;+@E"D:+@45=9<.+AD21<-;DE&&?-*:="62.8@3)%;,=,:A91;D#>04"?3:8E/,6F:2=:$@+9)</>.)2?&<@>&$9;7(6;A'CF<.5FB5'<''<.3@*>%.6EEF>.;'DC8=%3"7-=%7B'3#.;-3C)0895C7"'527:+AD%3*<F7)7',-;==<4F-#'.-;:6A;*&6+=1-%<9;.0/1B(.<*(<69&=$"&'0)8$$80>,:18>..)1A.328=C.3"C%)/7#DC?+:(:726/2,E"71<.A-8/F<6?=DE00;&$E8/C/)3D#AA;)1=;/,+E:4#>6E82"86//2A44;A/8*?)3->/A0@2<:A-;:(*DA=5((:%=F+>:/7/&DE*B8"&%*B2E$(?3"26+/@;;'+0/3>86A7C6"'8/>>6$9?4$0"%B&)C?A%%/B?0/'+1-3?9D>3?9C,F%*::5F&48*.'/+45BB=,6A%="+2B?>;8D"C9>C&E/164=@=;;=-"8D<&>8;'5?%(<1F&2<4(#=.26,'D&$35(*=+)F:E<)=>*=3A=(5*@-F.#&&$3?%(?%.+9:;(C?+;7<;5+F,)6A1#4EFD#1@?>=/99E.8.D/FEF96=C<2@->3(8+$$8>',:>77#8>)4)(<*C?<F"F519*6<-,"&E+5&4A'#41AB?#27/FF>C?89>#$F=+C)':FD%>'B""<<@8D%;.@9/F'4?8#C18<9<#(6/&:FB=5+3$8$%)?4=(FFC/)*31'=&?#AF$;8D&==;:(B>/<C+62(33;',D48/,:4/:EA=35')0#D8:1%2,F$.319.F5??4*A<4(='$&08*#-1(&,F((";#D0"),#"1$%E62;(4:(&5;7D%B&B8B.,E"/=A8<(.49:),A?"374::":1EA5&*46D.5?%,3$:<#6-8>3*="9C+0D?&>-;0<;0@9:,*7D"6D7F3%=B,&183%9:/?B@6)8;(:+D6+2-(#"9<(.,#C9$;.&A#*:*/+A$:*:C'4D&/&<%4*7,548.#+A@5E=?9B8&C5=+.8&0E6%:<<F*CE1'%04(C6B4,?&#+8%$F1C;;(($:9*,44'%,4),&(>-14F-2%>+',(8@++-942DC@*0-'>A-A)1B00%597.<:6-95A3$E1*%*A="')>,2"F0:.4&':C=/0@E/DB+"5&4?7.)?/FA4E>+9/444-),'"+37%)7741,/.A74+CD8A7.A%)59?5C#1:50F,84@F85A;5:&D15;D=%7?&74'58'9)1F;6*$E)?+F&=D-4(1"'/B?1D522#D5)<4%D:C@(>"2+,FA.+A8;9C868-0&+."73?AF&B./=79E<A1"'*5=">DB)>3C5&9&.9&6=?(0'618->%7$&AB'$<7#3'67"AA-+?2'0B15/3=3,3>F5#:#@#@*1D*878#FE.F)8&7F;.2&F<D)7*3?>#??6B9-B'A/:<.&*@<7'*,=$"A-)71)F$D9.8F:=;'*#-#+&.C@)";2,4.0.=*E&,4+($7%,:36'D3/01<8,44"5:9#$/3;@,)(61$E7/@%8()9+51?-5B(/BB0:&4>9?D$&$B53:17#CAB,4&3..;A0,+C<$0A<@6"D@'AD+/@.(C.0D85-.$3>$D,)B*@"4-"'FEAD'B>9*>3=11"4(9'+1E=6%'1&76*&:.>C82DD:36?#CC97F9%D5A#90)2:3/,CB)79==6:D9A99+7'3B:D)&7C:-$0@-B.*(2'=-E51$454@%6%0*6#C'*.3'#/4<<,"19>E)<3$=<*=0A$BDE+?.*.B-7>%4+1D#?&E<+0#F2.;)0'.9<*"?*@4;?*%=468;>?E;;&EF-C+:+*.?3$?E%.BDCE5($&><,C+76<:/@9+"+2$/>*6814&:7-,E8/F#)894<$=:C("C'3'4,/%(9>";E$+6BF"4086=(,0$#"(+$":<=C"=,7&E<E8;A$602&08".9E;)+/7"26;'+/3E8**.+)@%'-0&4>9#6;/7)&'&,7(%"+$4AB3A27=<C$*A(B5')<:)&??(C(+3.3@03)/428FF&B*'&(D4+@/+-(+26(/A7DE+*3@*+*=9';3299@">4F=-8-6/6'A,'+E4-E);"-5?F&/+48)',2?C&AE<%A<.6-#)0B>A1;<".=/>9,'"5*<E5,8<6'@'D;76)F9)05(%3A-E3.>FE:5(D#$5410,:52=0;20:<A,B<%$:.C/F0/D(/;4D+8?*?-+*<"A03DF#)(083%BC##C6;E770;58,<-:)=/2=7A3'81F>)=(,30;@2,'2$1::/65D41B173,9<?0*,-<A1*0$>>9&?:'AB8B">A4"81@9%F;0<2D#)2>001B"31<&(=?3E8@&:-"B0)*#9"?.A97#:D%250:(5"0AA#5,"E7=(B#(1*?';6DEC;;;1(4=&=5)1$-))1$>3"7%,A(E:/7)&*$7**F6)D4&2B7,)A3B961*'=$1()2#EC/A###C@+*6B"BBE,$)(A<;64;&1>797CD)553+$$F&C3@.#62)<10*F3:>&6%&,E;84<74(0'=05%#.54/.,DE(D"4>6E/<BB"($$/$%#E,3',326=%,*4++3@9=A:/C;$%=($$*%E@D0<(#3B&D'78</@%?+8'C>>9/A@B?<)7:58<(0&-.06F)8?C<7D0)+88EFF'"(.D,'-><#*AA7?69(@(F/'?5<+9@42&F9=B.9#@0.0<"-$+$&#:-5%??B8FC*712%)/32A(&/A(9D=5*#688""6@22A)5BB)&C-93@;>-$E3:33),@%7%35/2/E'-(*>C"5/+?F=@/-*"=0%F&53;?@)A875")2C"$-40:;C)B9B-9,+8426"E*8*,-/E6%F"#;@9?1>-4A%2"-$B>E;F$'@DA874$()47>@>9%C%@&)0#,B:05#)2>40"A71#<74<$A$.C@++<3-%4?7D)5;,1>D=<C8E+=#B.*7=%0?3,/38)AA3;6(*"#.DC%*8:A();6DA>A2&?&6$=A/04#0=11F8.&:-)3;D2$*3.8:/C.=<<&*;7)+>14(<,3:B::%";5B&48+-,<:%9(3E5+3@;59&CD?1?"=4"46">==.,3(-898?-B-7@#A>'9(**%6.*6-A70B</'8''1*64/8/4648?6?.E((FC"6E%%=83#28=009))C->@@C1<6;F%>3/#.<F.(2/=C.<'&,7D.)5B<((9))9D';/EA9A3F3)'5$D->D,1B$?)59BB/'8./B91B)(.51;;(?2<-4/F0*2$,9$-9:433)D5D6#$##(+$:+"CF,78?=>)EC&65.;D,50(+3:<%&04;?4-%+*%D"A:F,E&?8A:.+'';7,B.=$#0:2B?+*,(/(C20&$<D8C>"6*@7#+.&<01*E,$)F$+=-%-)2C6D;C8'C<%9@?"%B3C3F@&2E>8%:97A&%/>&,1:;C1;)B>+:A1)856'"3(@:>E0?'5&5#*,-F$$/),>.A*72'#;F##+%F)A&0B)<7*+E=?E"%&"/""3B2>E783*:"52'<.@%+91@//37)?-33/*0&<D&&<=%#B#*.C8E%7:&-%*(@&;103/3DF3;"+%B'.1.>269=3F)876A9)2F#/>=19)6?#?67.)?<&2+B?*@"8/#9;2B9F$)*D)5207')&7-3&A/B//*'%"#F*$7&-%(=$:5%)%E/'4AD25%B8<<05.%3#68$B7&-1C,#;+.;D;B&2,+8E=/;416B??,0*/7F@A<>?<;'E7-19"8,A*3;.$=$?7',E:08+$5;'"0215%88AE5/2A.#.%/E3-C-:85-,%?%8.1CE>->=966>1897*:;A94#(:;/:9:.4-D1"<2A;B:F%/#@F60DD<A(8.(6?,19?6#?7+=;+5*44D-16453,./=?<388B-8:40$0.B:7-$&.A56'.%5%#4',>4(1'@=>"E:+3=:1$),&=&$&2.),:6-D1?D(3<0&>?/7?++$87F5A#9#9*DC>4"+7$AC>"1@97*12:8=,BCC9D;0(%(<B030:03:5-<1%>#6@'57E7%&:#A/89(8>$(@80EB+.F6,:390EF95A2&4B.+$2;D)A5/<5)>1.4D1B$=A/$7F=;$6,<(*@7092)D$E2/$10#:,#7%A)"A66F+##(3:831C0D$6C8:F-EC7B6.,>$/'D+71;=(:B+/1<%"3#C-A1)*E-#,?5,+7AF$+>CB#C$63E/=(FD+2?E1@63@$4.06'&1*50/&'-%3<.;AF:9:8'@7D,>*%A8-'=@/5';(7>3;;,@%6+F7/,@$65<B%5C?4/:32#/*%<'-'8/)'B/#.3*A>>B5CB<;?CB#B'0@%.AC?&D9<@@"$9%7"C3:@B%CAE3&1D%F4%040&$14BE36.$A6<D(F2;F>$=))?AB%<;CC7+5B9$8-$1A.A%.-5/$,F40&>3F'F%>E7D/3.95-)-8=FF>9&F65)"-*;&(C*')A06025$8"6E/EB2:B8FA6;(;-=9)<"',846=7..7D#70C?42F7@B-?6?C'5/"8'9##C"(+2;:*.B&<&-@@41).F(,'<'$6?F-B6)+5,"5D63.'D2A#)10(410<(/&9-2+.@>/<@-+$@>E1C-C0:B;"B?6?/$#00A.,?BE()*33:)<=F94%=7%-B%10E=C"+.FB/5-567E+*"A.$7F=)B(FD'<F%DB;=C?:C(C17)6=+9#79.;A):3-):.E20?D,(1B(#E:":>#;>3',1#>&'80?C113<*/?/9<.29C)9+'D0+4'DA4'6=

>>> Hello
Thinking...
=E<+;4*A3C@CD*..*5F0"(2$@.*"6A(@,#;>52F65;:632E"/9;B;2*F31?#'2;>2+E/?"DE+;0'$25.:C;)D&42#)<8B=&651EC)1%=F97)3,7'#-$'@%,406C)*67>%4@

>>> /clear
Cleared session context
>>> Hello
Thinking...
7=>81*<6FE@33<D7,5%7E4;,'0@+&(.8,3$0

>>> Send a message (/? for help)

Relevant log output

server.log:

time=2026-02-15T19:55:36.330-07:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\mitcht\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:473 msg="total blobs: 0"
time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-15T19:55:36.332-07:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.16.1)"
time=2026-02-15T19:55:36.334-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-15T19:55:36.363-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64278"
time=2026-02-15T19:55:36.991-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-02-15T19:55:36.992-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64285"
time=2026-02-15T19:55:45.615-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64297"
time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64303"
time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64302"
time=2026-02-15T19:55:46.104-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA GeForce RTX 4070 Ti" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:54:00.0 type=discrete total="12.0 GiB" available="10.6 GiB"
time=2026-02-15T19:55:46.104-07:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096
[GIN] 2026/02/15 - 19:55:46 | 200 |       526.9µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/02/15 - 19:55:46 | 200 |       526.9µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/02/15 - 19:55:46 | 200 |       524.8µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/02/15 - 19:56:10 | 200 |            0s |       127.0.0.1 | GET      "/api/tags"
time=2026-02-15T20:04:22.276-07:00 level=INFO source=download.go:179 msg="downloading 9eba2761cf0b in 20 1 GB part(s)"
time=2026-02-15T20:07:27.157-07:00 level=INFO source=download.go:179 msg="downloading b1bca6ec8117 in 1 1.1 KB part(s)"
time=2026-02-15T20:07:28.497-07:00 level=INFO source=download.go:179 msg="downloading d8ba2f9a17b3 in 1 18 B part(s)"
time=2026-02-15T20:07:29.814-07:00 level=INFO source=download.go:179 msg="downloading 49d4bd6d5a04 in 1 486 B part(s)"
[GIN] 2026/02/15 - 20:07:51 | 200 |         3m30s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2026/02/15 - 20:07:52 | 200 |    201.2473ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/02/15 - 20:07:52 | 200 |     204.184ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-15T20:07:52.562-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 6655"
time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-02-15T20:07:52.848-07:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-02-15T20:07:52.849-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 6661"
time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.9 GiB" free_swap="107.6 GiB"
time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-15T20:07:52.852-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1
time=2026-02-15T20:07:52.889-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-15T20:07:52.890-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:6661"
time=2026-02-15T20:07:52.896-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:07:52.931-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39
load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d
load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-02-15T20:07:53.023-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-02-15T20:08:43.805-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:08:43.863-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:08:44.394-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-15T20:08:47.404-07:00 level=INFO source=server.go:1388 msg="llama runner started in 54.55 seconds"
[GIN] 2026/02/15 - 20:08:47 | 200 |   55.0765122s |       127.0.0.1 | POST     "/api/generate"
ggml_backend_cuda_device_get_memory device GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d utilizing NVML memory reporting free: 312586240 total: 12878610432
time=2026-02-15T20:13:47.792-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54417"
time=2026-02-15T20:17:20.986-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 9549"
time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-02-15T20:17:21.232-07:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-02-15T20:17:21.233-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 9554"
time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.2 GiB" free_swap="106.8 GiB"
time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-15T20:17:21.237-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1
time=2026-02-15T20:17:21.271-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-15T20:17:21.272-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:9554"
time=2026-02-15T20:17:21.280-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.314-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39
load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d
load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-02-15T20:17:21.401-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-02-15T20:17:21.681-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.733-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-15T20:17:24.752-07:00 level=INFO source=server.go:1388 msg="llama runner started in 3.52 seconds"
[GIN] 2026/02/15 - 20:17:44 | 200 |   23.4476178s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:19:38 | 200 |         1m25s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:22:40 | 200 |         1m52s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:26:43 | 200 |         1m46s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:32:41 | 200 |         1m42s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:33:11 | 500 |    2.6082429s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:42:08 | 200 |         8m43s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:42:29 | 200 |   13.4512325s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:42:41 | 200 |    2.3295111s |       127.0.0.1 | POST     "/api/chat"

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.16.1

Originally created by @MitchTalmadge on GitHub (Feb 16, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14273 ### What is the issue? I have just installed Ollama on my Windows desktop and chose the [glm-4.7-flash](https://ollama.com/library/glm-4.7-flash) model. While having it write a poem, it started outputting complete gibberish with no end in sight. I eventually stopped it. Then, all prompts, even after running `/clear`, would output this gibberish. After running `/bye` and starting it again, it works fine again. - CPU: Ryzen 9 5950x - GPU: RTX 4070Ti - RAM: 128GB DDR4 Here is when it started; see the line after "Stanza 5: Conclusion" ``` >>> /clear Cleared session context >>> Write a poem about chocolate Thinking... 1. **Analyze the Request:** * **Topic:** Chocolate. * **Format:** Poem. * **Tone:** Open, but usually implies appreciation, indulgence, or sensory description. 2. **Brainstorming Imagery & Themes:** * *Visual:* Dark, glossy, liquid, swirling, melting, dust of cocoa, velvety. * *Tactile:* Smooth, silky, melting on the tongue, cool in the hand, warm in the mouth. * *Taste:* Bitter, sweet, rich, earthy, decadent, forbidden, comforting. * *Emotional:* Joy, warmth, escapism, love, comfort, ritual. * *Associations:* Cacao beans, melting pot, gift, afternoon treat, dessert. 3. **Drafting - Stanza by Stanza:** * *Stanza 1: Introduction/The Origin (Earthiness).* * Start with the raw material. Beans from the tropics. * *Drafting ideas:* It comes from the sun, dark beans. Earthy soul. * *Refining:* "The soul of the earth in a dusty pod," "Forged in the heat." * *Stanza 2: The Transformation (Processing).* * Roasting and grinding. * *Drafting ideas:* Roasted to perfection, crushed into dust, liquid gold. * *Refining:* "Turned to liquid gold," "The roast and the crush." * *Stanza 3: The Sensory Experience (Taste/Texture).* * Eating it. The contrast of sweet and bitter. * *Drafting ideas:* Melts on the tongue, velvety coat, sweet rush. * *Refining:* "Velvet on the tongue," "The sweet that bites back," "Decadence." * *Stanza 4: The Emotional Connection (Comfort).* * Why we eat it. The feeling of warmth. * *Drafting ideas:* A hug for the soul, rainy day comfort, little piece of joy. * *Refining:* "A hug for the soul," "A little piece of the sun." * *Stanza 5: Conclusion.* * Summary%1(EC"2))068;E%,DF0.+8-3<2=#F*#752*;*@7""67.)*-+)+C719?*6FA$/#B/24-''?C"<@-*'-#6#%-?(7"*#>-049E?@B-<?2E&.$,@*=C&$4%>%.0.8/(2<'4@94C(;6,+-<@(78==-19(9%"DC=;#>DEEF03>*3=$.C?=(?,978$>"),=(@8.&<+,1E?'=8>+6?$9(.0C=5?$%-3-2=,72%*00)"69.$-@$,'3)D$D9"A$5@B&71@53&#:1A7CC-7%62>5-C(E=D8-3;,D3;@'B*">*%FE#B0BA-,9<36+64?)2..=:*5F,)0=6B/2%9,-E"(:??%+':1A1)F;44@D>+>"F.#6+9*2041$#>=54E$<=;:+*,<$<2):?.5,:31+%$C&C*46-(#E4><E=%==863CCF*E:)/$9C;B$96$E#?*751*+-(*".BD+4D(+)@7(;$.129E5F&)"'-A&.C;2;1,/B$E&#;1>E3>=<@3(F4$;0(+'#+E70$)6'D?BD'4@92A+<A=?CE&A4<@D>,8D,B7F/-.,?%%&D#50A"<:(&@9096/%20">/<><6>)56AB-;',83A'5=+93%7-4,:0,';-3C761@/=8#FC7**+(<=)@)9D/4,E87*7=43>-3F&95;8D5$>D2%C:FD5$(2'#A*-6=?.,3=?=4C;-9D9>@>;7%2$-32DB9<8./%:F5($7C?A$71%13.,6,>56*A;&1%2>,64BC"3,0A/4->87)7/8D=;/D37?<:2,7(B#/F;",3"2B(#A="D>23:5'#)C.)9"D>E480)"8A2(***&45816C830,(CF/2A)=A-'FE><2"487./5?&'&%)85>2BBA"9C9&'71>#=0&E;8CA&>E&=2$&.B<(/#2;@9(/94#=7(9(9.7,/(;+>E:4@*C/-*+E*0:)%@(7B.C:>E>2$-5-/2)%/-=/:5*"@>19)BD98*+<.E=:15%*5*26A-F/E/+$;7*4/-*(97$BE65&01C=>%<"1:6B0?5+&?C"&'B@E5<,).%(:=5/E*1""1=.1C;D2-9CC:>*3;7&7->@?&@C2E98%)CC.'>>*,%#,)@A4/%D,(A<-4==9;$*&5070(;+@E"D:+@45=9<.+AD21<-;DE&&?-*:="62.8@3)%;,=,:A91;D#>04"?3:8E/,6F:2=:$@+9)</>.)2?&<@>&$9;7(6;A'CF<.5FB5'<''<.3@*>%.6EEF>.;'DC8=%3"7-=%7B'3#.;-3C)0895C7"'527:+AD%3*<F7)7',-;==<4F-#'.-;:6A;*&6+=1-%<9;.0/1B(.<*(<69&=$"&'0)8$$80>,:18>..)1A.328=C.3"C%)/7#DC?+:(:726/2,E"71<.A-8/F<6?=DE00;&$E8/C/)3D#AA;)1=;/,+E:4#>6E82"86//2A44;A/8*?)3->/A0@2<:A-;:(*DA=5((:%=F+>:/7/&DE*B8"&%*B2E$(?3"26+/@;;'+0/3>86A7C6"'8/>>6$9?4$0"%B&)C?A%%/B?0/'+1-3?9D>3?9C,F%*::5F&48*.'/+45BB=,6A%="+2B?>;8D"C9>C&E/164=@=;;=-"8D<&>8;'5?%(<1F&2<4(#=.26,'D&$35(*=+)F:E<)=>*=3A=(5*@-F.#&&$3?%(?%.+9:;(C?+;7<;5+F,)6A1#4EFD#1@?>=/99E.8.D/FEF96=C<2@->3(8+$$8>',:>77#8>)4)(<*C?<F"F519*6<-,"&E+5&4A'#41AB?#27/FF>C?89>#$F=+C)':FD%>'B""<<@8D%;.@9/F'4?8#C18<9<#(6/&:FB=5+3$8$%)?4=(FFC/)*31'=&?#AF$;8D&==;:(B>/<C+62(33;',D48/,:4/:EA=35')0#D8:1%2,F$.319.F5??4*A<4(='$&08*#-1(&,F((";#D0"),#"1$%E62;(4:(&5;7D%B&B8B.,E"/=A8<(.49:),A?"374::":1EA5&*46D.5?%,3$:<#6-8>3*="9C+0D?&>-;0<;0@9:,*7D"6D7F3%=B,&183%9:/?B@6)8;(:+D6+2-(#"9<(.,#C9$;.&A#*:*/+A$:*:C'4D&/&<%4*7,548.#+A@5E=?9B8&C5=+.8&0E6%:<<F*CE1'%04(C6B4,?&#+8%$F1C;;(($:9*,44'%,4),&(>-14F-2%>+',(8@++-942DC@*0-'>A-A)1B00%597.<:6-95A3$E1*%*A="')>,2"F0:.4&':C=/0@E/DB+"5&4?7.)?/FA4E>+9/444-),'"+37%)7741,/.A74+CD8A7.A%)59?5C#1:50F,84@F85A;5:&D15;D=%7?&74'58'9)1F;6*$E)?+F&=D-4(1"'/B?1D522#D5)<4%D:C@(>"2+,FA.+A8;9C868-0&+."73?AF&B./=79E<A1"'*5=">DB)>3C5&9&.9&6=?(0'618->%7$&AB'$<7#3'67"AA-+?2'0B15/3=3,3>F5#:#@#@*1D*878#FE.F)8&7F;.2&F<D)7*3?>#??6B9-B'A/:<.&*@<7'*,=$"A-)71)F$D9.8F:=;'*#-#+&.C@)";2,4.0.=*E&,4+($7%,:36'D3/01<8,44"5:9#$/3;@,)(61$E7/@%8()9+51?-5B(/BB0:&4>9?D$&$B53:17#CAB,4&3..;A0,+C<$0A<@6"D@'AD+/@.(C.0D85-.$3>$D,)B*@"4-"'FEAD'B>9*>3=11"4(9'+1E=6%'1&76*&:.>C82DD:36?#CC97F9%D5A#90)2:3/,CB)79==6:D9A99+7'3B:D)&7C:-$0@-B.*(2'=-E51$454@%6%0*6#C'*.3'#/4<<,"19>E)<3$=<*=0A$BDE+?.*.B-7>%4+1D#?&E<+0#F2.;)0'.9<*"?*@4;?*%=468;>?E;;&EF-C+:+*.?3$?E%.BDCE5($&><,C+76<:/@9+"+2$/>*6814&:7-,E8/F#)894<$=:C("C'3'4,/%(9>";E$+6BF"4086=(,0$#"(+$":<=C"=,7&E<E8;A$602&08".9E;)+/7"26;'+/3E8**.+)@%'-0&4>9#6;/7)&'&,7(%"+$4AB3A27=<C$*A(B5')<:)&??(C(+3.3@03)/428FF&B*'&(D4+@/+-(+26(/A7DE+*3@*+*=9';3299@">4F=-8-6/6'A,'+E4-E);"-5?F&/+48)',2?C&AE<%A<.6-#)0B>A1;<".=/>9,'"5*<E5,8<6'@'D;76)F9)05(%3A-E3.>FE:5(D#$5410,:52=0;20:<A,B<%$:.C/F0/D(/;4D+8?*?-+*<"A03DF#)(083%BC##C6;E770;58,<-:)=/2=7A3'81F>)=(,30;@2,'2$1::/65D41B173,9<?0*,-<A1*0$>>9&?:'AB8B">A4"81@9%F;0<2D#)2>001B"31<&(=?3E8@&:-"B0)*#9"?.A97#:D%250:(5"0AA#5,"E7=(B#(1*?';6DEC;;;1(4=&=5)1$-))1$>3"7%,A(E:/7)&*$7**F6)D4&2B7,)A3B961*'=$1()2#EC/A###C@+*6B"BBE,$)(A<;64;&1>797CD)553+$$F&C3@.#62)<10*F3:>&6%&,E;84<74(0'=05%#.54/.,DE(D"4>6E/<BB"($$/$%#E,3',326=%,*4++3@9=A:/C;$%=($$*%E@D0<(#3B&D'78</@%?+8'C>>9/A@B?<)7:58<(0&-.06F)8?C<7D0)+88EFF'"(.D,'-><#*AA7?69(@(F/'?5<+9@42&F9=B.9#@0.0<"-$+$&#:-5%??B8FC*712%)/32A(&/A(9D=5*#688""6@22A)5BB)&C-93@;>-$E3:33),@%7%35/2/E'-(*>C"5/+?F=@/-*"=0%F&53;?@)A875")2C"$-40:;C)B9B-9,+8426"E*8*,-/E6%F"#;@9?1>-4A%2"-$B>E;F$'@DA874$()47>@>9%C%@&)0#,B:05#)2>40"A71#<74<$A$.C@++<3-%4?7D)5;,1>D=<C8E+=#B.*7=%0?3,/38)AA3;6(*"#.DC%*8:A();6DA>A2&?&6$=A/04#0=11F8.&:-)3;D2$*3.8:/C.=<<&*;7)+>14(<,3:B::%";5B&48+-,<:%9(3E5+3@;59&CD?1?"=4"46">==.,3(-898?-B-7@#A>'9(**%6.*6-A70B</'8''1*64/8/4648?6?.E((FC"6E%%=83#28=009))C->@@C1<6;F%>3/#.<F.(2/=C.<'&,7D.)5B<((9))9D';/EA9A3F3)'5$D->D,1B$?)59BB/'8./B91B)(.51;;(?2<-4/F0*2$,9$-9:433)D5D6#$##(+$:+"CF,78?=>)EC&65.;D,50(+3:<%&04;?4-%+*%D"A:F,E&?8A:.+'';7,B.=$#0:2B?+*,(/(C20&$<D8C>"6*@7#+.&<01*E,$)F$+=-%-)2C6D;C8'C<%9@?"%B3C3F@&2E>8%:97A&%/>&,1:;C1;)B>+:A1)856'"3(@:>E0?'5&5#*,-F$$/),>.A*72'#;F##+%F)A&0B)<7*+E=?E"%&"/""3B2>E783*:"52'<.@%+91@//37)?-33/*0&<D&&<=%#B#*.C8E%7:&-%*(@&;103/3DF3;"+%B'.1.>269=3F)876A9)2F#/>=19)6?#?67.)?<&2+B?*@"8/#9;2B9F$)*D)5207')&7-3&A/B//*'%"#F*$7&-%(=$:5%)%E/'4AD25%B8<<05.%3#68$B7&-1C,#;+.;D;B&2,+8E=/;416B??,0*/7F@A<>?<;'E7-19"8,A*3;.$=$?7',E:08+$5;'"0215%88AE5/2A.#.%/E3-C-:85-,%?%8.1CE>->=966>1897*:;A94#(:;/:9:.4-D1"<2A;B:F%/#@F60DD<A(8.(6?,19?6#?7+=;+5*44D-16453,./=?<388B-8:40$0.B:7-$&.A56'.%5%#4',>4(1'@=>"E:+3=:1$),&=&$&2.),:6-D1?D(3<0&>?/7?++$87F5A#9#9*DC>4"+7$AC>"1@97*12:8=,BCC9D;0(%(<B030:03:5-<1%>#6@'57E7%&:#A/89(8>$(@80EB+.F6,:390EF95A2&4B.+$2;D)A5/<5)>1.4D1B$=A/$7F=;$6,<(*@7092)D$E2/$10#:,#7%A)"A66F+##(3:831C0D$6C8:F-EC7B6.,>$/'D+71;=(:B+/1<%"3#C-A1)*E-#,?5,+7AF$+>CB#C$63E/=(FD+2?E1@63@$4.06'&1*50/&'-%3<.;AF:9:8'@7D,>*%A8-'=@/5';(7>3;;,@%6+F7/,@$65<B%5C?4/:32#/*%<'-'8/)'B/#.3*A>>B5CB<;?CB#B'0@%.AC?&D9<@@"$9%7"C3:@B%CAE3&1D%F4%040&$14BE36.$A6<D(F2;F>$=))?AB%<;CC7+5B9$8-$1A.A%.-5/$,F40&>3F'F%>E7D/3.95-)-8=FF>9&F65)"-*;&(C*')A06025$8"6E/EB2:B8FA6;(;-=9)<"',846=7..7D#70C?42F7@B-?6?C'5/"8'9##C"(+2;:*.B&<&-@@41).F(,'<'$6?F-B6)+5,"5D63.'D2A#)10(410<(/&9-2+.@>/<@-+$@>E1C-C0:B;"B?6?/$#00A.,?BE()*33:)<=F94%=7%-B%10E=C"+.FB/5-567E+*"A.$7F=)B(FD'<F%DB;=C?:C(C17)6=+9#79.;A):3-):.E20?D,(1B(#E:":>#;>3',1#>&'80?C113<*/?/9<.29C)9+'D0+4'DA4'6= >>> Hello Thinking... =E<+;4*A3C@CD*..*5F0"(2$@.*"6A(@,#;>52F65;:632E"/9;B;2*F31?#'2;>2+E/?"DE+;0'$25.:C;)D&42#)<8B=&651EC)1%=F97)3,7'#-$'@%,406C)*67>%4@ >>> /clear Cleared session context >>> Hello Thinking... 7=>81*<6FE@33<D7,5%7E4;,'0@+&(.8,3$0 >>> Send a message (/? for help) ``` ### Relevant log output ```shell server.log: time=2026-02-15T19:55:36.330-07:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\mitcht\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:473 msg="total blobs: 0" time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-15T19:55:36.332-07:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.16.1)" time=2026-02-15T19:55:36.334-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-15T19:55:36.363-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64278" time=2026-02-15T19:55:36.991-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-02-15T19:55:36.992-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64285" time=2026-02-15T19:55:45.615-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64297" time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64303" time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64302" time=2026-02-15T19:55:46.104-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA GeForce RTX 4070 Ti" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:54:00.0 type=discrete total="12.0 GiB" available="10.6 GiB" time=2026-02-15T19:55:46.104-07:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096 [GIN] 2026/02/15 - 19:55:46 | 200 | 526.9µs | 127.0.0.1 | HEAD "/" [GIN] 2026/02/15 - 19:55:46 | 200 | 526.9µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/02/15 - 19:55:46 | 200 | 524.8µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/02/15 - 19:56:10 | 200 | 0s | 127.0.0.1 | GET "/api/tags" time=2026-02-15T20:04:22.276-07:00 level=INFO source=download.go:179 msg="downloading 9eba2761cf0b in 20 1 GB part(s)" time=2026-02-15T20:07:27.157-07:00 level=INFO source=download.go:179 msg="downloading b1bca6ec8117 in 1 1.1 KB part(s)" time=2026-02-15T20:07:28.497-07:00 level=INFO source=download.go:179 msg="downloading d8ba2f9a17b3 in 1 18 B part(s)" time=2026-02-15T20:07:29.814-07:00 level=INFO source=download.go:179 msg="downloading 49d4bd6d5a04 in 1 486 B part(s)" [GIN] 2026/02/15 - 20:07:51 | 200 | 3m30s | 127.0.0.1 | POST "/api/pull" [GIN] 2026/02/15 - 20:07:52 | 200 | 201.2473ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/02/15 - 20:07:52 | 200 | 204.184ms | 127.0.0.1 | POST "/api/show" time=2026-02-15T20:07:52.562-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 6655" time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-02-15T20:07:52.848-07:00 level=INFO source=server.go:247 msg="enabling flash attention" time=2026-02-15T20:07:52.849-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 6661" time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.9 GiB" free_swap="107.6 GiB" time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-15T20:07:52.852-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1 time=2026-02-15T20:07:52.889-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-15T20:07:52.890-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:6661" time=2026-02-15T20:07:52.896-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:07:52.931-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39 load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-02-15T20:07:53.023-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-02-15T20:08:43.805-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:08:43.863-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:08:44.394-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU" time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-02-15T20:08:47.404-07:00 level=INFO source=server.go:1388 msg="llama runner started in 54.55 seconds" [GIN] 2026/02/15 - 20:08:47 | 200 | 55.0765122s | 127.0.0.1 | POST "/api/generate" ggml_backend_cuda_device_get_memory device GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d utilizing NVML memory reporting free: 312586240 total: 12878610432 time=2026-02-15T20:13:47.792-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54417" time=2026-02-15T20:17:20.986-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 9549" time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-02-15T20:17:21.232-07:00 level=INFO source=server.go:247 msg="enabling flash attention" time=2026-02-15T20:17:21.233-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 9554" time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.2 GiB" free_swap="106.8 GiB" time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-15T20:17:21.237-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1 time=2026-02-15T20:17:21.271-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-15T20:17:21.272-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:9554" time=2026-02-15T20:17:21.280-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.314-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39 load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-02-15T20:17:21.401-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-02-15T20:17:21.681-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.733-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.994-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU" time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-02-15T20:17:24.752-07:00 level=INFO source=server.go:1388 msg="llama runner started in 3.52 seconds" [GIN] 2026/02/15 - 20:17:44 | 200 | 23.4476178s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:19:38 | 200 | 1m25s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:22:40 | 200 | 1m52s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:26:43 | 200 | 1m46s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:32:41 | 200 | 1m42s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:33:11 | 500 | 2.6082429s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:42:08 | 200 | 8m43s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:42:29 | 200 | 13.4512325s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:42:41 | 200 | 2.3295111s | 127.0.0.1 | POST "/api/chat" ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.16.1

GiteaMirror added the bug label 2026-04-12 22:09:39 -05:00

GiteaMirror commented

2026-04-12 22:09:40 -05:00

@rick-github commented on GitHub (Feb 16, 2026):

Before you ran /clear, had there been a lot of input/output?

@rick-github commented on GitHub (Feb 16, 2026): Before you ran `/clear`, had there been a lot of input/output?

GiteaMirror commented

2026-04-12 22:09:40 -05:00

@MitchTalmadge commented on GitHub (Feb 17, 2026):

Not a ton. I had it generate a bit of code but it wasn't excessive. It hasn't done this again though.

@MitchTalmadge commented on GitHub (Feb 17, 2026): Not a ton. I had it generate a bit of code but it wasn't excessive. It hasn't done this again though.

GiteaMirror commented

2026-04-12 22:09:41 -05:00

@3abdelazim commented on GitHub (Feb 18, 2026):

What context size do you see when running ollama ps?

@3abdelazim commented on GitHub (Feb 18, 2026): What context size do you see when running `ollama ps`?

GiteaMirror commented

2026-04-12 22:09:42 -05:00

@MitchTalmadge commented on GitHub (Mar 18, 2026):

I'm sorry guys, I tried to reproduce this but couldn't get it to happen again. I'll close this for now, if anyone else experiences it maybe it can be reopened.

@MitchTalmadge commented on GitHub (Mar 18, 2026): I'm sorry guys, I tried to reproduce this but couldn't get it to happen again. I'll close this for now, if anyone else experiences it maybe it can be reopened.

GiteaMirror commented

2026-04-12 22:09:42 -05:00

@NoMansPC commented on GitHub (Apr 10, 2026):

I know this issue is closed, but it keeps happening to me and I don't know why. When it works, it's so great, but randomly, it just stops working.

@NoMansPC commented on GitHub (Apr 10, 2026): I know this issue is closed, but it keeps happening to me and I don't know why. When it works, it's so great, but randomly, it just stops working.

GiteaMirror commented

2026-04-12 22:09:43 -05:00

@MitchTalmadge commented on GitHub (Apr 10, 2026):

Good to see I'm not alone! I'll try using the model some more to see if it happens again

@MitchTalmadge commented on GitHub (Apr 10, 2026): Good to see I'm not alone! I'll try using the model some more to see if it happens again

GiteaMirror referenced this issue

2026-04-13 00:12:34 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #12911

GiteaMirror referenced this issue

2026-04-16 06:27:28 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #18182

GiteaMirror referenced this issue

2026-04-19 17:00:38 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #23451

GiteaMirror referenced this issue

2026-04-22 11:15:36 -05:00

[GH-ISSUE #8330] Using the Ollama 0.5.4 will cause the pull progress to decrease instead of increase. #31098

GiteaMirror referenced this issue

2026-04-22 11:22:46 -05:00

[GH-ISSUE #8406] Pulling models resets the download on raspberry pi 5 #31161

GiteaMirror referenced this issue

2026-04-22 11:28:22 -05:00

[GH-ISSUE #8484] Issue with Ollama Model Download: Progress Reverting During Download #31224

GiteaMirror referenced this issue

2026-04-22 11:29:56 -05:00

[GH-ISSUE #8498] Network issues with pulling model from ollama #31236

GiteaMirror referenced this issue

2026-04-22 11:44:56 -05:00

[GH-ISSUE #8632] Downloading a model with `ollama pull` or `ollama run` stalls #31350

GiteaMirror referenced this issue

2026-04-22 11:58:11 -05:00

[GH-ISSUE #8856] Unable to finish download #31501

GiteaMirror referenced this issue

2026-04-22 12:35:58 -05:00

[GH-ISSUE #9300] Ollama Model download problem (Restarting) on windows #31827

GiteaMirror referenced this issue

2026-04-22 23:26:53 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #38784

GiteaMirror referenced this issue

2026-04-24 23:41:20 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #44159

GiteaMirror referenced this issue

2026-04-28 21:04:05 -05:00

[GH-ISSUE #8330] Using the Ollama 0.5.4 will cause the pull progress to decrease instead of increase. #51849

GiteaMirror referenced this issue

2026-04-28 21:14:02 -05:00

[GH-ISSUE #8406] Pulling models resets the download on raspberry pi 5 #51912

GiteaMirror referenced this issue

2026-04-28 21:27:06 -05:00

[GH-ISSUE #8484] Issue with Ollama Model Download: Progress Reverting During Download #51975

GiteaMirror referenced this issue

2026-04-28 21:30:33 -05:00

[GH-ISSUE #8498] Network issues with pulling model from ollama #51987

GiteaMirror referenced this issue

2026-04-28 22:05:29 -05:00

[GH-ISSUE #8632] Downloading a model with `ollama pull` or `ollama run` stalls #52101

GiteaMirror referenced this issue

2026-04-28 22:40:07 -05:00

[GH-ISSUE #8856] Unable to finish download #52252

GiteaMirror referenced this issue

2026-04-28 23:45:52 -05:00

[GH-ISSUE #9300] Ollama Model download problem (Restarting) on windows #52578

GiteaMirror referenced this issue

2026-04-29 14:32:51 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #59608

GiteaMirror referenced this issue

2026-05-04 10:13:48 -05:00

[GH-ISSUE #8330] Using the Ollama 0.5.4 will cause the pull progress to decrease instead of increase. #67394

GiteaMirror referenced this issue

2026-05-04 10:26:10 -05:00

[GH-ISSUE #8406] Pulling models resets the download on raspberry pi 5 #67457

GiteaMirror referenced this issue

2026-05-04 10:38:28 -05:00

[GH-ISSUE #8484] Issue with Ollama Model Download: Progress Reverting During Download #67520

GiteaMirror referenced this issue

2026-05-04 10:41:48 -05:00

[GH-ISSUE #8498] Network issues with pulling model from ollama #67532

GiteaMirror referenced this issue

2026-05-04 11:11:43 -05:00

[GH-ISSUE #8632] Downloading a model with `ollama pull` or `ollama run` stalls #67646

GiteaMirror referenced this issue

2026-05-04 11:43:35 -05:00

[GH-ISSUE #8856] Unable to finish download #67797

GiteaMirror referenced this issue

2026-05-04 12:35:43 -05:00

[GH-ISSUE #9300] Ollama Model download problem (Restarting) on windows #68123

GiteaMirror referenced this issue

2026-05-05 07:38:09 -05:00

[PR #9294] [MERGED] server/internal: copy bmizerany/ollama-go to internal package #75205

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#9294