[GH-ISSUE #14273] glm-4.7-flash emitting gibberish after a while #9294

Open
opened 2026-04-12 22:09:39 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @MitchTalmadge on GitHub (Feb 16, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14273

What is the issue?

I have just installed Ollama on my Windows desktop and chose the glm-4.7-flash model. While having it write a poem, it started outputting complete gibberish with no end in sight. I eventually stopped it. Then, all prompts, even after running /clear, would output this gibberish. After running /bye and starting it again, it works fine again.

  • CPU: Ryzen 9 5950x
  • GPU: RTX 4070Ti
  • RAM: 128GB DDR4

Here is when it started; see the line after "Stanza 5: Conclusion"

>>> /clear
Cleared session context
>>> Write a poem about chocolate
Thinking...
1.  **Analyze the Request:**
    *   **Topic:** Chocolate.
    *   **Format:** Poem.
    *   **Tone:** Open, but usually implies appreciation, indulgence, or sensory description.

2.  **Brainstorming Imagery & Themes:**
    *   *Visual:* Dark, glossy, liquid, swirling, melting, dust of cocoa, velvety.
    *   *Tactile:* Smooth, silky, melting on the tongue, cool in the hand, warm in the mouth.
    *   *Taste:* Bitter, sweet, rich, earthy, decadent, forbidden, comforting.
    *   *Emotional:* Joy, warmth, escapism, love, comfort, ritual.
    *   *Associations:* Cacao beans, melting pot, gift, afternoon treat, dessert.

3.  **Drafting - Stanza by Stanza:**

    *   *Stanza 1: Introduction/The Origin (Earthiness).*
        *   Start with the raw material. Beans from the tropics.
        *   *Drafting ideas:* It comes from the sun, dark beans. Earthy soul.
        *   *Refining:* "The soul of the earth in a dusty pod," "Forged in the heat."

    *   *Stanza 2: The Transformation (Processing).*
        *   Roasting and grinding.
        *   *Drafting ideas:* Roasted to perfection, crushed into dust, liquid gold.
        *   *Refining:* "Turned to liquid gold," "The roast and the crush."

    *   *Stanza 3: The Sensory Experience (Taste/Texture).*
        *   Eating it. The contrast of sweet and bitter.
        *   *Drafting ideas:* Melts on the tongue, velvety coat, sweet rush.
        *   *Refining:* "Velvet on the tongue," "The sweet that bites back," "Decadence."

    *   *Stanza 4: The Emotional Connection (Comfort).*
        *   Why we eat it. The feeling of warmth.
        *   *Drafting ideas:* A hug for the soul, rainy day comfort, little piece of joy.
        *   *Refining:* "A hug for the soul," "A little piece of the sun."

    *   *Stanza 5: Conclusion.*
        *   Summary%1(EC"2))068;E%,DF0.+8-3<2=#F*#752*;*@7""67.)*-+)+C719?*6FA$/#B/24-''?C"<@-*'-#6#%-?(7"*#>-049E?@B-<?2E&.$,@*=C&$4%>%.0.8/(2<'4@94C(;6,+-<@(78==-19(9%"DC=;#>DEEF03>*3=$.C?=(?,978$>"),=(@8.&<+,1E?'=8>+6?$9(.0C=5?$%-3-2=,72%*00)"69.$-@$,'3)D$D9"A$5@B&71@53&#:1A7CC-7%62>5-C(E=D8-3;,D3;@'B*">*%FE#B0BA-,9<36+64?)2..=:*5F,)0=6B/2%9,-E"(:??%+':1A1)F;44@D>+>"F.#6+9*2041$#>=54E$<=;:+*,<$<2):?.5,:31+%$C&C*46-(#E4><E=%==863CCF*E:)/$9C;B$96$E#?*751*+-(*".BD+4D(+)@7(;$.129E5F&)"'-A&.C;2;1,/B$E&#;1>E3>=<@3(F4$;0(+'#+E70$)6'D?BD'4@92A+<A=?CE&A4<@D>,8D,B7F/-.,?%%&D#50A"<:(&@9096/%20">/<><6>)56AB-;',83A'5=+93%7-4,:0,';-3C761@/=8#FC7**+(<=)@)9D/4,E87*7=43>-3F&95;8D5$>D2%C:FD5$(2'#A*-6=?.,3=?=4C;-9D9>@>;7%2$-32DB9<8./%:F5($7C?A$71%13.,6,>56*A;&1%2>,64BC"3,0A/4->87)7/8D=;/D37?<:2,7(B#/F;",3"2B(#A="D>23:5'#)C.)9"D>E480)"8A2(***&45816C830,(CF/2A)=A-'FE><2"487./5?&'&%)85>2BBA"9C9&'71>#=0&E;8CA&>E&=2$&.B<(/#2;@9(/94#=7(9(9.7,/(;+>E:4@*C/-*+E*0:)%@(7B.C:>E>2$-5-/2)%/-=/:5*"@>19)BD98*+<.E=:15%*5*26A-F/E/+$;7*4/-*(97$BE65&01C=>%<"1:6B0?5+&?C"&'B@E5<,).%(:=5/E*1""1=.1C;D2-9CC:>*3;7&7->@?&@C2E98%)CC.'>>*,%#,)@A4/%D,(A<-4==9;$*&5070(;+@E"D:+@45=9<.+AD21<-;DE&&?-*:="62.8@3)%;,=,:A91;D#>04"?3:8E/,6F:2=:$@+9)</>.)2?&<@>&$9;7(6;A'CF<.5FB5'<''<.3@*>%.6EEF>.;'DC8=%3"7-=%7B'3#.;-3C)0895C7"'527:+AD%3*<F7)7',-;==<4F-#'.-;:6A;*&6+=1-%<9;.0/1B(.<*(<69&=$"&'0)8$$80>,:18>..)1A.328=C.3"C%)/7#DC?+:(:726/2,E"71<.A-8/F<6?=DE00;&$E8/C/)3D#AA;)1=;/,+E:4#>6E82"86//2A44;A/8*?)3->/A0@2<:A-;:(*DA=5((:%=F+>:/7/&DE*B8"&%*B2E$(?3"26+/@;;'+0/3>86A7C6"'8/>>6$9?4$0"%B&)C?A%%/B?0/'+1-3?9D>3?9C,F%*::5F&48*.'/+45BB=,6A%="+2B?>;8D"C9>C&E/164=@=;;=-"8D<&>8;'5?%(<1F&2<4(#=.26,'D&$35(*=+)F:E<)=>*=3A=(5*@-F.#&&$3?%(?%.+9:;(C?+;7<;5+F,)6A1#4EFD#1@?>=/99E.8.D/FEF96=C<2@->3(8+$$8>',:>77#8>)4)(<*C?<F"F519*6<-,"&E+5&4A'#41AB?#27/FF>C?89>#$F=+C)':FD%>'B""<<@8D%;.@9/F'4?8#C18<9<#(6/&:FB=5+3$8$%)?4=(FFC/)*31'=&?#AF$;8D&==;:(B>/<C+62(33;',D48/,:4/:EA=35')0#D8:1%2,F$.319.F5??4*A<4(='$&08*#-1(&,F((";#D0"),#"1$%E62;(4:(&5;7D%B&B8B.,E"/=A8<(.49:),A?"374::":1EA5&*46D.5?%,3$:<#6-8>3*="9C+0D?&>-;0<;0@9:,*7D"6D7F3%=B,&183%9:/?B@6)8;(:+D6+2-(#"9<(.,#C9$;.&A#*:*/+A$:*:C'4D&/&<%4*7,548.#+A@5E=?9B8&C5=+.8&0E6%:<<F*CE1'%04(C6B4,?&#+8%$F1C;;(($:9*,44'%,4),&(>-14F-2%>+',(8@++-942DC@*0-'>A-A)1B00%597.<:6-95A3$E1*%*A="')>,2"F0:.4&':C=/0@E/DB+"5&4?7.)?/FA4E>+9/444-),'"+37%)7741,/.A74+CD8A7.A%)59?5C#1:50F,84@F85A;5:&D15;D=%7?&74'58'9)1F;6*$E)?+F&=D-4(1"'/B?1D522#D5)<4%D:C@(>"2+,FA.+A8;9C868-0&+."73?AF&B./=79E<A1"'*5=">DB)>3C5&9&.9&6=?(0'618->%7$&AB'$<7#3'67"AA-+?2'0B15/3=3,3>F5#:#@#@*1D*878#FE.F)8&7F;.2&F<D)7*3?>#??6B9-B'A/:<.&*@<7'*,=$"A-)71)F$D9.8F:=;'*#-#+&.C@)";2,4.0.=*E&,4+($7%,:36'D3/01<8,44"5:9#$/3;@,)(61$E7/@%8()9+51?-5B(/BB0:&4>9?D$&$B53:17#CAB,4&3..;A0,+C<$0A<@6"D@'AD+/@.(C.0D85-.$3>$D,)B*@"4-"'FEAD'B>9*>3=11"4(9'+1E=6%'1&76*&:.>C82DD:36?#CC97F9%D5A#90)2:3/,CB)79==6:D9A99+7'3B:D)&7C:-$0@-B.*(2'=-E51$454@%6%0*6#C'*.3'#/4<<,"19>E)<3$=<*=0A$BDE+?.*.B-7>%4+1D#?&E<+0#F2.;)0'.9<*"?*@4;?*%=468;>?E;;&EF-C+:+*.?3$?E%.BDCE5($&><,C+76<:/@9+"+2$/>*6814&:7-,E8/F#)894<$=:C("C'3'4,/%(9>";E$+6BF"4086=(,0$#"(+$":<=C"=,7&E<E8;A$602&08".9E;)+/7"26;'+/3E8**.+)@%'-0&4>9#6;/7)&'&,7(%"+$4AB3A27=<C$*A(B5')<:)&??(C(+3.3@03)/428FF&B*'&(D4+@/+-(+26(/A7DE+*3@*+*=9';3299@">4F=-8-6/6'A,'+E4-E);"-5?F&/+48)',2?C&AE<%A<.6-#)0B>A1;<".=/>9,'"5*<E5,8<6'@'D;76)F9)05(%3A-E3.>FE:5(D#$5410,:52=0;20:<A,B<%$:.C/F0/D(/;4D+8?*?-+*<"A03DF#)(083%BC##C6;E770;58,<-:)=/2=7A3'81F>)=(,30;@2,'2$1::/65D41B173,9<?0*,-<A1*0$>>9&?:'AB8B">A4"81@9%F;0<2D#)2>001B"31<&(=?3E8@&:-"B0)*#9"?.A97#:D%250:(5"0AA#5,"E7=(B#(1*?';6DEC;;;1(4=&=5)1$-))1$>3"7%,A(E:/7)&*$7**F6)D4&2B7,)A3B961*'=$1()2#EC/A###C@+*6B"BBE,$)(A<;64;&1>797CD)553+$$F&C3@.#62)<10*F3:>&6%&,E;84<74(0'=05%#.54/.,DE(D"4>6E/<BB"($$/$%#E,3',326=%,*4++3@9=A:/C;$%=($$*%E@D0<(#3B&D'78</@%?+8'C>>9/A@B?<)7:58<(0&-.06F)8?C<7D0)+88EFF'"(.D,'-><#*AA7?69(@(F/'?5<+9@42&F9=B.9#@0.0<"-$+$&#:-5%??B8FC*712%)/32A(&/A(9D=5*#688""6@22A)5BB)&C-93@;>-$E3:33),@%7%35/2/E'-(*>C"5/+?F=@/-*"=0%F&53;?@)A875")2C"$-40:;C)B9B-9,+8426"E*8*,-/E6%F"#;@9?1>-4A%2"-$B>E;F$'@DA874$()47>@>9%C%@&)0#,B:05#)2>40"A71#<74<$A$.C@++<3-%4?7D)5;,1>D=<C8E+=#B.*7=%0?3,/38)AA3;6(*"#.DC%*8:A();6DA>A2&?&6$=A/04#0=11F8.&:-)3;D2$*3.8:/C.=<<&*;7)+>14(<,3:B::%";5B&48+-,<:%9(3E5+3@;59&CD?1?"=4"46">==.,3(-898?-B-7@#A>'9(**%6.*6-A70B</'8''1*64/8/4648?6?.E((FC"6E%%=83#28=009))C->@@C1<6;F%>3/#.<F.(2/=C.<'&,7D.)5B<((9))9D';/EA9A3F3)'5$D->D,1B$?)59BB/'8./B91B)(.51;;(?2<-4/F0*2$,9$-9:433)D5D6#$##(+$:+"CF,78?=>)EC&65.;D,50(+3:<%&04;?4-%+*%D"A:F,E&?8A:.+'';7,B.=$#0:2B?+*,(/(C20&$<D8C>"6*@7#+.&<01*E,$)F$+=-%-)2C6D;C8'C<%9@?"%B3C3F@&2E>8%:97A&%/>&,1:;C1;)B>+:A1)856'"3(@:>E0?'5&5#*,-F$$/),>.A*72'#;F##+%F)A&0B)<7*+E=?E"%&"/""3B2>E783*:"52'<.@%+91@//37)?-33/*0&<D&&<=%#B#*.C8E%7:&-%*(@&;103/3DF3;"+%B'.1.>269=3F)876A9)2F#/>=19)6?#?67.)?<&2+B?*@"8/#9;2B9F$)*D)5207')&7-3&A/B//*'%"#F*$7&-%(=$:5%)%E/'4AD25%B8<<05.%3#68$B7&-1C,#;+.;D;B&2,+8E=/;416B??,0*/7F@A<>?<;'E7-19"8,A*3;.$=$?7',E:08+$5;'"0215%88AE5/2A.#.%/E3-C-:85-,%?%8.1CE>->=966>1897*:;A94#(:;/:9:.4-D1"<2A;B:F%/#@F60DD<A(8.(6?,19?6#?7+=;+5*44D-16453,./=?<388B-8:40$0.B:7-$&.A56'.%5%#4',>4(1'@=>"E:+3=:1$),&=&$&2.),:6-D1?D(3<0&>?/7?++$87F5A#9#9*DC>4"+7$AC>"1@97*12:8=,BCC9D;0(%(<B030:03:5-<1%>#6@'57E7%&:#A/89(8>$(@80EB+.F6,:390EF95A2&4B.+$2;D)A5/<5)>1.4D1B$=A/$7F=;$6,<(*@7092)D$E2/$10#:,#7%A)"A66F+##(3:831C0D$6C8:F-EC7B6.,>$/'D+71;=(:B+/1<%"3#C-A1)*E-#,?5,+7AF$+>CB#C$63E/=(FD+2?E1@63@$4.06'&1*50/&'-%3<.;AF:9:8'@7D,>*%A8-'=@/5';(7>3;;,@%6+F7/,@$65<B%5C?4/:32#/*%<'-'8/)'B/#.3*A>>B5CB<;?CB#B'0@%.AC?&D9<@@"$9%7"C3:@B%CAE3&1D%F4%040&$14BE36.$A6<D(F2;F>$=))?AB%<;CC7+5B9$8-$1A.A%.-5/$,F40&>3F'F%>E7D/3.95-)-8=FF>9&F65)"-*;&(C*')A06025$8"6E/EB2:B8FA6;(;-=9)<"',846=7..7D#70C?42F7@B-?6?C'5/"8'9##C"(+2;:*.B&<&-@@41).F(,'<'$6?F-B6)+5,"5D63.'D2A#)10(410<(/&9-2+.@>/<@-+$@>E1C-C0:B;"B?6?/$#00A.,?BE()*33:)<=F94%=7%-B%10E=C"+.FB/5-567E+*"A.$7F=)B(FD'<F%DB;=C?:C(C17)6=+9#79.;A):3-):.E20?D,(1B(#E:":>#;>3',1#>&'80?C113<*/?/9<.29C)9+'D0+4'DA4'6=

>>> Hello
Thinking...
=E<+;4*A3C@CD*..*5F0"(2$@.*"6A(@,#;>52F65;:632E"/9;B;2*F31?#'2;>2+E/?"DE+;0'$25.:C;)D&42#)<8B=&651EC)1%=F97)3,7'#-$'@%,406C)*67>%4@

>>> /clear
Cleared session context
>>> Hello
Thinking...
7=>81*<6FE@33<D7,5%7E4;,'0@+&(.8,3$0

>>> Send a message (/? for help)

Relevant log output

server.log:

time=2026-02-15T19:55:36.330-07:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\mitcht\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:473 msg="total blobs: 0"
time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-15T19:55:36.332-07:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.16.1)"
time=2026-02-15T19:55:36.334-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-15T19:55:36.363-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64278"
time=2026-02-15T19:55:36.991-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-02-15T19:55:36.992-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64285"
time=2026-02-15T19:55:45.615-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64297"
time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64303"
time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64302"
time=2026-02-15T19:55:46.104-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA GeForce RTX 4070 Ti" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:54:00.0 type=discrete total="12.0 GiB" available="10.6 GiB"
time=2026-02-15T19:55:46.104-07:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096
[GIN] 2026/02/15 - 19:55:46 | 200 |       526.9µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/02/15 - 19:55:46 | 200 |       526.9µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/02/15 - 19:55:46 | 200 |       524.8µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/02/15 - 19:56:10 | 200 |            0s |       127.0.0.1 | GET      "/api/tags"
time=2026-02-15T20:04:22.276-07:00 level=INFO source=download.go:179 msg="downloading 9eba2761cf0b in 20 1 GB part(s)"
time=2026-02-15T20:07:27.157-07:00 level=INFO source=download.go:179 msg="downloading b1bca6ec8117 in 1 1.1 KB part(s)"
time=2026-02-15T20:07:28.497-07:00 level=INFO source=download.go:179 msg="downloading d8ba2f9a17b3 in 1 18 B part(s)"
time=2026-02-15T20:07:29.814-07:00 level=INFO source=download.go:179 msg="downloading 49d4bd6d5a04 in 1 486 B part(s)"
[GIN] 2026/02/15 - 20:07:51 | 200 |         3m30s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2026/02/15 - 20:07:52 | 200 |    201.2473ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/02/15 - 20:07:52 | 200 |     204.184ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-15T20:07:52.562-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 6655"
time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-02-15T20:07:52.848-07:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-02-15T20:07:52.849-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 6661"
time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.9 GiB" free_swap="107.6 GiB"
time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-15T20:07:52.852-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1
time=2026-02-15T20:07:52.889-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-15T20:07:52.890-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:6661"
time=2026-02-15T20:07:52.896-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:07:52.931-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39
load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d
load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-02-15T20:07:53.023-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-02-15T20:08:43.805-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:08:43.863-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:08:44.394-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-15T20:08:47.404-07:00 level=INFO source=server.go:1388 msg="llama runner started in 54.55 seconds"
[GIN] 2026/02/15 - 20:08:47 | 200 |   55.0765122s |       127.0.0.1 | POST     "/api/generate"
ggml_backend_cuda_device_get_memory device GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d utilizing NVML memory reporting free: 312586240 total: 12878610432
time=2026-02-15T20:13:47.792-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54417"
time=2026-02-15T20:17:20.986-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 9549"
time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-02-15T20:17:21.232-07:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-02-15T20:17:21.233-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 9554"
time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.2 GiB" free_swap="106.8 GiB"
time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-15T20:17:21.237-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1
time=2026-02-15T20:17:21.271-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-15T20:17:21.272-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:9554"
time=2026-02-15T20:17:21.280-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.314-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39
load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d
load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-02-15T20:17:21.401-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-02-15T20:17:21.681-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.733-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-15T20:17:24.752-07:00 level=INFO source=server.go:1388 msg="llama runner started in 3.52 seconds"
[GIN] 2026/02/15 - 20:17:44 | 200 |   23.4476178s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:19:38 | 200 |         1m25s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:22:40 | 200 |         1m52s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:26:43 | 200 |         1m46s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:32:41 | 200 |         1m42s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:33:11 | 500 |    2.6082429s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:42:08 | 200 |         8m43s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:42:29 | 200 |   13.4512325s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/15 - 20:42:41 | 200 |    2.3295111s |       127.0.0.1 | POST     "/api/chat"

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.16.1

Originally created by @MitchTalmadge on GitHub (Feb 16, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14273 ### What is the issue? I have just installed Ollama on my Windows desktop and chose the [glm-4.7-flash](https://ollama.com/library/glm-4.7-flash) model. While having it write a poem, it started outputting complete gibberish with no end in sight. I eventually stopped it. Then, all prompts, even after running `/clear`, would output this gibberish. After running `/bye` and starting it again, it works fine again. - CPU: Ryzen 9 5950x - GPU: RTX 4070Ti - RAM: 128GB DDR4 Here is when it started; see the line after "Stanza 5: Conclusion" ``` >>> /clear Cleared session context >>> Write a poem about chocolate Thinking... 1. **Analyze the Request:** * **Topic:** Chocolate. * **Format:** Poem. * **Tone:** Open, but usually implies appreciation, indulgence, or sensory description. 2. **Brainstorming Imagery & Themes:** * *Visual:* Dark, glossy, liquid, swirling, melting, dust of cocoa, velvety. * *Tactile:* Smooth, silky, melting on the tongue, cool in the hand, warm in the mouth. * *Taste:* Bitter, sweet, rich, earthy, decadent, forbidden, comforting. * *Emotional:* Joy, warmth, escapism, love, comfort, ritual. * *Associations:* Cacao beans, melting pot, gift, afternoon treat, dessert. 3. **Drafting - Stanza by Stanza:** * *Stanza 1: Introduction/The Origin (Earthiness).* * Start with the raw material. Beans from the tropics. * *Drafting ideas:* It comes from the sun, dark beans. Earthy soul. * *Refining:* "The soul of the earth in a dusty pod," "Forged in the heat." * *Stanza 2: The Transformation (Processing).* * Roasting and grinding. * *Drafting ideas:* Roasted to perfection, crushed into dust, liquid gold. * *Refining:* "Turned to liquid gold," "The roast and the crush." * *Stanza 3: The Sensory Experience (Taste/Texture).* * Eating it. The contrast of sweet and bitter. * *Drafting ideas:* Melts on the tongue, velvety coat, sweet rush. * *Refining:* "Velvet on the tongue," "The sweet that bites back," "Decadence." * *Stanza 4: The Emotional Connection (Comfort).* * Why we eat it. The feeling of warmth. * *Drafting ideas:* A hug for the soul, rainy day comfort, little piece of joy. * *Refining:* "A hug for the soul," "A little piece of the sun." * *Stanza 5: Conclusion.* * Summary%1(EC"2))068;E%,DF0.+8-3<2=#F*#752*;*@7""67.)*-+)+C719?*6FA$/#B/24-''?C"<@-*'-#6#%-?(7"*#>-049E?@B-<?2E&.$,@*=C&$4%>%.0.8/(2<'4@94C(;6,+-<@(78==-19(9%"DC=;#>DEEF03>*3=$.C?=(?,978$>"),=(@8.&<+,1E?'=8>+6?$9(.0C=5?$%-3-2=,72%*00)"69.$-@$,'3)D$D9"A$5@B&71@53&#:1A7CC-7%62>5-C(E=D8-3;,D3;@'B*">*%FE#B0BA-,9<36+64?)2..=:*5F,)0=6B/2%9,-E"(:??%+':1A1)F;44@D>+>"F.#6+9*2041$#>=54E$<=;:+*,<$<2):?.5,:31+%$C&C*46-(#E4><E=%==863CCF*E:)/$9C;B$96$E#?*751*+-(*".BD+4D(+)@7(;$.129E5F&)"'-A&.C;2;1,/B$E&#;1>E3>=<@3(F4$;0(+'#+E70$)6'D?BD'4@92A+<A=?CE&A4<@D>,8D,B7F/-.,?%%&D#50A"<:(&@9096/%20">/<><6>)56AB-;',83A'5=+93%7-4,:0,';-3C761@/=8#FC7**+(<=)@)9D/4,E87*7=43>-3F&95;8D5$>D2%C:FD5$(2'#A*-6=?.,3=?=4C;-9D9>@>;7%2$-32DB9<8./%:F5($7C?A$71%13.,6,>56*A;&1%2>,64BC"3,0A/4->87)7/8D=;/D37?<:2,7(B#/F;",3"2B(#A="D>23:5'#)C.)9"D>E480)"8A2(***&45816C830,(CF/2A)=A-'FE><2"487./5?&'&%)85>2BBA"9C9&'71>#=0&E;8CA&>E&=2$&.B<(/#2;@9(/94#=7(9(9.7,/(;+>E:4@*C/-*+E*0:)%@(7B.C:>E>2$-5-/2)%/-=/:5*"@>19)BD98*+<.E=:15%*5*26A-F/E/+$;7*4/-*(97$BE65&01C=>%<"1:6B0?5+&?C"&'B@E5<,).%(:=5/E*1""1=.1C;D2-9CC:>*3;7&7->@?&@C2E98%)CC.'>>*,%#,)@A4/%D,(A<-4==9;$*&5070(;+@E"D:+@45=9<.+AD21<-;DE&&?-*:="62.8@3)%;,=,:A91;D#>04"?3:8E/,6F:2=:$@+9)</>.)2?&<@>&$9;7(6;A'CF<.5FB5'<''<.3@*>%.6EEF>.;'DC8=%3"7-=%7B'3#.;-3C)0895C7"'527:+AD%3*<F7)7',-;==<4F-#'.-;:6A;*&6+=1-%<9;.0/1B(.<*(<69&=$"&'0)8$$80>,:18>..)1A.328=C.3"C%)/7#DC?+:(:726/2,E"71<.A-8/F<6?=DE00;&$E8/C/)3D#AA;)1=;/,+E:4#>6E82"86//2A44;A/8*?)3->/A0@2<:A-;:(*DA=5((:%=F+>:/7/&DE*B8"&%*B2E$(?3"26+/@;;'+0/3>86A7C6"'8/>>6$9?4$0"%B&)C?A%%/B?0/'+1-3?9D>3?9C,F%*::5F&48*.'/+45BB=,6A%="+2B?>;8D"C9>C&E/164=@=;;=-"8D<&>8;'5?%(<1F&2<4(#=.26,'D&$35(*=+)F:E<)=>*=3A=(5*@-F.#&&$3?%(?%.+9:;(C?+;7<;5+F,)6A1#4EFD#1@?>=/99E.8.D/FEF96=C<2@->3(8+$$8>',:>77#8>)4)(<*C?<F"F519*6<-,"&E+5&4A'#41AB?#27/FF>C?89>#$F=+C)':FD%>'B""<<@8D%;.@9/F'4?8#C18<9<#(6/&:FB=5+3$8$%)?4=(FFC/)*31'=&?#AF$;8D&==;:(B>/<C+62(33;',D48/,:4/:EA=35')0#D8:1%2,F$.319.F5??4*A<4(='$&08*#-1(&,F((";#D0"),#"1$%E62;(4:(&5;7D%B&B8B.,E"/=A8<(.49:),A?"374::":1EA5&*46D.5?%,3$:<#6-8>3*="9C+0D?&>-;0<;0@9:,*7D"6D7F3%=B,&183%9:/?B@6)8;(:+D6+2-(#"9<(.,#C9$;.&A#*:*/+A$:*:C'4D&/&<%4*7,548.#+A@5E=?9B8&C5=+.8&0E6%:<<F*CE1'%04(C6B4,?&#+8%$F1C;;(($:9*,44'%,4),&(>-14F-2%>+',(8@++-942DC@*0-'>A-A)1B00%597.<:6-95A3$E1*%*A="')>,2"F0:.4&':C=/0@E/DB+"5&4?7.)?/FA4E>+9/444-),'"+37%)7741,/.A74+CD8A7.A%)59?5C#1:50F,84@F85A;5:&D15;D=%7?&74'58'9)1F;6*$E)?+F&=D-4(1"'/B?1D522#D5)<4%D:C@(>"2+,FA.+A8;9C868-0&+."73?AF&B./=79E<A1"'*5=">DB)>3C5&9&.9&6=?(0'618->%7$&AB'$<7#3'67"AA-+?2'0B15/3=3,3>F5#:#@#@*1D*878#FE.F)8&7F;.2&F<D)7*3?>#??6B9-B'A/:<.&*@<7'*,=$"A-)71)F$D9.8F:=;'*#-#+&.C@)";2,4.0.=*E&,4+($7%,:36'D3/01<8,44"5:9#$/3;@,)(61$E7/@%8()9+51?-5B(/BB0:&4>9?D$&$B53:17#CAB,4&3..;A0,+C<$0A<@6"D@'AD+/@.(C.0D85-.$3>$D,)B*@"4-"'FEAD'B>9*>3=11"4(9'+1E=6%'1&76*&:.>C82DD:36?#CC97F9%D5A#90)2:3/,CB)79==6:D9A99+7'3B:D)&7C:-$0@-B.*(2'=-E51$454@%6%0*6#C'*.3'#/4<<,"19>E)<3$=<*=0A$BDE+?.*.B-7>%4+1D#?&E<+0#F2.;)0'.9<*"?*@4;?*%=468;>?E;;&EF-C+:+*.?3$?E%.BDCE5($&><,C+76<:/@9+"+2$/>*6814&:7-,E8/F#)894<$=:C("C'3'4,/%(9>";E$+6BF"4086=(,0$#"(+$":<=C"=,7&E<E8;A$602&08".9E;)+/7"26;'+/3E8**.+)@%'-0&4>9#6;/7)&'&,7(%"+$4AB3A27=<C$*A(B5')<:)&??(C(+3.3@03)/428FF&B*'&(D4+@/+-(+26(/A7DE+*3@*+*=9';3299@">4F=-8-6/6'A,'+E4-E);"-5?F&/+48)',2?C&AE<%A<.6-#)0B>A1;<".=/>9,'"5*<E5,8<6'@'D;76)F9)05(%3A-E3.>FE:5(D#$5410,:52=0;20:<A,B<%$:.C/F0/D(/;4D+8?*?-+*<"A03DF#)(083%BC##C6;E770;58,<-:)=/2=7A3'81F>)=(,30;@2,'2$1::/65D41B173,9<?0*,-<A1*0$>>9&?:'AB8B">A4"81@9%F;0<2D#)2>001B"31<&(=?3E8@&:-"B0)*#9"?.A97#:D%250:(5"0AA#5,"E7=(B#(1*?';6DEC;;;1(4=&=5)1$-))1$>3"7%,A(E:/7)&*$7**F6)D4&2B7,)A3B961*'=$1()2#EC/A###C@+*6B"BBE,$)(A<;64;&1>797CD)553+$$F&C3@.#62)<10*F3:>&6%&,E;84<74(0'=05%#.54/.,DE(D"4>6E/<BB"($$/$%#E,3',326=%,*4++3@9=A:/C;$%=($$*%E@D0<(#3B&D'78</@%?+8'C>>9/A@B?<)7:58<(0&-.06F)8?C<7D0)+88EFF'"(.D,'-><#*AA7?69(@(F/'?5<+9@42&F9=B.9#@0.0<"-$+$&#:-5%??B8FC*712%)/32A(&/A(9D=5*#688""6@22A)5BB)&C-93@;>-$E3:33),@%7%35/2/E'-(*>C"5/+?F=@/-*"=0%F&53;?@)A875")2C"$-40:;C)B9B-9,+8426"E*8*,-/E6%F"#;@9?1>-4A%2"-$B>E;F$'@DA874$()47>@>9%C%@&)0#,B:05#)2>40"A71#<74<$A$.C@++<3-%4?7D)5;,1>D=<C8E+=#B.*7=%0?3,/38)AA3;6(*"#.DC%*8:A();6DA>A2&?&6$=A/04#0=11F8.&:-)3;D2$*3.8:/C.=<<&*;7)+>14(<,3:B::%";5B&48+-,<:%9(3E5+3@;59&CD?1?"=4"46">==.,3(-898?-B-7@#A>'9(**%6.*6-A70B</'8''1*64/8/4648?6?.E((FC"6E%%=83#28=009))C->@@C1<6;F%>3/#.<F.(2/=C.<'&,7D.)5B<((9))9D';/EA9A3F3)'5$D->D,1B$?)59BB/'8./B91B)(.51;;(?2<-4/F0*2$,9$-9:433)D5D6#$##(+$:+"CF,78?=>)EC&65.;D,50(+3:<%&04;?4-%+*%D"A:F,E&?8A:.+'';7,B.=$#0:2B?+*,(/(C20&$<D8C>"6*@7#+.&<01*E,$)F$+=-%-)2C6D;C8'C<%9@?"%B3C3F@&2E>8%:97A&%/>&,1:;C1;)B>+:A1)856'"3(@:>E0?'5&5#*,-F$$/),>.A*72'#;F##+%F)A&0B)<7*+E=?E"%&"/""3B2>E783*:"52'<.@%+91@//37)?-33/*0&<D&&<=%#B#*.C8E%7:&-%*(@&;103/3DF3;"+%B'.1.>269=3F)876A9)2F#/>=19)6?#?67.)?<&2+B?*@"8/#9;2B9F$)*D)5207')&7-3&A/B//*'%"#F*$7&-%(=$:5%)%E/'4AD25%B8<<05.%3#68$B7&-1C,#;+.;D;B&2,+8E=/;416B??,0*/7F@A<>?<;'E7-19"8,A*3;.$=$?7',E:08+$5;'"0215%88AE5/2A.#.%/E3-C-:85-,%?%8.1CE>->=966>1897*:;A94#(:;/:9:.4-D1"<2A;B:F%/#@F60DD<A(8.(6?,19?6#?7+=;+5*44D-16453,./=?<388B-8:40$0.B:7-$&.A56'.%5%#4',>4(1'@=>"E:+3=:1$),&=&$&2.),:6-D1?D(3<0&>?/7?++$87F5A#9#9*DC>4"+7$AC>"1@97*12:8=,BCC9D;0(%(<B030:03:5-<1%>#6@'57E7%&:#A/89(8>$(@80EB+.F6,:390EF95A2&4B.+$2;D)A5/<5)>1.4D1B$=A/$7F=;$6,<(*@7092)D$E2/$10#:,#7%A)"A66F+##(3:831C0D$6C8:F-EC7B6.,>$/'D+71;=(:B+/1<%"3#C-A1)*E-#,?5,+7AF$+>CB#C$63E/=(FD+2?E1@63@$4.06'&1*50/&'-%3<.;AF:9:8'@7D,>*%A8-'=@/5';(7>3;;,@%6+F7/,@$65<B%5C?4/:32#/*%<'-'8/)'B/#.3*A>>B5CB<;?CB#B'0@%.AC?&D9<@@"$9%7"C3:@B%CAE3&1D%F4%040&$14BE36.$A6<D(F2;F>$=))?AB%<;CC7+5B9$8-$1A.A%.-5/$,F40&>3F'F%>E7D/3.95-)-8=FF>9&F65)"-*;&(C*')A06025$8"6E/EB2:B8FA6;(;-=9)<"',846=7..7D#70C?42F7@B-?6?C'5/"8'9##C"(+2;:*.B&<&-@@41).F(,'<'$6?F-B6)+5,"5D63.'D2A#)10(410<(/&9-2+.@>/<@-+$@>E1C-C0:B;"B?6?/$#00A.,?BE()*33:)<=F94%=7%-B%10E=C"+.FB/5-567E+*"A.$7F=)B(FD'<F%DB;=C?:C(C17)6=+9#79.;A):3-):.E20?D,(1B(#E:":>#;>3',1#>&'80?C113<*/?/9<.29C)9+'D0+4'DA4'6= >>> Hello Thinking... =E<+;4*A3C@CD*..*5F0"(2$@.*"6A(@,#;>52F65;:632E"/9;B;2*F31?#'2;>2+E/?"DE+;0'$25.:C;)D&42#)<8B=&651EC)1%=F97)3,7'#-$'@%,406C)*67>%4@ >>> /clear Cleared session context >>> Hello Thinking... 7=>81*<6FE@33<D7,5%7E4;,'0@+&(.8,3$0 >>> Send a message (/? for help) ``` ### Relevant log output ```shell server.log: time=2026-02-15T19:55:36.330-07:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\mitcht\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:473 msg="total blobs: 0" time=2026-02-15T19:55:36.332-07:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-15T19:55:36.332-07:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.16.1)" time=2026-02-15T19:55:36.334-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-15T19:55:36.363-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64278" time=2026-02-15T19:55:36.991-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-02-15T19:55:36.992-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64285" time=2026-02-15T19:55:45.615-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64297" time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64303" time=2026-02-15T19:55:45.918-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 64302" time=2026-02-15T19:55:46.104-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA GeForce RTX 4070 Ti" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:54:00.0 type=discrete total="12.0 GiB" available="10.6 GiB" time=2026-02-15T19:55:46.104-07:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096 [GIN] 2026/02/15 - 19:55:46 | 200 | 526.9µs | 127.0.0.1 | HEAD "/" [GIN] 2026/02/15 - 19:55:46 | 200 | 526.9µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/02/15 - 19:55:46 | 200 | 524.8µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/02/15 - 19:56:10 | 200 | 0s | 127.0.0.1 | GET "/api/tags" time=2026-02-15T20:04:22.276-07:00 level=INFO source=download.go:179 msg="downloading 9eba2761cf0b in 20 1 GB part(s)" time=2026-02-15T20:07:27.157-07:00 level=INFO source=download.go:179 msg="downloading b1bca6ec8117 in 1 1.1 KB part(s)" time=2026-02-15T20:07:28.497-07:00 level=INFO source=download.go:179 msg="downloading d8ba2f9a17b3 in 1 18 B part(s)" time=2026-02-15T20:07:29.814-07:00 level=INFO source=download.go:179 msg="downloading 49d4bd6d5a04 in 1 486 B part(s)" [GIN] 2026/02/15 - 20:07:51 | 200 | 3m30s | 127.0.0.1 | POST "/api/pull" [GIN] 2026/02/15 - 20:07:52 | 200 | 201.2473ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/02/15 - 20:07:52 | 200 | 204.184ms | 127.0.0.1 | POST "/api/show" time=2026-02-15T20:07:52.562-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 6655" time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-15T20:07:52.747-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-02-15T20:07:52.848-07:00 level=INFO source=server.go:247 msg="enabling flash attention" time=2026-02-15T20:07:52.849-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 6661" time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.9 GiB" free_swap="107.6 GiB" time=2026-02-15T20:07:52.852-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-15T20:07:52.852-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1 time=2026-02-15T20:07:52.889-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-15T20:07:52.890-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:6661" time=2026-02-15T20:07:52.896-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:07:52.931-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39 load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-02-15T20:07:53.023-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-02-15T20:08:43.805-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:08:43.863-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:08:44.394-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU" time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-02-15T20:08:44.395-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB" time=2026-02-15T20:08:44.395-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-02-15T20:08:44.395-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-02-15T20:08:47.404-07:00 level=INFO source=server.go:1388 msg="llama runner started in 54.55 seconds" [GIN] 2026/02/15 - 20:08:47 | 200 | 55.0765122s | 127.0.0.1 | POST "/api/generate" ggml_backend_cuda_device_get_memory device GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d utilizing NVML memory reporting free: 312586240 total: 12878610432 time=2026-02-15T20:13:47.792-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54417" time=2026-02-15T20:17:20.986-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 9549" time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-15T20:17:21.145-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-02-15T20:17:21.232-07:00 level=INFO source=server.go:247 msg="enabling flash attention" time=2026-02-15T20:17:21.233-07:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\mitcht\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\mitcht\\.ollama\\models\\blobs\\sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 9554" time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:463 msg="system memory" total="127.9 GiB" free="101.2 GiB" free_swap="106.8 GiB" time=2026-02-15T20:17:21.237-07:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d library=CUDA available="9.9 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-15T20:17:21.237-07:00 level=INFO source=server.go:757 msg="loading model" "model layers"=48 requested=-1 time=2026-02-15T20:17:21.271-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-15T20:17:21.272-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:9554" time=2026-02-15T20:17:21.280-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.314-07:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39 load_backend: loaded CPU backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, ID: GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d load_backend: loaded CUDA backend from C:\Users\mitcht\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-02-15T20:17:21.401-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-02-15T20:17:21.681-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.733-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.994-07:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:24[ID:GPU-dfbb6591-91db-65e5-ea99-49e51d733d1d Layers:24(23..46)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU" time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-02-15T20:17:21.994-07:00 level=INFO source=ggml.go:494 msg="offloaded 24/48 layers to GPU" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.3 GiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="8.4 GiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="204.0 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="195.5 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="272.0 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=device.go:272 msg="total memory" size="18.4 GiB" time=2026-02-15T20:17:21.994-07:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-02-15T20:17:21.994-07:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-02-15T20:17:24.752-07:00 level=INFO source=server.go:1388 msg="llama runner started in 3.52 seconds" [GIN] 2026/02/15 - 20:17:44 | 200 | 23.4476178s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:19:38 | 200 | 1m25s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:22:40 | 200 | 1m52s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:26:43 | 200 | 1m46s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:32:41 | 200 | 1m42s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:33:11 | 500 | 2.6082429s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:42:08 | 200 | 8m43s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:42:29 | 200 | 13.4512325s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/15 - 20:42:41 | 200 | 2.3295111s | 127.0.0.1 | POST "/api/chat" ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.16.1
GiteaMirror added the bug label 2026-04-12 22:09:39 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 16, 2026):

Before you ran /clear, had there been a lot of input/output?

<!-- gh-comment-id:3910188108 --> @rick-github commented on GitHub (Feb 16, 2026): Before you ran `/clear`, had there been a lot of input/output?
Author
Owner

@MitchTalmadge commented on GitHub (Feb 17, 2026):

Not a ton. I had it generate a bit of code but it wasn't excessive. It hasn't done this again though.

<!-- gh-comment-id:3912202344 --> @MitchTalmadge commented on GitHub (Feb 17, 2026): Not a ton. I had it generate a bit of code but it wasn't excessive. It hasn't done this again though.
Author
Owner

@3abdelazim commented on GitHub (Feb 18, 2026):

What context size do you see when running ollama ps?

<!-- gh-comment-id:3923734529 --> @3abdelazim commented on GitHub (Feb 18, 2026): What context size do you see when running `ollama ps`?
Author
Owner

@MitchTalmadge commented on GitHub (Mar 18, 2026):

I'm sorry guys, I tried to reproduce this but couldn't get it to happen again. I'll close this for now, if anyone else experiences it maybe it can be reopened.

<!-- gh-comment-id:4078987974 --> @MitchTalmadge commented on GitHub (Mar 18, 2026): I'm sorry guys, I tried to reproduce this but couldn't get it to happen again. I'll close this for now, if anyone else experiences it maybe it can be reopened.
Author
Owner

@NoMansPC commented on GitHub (Apr 10, 2026):

I know this issue is closed, but it keeps happening to me and I don't know why. When it works, it's so great, but randomly, it just stops working.

<!-- gh-comment-id:4225353089 --> @NoMansPC commented on GitHub (Apr 10, 2026): I know this issue is closed, but it keeps happening to me and I don't know why. When it works, it's so great, but randomly, it just stops working.
Author
Owner

@MitchTalmadge commented on GitHub (Apr 10, 2026):

Good to see I'm not alone! I'll try using the model some more to see if it happens again

<!-- gh-comment-id:4226019726 --> @MitchTalmadge commented on GitHub (Apr 10, 2026): Good to see I'm not alone! I'll try using the model some more to see if it happens again
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9294