[GH-ISSUE #22687] issue: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 thinking tag doesnt work #58455

Closed
opened 2026-05-05 23:11:46 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @oe3gwu on GitHub (Mar 15, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22687

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.10

Ollama Version (if applicable)

use vllm

Operating System

Nvidia branded Ubuntu

Browser (if applicable)

Firefox

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When chatting with the model, a thinking should appear, not the whole text.

Actual Behavior

When chatting with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4, everything works fine but it clutters the chat with it reasoning.

Steps to Reproduce

  1. set up vllm
  2. download nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
  3. login to openwebui
  4. chat with the model

Logs & Screenshots

Image Image

webui.txt

Additional Information

name: vllm
services:
    vllm_instruct:
      image: scitrera/dgx-spark-vllm:0.17.0-t5
      # image: nvcr.io/nvidia/vllm:26.02-py3
      container_name: vllm_instruct
      restart: unless-stopped
      pull_policy: always
      shm_size: '64gb'
      ports:
        - 8000:8000
      volumes:
        - /srv/vllm/cache:/workspace/.cache
        - /srv/vllm/huggingface_cache:/models
        - /srv/vllm/vllm_templates:/templates
      entrypoint: python3

      # #####################################
      # nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
      # #####################################
      command: -m vllm.entrypoints.openai.api_server --port 8000 --host 0.0.0.0 --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 --async-scheduling --trust-remote-code --dtype auto --kv-cache-dtype fp8 --gpu-memory-utilization 0.66 --swap-space 0 --attention-backend TRITON_ATTN --enable-chunked-prefill --max-model-len 80000

      environment:
        # - NCCL_IGNORE_DISABLED_P2P=1
        # - VLLM_ATTENTION_BACKEND=FLASHINFER
        - HF_HOME=/models
        - HF_TOKEN=
      healthcheck:
        test: [ "CMD", "curl", "-f", "http://127.0.0.1:8000/v1/models" ]
        interval: 30s
        timeout: 5s
        retries: 20
      ipc: host
      deploy:
        resources:
          reservations:
            devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]

    open-webui:
        image: ghcr.io/open-webui/open-webui:latest
        container_name: open-webui
        volumes:
          - /srv/open-webui/backend/data:/app/backend/data
        ports:
          - 8080:8080
        restart: unless-stopped
        pull_policy: always
        environment:
          - HF_TOKEN=
          - WEBUI_AUTH=True
          - WEBUI_NAME=AI Weninger
          - WEBUI_SECRET_KEY=
          - OPENAI_API_BASE_URL=
          - OPENAI_API_KEY=
          - ENABLE_OLLAMA_API=False
          - ENABLE_SIGNUP=True
Originally created by @oe3gwu on GitHub (Mar 15, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22687 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.10 ### Ollama Version (if applicable) use vllm ### Operating System Nvidia branded Ubuntu ### Browser (if applicable) Firefox ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When chatting with the model, a thinking should appear, not the whole text. ### Actual Behavior When chatting with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4, everything works fine but it clutters the chat with it reasoning. ### Steps to Reproduce 1. set up vllm 2. download nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 3. login to openwebui 4. chat with the model ### Logs & Screenshots <img width="1696" height="957" alt="Image" src="https://github.com/user-attachments/assets/274cd4b4-7580-422e-9a90-9f45bdd10bec" /> <img width="1696" height="957" alt="Image" src="https://github.com/user-attachments/assets/20aa0162-f744-41b6-8094-421459c5f5ef" /> [webui.txt](https://github.com/user-attachments/files/26000711/webui.txt) ### Additional Information ```yaml name: vllm services: vllm_instruct: image: scitrera/dgx-spark-vllm:0.17.0-t5 # image: nvcr.io/nvidia/vllm:26.02-py3 container_name: vllm_instruct restart: unless-stopped pull_policy: always shm_size: '64gb' ports: - 8000:8000 volumes: - /srv/vllm/cache:/workspace/.cache - /srv/vllm/huggingface_cache:/models - /srv/vllm/vllm_templates:/templates entrypoint: python3 # ##################################### # nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 # ##################################### command: -m vllm.entrypoints.openai.api_server --port 8000 --host 0.0.0.0 --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 --async-scheduling --trust-remote-code --dtype auto --kv-cache-dtype fp8 --gpu-memory-utilization 0.66 --swap-space 0 --attention-backend TRITON_ATTN --enable-chunked-prefill --max-model-len 80000 environment: # - NCCL_IGNORE_DISABLED_P2P=1 # - VLLM_ATTENTION_BACKEND=FLASHINFER - HF_HOME=/models - HF_TOKEN= healthcheck: test: [ "CMD", "curl", "-f", "http://127.0.0.1:8000/v1/models" ] interval: 30s timeout: 5s retries: 20 ipc: host deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0'] capabilities: [gpu] open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui volumes: - /srv/open-webui/backend/data:/app/backend/data ports: - 8080:8080 restart: unless-stopped pull_policy: always environment: - HF_TOKEN= - WEBUI_AUTH=True - WEBUI_NAME=AI Weninger - WEBUI_SECRET_KEY= - OPENAI_API_BASE_URL= - OPENAI_API_KEY= - ENABLE_OLLAMA_API=False - ENABLE_SIGNUP=True ```
GiteaMirror added the bug label 2026-05-05 23:11:46 -05:00
Author
Owner

@Zambonilli commented on GitHub (Mar 15, 2026):

I was able to get nemotron-3-nano to parse correctly when I downloaded the nvidia parser and then set the plugin and parser.

Here is the URL to their reasoning parser, https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4/resolve/main/nano_v3_reasoning_parser.py

Then add these two flags to your vllm command:

  • --reasoning-parser-plugin nano_v3_reasoning_parser.py
  • --reasoning-parser nano_v3
<!-- gh-comment-id:4062037757 --> @Zambonilli commented on GitHub (Mar 15, 2026): I was able to get nemotron-3-nano to parse correctly when I downloaded the nvidia parser and then set the plugin and parser. Here is the URL to their reasoning parser, https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4/resolve/main/nano_v3_reasoning_parser.py Then add these two flags to your vllm command: - --reasoning-parser-plugin nano_v3_reasoning_parser.py - --reasoning-parser nano_v3
Author
Owner

@CYzhr commented on GitHub (Mar 15, 2026):

Hi! 👋 Building AI interfaces? AICostMonitor (https://aicostmonitor.com) helps track API costs for OpenWebUI and other LLM interfaces. Free consultation available!

<!-- gh-comment-id:4062127470 --> @CYzhr commented on GitHub (Mar 15, 2026): Hi! 👋 Building AI interfaces? **AICostMonitor** (https://aicostmonitor.com) helps track API costs for OpenWebUI and other LLM interfaces. Free consultation available!
Author
Owner

@Classic298 commented on GitHub (Mar 15, 2026):

Thanks @Zambonilli

not an open webui issue then

<!-- gh-comment-id:4062736479 --> @Classic298 commented on GitHub (Mar 15, 2026): Thanks @Zambonilli not an open webui issue then
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58455