[GH-ISSUE #22687] issue: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 thinking tag doesnt work #58455

New Issue

GiteaMirror · 2026-05-05T23:11:46-05:00

GiteaMirror commented

2026-05-05 23:11:46 -05:00

Originally created by @oe3gwu on GitHub (Mar 15, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22687

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.10

Ollama Version (if applicable)

use vllm

Operating System

Nvidia branded Ubuntu

Browser (if applicable)

Firefox

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When chatting with the model, a thinking should appear, not the whole text.

Actual Behavior

When chatting with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4, everything works fine but it clutters the chat with it reasoning.

Steps to Reproduce

set up vllm
download nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
login to openwebui
chat with the model

Logs & Screenshots

webui.txt

Additional Information

name: vllm
services:
    vllm_instruct:
      image: scitrera/dgx-spark-vllm:0.17.0-t5
      # image: nvcr.io/nvidia/vllm:26.02-py3
      container_name: vllm_instruct
      restart: unless-stopped
      pull_policy: always
      shm_size: '64gb'
      ports:
        - 8000:8000
      volumes:
        - /srv/vllm/cache:/workspace/.cache
        - /srv/vllm/huggingface_cache:/models
        - /srv/vllm/vllm_templates:/templates
      entrypoint: python3

      # #####################################
      # nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
      # #####################################
      command: -m vllm.entrypoints.openai.api_server --port 8000 --host 0.0.0.0 --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 --async-scheduling --trust-remote-code --dtype auto --kv-cache-dtype fp8 --gpu-memory-utilization 0.66 --swap-space 0 --attention-backend TRITON_ATTN --enable-chunked-prefill --max-model-len 80000

      environment:
        # - NCCL_IGNORE_DISABLED_P2P=1
        # - VLLM_ATTENTION_BACKEND=FLASHINFER
        - HF_HOME=/models
        - HF_TOKEN=
      healthcheck:
        test: [ "CMD", "curl", "-f", "http://127.0.0.1:8000/v1/models" ]
        interval: 30s
        timeout: 5s
        retries: 20
      ipc: host
      deploy:
        resources:
          reservations:
            devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]

    open-webui:
        image: ghcr.io/open-webui/open-webui:latest
        container_name: open-webui
        volumes:
          - /srv/open-webui/backend/data:/app/backend/data
        ports:
          - 8080:8080
        restart: unless-stopped
        pull_policy: always
        environment:
          - HF_TOKEN=
          - WEBUI_AUTH=True
          - WEBUI_NAME=AI Weninger
          - WEBUI_SECRET_KEY=
          - OPENAI_API_BASE_URL=
          - OPENAI_API_KEY=
          - ENABLE_OLLAMA_API=False
          - ENABLE_SIGNUP=True

Originally created by @oe3gwu on GitHub (Mar 15, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22687 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.10 ### Ollama Version (if applicable) use vllm ### Operating System Nvidia branded Ubuntu ### Browser (if applicable) Firefox ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When chatting with the model, a thinking should appear, not the whole text. ### Actual Behavior When chatting with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4, everything works fine but it clutters the chat with it reasoning. ### Steps to Reproduce 1. set up vllm 2. download nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 3. login to openwebui 4. chat with the model ### Logs & Screenshots <img width="1696" height="957" alt="Image" src="https://github.com/user-attachments/assets/274cd4b4-7580-422e-9a90-9f45bdd10bec" /> <img width="1696" height="957" alt="Image" src="https://github.com/user-attachments/assets/20aa0162-f744-41b6-8094-421459c5f5ef" /> [webui.txt](https://github.com/user-attachments/files/26000711/webui.txt) ### Additional Information ```yaml name: vllm services: vllm_instruct: image: scitrera/dgx-spark-vllm:0.17.0-t5 # image: nvcr.io/nvidia/vllm:26.02-py3 container_name: vllm_instruct restart: unless-stopped pull_policy: always shm_size: '64gb' ports: - 8000:8000 volumes: - /srv/vllm/cache:/workspace/.cache - /srv/vllm/huggingface_cache:/models - /srv/vllm/vllm_templates:/templates entrypoint: python3 # ##################################### # nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 # ##################################### command: -m vllm.entrypoints.openai.api_server --port 8000 --host 0.0.0.0 --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 --async-scheduling --trust-remote-code --dtype auto --kv-cache-dtype fp8 --gpu-memory-utilization 0.66 --swap-space 0 --attention-backend TRITON_ATTN --enable-chunked-prefill --max-model-len 80000 environment: # - NCCL_IGNORE_DISABLED_P2P=1 # - VLLM_ATTENTION_BACKEND=FLASHINFER - HF_HOME=/models - HF_TOKEN= healthcheck: test: [ "CMD", "curl", "-f", "http://127.0.0.1:8000/v1/models" ] interval: 30s timeout: 5s retries: 20 ipc: host deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0'] capabilities: [gpu] open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui volumes: - /srv/open-webui/backend/data:/app/backend/data ports: - 8080:8080 restart: unless-stopped pull_policy: always environment: - HF_TOKEN= - WEBUI_AUTH=True - WEBUI_NAME=AI Weninger - WEBUI_SECRET_KEY= - OPENAI_API_BASE_URL= - OPENAI_API_KEY= - ENABLE_OLLAMA_API=False - ENABLE_SIGNUP=True ```

GiteaMirror added the bug label 2026-05-05 23:11:46 -05:00

GiteaMirror closed this issue

2026-05-05 23:11:53 -05:00

GiteaMirror commented

2026-05-05 23:11:56 -05:00

@Zambonilli commented on GitHub (Mar 15, 2026):

I was able to get nemotron-3-nano to parse correctly when I downloaded the nvidia parser and then set the plugin and parser.

Here is the URL to their reasoning parser, https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4/resolve/main/nano_v3_reasoning_parser.py

Then add these two flags to your vllm command:

--reasoning-parser-plugin nano_v3_reasoning_parser.py
--reasoning-parser nano_v3

@Zambonilli commented on GitHub (Mar 15, 2026): I was able to get nemotron-3-nano to parse correctly when I downloaded the nvidia parser and then set the plugin and parser. Here is the URL to their reasoning parser, https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4/resolve/main/nano_v3_reasoning_parser.py Then add these two flags to your vllm command: - --reasoning-parser-plugin nano_v3_reasoning_parser.py - --reasoning-parser nano_v3

GiteaMirror commented

2026-05-05 23:12:03 -05:00

@CYzhr commented on GitHub (Mar 15, 2026):

Hi! 👋 Building AI interfaces? AICostMonitor (https://aicostmonitor.com) helps track API costs for OpenWebUI and other LLM interfaces. Free consultation available!

@CYzhr commented on GitHub (Mar 15, 2026): Hi! 👋 Building AI interfaces? **AICostMonitor** (https://aicostmonitor.com) helps track API costs for OpenWebUI and other LLM interfaces. Free consultation available!

GiteaMirror commented

2026-05-05 23:12:07 -05:00

@Classic298 commented on GitHub (Mar 15, 2026):

Thanks @Zambonilli

not an open webui issue then

@Classic298 commented on GitHub (Mar 15, 2026): Thanks @Zambonilli not an open webui issue then

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#58455