feat: Limit the number of tokens when sending requests directly #4790

New Issue

GiteaMirror · 2025-11-11T16:03:14-06:00

GiteaMirror commented

2025-11-11 16:03:14 -06:00

Originally created by @shentong0722 on GitHub (Apr 10, 2025).

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

I'm using Groq's free plan, and Groq limits the maximum single input to 6000 tokens for most models. When my context is too long, sending it to Groq will result in an error.
I tried limiting the context refresh tokens and context tokens in the settings, but neither had any effect.

Desired Solution you'd like

Is it possible to perform context truncation directly when sending to the upstream server? I need this very much

Alternatives Considered

No response

Additional Context

No response

Originally created by @shentong0722 on GitHub (Apr 10, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description I'm using Groq's free plan, and Groq limits the maximum single input to 6000 tokens for most models. When my context is too long, sending it to Groq will result in an error. I tried limiting the context refresh tokens and context tokens in the settings, but neither had any effect. ### Desired Solution you'd like Is it possible to perform context truncation directly when sending to the upstream server? I need this very much ### Alternatives Considered _No response_ ### Additional Context _No response_

GiteaMirror closed this issue

2025-11-11 16:03:14 -06:00

GiteaMirror commented

2025-11-11 16:03:15 -06:00

@shentong0722 commented on GitHub (Apr 10, 2025):

找到了：
"""
title: Token Clip Filter
author: houxin
author_url: https://github.com/hx173149
funding_url: https://github.com/hx173149
version: 0.1
"""

from pydantic import BaseModel, Field
from typing import Optional

import tiktoken

class Filter:
class Valves(BaseModel):
priority: int = Field(
default=0, description="Priority level for the filter operations."
)
n_token_limit: int = Field(
default=7000, description="Number of token limit to retain."
)
pass

class UserValves(BaseModel):
    pass

def __init__(self):
    self.valves = self.Valves()
    self.encoding = tiktoken.get_encoding("cl100k_base")
    pass

def inlet(self, body: dict, __user__: Optional[dict] = None) -> dict:
    messages = body["messages"]
    sys_msgs = [message for message in messages if message.get("role") == "system"]
    if len(sys_msgs) > 0:
        sys_msg = sys_msgs[0]
    else:
        sys_msg = {"role": "system", "content": ""}
    token_len = len(self.encoding.encode(sys_msg["content"]))
    filter_messages = []
    remain_messages = [
        message for message in messages if message.get("role") != "system"
    ]
    for msg in remain_messages[::-1]:
        if (
            len(self.encoding.encode(msg["content"])) + token_len
            > self.valves.n_token_limit
        ):
            break
        filter_messages.append(msg)
        token_len += len(self.encoding.encode(msg["content"]))
    body["messages"] = [sys_msg] + filter_messages[::-1]
    return body

@shentong0722 commented on GitHub (Apr 10, 2025): 找到了： """ title: Token Clip Filter author: houxin author_url: https://github.com/hx173149 funding_url: https://github.com/hx173149 version: 0.1 """ from pydantic import BaseModel, Field from typing import Optional import tiktoken class Filter: class Valves(BaseModel): priority: int = Field( default=0, description="Priority level for the filter operations." ) n_token_limit: int = Field( default=7000, description="Number of token limit to retain." ) pass class UserValves(BaseModel): pass def __init__(self): self.valves = self.Valves() self.encoding = tiktoken.get_encoding("cl100k_base") pass def inlet(self, body: dict, __user__: Optional[dict] = None) -> dict: messages = body["messages"] sys_msgs = [message for message in messages if message.get("role") == "system"] if len(sys_msgs) > 0: sys_msg = sys_msgs[0] else: sys_msg = {"role": "system", "content": ""} token_len = len(self.encoding.encode(sys_msg["content"])) filter_messages = [] remain_messages = [ message for message in messages if message.get("role") != "system" ] for msg in remain_messages[::-1]: if ( len(self.encoding.encode(msg["content"])) + token_len > self.valves.n_token_limit ): break filter_messages.append(msg) token_len += len(self.encoding.encode(msg["content"])) body["messages"] = [sys_msg] + filter_messages[::-1] return body

GiteaMirror referenced this issue

2025-11-11 17:51:56 -06:00

[PR #4790] [MERGED] fix #8362

GiteaMirror referenced this issue

2025-11-11 17:52:01 -06:00

[PR #4803] [MERGED] fix: DeprecationWarning for datetime.utcnow() by using datetime.now(UTC) #8366

Sign in to join this conversation.