Incorrect &#34;slow&#34; response message in Cache Configuration test #13947

@wxiaoguang commented on GitHub (Jan 10, 2025):

Related: Add cache test for admins #31265 @6543

You can safely ignore that message, and I still think that test should be removed from admin panel.

@wxiaoguang commented on GitHub (Jan 10, 2025): Related: Add cache test for admins #31265 @6543 You can safely ignore that message, and I still think that test should be removed from admin panel.

GiteaMirror commented

@H0llyW00dzZ commented on GitHub (Jan 10, 2025):

Related: Add cache test for admins #31265 @6543

You can safely ignore that message, and I still think that test should be removed from admin panel.

@wxiaoguang It seems that the implementation in PR #31265 is incorrect. The test measures performance in microseconds (μs), which is a local measurement. However, in real-world scenarios, it should start from milliseconds (ms). Basically, any response time under 30ms, even if it extends to microseconds, can still be considered fast or low latency.

@H0llyW00dzZ commented on GitHub (Jan 10, 2025): > Related: Add cache test for admins #31265 @6543 > > You can safely ignore that message, and I still think that test should be removed from admin panel. @wxiaoguang It seems that the implementation in PR #31265 is incorrect. The test measures performance in microseconds (μs), which is a local measurement. However, in real-world scenarios, it should start from milliseconds (ms). Basically, any response time under 30ms, even if it extends to microseconds, can still be considered fast or low latency.

GiteaMirror commented

@TheFox0x7 commented on GitHub (Jan 10, 2025):

I can fix this, but the question remains how:

Remove the test entirely per PR comments (Gitea won't start if the cache doesn't work so testing if it works is pointless)
Replace it with pure information about the timing (so no warning).
Change time to pass to 100ms so the test won't fail

Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird.

@TheFox0x7 commented on GitHub (Jan 10, 2025): I can fix this, but the question remains how: 1. Remove the test entirely per PR comments (Gitea won't start if the cache doesn't work so testing if it works is pointless) 2. Replace it with pure information about the timing (so no warning). 3. Change time to pass to 100ms so the test won't fail Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird.

GiteaMirror commented

@lunny commented on GitHub (Jan 10, 2025):

I prefer the solution 3.

@lunny commented on GitHub (Jan 10, 2025): I prefer the solution 3.

GiteaMirror commented

@H0llyW00dzZ commented on GitHub (Jan 11, 2025):

I can fix this, but the question remains how:

Remove the test entirely per PR comments (Gitea won't start if the cache doesn't work so testing if it works is pointless)

Replace it with pure information about the timing (so no warning).

Change time to pass to 100ms so the test won't fail

Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird.

@TheFox0x7
Nah, don't remove it. It's pretty useful for testing.

Another idea is to change the current implementation by adjusting the ratio. For example, if the latency is under 30ms, it can still be considered fast or low latency. However, if the latency goes around 50ms to 100ms, then it can be considered slow or high latency. This is because under 30ms is considered an average stable network latency.

@H0llyW00dzZ commented on GitHub (Jan 11, 2025): > I can fix this, but the question remains how: > > 1. Remove the test entirely per PR comments (Gitea won't start if the cache doesn't work so testing if it works is pointless) > 2. Replace it with pure information about the timing (so no warning). > 3. Change time to pass to 100ms so the test won't fail > > Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird. @TheFox0x7 Nah, don't remove it. It's pretty useful for testing. Another idea is to change the current implementation by adjusting the ratio. For example, if the latency is under 30ms, it can still be considered fast or low latency. However, if the latency goes around 50ms to 100ms, then it can be considered slow or high latency. This is because under 30ms is considered an average stable network latency.

GiteaMirror commented

@wxiaoguang commented on GitHub (Jan 11, 2025):

Removing it should be fine because:

Gitea instance already does cache check during startup, it doesn't make sense to "test" it again and again
For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

@wxiaoguang commented on GitHub (Jan 11, 2025): Removing it should be fine because: * Gitea instance already does cache check during startup, it doesn't make sense to "test" it again and again * For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

GiteaMirror commented

@H0llyW00dzZ commented on GitHub (Jan 11, 2025):

Removing it should be fine because:

Gitea instance already does cache check during startup, it doesn't make sense to "test" it again and again

For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

Cache check during startup is different from cache testing repeatedly. When you press test again and again, the results will always vary, sometimes ranging from low to high latency.

For example:

This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

@H0llyW00dzZ commented on GitHub (Jan 11, 2025): > Removing it should be fine because: > > * Gitea instance already does cache check during startup, it doesn't make sense to "test" it again and again > * For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay. Cache check during startup is different from cache testing repeatedly. When you press test again and again, the results will always vary, sometimes ranging from low to high latency. For example: ![image](https://github.com/user-attachments/assets/fe4039b7-6522-4ece-8f59-68e512d06346) ![image](https://github.com/user-attachments/assets/867ca482-4c77-4b38-93ff-5b12e73c59eb) ![image](https://github.com/user-attachments/assets/6054be9a-d04e-4079-891e-79b1dda7aa5d) This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

GiteaMirror commented

@wxiaoguang commented on GitHub (Jan 11, 2025):

This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

But:

For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

@wxiaoguang commented on GitHub (Jan 11, 2025): > This test provides a real-time check of the latency, which can be useful for monitoring the system's performance. But: > For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

GiteaMirror commented

@H0llyW00dzZ commented on GitHub (Jan 11, 2025):

This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

But:

For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

No, you don't really need something like tracing and profiling in this case.

In PR https://github.com/go-gitea/gitea/pull/31265, the code is measuring latency, not just reporting a single delay. The Test() function performs a series of cache operations and measures the total time taken to complete them:

const (
	testCacheKey       = "DefaultCache.TestKey"
	SlowCacheThreshold = 100 * time.Microsecond
)

func Test() (time.Duration, error) {
	if defaultCache == nil {
		return 0, fmt.Errorf("default cache not initialized")
	}

	testData := fmt.Sprintf("%x", make([]byte, 500))

	start := time.Now()

	if err := defaultCache.Delete(testCacheKey); err != nil {
		return 0, fmt.Errorf("expect cache to delete data based on key if exist but got: %w", err)
	}
	if err := defaultCache.Put(testCacheKey, testData, 10); err != nil {
		return 0, fmt.Errorf("expect cache to store data but got: %w", err)
	}
	testVal, hit := defaultCache.Get(testCacheKey)
	if !hit {
		return 0, fmt.Errorf("expect cache hit but got none")
	}
	if testVal != testData {
		return 0, fmt.Errorf("expect cache to return same value as stored but got other")
	}

	return time.Since(start), nil
}

@H0llyW00dzZ commented on GitHub (Jan 11, 2025): > > This test provides a real-time check of the latency, which can be useful for monitoring the system's performance. > > But: > > > For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay. No, you don't really need something like tracing and profiling in this case. In PR https://github.com/go-gitea/gitea/pull/31265, the code is measuring latency, not just reporting a single delay. The `Test()` function performs a series of cache operations and measures the total time taken to complete them: ```go const ( testCacheKey = "DefaultCache.TestKey" SlowCacheThreshold = 100 * time.Microsecond ) func Test() (time.Duration, error) { if defaultCache == nil { return 0, fmt.Errorf("default cache not initialized") } testData := fmt.Sprintf("%x", make([]byte, 500)) start := time.Now() if err := defaultCache.Delete(testCacheKey); err != nil { return 0, fmt.Errorf("expect cache to delete data based on key if exist but got: %w", err) } if err := defaultCache.Put(testCacheKey, testData, 10); err != nil { return 0, fmt.Errorf("expect cache to store data but got: %w", err) } testVal, hit := defaultCache.Get(testCacheKey) if !hit { return 0, fmt.Errorf("expect cache hit but got none") } if testVal != testData { return 0, fmt.Errorf("expect cache to return same value as stored but got other") } return time.Since(start), nil } ```

GiteaMirror commented

@wxiaoguang commented on GitHub (Jan 11, 2025):

No, you don't really need something like tracing and profiling in this case.

In PR #31265, the code is measuring latency, not just reporting a single delay. The Test() function performs a series of cache operations and measures the total time taken to complete them:

Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem?

@wxiaoguang commented on GitHub (Jan 11, 2025): > No, you don't really need something like tracing and profiling in this case. > > In PR #31265, the code is measuring latency, not just reporting a single delay. The `Test()` function performs a series of cache operations and measures the total time taken to complete them: Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem?

GiteaMirror commented

@H0llyW00dzZ commented on GitHub (Jan 11, 2025):

No, you don't really need something like tracing and profiling in this case.
In PR #31265, the code is measuring latency, not just reporting a single delay. The Test() function performs a series of cache operations and measures the total time taken to complete them:

Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem?

Of course, if the response times are accurate. However, the message Cache test successful, but response is slow: 6.725281ms can be misleading.

A latency of 6ms is generally considered fast or low latency, and it's well within acceptable performance standards. Labeling a response time under 30ms as "slow" might not be appropriate in most cases, as it can create confusion

@H0llyW00dzZ commented on GitHub (Jan 11, 2025): > > No, you don't really need something like tracing and profiling in this case. > > In PR #31265, the code is measuring latency, not just reporting a single delay. The `Test()` function performs a series of cache operations and measures the total time taken to complete them: > > Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem? Of course, if the response times are accurate. However, the message `Cache test successful, but response is slow: 6.725281ms` can be misleading. A latency of 6ms is generally considered fast or low latency, and it's well within acceptable performance standards. Labeling a response time under 30ms as "slow" might not be appropriate in most cases, as it can create confusion

GiteaMirror commented

@H0llyW00dzZ commented on GitHub (Jan 11, 2025):

So the PR #31265 only needs to improve this part:

elapsed, err := cache.Test()
if err != nil {
    ctx.Data["CacheError"] = err
} else if elapsed > cache.SlowCacheThreshold {
    ctx.Data["CacheSlow"] = fmt.Sprint(elapsed)
}

Change the SlowCacheThreshold to:

const SlowCacheThreshold = 30 * time.Millisecond

Then it might look like this:

elapsed, err := cache.Test()
if err != nil {
    ctx.Data["CacheError"] = err
} else if elapsed > cache.SlowCacheThreshold {
    ctx.Data["CacheSlow"] = fmt.Sprintf("Cache response is slow: %v", elapsed)
} else {
    ctx.Data["CacheFast"] = fmt.Sprintf("Cache response is fast: %v", elapsed)
}

@H0llyW00dzZ commented on GitHub (Jan 11, 2025): So the PR #31265 only needs to improve this part: ```go elapsed, err := cache.Test() if err != nil { ctx.Data["CacheError"] = err } else if elapsed > cache.SlowCacheThreshold { ctx.Data["CacheSlow"] = fmt.Sprint(elapsed) } ``` Change the `SlowCacheThreshold` to: ```go const SlowCacheThreshold = 30 * time.Millisecond ``` Then it might look like this: ```go elapsed, err := cache.Test() if err != nil { ctx.Data["CacheError"] = err } else if elapsed > cache.SlowCacheThreshold { ctx.Data["CacheSlow"] = fmt.Sprintf("Cache response is slow: %v", elapsed) } else { ctx.Data["CacheFast"] = fmt.Sprintf("Cache response is fast: %v", elapsed) } ```

GiteaMirror commented