Introduces REDIS_HEALTH_CHECK_INTERVAL and wires it through to every
Redis client created by get_redis_connection (plain, cluster and
sentinel paths, sync and async). When set, redis-py will PING any
connection idle longer than the interval on checkout, so dead sockets
are surfaced as reconnectable errors before a real command lands on
them.
Defaults to unset (empty string) so existing deployments see no
behavioural change. Operators who want the protection should set it
shorter than the Redis server `timeout` setting and any firewall/LB
idle timeout on the path to Redis.
Co-authored-by: Claude <noreply@anthropic.com>
Introduces REDIS_SOCKET_KEEPALIVE and wires socket_keepalive=True
through to every Redis client created by get_redis_connection
(plain, cluster and sentinel paths, sync and async). When enabled,
the kernel sends TCP keepalive probes on idle connections so
half-closed sockets (e.g. after a silent firewall/LB reset or a NIC
flap) are detected before the next command lands on them and the
request never sees a "Connection reset by peer" error.
Defaults to off so existing deployments see no behavioural change.
Operators who want the protection set REDIS_SOCKET_KEEPALIVE=true
in their environment.
Co-authored-by: Claude <noreply@anthropic.com>
* fix(redis): honor REDIS_SOCKET_CONNECT_TIMEOUT on non-sentinel clients
Previously only the sentinel path passed REDIS_SOCKET_CONNECT_TIMEOUT
through to the Redis client. Plain redis:// and cluster URLs fell back
to redis-py's default (no explicit connect timeout), so a hung Redis
or a black-holed network path could stall the whole worker until the
kernel gave up. Forwarding the same env var to from_url()/RedisCluster
keeps the behavior consistent across all deployment topologies.
* fix(redis): gate socket_connect_timeout on is-not-None, not truthiness
Addresses review feedback: the truthiness check on REDIS_SOCKET_CONNECT_TIMEOUT
silently dropped an explicit 0 value and was inconsistent with the sentinel
construction path, which forwards the value directly. Switch to `is not None`
so any user-configured value (including 0) is passed through to from_url()
and RedisCluster.from_url().
---------
Co-authored-by: Claude <noreply@anthropic.com>
Every call to get_redis_connection() spawned a new pool, so workers slowly accumulated thousands of open sockets. Even though connections were eventually released, skewed release timing still pushed us past Redis’ max-clients and the cluster egress IP cap.
A module-level _CONNECTION_CACHE now memoises pools by (redis_url, sentinel_hosts, async_mode, decode_responses).
Result: flat connection count, no more IP or FD exhaustion. Public API unchanged.
Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com>
- Implement SentinelRedisProxy class with automatic master discovery
- Add retry logic for handling connection failures and read-only errors
- Support both async and sync Redis operations with Sentinel
- Ensure backward compatibility with existing Redis configurations
- Provide seamless failover during master node outages
This enhancement significantly improves system reliability by eliminating
single points of failure in Redis deployments and ensuring continuous
service availability during infrastructure issues.
Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com>