Commit Graph

12712 Commits

Author SHA1 Message Date
Sarthak Aggarwal faad628398
[Backport 8.0] Allow TCL 9.0 for tests (#1673) (#2813)
Makes our tests possible to run with TCL 9.

The latest Fedora now has TCL 9.0 and it's working now, including the
TCL TLS package. (This wasn't working earlier due to some packaging
errors for TCL packages in Fedora, which have been fixed now.)

This PR also removes the custom compilation of TCL 8 used in our Daily
jobs and uses the system default TCL version instead. The TCL version
depends on the OS. For the latest Fedora, you get 9.0, for macOS you get
8.5 and for most other OSes you get 8.6.

The checks for TCL 8.7 are removed, because 8.7 doesn't exist. It was
never released.

Backport of #1673.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-06 23:44:25 +01:00
Rain Valentine 1cac48f92d
Release notes for 8.0.6
updated release notes and version file

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
2025-10-03 13:27:36 -07:00
Madelyn Olson 23a0c5ecf7
Fix format issues with CVE fix (#2682)
The CVE fixes had a formatting and external test issue that wasn't
caught because private branches don't run those CI steps. 

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-10-03 10:33:18 -07:00
Madelyn Olson 5782873058
Merge commit from fork
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-10-03 06:32:24 -07:00
Binbin d5bf5bf8dd Minor fix for dual rdb channel connection conn error log (#2658)
This should be server.repl_rdb_transfer_s

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-10-01 16:04:34 +02:00
Zeroday BYTE 28228eaaa4 Fix unsigned difference expression compared to zero (#2101)
daea05b1e2/src/networking.c (L886-L886)

Fix the issue need to ensure that the subtraction `prev->size -
prev->used` does not underflow. This can be achieved by explicitly
checking that `prev->used` is less than `prev->size` before performing
the subtraction. This approach avoids relying on unsigned arithmetic and
ensures the logic is clear and robust.

The specific changes are:
1. Replace the condition `prev->size - prev->used > 0` with `prev->used
< prev->size`.
2. This change ensures that the logic checks whether there is remaining
space in the buffer without risking underflow.

**References**
[INT02-C. Understand integer conversion
rules](https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules)
[CWE-191](https://cwe.mitre.org/data/definitions/191.html)


---

Signed-off-by: Zeroday BYTE <github@zerodaysec.org>
2025-10-01 16:04:34 +02:00
Sarthak Aggarwal 4d7d157702
Fix accounting for dual channel RDB bytes in replication stats (#2602) (backport in #2616)
Backport of #2602. Resolves #2545.

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-09-23 12:59:19 +02:00
Nikhil Manglore 8310135e69 Add release notes for 8.0.5 and update version file
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 983e4866ad Fix pre-size hashtables per slot when reading RDB files (#2466)
When reading RDB files with information about the number of keys per cluster slot,
we need to create the dicts if they don't exist.

Currently, when processing RDB slot-info, our expand has no effect because the dict
does not exist (we initialize it only when we need it).

We also update kvstoreExpand to use the kvstoreDictExpand to make
sure there is only one code path. Also see #1199 for more details.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-08-22 15:30:45 +02:00
Ted Lyngmo ca45554a85 Fix assumptions that pthread functions set errno (#2526)
pthread functions return the error instead of setting errno.

Fixes #2525

Signed-off-by: Ted Lyngmo <ted@lyncon.se>
2025-08-22 15:30:45 +02:00
Harkrishn Patro 3dfcc504ac Converge divergent shard-id persisted in nodes.conf to primary's shard id (#2174)
Fixes #2171

Handle divergent shard-id across primary and replica from nodes.conf and
reconcile all the nodes in the shard to the primary node's shard-id.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin d462c7d977 Fix client tracking memory overhead calculation (#2360)
This should be + instread of *, otherwise it does not make any sense.
Otherwise we would have to calculate 20 more bytes for each prefix rax
node in 64 bits build.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Katie Holly d7bf1b57c5 Ensure empty error tables in scripts don't crash Valkey (#2229)
When calling the command `EVAL error{} 0`, Valkey crashes with the
following stack trace. This patch ensures we never leave the
`err_info.msg` field null when we fail to extract a proper error
message.

```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
2595901:M 18 Jun 2025 01:20:12.917 # valkey 8.1.2 crashed by signal: 11, si_code: 1
2595901:M 18 Jun 2025 01:20:12.917 # Accessing address: (nil)
2595901:M 18 Jun 2025 01:20:12.917 # Crashed running the instruction at: 0x726f8e57ed1d

------ STACK TRACE ------
EIP:
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]

2595905 bio_aof
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]

2595904 bio_close_file
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]

2595901 valkey-server *
/usr/lib/libc.so.6(+0x3def0) [0x726f8e44def0]
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
valkey-server *:6379(sdscatfmt+0x894) [0x6530abaa24a4]
valkey-server *:6379(luaCallFunction+0x39a) [0x6530abbc66ea]
valkey-server *:6379(+0x1a0992) [0x6530abbc6992]
valkey-server *:6379(scriptingEngineCallFunction+0x98) [0x6530abbc1298]
valkey-server *:6379(+0x11ff55) [0x6530abb45f55]
valkey-server *:6379(call+0x174) [0x6530aba94454]
valkey-server *:6379(processCommand+0x93d) [0x6530aba958dd]
valkey-server *:6379(processCommandAndResetClient+0x21) [0x6530abaa9d11]
valkey-server *:6379(processInputBuffer+0xe3) [0x6530abaaee83]
valkey-server *:6379(readQueryFromClient+0x65) [0x6530abaaef55]
valkey-server *:6379(+0x18e31a) [0x6530abbb431a]
valkey-server *:6379(aeProcessEvents+0x24a) [0x6530aba790ca]
valkey-server *:6379(aeMain+0x2d) [0x6530aba7938d]
valkey-server *:6379(main+0x3f6) [0x6530aba6e7b6]
/usr/lib/libc.so.6(+0x276b5) [0x726f8e4376b5]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x726f8e437769]
valkey-server *:6379(_start+0x25) [0x6530aba70235]

2595906 bio_lazy_free
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]

4/4 expected stacktraces.

------ STACK TRACE DONE ------

------ REGISTERS ------
2595901:M 18 Jun 2025 01:20:12.920 #
RAX:0000000000000000 RBX:0000726f8dd35663
RCX:0000000000000000 RDX:0000000000000000
RDI:0000000000000000 RSI:0000000000000010
RBP:00007ffc2b821a80 RSP:00007ffc2b821938
R8 :000000000000000c R9 :00006530abc111b8
R10:0000000000000001 R11:0000000000000003
R12:00006530abc49adc R13:00006530abc111b7
R14:0000000000000001 R15:0000000000000001
RIP:0000726f8e57ed1d EFL:0000000000010283
CSGSFS:002b000000000033
2595901:M 18 Jun 2025 01:20:12.921 * hide-user-data-from-log is on, skip logging stack content to avoid spilling user data.

------ INFO OUTPUT ------
redis_version:7.2.4
server_name:valkey
valkey_version:8.1.2
valkey_release_stage:ga
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:38d65aa7b4148d2c
server_mode:standalone
os:Linux 6.14.6-arch1-1 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
gcc_version:15.1.1
process_id:2595901
process_supervised:no
run_id:a0b75f67a217a81142f17553028c010e86c1ee80
tcp_port:6379
server_time_usec:1750209612917634
uptime_in_seconds:16
uptime_in_days:0
hz:10
configured_hz:10
clients_hz:10
lru_clock:5379148
executable:/home/fusl/valkey-server
config_file:
io_threads_active:0
availability_zone:
listener0:name=tcp,bind=*,bind=-::*,port=6379

connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
paused_reason:none
paused_actions:none
paused_timeout_milliseconds:0

used_memory:911824
used_memory_human:890.45K
used_memory_rss:15323136
used_memory_rss_human:14.61M
used_memory_peak:911824
used_memory_peak_human:890.45K
used_memory_peak_perc:100.29%
used_memory_overhead:892232
used_memory_startup:891824
used_memory_dataset:19592
used_memory_dataset_perc:97.96%
allocator_allocated:1845952
allocator_active:1986560
allocator_resident:6672384
allocator_muzzy:0
total_system_memory:67323842560
total_system_memory_human:62.70G
used_memory_lua:34816
used_memory_vm_eval:34816
used_memory_lua_human:34.00K
used_memory_scripts_eval:184
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:68608
used_memory_vm_total_human:67.00K
used_memory_functions:224
used_memory_scripts:408
used_memory_scripts_human:408B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:0
allocator_rss_ratio:3.36
allocator_rss_bytes:4685824
rss_overhead_ratio:2.30
rss_overhead_bytes:8650752
mem_fragmentation_ratio:17.18
mem_fragmentation_bytes:14431168
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1750209596
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

total_connections_received:1
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:34
total_net_output_bytes:0
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:1
total_writes_processed:0
io_threaded_reads_processed:0
io_threaded_writes_processed:0
io_threaded_freed_objects:0
io_threaded_accept_processed:0
io_threaded_poll_processed:0
io_threaded_total_prefetch_batches:0
io_threaded_total_prefetch_entries:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:0
reply_buffer_expands:0
eventloop_cycles:170
eventloop_duration_sum:17739
eventloop_duration_cmd_sum:0
instantaneous_eventloop_cycles_per_sec:9
instantaneous_eventloop_duration_usec:99
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

role:master
connected_slaves:0
replicas_waiting_psync:0
master_failover_state:no-failover
master_replid:d35a0bb7979f490a60174bb363524431d7eb2428
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:10485760
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

used_cpu_sys:0.012543
used_cpu_user:0.016853
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.012440
used_cpu_user_main_thread:0.016714

cluster_enabled:0

------ CLIENT LIST OUTPUT ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0

------ CURRENT CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes

------ EXECUTING CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
repl-diskless-load disabled
debug-context ""
sanitize-dump-payload no
lazyfree-lazy-user-del yes
lazyfree-lazy-server-del yes
import-mode no
lazyfree-lazy-user-flush yes
list-compress-depth 0
dual-channel-replication-enabled no
repl-diskless-sync yes
activedefrag no
lazyfree-lazy-expire yes
io-threads 1
replica-read-only yes
client-query-buffer-limit 1gb
slave-read-only yes
lazyfree-lazy-eviction yes
proto-max-bulk-len 512mb

------ FAST MEMORY TEST ------
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #0 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #1 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #2 terminated
*** Preparing to test memory region 6530abce2000 (212992 bytes)
*** Preparing to test memory region 726f8af7f000 (2621440 bytes)
*** Preparing to test memory region 726f8b200000 (8388608 bytes)
*** Preparing to test memory region 726f8ba00000 (4194304 bytes)
*** Preparing to test memory region 726f8bffe000 (8388608 bytes)
*** Preparing to test memory region 726f8c7ff000 (8388608 bytes)
*** Preparing to test memory region 726f8d000000 (8388608 bytes)
*** Preparing to test memory region 726f8dc00000 (4194304 bytes)
*** Preparing to test memory region 726f8e290000 (16384 bytes)
*** Preparing to test memory region 726f8e3d2000 (20480 bytes)
*** Preparing to test memory region 726f8e5f8000 (32768 bytes)
*** Preparing to test memory region 726f8eb58000 (12288 bytes)
*** Preparing to test memory region 726f8eb5c000 (16384 bytes)
*** Preparing to test memory region 726f8ed63000 (4096 bytes)
*** Preparing to test memory region 726f8eef2000 (397312 bytes)
*** Preparing to test memory region 726f8efc7000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /usr/lib/libc.so.6 (base 0x726f8e410000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------

=== VALKEY BUG REPORT END. Make sure to include from START to END. ===
```

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Viktor Söderqvist 9199dc1848 Fix defrag timeouts in 8.0
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Ran Shidlansik 85c2286f6f update build-debian-old to use Bullseye instead of EOL buster (#2345)
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Viktor Söderqvist 21cfd5427d Disable clang-format in 8.0 to simplify backporting
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Sarthak Aggarwal 599dd5daee Try to stabilize aof test (#2399)
Based on @enjoy-binbin's suggestion on #1611, I made the change to find
the available port. The test has been passing in the daily tests in my
local repo.

Resolved #1611

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Viktor Söderqvist c69b819d02 Test coverage for ECHO for reply schema validation (#1549)
After #1545 disabled some tests for reply schema validation, we now have
another issue that ECHO is not covered.

```
WARNING! The following commands were not hit at all:
  echo
ERROR! at least one command was not hit by the tests
```

This patch adds a test case for ECHO in the unit/other test suite. I
haven't checked if there are more commands that aren't covered.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin c7d08ad674 Avoid timing issue in diskless-load-swapdb test (#1077)
Since we paused the primary node earlier, the replica may enter
cluster down due to primary node pfail. Here set allow read to
prevent subsequent read errors.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Madelyn Olson 50388d8c9b Fix undefined behavior defined by ASAN (#1451)
Asan now supports making sure you are passing in the correct pointer
type, which seems useful but we can't support it since we pass in an
incorrect pointer in several places. This is most commonly done with
generic free functions, where we simply cast it to the correct type.

It's not a lot of code to clean up, so it seems appropriate to cleanup
instead of disabling the check.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
zvi-code 7fa6df671d Disable lazy free in defrag test to fix 32bit daily failure (#1370)
Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com>
Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Viktor Söderqvist 3fd70e6c12 Skip CLI tests with reply schema validation (#1545)
The commands used in valkey-cli tests are not important the reply schema
validation. Skip them to avoid the problem if tests hanging. This has
failed lately in the daily job:

```
[TIMEOUT]: clients state report follows.
sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription
Killing still running Valkey server 33357
```

These test cases use a special valkey-cli command `:get pubsub` command,
which is an internal command to valkey-cli rather than a Valkey server
command. This command hangs when compiled with with logreqres enabled.
Easy solution is to skip the tests in this setup.

The test cases were introduced in #1432.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 7589a18b02 Fix the election was reset wrongly before failover epoch was obtained (#1339)
After #1009, we will reset the election when we received
a claim with an equal or higher epoch since a node can win
an election in the past.

But we need to consider the time before the node actually
obtains the failover_auth_epoch. The failover_auth_epoch
default is 0, so before the node actually get the failover
epoch, we might wrongly reset the election.

This is probably harmless, but will produce misleading log
output and may delay election by a cron cycle or beforesleep.
Now we will only reset the election when a node is actually
obtains the failover epoch.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 6a7b806090 Fix empty primary may have dirty slots data due to bad migration (#1285)
If we become an empty primary for some reason, we still need to
check if we need to delete dirty slots, because we may have dirty
slots data left over from a bad migration. Like the target node forcibly
executes CLUSTER SETSLOT NODE to take over the slot without
performing key migration.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Vitaly Arbuzov 21946e6ec0 Add retries on CLUSTERDOWN in tests and fix sanitizer error
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Vitaly Arbuzov 2f636a0169 fix uninitialized variable warning
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Vitaly Arbuzov 84f2fb96da formatting
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Vitaly Arbuzov adf7328bbc Fixes
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 18dd441018 Make manual failover reset the on-going election to promote failover (#1274)
If a manual failover got timed out, like the election don't get the
enough votes, since we have a auth_timeout and a auth_retry_time, a
new manual failover will not be able to proceed on the replica side.

Like if we initiate a new manual failover after a election timed out,
we will pause the primary, but on the replica side, due to retry_time,
replica does not trigger the new election and the manual failover will
eventually time out.

In this case, if we initiate manual failover again and there is an
ongoing election, we will reset it so that the replica can initiate
a new election at the manual failover's request.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 056615de22 Trigger the election as soon as possible when doing a forced manual failover (#1067)
In CLUSTER FAILOVER FORCE case, we will set mf_can_start to
1 and wait for a cron to trigger the election. We can also set a
CLUSTER_TODO_HANDLE_MANUALFAILOVER flag so that we
can start the election as soon as possible instead of waiting for
the cron, so that we won't have a 100ms delay (clusterCron).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin d1ddd6a812 Minor log fixes when failover auth denied due to slot epoch (#1341)
The old reqEpoch mainly refers to requestCurrentEpoch, see:
```
    if (requestCurrentEpoch < server.cluster->currentEpoch) {
        serverLog(LL_WARNING, "Failover auth denied to %.40s (%s): reqEpoch (%llu) < curEpoch(%llu)", node->name,
                  node->human_nodename, (unsigned long long)requestCurrentEpoch,
                  (unsigned long long)server.cluster->currentEpoch);
        return;
    }
```

And in here we refer to requestConfigEpoch, it's a bit misleading,
so change it to reqConfigEpoch to make it clear.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Madelyn Olson 3021252575 Add a filter option to drop all cluster packets (#1252)
A minor debugging change that helped in the investigation of
https://github.com/valkey-io/valkey/issues/1251. Basically there are
some edge cases where we want to fully isolate a note from receiving
packets, but can't suspend the process because we need it to continue
sending outbound traffic. So, added a filter for that.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 092c9657cc Add packet-drop to fix the new flaky failover test (#2196)
The new test was added in #2178, obviously there may be
pending reads in the connection, so there may be a race
in the DROP-CLUSTER-PACKET-FILTER part causing the test
to fail. Add CLOSE-CLUSTER-LINK-ON-PACKET-DROP to ensure
that the replica does not process the packet.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 2019337e74)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 5420531da4 Fix cluster myself CLUSTER SLOTS/NODES wrong port after updating port/tls-port (#2186)
When modifying port or tls-port through config set, we need to call
clusterUpdateMyselfAnnouncedPorts to update myself's port, otherwise
CLUSTER SLOTS/NODES will be old information from myself's perspective.

In addition, in some places, such as clusterUpdateMyselfAnnouncedPorts
and clusterUpdateMyselfIp, beforeSleep save is added so that the
new ip info can be updated to nodes.conf.

Remove clearCachedClusterSlotsResponse in updateClusterAnnouncedPort
since now we add beforeSleep save in clusterUpdateMyselfAnnouncedPorts,
and it will call clearCachedClusterSlotsResponse.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 5cc2b25753)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin d72d0bc34c Fix replica can't finish failover when config epoch is outdated (#2178)
When the primary changes the config epoch and then down immediately,
the replica may not update the config epoch in time. Although we will
broadcast the change in cluster (see #1813), there may be a race in
the network or in the code. In this case, the replica will never finish
the failover since other primaries will refuse to vote because the
replica's slot config epoch is old.

We need a way to allow the replica can finish the failover in this case.

When the primary refuses to vote because the replica's config epoch is
less than the dead primary's config epoch, it can send an UPDATE packet
to the replica to inform the replica about the dead primary. The UPDATE
message contains information about the dead primary's config epoch and
owned slots. The failover will time out, but later the replica can try
again with the updated config epoch and it can succeed.

Fixes #2169.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
(cherry picked from commit 476671be19)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Madelyn Olson 242501ebeb Incorporate Redis CVE for CVE-2025-27151 (#2146)
Resolves https://github.com/valkey-io/valkey/issues/2145

Incorporate the CVE patch that was sent to us by Redis Ltd.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
(cherry picked from commit 73696bf6e2)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Madelyn Olson 6af03a0309 Correctly cast the extension lengths (#2144)
Correctly use a 32 bit integer for accumulating the length of ping
extensions.

The current code may accidentally truncate the length of an
extension that is greater than 64kb and fail the validation check. We
don't currently emit any extensions that are this length, but if we were
to do so in the future we might have issues with older nodes (without
this fix) will silently drop packets from newer nodes. We should
backport this to all versions.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
(cherry picked from commit 30d7f08a4e)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Viktor Söderqvist 5082ec6ae5 Detect SSL_new() returning NULL in outgoing connections (#2140)
When creating an outgoing TLS connection, we don't check if `SSL_new()`
returned NULL.

Without this patch, the check was done only for incoming connections in
`connCreateAcceptedTLS()`. This patch moves the check to
`createTLSConnection()` which is used both for incoming and outgoing
connections.

This check makes sure we fail the connection before going any further,
e.g. when `connCreate()` is followed by `connConnect()`, the latter
returns `C_ERR` which is commonly detected where outgoing connections
are established, such where a replica connects to a primary.

```c
int connectWithPrimary(void) {
    server.repl_transfer_s = connCreate(connTypeOfReplication());
    if (connConnect(server.repl_transfer_s, server.primary_host, server.primary_port, server.bind_source_addr,
                    server.repl_mptcp, syncWithPrimary) == C_ERR) {
        serverLog(LL_WARNING, "Unable to connect to PRIMARY: %s", connGetLastError(server.repl_transfer_s));
        connClose(server.repl_transfer_s);
        server.repl_transfer_s = NULL;
        return C_ERR;
    }
```

For a more thorough explanation, see
https://github.com/valkey-io/valkey/issues/1939#issuecomment-2912177877.

Might fix #1939.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
(cherry picked from commit 17e66863a5)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Binbin 3463206d4a CLIENT UNBLOCK should't be able to unpause paused clients (#2117)
When a client is blocked by something like `CLIENT PAUSE`, we should not
allow `CLIENT UNBLOCK timeout` to unblock it, since some blocking types
does not has the timeout callback, it will trigger a panic in the core,
people should use `CLIENT UNPAUSE` to unblock it.

Also using `CLIENT UNBLOCK error` is not right, it will return a UNBLOCKED
error to the command, people don't expect a `SET` command to get an error.

So in this commit, in these cases, we will return 0 to `CLIENT UNBLOCK`
to indicate the unblock is fail. The reason is that we assume that if
a command doesn't expect to be timedout, it also doesn't expect to be
unblocked by `CLIENT UNBLOCK`.

The old behavior of the following command will trigger panic in timeout
and get UNBLOCKED error in error. Under the new behavior, client unblock
will get the result of 0.
```
client 1> client pause 100000 write
client 2> set x x

client 1> client unblock 2 timeout
or
client 1> client unblock 2 error
```

Potentially breaking change, previously allowed `CLIENT UNBLOCK error`.
Fixes #2111.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 3bc40be6cd)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Jacob Murphy 73e817d578 Free module context even if there was no content written in auxsave2 (#2132)
When a module saves to the Aux metadata with aux_save2 callback, the
module context is not freed if the module didn't save anything in the
callback.

https://github.com/valkey-io/valkey/blob/8.1.1/src/rdb.c#L1277-L1284.
Note that we return 0, however we should be doing the cleanup done on
https://github.com/valkey-io/valkey/blob/8.1.1/src/rdb.c#L1300-L1303
still.

Fixes #2125

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
(cherry picked from commit 258213ff7e)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
uriyage 6c5b0a02b6 Fix bad slot used in sharded pubsub unsubscribe (#2137)
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
(cherry picked from commit bd5dcb2819)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Ran Shidlansik 02d5f6cb70 Only mark the client reprocessing flag when unblocked on keys (#2109)
When we refactored the blocking framework we introduced the client
reprocessing infrastructure. In cases the client was blocked on keys, it
will attempt to reprocess the command. One challenge was to keep track
of the command timeout, since we are reprocessing and do not want to
re-register the client with a fresh timeout each time. The solution was
to consider the client reprocessing flag when the client is
blockedOnKeys:

```
if (!c->flag.reprocessing_command) {
        /* If the client is re-processing the command, we do not set the timeout
         * because we need to retain the client's original timeout. */
        c->bstate->timeout = timeout;
    }
```

However, this introduced a new issue. There are cases where the client
will consecutive blocking of different types for example:
```
CLIENT PAUSE 10000 ALL
BZPOPMAX zset 1
```
would have the client blocked on the zset endlessly if nothing will be
written to it.

**Credits to @uriyage for locating this with his fuzzer testing**

The suggested solution is to only flag the client when it is
specifically unblocked on keys.

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit d00fb8e713)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-08-22 15:30:45 +02:00
Ran Shidlansik f77cbfbe11 Release notes for 8.0.4
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2025-07-07 10:22:40 +03:00
Ran Shidlansik 25d3c19ffa retry accept on transient errors (CVE-2025-48367) (#2315)
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2025-07-07 10:22:40 +03:00
Madelyn Olson 8235d12277 Apply fixed for CVE-2025-32023 (#2314)
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2025-07-07 10:22:40 +03:00
Jacob Murphy 6917ea8f8d
Release notes for 8.0.3 (#1998)
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2025-04-23 13:41:26 -07:00
uriyage 507c9eb73e
Add output buffer limiting for unauthenticated clients (#1993)
This commit introduces a mechanism to track client authentication state
with a new `ever_authenticated` flag. It refactors client authentication
handling by adding a `clientSetUser` function that properly sets both
the authenticated and `ever_authenticated` flags.

The implementation limits output buffer size for clients that have never
been authenticated.

Added tests to verify the output buffer limiting behavior for
unauthenticated clients.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-04-23 22:30:56 +02:00
Madelyn Olson ddca1c457e Remove readability refactor for failover auth to fix clang warning (#1481)
As part of #1463, I made a small refactor between the PR and the daily
test I submitted to try to improve readability by adding a function to
abstract the extraction of the message types. However, that change
apparently caused GCC to throw another warning, so reverting the
abstraction on just one line.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2025-04-23 20:59:38 +02:00
Harkrishn Patro a3a137bf45 Update upload artifacts to v4 (#1539)
Fixes #1538

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2025-04-23 20:59:38 +02:00
Vitah Lin ddbd1c2052 Fix CI latest fedora: install awk and tcl8, build tcltls from source (#1965)
Fix two problems in fedora CI jobs:

1. Install awk where missing. It's required for building jemalloc.
2. Fix problems with TCL, required for running the tests. Fedora comes
   with TCL 9 by default, but the TLS package 'tcltls' isn't built for TCL 9.
   Install 'tcl8' and build 'tcltls' from source.

---------

Signed-off-by: vitah <vitahlin@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2025-04-23 20:59:38 +02:00