valkey

Commit Graph

Author	SHA1	Message	Date
Sarthak Aggarwal	faad628398	[Backport 8.0] Allow TCL 9.0 for tests (#1673 ) (#2813 ) Makes our tests possible to run with TCL 9. The latest Fedora now has TCL 9.0 and it's working now, including the TCL TLS package. (This wasn't working earlier due to some packaging errors for TCL packages in Fedora, which have been fixed now.) This PR also removes the custom compilation of TCL 8 used in our Daily jobs and uses the system default TCL version instead. The TCL version depends on the OS. For the latest Fedora, you get 9.0, for macOS you get 8.5 and for most other OSes you get 8.6. The checks for TCL 8.7 are removed, because 8.7 doesn't exist. It was never released. Backport of #1673. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-11-06 23:44:25 +01:00
Rain Valentine	1cac48f92d	Release notes for 8.0.6 updated release notes and version file Signed-off-by: Rain Valentine <rsg000@gmail.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>	2025-10-03 13:27:36 -07:00
Madelyn Olson	23a0c5ecf7	Fix format issues with CVE fix (#2682 ) The CVE fixes had a formatting and external test issue that wasn't caught because private branches don't run those CI steps. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2025-10-03 10:33:18 -07:00
Madelyn Olson	5782873058	Merge commit from fork Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2025-10-03 06:32:24 -07:00
Binbin	d5bf5bf8dd	Minor fix for dual rdb channel connection conn error log (#2658 ) This should be server.repl_rdb_transfer_s Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-10-01 16:04:34 +02:00
Zeroday BYTE	28228eaaa4	Fix unsigned difference expression compared to zero (#2101 ) `daea05b1e2/src/networking.c (L886-L886)` Fix the issue need to ensure that the subtraction `prev->size - prev->used` does not underflow. This can be achieved by explicitly checking that `prev->used` is less than `prev->size` before performing the subtraction. This approach avoids relying on unsigned arithmetic and ensures the logic is clear and robust. The specific changes are: 1. Replace the condition `prev->size - prev->used > 0` with `prev->used < prev->size`. 2. This change ensures that the logic checks whether there is remaining space in the buffer without risking underflow. References [INT02-C. Understand integer conversion rules](https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules) [CWE-191](https://cwe.mitre.org/data/definitions/191.html) --- Signed-off-by: Zeroday BYTE <github@zerodaysec.org>	2025-10-01 16:04:34 +02:00
Sarthak Aggarwal	4d7d157702	Fix accounting for dual channel RDB bytes in replication stats (#2602 ) (backport in #2616 ) Backport of #2602. Resolves #2545. --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-09-23 12:59:19 +02:00
Nikhil Manglore	8310135e69	Add release notes for 8.0.5 and update version file Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Nikhil Manglore <nmanglor@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	983e4866ad	Fix pre-size hashtables per slot when reading RDB files (#2466 ) When reading RDB files with information about the number of keys per cluster slot, we need to create the dicts if they don't exist. Currently, when processing RDB slot-info, our expand has no effect because the dict does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreDictExpand to make sure there is only one code path. Also see #1199 for more details. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-22 15:30:45 +02:00
Ted Lyngmo	ca45554a85	Fix assumptions that pthread functions set errno (#2526 ) pthread functions return the error instead of setting errno. Fixes #2525 Signed-off-by: Ted Lyngmo <ted@lyncon.se>	2025-08-22 15:30:45 +02:00
Harkrishn Patro	3dfcc504ac	Converge divergent shard-id persisted in nodes.conf to primary's shard id (#2174 ) Fixes #2171 Handle divergent shard-id across primary and replica from nodes.conf and reconcile all the nodes in the shard to the primary node's shard-id. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	d462c7d977	Fix client tracking memory overhead calculation (#2360 ) This should be + instread of *, otherwise it does not make any sense. Otherwise we would have to calculate 20 more bytes for each prefix rax node in 64 bits build. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Katie Holly	d7bf1b57c5	Ensure empty error tables in scripts don't crash Valkey (#2229 ) When calling the command `EVAL error{} 0`, Valkey crashes with the following stack trace. This patch ensures we never leave the `err_info.msg` field null when we fail to extract a proper error message. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 2595901:M 18 Jun 2025 01:20:12.917 # valkey 8.1.2 crashed by signal: 11, si_code: 1 2595901:M 18 Jun 2025 01:20:12.917 # Accessing address: (nil) 2595901:M 18 Jun 2025 01:20:12.917 # Crashed running the instruction at: 0x726f8e57ed1d ------ STACK TRACE ------ EIP: /usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d] 2595905 bio_aof /usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22] /usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda] /usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c] /usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e] valkey-server :6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4] /usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb] /usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c] 2595904 bio_close_file /usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22] /usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda] /usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c] /usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e] valkey-server :6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4] /usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb] /usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c] 2595901 valkey-server * /usr/lib/libc.so.6(+0x3def0) [0x726f8e44def0] /usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d] valkey-server :6379(sdscatfmt+0x894) [0x6530abaa24a4] valkey-server :6379(luaCallFunction+0x39a) [0x6530abbc66ea] valkey-server :6379(+0x1a0992) [0x6530abbc6992] valkey-server :6379(scriptingEngineCallFunction+0x98) [0x6530abbc1298] valkey-server :6379(+0x11ff55) [0x6530abb45f55] valkey-server :6379(call+0x174) [0x6530aba94454] valkey-server :6379(processCommand+0x93d) [0x6530aba958dd] valkey-server :6379(processCommandAndResetClient+0x21) [0x6530abaa9d11] valkey-server :6379(processInputBuffer+0xe3) [0x6530abaaee83] valkey-server :6379(readQueryFromClient+0x65) [0x6530abaaef55] valkey-server :6379(+0x18e31a) [0x6530abbb431a] valkey-server :6379(aeProcessEvents+0x24a) [0x6530aba790ca] valkey-server :6379(aeMain+0x2d) [0x6530aba7938d] valkey-server :6379(main+0x3f6) [0x6530aba6e7b6] /usr/lib/libc.so.6(+0x276b5) [0x726f8e4376b5] /usr/lib/libc.so.6(__libc_start_main+0x89) [0x726f8e437769] valkey-server :6379(_start+0x25) [0x6530aba70235] 2595906 bio_lazy_free /usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22] /usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda] /usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c] /usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e] valkey-server :6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4] /usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb] /usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c] 4/4 expected stacktraces. ------ STACK TRACE DONE ------ ------ REGISTERS ------ 2595901:M 18 Jun 2025 01:20:12.920 # RAX:0000000000000000 RBX:0000726f8dd35663 RCX:0000000000000000 RDX:0000000000000000 RDI:0000000000000000 RSI:0000000000000010 RBP:00007ffc2b821a80 RSP:00007ffc2b821938 R8 :000000000000000c R9 :00006530abc111b8 R10:0000000000000001 R11:0000000000000003 R12:00006530abc49adc R13:00006530abc111b7 R14:0000000000000001 R15:0000000000000001 RIP:0000726f8e57ed1d EFL:0000000000010283 CSGSFS:002b000000000033 2595901:M 18 Jun 2025 01:20:12.921 * hide-user-data-from-log is on, skip logging stack content to avoid spilling user data. ------ INFO OUTPUT ------ redis_version:7.2.4 server_name:valkey valkey_version:8.1.2 valkey_release_stage:ga redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:38d65aa7b4148d2c server_mode:standalone os:Linux 6.14.6-arch1-1 x86_64 arch_bits:64 monotonic_clock:POSIX clock_gettime multiplexing_api:epoll gcc_version:15.1.1 process_id:2595901 process_supervised:no run_id:a0b75f67a217a81142f17553028c010e86c1ee80 tcp_port:6379 server_time_usec:1750209612917634 uptime_in_seconds:16 uptime_in_days:0 hz:10 configured_hz:10 clients_hz:10 lru_clock:5379148 executable:/home/fusl/valkey-server config_file: io_threads_active:0 availability_zone: listener0:name=tcp,bind=,bind=-::,port=6379 connected_clients:1 cluster_connections:0 maxclients:10000 client_recent_max_input_buffer:0 client_recent_max_output_buffer:0 blocked_clients:0 tracking_clients:0 pubsub_clients:0 watching_clients:0 clients_in_timeout_table:0 total_watched_keys:0 total_blocking_keys:0 total_blocking_keys_on_nokey:0 paused_reason:none paused_actions:none paused_timeout_milliseconds:0 used_memory:911824 used_memory_human:890.45K used_memory_rss:15323136 used_memory_rss_human:14.61M used_memory_peak:911824 used_memory_peak_human:890.45K used_memory_peak_perc:100.29% used_memory_overhead:892232 used_memory_startup:891824 used_memory_dataset:19592 used_memory_dataset_perc:97.96% allocator_allocated:1845952 allocator_active:1986560 allocator_resident:6672384 allocator_muzzy:0 total_system_memory:67323842560 total_system_memory_human:62.70G used_memory_lua:34816 used_memory_vm_eval:34816 used_memory_lua_human:34.00K used_memory_scripts_eval:184 number_of_cached_scripts:1 number_of_functions:0 number_of_libraries:0 used_memory_vm_functions:33792 used_memory_vm_total:68608 used_memory_vm_total_human:67.00K used_memory_functions:224 used_memory_scripts:408 used_memory_scripts_human:408B maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.00 allocator_frag_bytes:0 allocator_rss_ratio:3.36 allocator_rss_bytes:4685824 rss_overhead_ratio:2.30 rss_overhead_bytes:8650752 mem_fragmentation_ratio:17.18 mem_fragmentation_bytes:14431168 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_total_replication_buffers:0 mem_clients_slaves:0 mem_clients_normal:0 mem_cluster_links:0 mem_aof_buffer:0 mem_allocator:jemalloc-5.3.0 mem_overhead_db_hashtable_rehashing:0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:0 loading:0 async_loading:0 current_cow_peak:0 current_cow_size:0 current_cow_size_age:0 current_fork_perc:0.00 current_save_keys_processed:0 current_save_keys_total:0 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1750209596 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_saves:0 rdb_last_cow_size:0 rdb_last_load_keys_expired:0 rdb_last_load_keys_loaded:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_rewrites:0 aof_rewrites_consecutive_failures:0 aof_last_write_status:ok aof_last_cow_size:0 module_fork_in_progress:0 module_fork_last_cow_size:0 total_connections_received:1 total_commands_processed:0 instantaneous_ops_per_sec:0 total_net_input_bytes:34 total_net_output_bytes:0 total_net_repl_input_bytes:0 total_net_repl_output_bytes:0 instantaneous_input_kbps:0.00 instantaneous_output_kbps:0.00 instantaneous_input_repl_kbps:0.00 instantaneous_output_repl_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:0 evicted_keys:0 evicted_clients:0 evicted_scripts:0 total_eviction_exceeded_time:0 current_eviction_exceeded_time:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 pubsubshard_channels:0 latest_fork_usec:0 total_forks:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 total_active_defrag_time:0 current_active_defrag_time:0 tracking_total_keys:0 tracking_total_items:0 tracking_total_prefixes:0 unexpected_error_replies:0 total_error_replies:0 dump_payload_sanitizations:0 total_reads_processed:1 total_writes_processed:0 io_threaded_reads_processed:0 io_threaded_writes_processed:0 io_threaded_freed_objects:0 io_threaded_accept_processed:0 io_threaded_poll_processed:0 io_threaded_total_prefetch_batches:0 io_threaded_total_prefetch_entries:0 client_query_buffer_limit_disconnections:0 client_output_buffer_limit_disconnections:0 reply_buffer_shrinks:0 reply_buffer_expands:0 eventloop_cycles:170 eventloop_duration_sum:17739 eventloop_duration_cmd_sum:0 instantaneous_eventloop_cycles_per_sec:9 instantaneous_eventloop_duration_usec:99 acl_access_denied_auth:0 acl_access_denied_cmd:0 acl_access_denied_key:0 acl_access_denied_channel:0 role:master connected_slaves:0 replicas_waiting_psync:0 master_failover_state:no-failover master_replid:d35a0bb7979f490a60174bb363524431d7eb2428 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:10485760 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0 used_cpu_sys:0.012543 used_cpu_user:0.016853 used_cpu_sys_children:0.000000 used_cpu_user_children:0.000000 used_cpu_sys_main_thread:0.012440 used_cpu_user_main_thread:0.016714 cluster_enabled:0 ------ CLIENT LIST OUTPUT ------ id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=redacted age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=redacted redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0 ------ CURRENT CLIENT INFO ------ id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=redacted age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=redacted redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0 argc: 3 argv[0]: "eval" argv[1]: 7 bytes argv[2]: 1 bytes ------ EXECUTING CLIENT INFO ------ id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=redacted age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=redacted redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0 argc: 3 argv[0]: "eval" argv[1]: 7 bytes argv[2]: 1 bytes ------ MODULES INFO OUTPUT ------ ------ CONFIG DEBUG OUTPUT ------ repl-diskless-load disabled debug-context "" sanitize-dump-payload no lazyfree-lazy-user-del yes lazyfree-lazy-server-del yes import-mode no lazyfree-lazy-user-flush yes list-compress-depth 0 dual-channel-replication-enabled no repl-diskless-sync yes activedefrag no lazyfree-lazy-expire yes io-threads 1 replica-read-only yes client-query-buffer-limit 1gb slave-read-only yes lazyfree-lazy-eviction yes proto-max-bulk-len 512mb ------ FAST MEMORY TEST ------ 2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #0 terminated 2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #1 terminated 2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #2 terminated * Preparing to test memory region 6530abce2000 (212992 bytes) * Preparing to test memory region 726f8af7f000 (2621440 bytes) * Preparing to test memory region 726f8b200000 (8388608 bytes) * Preparing to test memory region 726f8ba00000 (4194304 bytes) * Preparing to test memory region 726f8bffe000 (8388608 bytes) * Preparing to test memory region 726f8c7ff000 (8388608 bytes) * Preparing to test memory region 726f8d000000 (8388608 bytes) * Preparing to test memory region 726f8dc00000 (4194304 bytes) * Preparing to test memory region 726f8e290000 (16384 bytes) * Preparing to test memory region 726f8e3d2000 (20480 bytes) * Preparing to test memory region 726f8e5f8000 (32768 bytes) * Preparing to test memory region 726f8eb58000 (12288 bytes) * Preparing to test memory region 726f8eb5c000 (16384 bytes) * Preparing to test memory region 726f8ed63000 (4096 bytes) * Preparing to test memory region 726f8eef2000 (397312 bytes) * Preparing to test memory region 726f8efc7000 (4096 bytes) .O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible. ------ DUMPING CODE AROUND EIP ------ Symbol: (null) (base: (nil)) Module: /usr/lib/libc.so.6 (base 0x726f8e410000) $ xxd -r -p /tmp/dump.hex /tmp/dump.bin $ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin ------ === VALKEY BUG REPORT END. Make sure to include from START to END. === ``` --------- Signed-off-by: Fusl <fusl@meo.ws> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Viktor Söderqvist	9199dc1848	Fix defrag timeouts in 8.0 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Ran Shidlansik	85c2286f6f	update build-debian-old to use Bullseye instead of EOL buster (#2345 ) Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Viktor Söderqvist	21cfd5427d	Disable clang-format in 8.0 to simplify backporting Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Sarthak Aggarwal	599dd5daee	Try to stabilize aof test (#2399 ) Based on @enjoy-binbin's suggestion on #1611, I made the change to find the available port. The test has been passing in the daily tests in my local repo. Resolved #1611 Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Viktor Söderqvist	c69b819d02	Test coverage for ECHO for reply schema validation (#1549 ) After #1545 disabled some tests for reply schema validation, we now have another issue that ECHO is not covered. ``` WARNING! The following commands were not hit at all: echo ERROR! at least one command was not hit by the tests ``` This patch adds a test case for ECHO in the unit/other test suite. I haven't checked if there are more commands that aren't covered. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	c7d08ad674	Avoid timing issue in diskless-load-swapdb test (#1077 ) Since we paused the primary node earlier, the replica may enter cluster down due to primary node pfail. Here set allow read to prevent subsequent read errors. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Madelyn Olson	50388d8c9b	Fix undefined behavior defined by ASAN (#1451 ) Asan now supports making sure you are passing in the correct pointer type, which seems useful but we can't support it since we pass in an incorrect pointer in several places. This is most commonly done with generic free functions, where we simply cast it to the correct type. It's not a lot of code to clean up, so it seems appropriate to cleanup instead of disabling the check. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
zvi-code	7fa6df671d	Disable lazy free in defrag test to fix 32bit daily failure (#1370 ) Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com> Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Viktor Söderqvist	3fd70e6c12	Skip CLI tests with reply schema validation (#1545 ) The commands used in valkey-cli tests are not important the reply schema validation. Skip them to avoid the problem if tests hanging. This has failed lately in the daily job: ``` [TIMEOUT]: clients state report follows. sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription Killing still running Valkey server 33357 ``` These test cases use a special valkey-cli command `:get pubsub` command, which is an internal command to valkey-cli rather than a Valkey server command. This command hangs when compiled with with logreqres enabled. Easy solution is to skip the tests in this setup. The test cases were introduced in #1432. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	7589a18b02	Fix the election was reset wrongly before failover epoch was obtained (#1339 ) After #1009, we will reset the election when we received a claim with an equal or higher epoch since a node can win an election in the past. But we need to consider the time before the node actually obtains the failover_auth_epoch. The failover_auth_epoch default is 0, so before the node actually get the failover epoch, we might wrongly reset the election. This is probably harmless, but will produce misleading log output and may delay election by a cron cycle or beforesleep. Now we will only reset the election when a node is actually obtains the failover epoch. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	6a7b806090	Fix empty primary may have dirty slots data due to bad migration (#1285 ) If we become an empty primary for some reason, we still need to check if we need to delete dirty slots, because we may have dirty slots data left over from a bad migration. Like the target node forcibly executes CLUSTER SETSLOT NODE to take over the slot without performing key migration. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Vitaly Arbuzov	21946e6ec0	Add retries on CLUSTERDOWN in tests and fix sanitizer error Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Vitaly Arbuzov	2f636a0169	fix uninitialized variable warning Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Vitaly Arbuzov	84f2fb96da	formatting Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Vitaly Arbuzov	adf7328bbc	Fixes Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	18dd441018	Make manual failover reset the on-going election to promote failover (#1274 ) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	056615de22	Trigger the election as soon as possible when doing a forced manual failover (#1067 ) In CLUSTER FAILOVER FORCE case, we will set mf_can_start to 1 and wait for a cron to trigger the election. We can also set a CLUSTER_TODO_HANDLE_MANUALFAILOVER flag so that we can start the election as soon as possible instead of waiting for the cron, so that we won't have a 100ms delay (clusterCron). Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	d1ddd6a812	Minor log fixes when failover auth denied due to slot epoch (#1341 ) The old reqEpoch mainly refers to requestCurrentEpoch, see: ``` if (requestCurrentEpoch < server.cluster->currentEpoch) { serverLog(LL_WARNING, "Failover auth denied to %.40s (%s): reqEpoch (%llu) < curEpoch(%llu)", node->name, node->human_nodename, (unsigned long long)requestCurrentEpoch, (unsigned long long)server.cluster->currentEpoch); return; } ``` And in here we refer to requestConfigEpoch, it's a bit misleading, so change it to reqConfigEpoch to make it clear. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Madelyn Olson	3021252575	Add a filter option to drop all cluster packets (#1252 ) A minor debugging change that helped in the investigation of https://github.com/valkey-io/valkey/issues/1251. Basically there are some edge cases where we want to fully isolate a note from receiving packets, but can't suspend the process because we need it to continue sending outbound traffic. So, added a filter for that. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	092c9657cc	Add packet-drop to fix the new flaky failover test (#2196 ) The new test was added in #2178, obviously there may be pending reads in the connection, so there may be a race in the DROP-CLUSTER-PACKET-FILTER part causing the test to fail. Add CLOSE-CLUSTER-LINK-ON-PACKET-DROP to ensure that the replica does not process the packet. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit `2019337e74`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	5420531da4	Fix cluster myself CLUSTER SLOTS/NODES wrong port after updating port/tls-port (#2186 ) When modifying port or tls-port through config set, we need to call clusterUpdateMyselfAnnouncedPorts to update myself's port, otherwise CLUSTER SLOTS/NODES will be old information from myself's perspective. In addition, in some places, such as clusterUpdateMyselfAnnouncedPorts and clusterUpdateMyselfIp, beforeSleep save is added so that the new ip info can be updated to nodes.conf. Remove clearCachedClusterSlotsResponse in updateClusterAnnouncedPort since now we add beforeSleep save in clusterUpdateMyselfAnnouncedPorts, and it will call clearCachedClusterSlotsResponse. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit `5cc2b25753`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	d72d0bc34c	Fix replica can't finish failover when config epoch is outdated (#2178 ) When the primary changes the config epoch and then down immediately, the replica may not update the config epoch in time. Although we will broadcast the change in cluster (see #1813), there may be a race in the network or in the code. In this case, the replica will never finish the failover since other primaries will refuse to vote because the replica's slot config epoch is old. We need a way to allow the replica can finish the failover in this case. When the primary refuses to vote because the replica's config epoch is less than the dead primary's config epoch, it can send an UPDATE packet to the replica to inform the replica about the dead primary. The UPDATE message contains information about the dead primary's config epoch and owned slots. The failover will time out, but later the replica can try again with the updated config epoch and it can succeed. Fixes #2169. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> (cherry picked from commit `476671be19`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Madelyn Olson	242501ebeb	Incorporate Redis CVE for CVE-2025-27151 (#2146 ) Resolves https://github.com/valkey-io/valkey/issues/2145 Incorporate the CVE patch that was sent to us by Redis Ltd. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Ping Xie <pingxie@outlook.com> (cherry picked from commit `73696bf6e2`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Madelyn Olson	6af03a0309	Correctly cast the extension lengths (#2144 ) Correctly use a 32 bit integer for accumulating the length of ping extensions. The current code may accidentally truncate the length of an extension that is greater than 64kb and fail the validation check. We don't currently emit any extensions that are this length, but if we were to do so in the future we might have issues with older nodes (without this fix) will silently drop packets from newer nodes. We should backport this to all versions. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> (cherry picked from commit `30d7f08a4e`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Viktor Söderqvist	5082ec6ae5	Detect SSL_new() returning NULL in outgoing connections (#2140 ) When creating an outgoing TLS connection, we don't check if `SSL_new()` returned NULL. Without this patch, the check was done only for incoming connections in `connCreateAcceptedTLS()`. This patch moves the check to `createTLSConnection()` which is used both for incoming and outgoing connections. This check makes sure we fail the connection before going any further, e.g. when `connCreate()` is followed by `connConnect()`, the latter returns `C_ERR` which is commonly detected where outgoing connections are established, such where a replica connects to a primary. ```c int connectWithPrimary(void) { server.repl_transfer_s = connCreate(connTypeOfReplication()); if (connConnect(server.repl_transfer_s, server.primary_host, server.primary_port, server.bind_source_addr, server.repl_mptcp, syncWithPrimary) == C_ERR) { serverLog(LL_WARNING, "Unable to connect to PRIMARY: %s", connGetLastError(server.repl_transfer_s)); connClose(server.repl_transfer_s); server.repl_transfer_s = NULL; return C_ERR; } ``` For a more thorough explanation, see https://github.com/valkey-io/valkey/issues/1939#issuecomment-2912177877. Might fix #1939. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> (cherry picked from commit `17e66863a5`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Binbin	3463206d4a	CLIENT UNBLOCK should't be able to unpause paused clients (#2117 ) When a client is blocked by something like `CLIENT PAUSE`, we should not allow `CLIENT UNBLOCK timeout` to unblock it, since some blocking types does not has the timeout callback, it will trigger a panic in the core, people should use `CLIENT UNPAUSE` to unblock it. Also using `CLIENT UNBLOCK error` is not right, it will return a UNBLOCKED error to the command, people don't expect a `SET` command to get an error. So in this commit, in these cases, we will return 0 to `CLIENT UNBLOCK` to indicate the unblock is fail. The reason is that we assume that if a command doesn't expect to be timedout, it also doesn't expect to be unblocked by `CLIENT UNBLOCK`. The old behavior of the following command will trigger panic in timeout and get UNBLOCKED error in error. Under the new behavior, client unblock will get the result of 0. ``` client 1> client pause 100000 write client 2> set x x client 1> client unblock 2 timeout or client 1> client unblock 2 error ``` Potentially breaking change, previously allowed `CLIENT UNBLOCK error`. Fixes #2111. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit `3bc40be6cd`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Jacob Murphy	73e817d578	Free module context even if there was no content written in auxsave2 (#2132 ) When a module saves to the Aux metadata with aux_save2 callback, the module context is not freed if the module didn't save anything in the callback. https://github.com/valkey-io/valkey/blob/8.1.1/src/rdb.c#L1277-L1284. Note that we return 0, however we should be doing the cleanup done on https://github.com/valkey-io/valkey/blob/8.1.1/src/rdb.c#L1300-L1303 still. Fixes #2125 Signed-off-by: Jacob Murphy <jkmurphy@google.com> (cherry picked from commit `258213ff7e`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
uriyage	6c5b0a02b6	Fix bad slot used in sharded pubsub unsubscribe (#2137 ) This commit fixes two issues in `pubsubUnsubscribeChannel` that could lead to memory corruption: 1. When calculating the slot for a channel, we were using getKeySlot() which might use the current_client's slot if available. This is problematic when a client kills another client (e.g., via CLIENT KILL command) as the slot won't match the channel's actual slot. 2. The `found` variable was not initialized to `NULL`, causing the serverAssert to potentially pass incorrectly when the hashtable lookup failed, leading to memory corruption in subsequent operations. The fix: - Calculate the slot directly from the channel name using keyHashSlot() instead of relying on the current client's slot - Initialize 'found' to NULL Added a test case that reproduces the issue by having one client kill another client that is subscribed to a sharded pubsub channel during a transaction. Crash log (After initializing the variable 'found' to null, without initialization, memory corruption could occur): ``` VALKEY BUG REPORT START: Cut & paste starting from here === 59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT === 59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048 59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11 59707:M 24 May 2025 23:10:40.429 # client->argc = 0 59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED === 59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true ------ STACK TRACE ------ Backtrace: 0 valkey-server 0x0000000104974054 _serverAssertWithInfo + 112 1 valkey-server 0x000000010496c7fc pubsubUnsubscribeChannel + 268 2 valkey-server 0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216 3 valkey-server 0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76 4 valkey-server 0x000000010496c1d4 freeClientPubSubData + 60 5 valkey-server 0x00000001048f3cbc freeClient + 792 6 valkey-server 0x0000000104900870 clientKillCommand + 356 7 valkey-server 0x00000001048d1790 call + 428 8 valkey-server 0x000000010496ef4c execCommand + 872 9 valkey-server 0x00000001048d1790 call + 428 10 valkey-server 0x00000001048d3a44 processCommand + 5056 11 valkey-server 0x00000001048fdc20 processCommandAndResetClient + 64 12 valkey-server 0x00000001048fdeac processInputBuffer + 276 13 valkey-server 0x00000001048f2ff0 readQueryFromClient + 148 14 valkey-server 0x0000000104a182e8 callHandler + 60 15 valkey-server 0x0000000104a1731c connSocketEventHandler + 488 16 valkey-server 0x00000001048b5e80 aeProcessEvents + 820 17 valkey-server 0x00000001048b6598 aeMain + 64 18 valkey-server 0x00000001048dcecc main + 4084 19 dyld 0x0000000186b34274 start + 2840 ```` --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> (cherry picked from commit `bd5dcb2819`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Ran Shidlansik	02d5f6cb70	Only mark the client reprocessing flag when unblocked on keys (#2109 ) When we refactored the blocking framework we introduced the client reprocessing infrastructure. In cases the client was blocked on keys, it will attempt to reprocess the command. One challenge was to keep track of the command timeout, since we are reprocessing and do not want to re-register the client with a fresh timeout each time. The solution was to consider the client reprocessing flag when the client is blockedOnKeys: ``` if (!c->flag.reprocessing_command) { /* If the client is re-processing the command, we do not set the timeout * because we need to retain the client's original timeout. / c->bstate->timeout = timeout; } ``` However, this introduced a new issue. There are cases where the client will consecutive blocking of different types for example: ``` CLIENT PAUSE 10000 ALL BZPOPMAX zset 1 ``` would have the client blocked on the zset endlessly if nothing will be written to it. Credits to @uriyage for locating this with his fuzzer testing* The suggested solution is to only flag the client when it is specifically unblocked on keys. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit `d00fb8e713`) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-22 15:30:45 +02:00
Ran Shidlansik	f77cbfbe11	Release notes for 8.0.4 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-07-07 10:22:40 +03:00
Ran Shidlansik	25d3c19ffa	retry accept on transient errors (CVE-2025-48367) (#2315 ) Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-07-07 10:22:40 +03:00
Madelyn Olson	8235d12277	Apply fixed for CVE-2025-32023 (#2314 ) Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-07-07 10:22:40 +03:00
Jacob Murphy	6917ea8f8d	Release notes for 8.0.3 (#1998 ) Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-04-23 13:41:26 -07:00
uriyage	507c9eb73e	Add output buffer limiting for unauthenticated clients (#1993 ) This commit introduces a mechanism to track client authentication state with a new `ever_authenticated` flag. It refactors client authentication handling by adding a `clientSetUser` function that properly sets both the authenticated and `ever_authenticated` flags. The implementation limits output buffer size for clients that have never been authenticated. Added tests to verify the output buffer limiting behavior for unauthenticated clients. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-04-23 22:30:56 +02:00
Madelyn Olson	ddca1c457e	Remove readability refactor for failover auth to fix clang warning (#1481 ) As part of #1463, I made a small refactor between the PR and the daily test I submitted to try to improve readability by adding a function to abstract the extraction of the message types. However, that change apparently caused GCC to throw another warning, so reverting the abstraction on just one line. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-04-23 20:59:38 +02:00
Harkrishn Patro	a3a137bf45	Update upload artifacts to v4 (#1539 ) Fixes #1538 Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-04-23 20:59:38 +02:00
Vitah Lin	ddbd1c2052	Fix CI latest fedora: install awk and tcl8, build tcltls from source (#1965 ) Fix two problems in fedora CI jobs: 1. Install awk where missing. It's required for building jemalloc. 2. Fix problems with TCL, required for running the tests. Fedora comes with TCL 9 by default, but the TLS package 'tcltls' isn't built for TCL 9. Install 'tcl8' and build 'tcltls' from source. --------- Signed-off-by: vitah <vitahlin@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-04-23 20:59:38 +02:00

1 2 3 4 5 ...

12712 Commits All Branches Search

12712 Commits

All Branches