valkey

Commit Graph

Author	SHA1	Message	Date
Sarthak Aggarwal	410976d509	Add test failure template to contributing guide (#2491 ) We recently introduced a new template to create `test failures` issues from a template. This change makes this template visible in the `CONTRIBUTING.md` file. Also, added a tip to paste the stack trace since outputs of CI links can expire. --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-08-17 11:58:32 +02:00
Ran Shidlansik	e364c57e3d	simplify COPY Preserves TTLs hashexpire test (#2495 ) Simplifies a test case seen to be flaky. fixes: https://github.com/valkey-io/valkey/issues/2482 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-17 11:53:41 +02:00
Hanxi Zhang	f36cc20836	Fix cluster test module to pass null terminated node-id to SendClusterMessage (#2484 ) Fix https://github.com/valkey-io/valkey/issues/2438 Modified `DingReceiver` function in `tests/modules/cluster.c` by adding null-termination logic for cross-version compatibility --------- Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>	2025-08-15 20:40:34 +02:00
yzc-yzc	e7bb2354da	Don't call SSL_write() with num=0 (#2490 ) `aefed3d363/src/networking.c (L2279-L2293)` From above code, we can see that `c->repl_data->ref_block_pos` could be equal to `o->used`. When `o->used == o->size`, we may call SSL_write() with num=0 which does not comply with the openSSL specification. (ref: https://docs.openssl.org/master/man3/SSL_write/#warnings) What's worse is that it's still the case after the reconnection. See `aefed3d363/src/replication.c (L756-L769)`. So in this case the replica will keep reconnecting again and again until it doesn't meet the requirements for partial synchronization. Resolves #2119 --------- Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>	2025-08-15 20:37:15 +02:00
Viktor Söderqvist	87d2330c22	Fix timeout in defrag tests (#2483 ) * Use pipelines of length 1000 instead of up to 200000. * Use CLIENT REPLY OFF instead of reading and discarding the replies. Fixes #2205 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-15 20:28:14 +02:00
Binbin	aefed3d363	Don't allow resize hashtable if rehashing is ongoing (#2465 ) Similar to dicts, we disallow resizing while the hashtable is rehashing. In the previous code, if a resize was triggered during rehashing, like if the rehashing wasn't fast enough, we would do a while loop until the rehashing was complete, which could be a potential issue when doing resize. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-15 11:55:57 +02:00
Sarthak Aggarwal	dcbaecddce	Ensures presence of slots on the node before test is run (#2486 ) The change will ensure that the slot is present on the node before the slot is populated. This will avoid the errors during populating the slot. Resolves #2480 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-08-15 11:23:34 +02:00
Sarthak Aggarwal	de0e581432	Add bug / test-failure / enhancement label to issue template (#2273 ) Automatically attach respective label to newly filled issues. --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <sarthakaggarwal97@gmail.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>	2025-08-14 19:47:38 -07:00
Sarthak Aggarwal	287fbcce33	Fixing Slot Migration Test Failure (#2485 ) Make sure slot migration has finished before moving on to the next test. Resolves #2479 Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-08-14 15:20:13 -07:00
Ping Xie	0b4352c98c	Consolidate slot migration logs by grouping consecutive slot migrations into a single log entry (#2481 ) Previously, each slot migration was logged individually, which could lead to log spam in scenarios where many slots are migrated at once. This commit enhances the logging mechanism to group consecutive slot migrations into a single log entry, improving log readability and reducing noise. Log snippets ``` 1661951:S 13 Aug 2025 15:47:10.132 * Slot range [16383, 16383] is migrated from node c3926da75f7c3a0a1bcd07e088b0bde09d48024c () in shard 7746b693330c0814178b90b757e2711ebb8c6609 to node 2465c29c8afb9231525e281e5825684d0bb79f7b () in shard 39342c039d2a6c7ef0ff96314b230dfd7737d646. 1661951:S 13 Aug 2025 15:47:10.289 * Slot range [10924, 16383] is migrated from node 2465c29c8afb9231525e281e5825684d0bb79f7b () in shard 39342c039d2a6c7ef0ff96314b230dfd7737d646 to node c3926da75f7c3a0a1bcd07e088b0bde09d48024c () in shard 7746b693330c0814178b90b757e2711ebb8c6609. 1661951:S 13 Aug 2025 15:47:10.524 * Slot range [10924, 16383] is migrated from node c3926da75f7c3a0a1bcd07e088b0bde09d48024c () in shard 7746b693330c0814178b90b757e2711ebb8c6609 to node 2465c29c8afb9231525e281e5825684d0bb79f7b () in shard 39342c039d2a6c7ef0ff96314b230dfd7737d646. ``` --------- Signed-off-by: Ping Xie <pingxie@google.com>	2025-08-14 13:15:52 -07:00
Binbin	3d1ff2a38c	Remove if condition and disable the new failover test (#2477 ) In #2431 we changed the assert to a if condition, and the test cause some trouble, now we just remove the assert (if condition) and disable the test for now due to #2441. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-13 17:01:00 +08:00
Ran Shidlansik	15355124d4	HSETEX 'hset' notification should only be generated if not expired (#2475 ) Currently HSETEX always generate `hset` notification. In order to align with generic `set` command, it should only generate `hset` if the provided time-to-live is a valid future time. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-13 10:22:44 +03:00
Ran Shidlansik	c93b2971e6	Increment expired_fields stat when assigned TTL is in the past (#2474 ) fixes: https://github.com/valkey-io/valkey/issues/2461 --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-13 09:37:54 +03:00
ruihong123	58f5562d22	Fix duplicate Acks for RDMA events and fix extremely large max latency for RDMA benchmark. (#2430 ) (1) The old logic may result in the RDMA event being acknowledged unexpectly in the following two scenarios. * ibv_get_cq_event get an EAGAIN error. * ibv_get_cq_event get one event but it may ack multiple times in the pollcq loop. (2) In the benchmark result of valkey over RDMA, the tail latency is as high as 177 milliseconds(almost 80x of TCP). This results from incorrect benchmark client setup which includes the connection setup time into the benchmark latency recording. This patch fixes this crazy tail latency issue by modifying the valkey-benchmark.c. This change only affects benchmark over RDMA as updates are regulated under Macro USE_RDMA. There are following updates on valkey RDMA but I am willing to create separated pull requests. --------- Signed-off-by: Ruihong Wang <ruihong@google.com>	2025-08-13 09:28:44 +03:00
Ran Shidlansik	6c2e4f9553	Deflake hashexpire tests (#2473 ) 1. Better separation of test steps in `Chain Replication (Primary -> R1 -> R2) preserves TTL` This can help prevent or provide better understanding of a flakey fail: https://github.com/valkey-io/valkey/actions/runs/16867976482/job/47777607814 2. increase the millisecond short timeout to 1 second since some tests are failing because of it. It also better matches the second timeout. example fail: https://github.com/valkey-io/valkey/actions/runs/16867976482/job/47777607746 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-12 13:32:19 +03:00
Binbin	7a9ef29f1a	Change the same shard failover assert to if condition to avoid crash (#2431 ) The assert was added in #2301 and we found that there are some situations would trigger assert and crash the server. The reason we added the assert is because, in the code: 1. sender_claimed_primary and sender are in the same shard 2. and sender is the old primary, sender_claimed_primary is the old replica 3. and now sender become a replica, sender_claimed_primary become a primary That means a failover happend in the shard, and sender should be the primary of sender_claimed_primary. But obviously this assumption may be wrong, we rely on shard_id to determine whether it is in a same shard, and assume that a shard can only have one primary. But this is wrong, from #2279 we can know there will be a case that we can create two primaries in the same shard due to the untimely update of shard_id. So we can create a test that trigger the assert in this way: 1. pre condition: two primaries in the same shard, one has slots and one is empty. 2. replica doing a cluster failover 3. the empty primary doing a cluster replicate with the replica (new primary) We change the assert to an if condition to fix it. Closes #2423. Note that the test written here also exposes the issue in #2441, so these two may need to be addressed together. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-12 10:37:11 +08:00
Binbin	8131c2b07b	Add test for failover sub-replica replication loop case (#2456 ) When we added the safeguard against sub-replicas logic, we didn't add much tests for it. As time has shown, it can happen in different scenarios. Here's a test case that used to happen in a failover scenario. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-12 10:21:06 +08:00
Jacob Murphy	d7993b78d8	Introduce atomic slot migration (#1949 ) Introduces a new family of commands for migrating slots via replication. The procedure is driven by the source node which pushes an AOF formatted snapshot of the slots to the target, followed by a replication stream of changes on that slot (a la manual failover). This solution is an adaptation of the solution provided by @enjoy-binbin, combined with the solution I previously posted at #1591, modified to meet the designs we had outlined in #23. ## New commands * `CLUSTER MIGRATESLOTS SLOTSRANGE start end [start end]... NODE node-id`: Begin sending the slot via replication to the target. Multiple targets can be specified by repeating `SLOTSRANGE ... NODE ...` * `CLUSTER CANCELMIGRATION ALL`: Cancel all slot migrations * `CLUSTER GETSLOTMIGRATIONS`: See a recent log of migrations This PR only implements "one shot" semantics with an asynchronous model. Later, "two phase" (e.g. slot level replicate/failover commands) can be added with the same core. ## Slot migration jobs Introduces the concept of a slot migration job. While active, a job tracks a connection created by the source to the target over which the contents of the slots are sent. This connection is used for control messages as well as replicated slot data. Each job is given a 40 character random name to help uniquely identify it. All jobs, including those that finished recently, can be observed using the `CLUSTER GETSLOTMIGRATIONS` command. ## Replication * Since the snapshot uses AOF, the snapshot can be replayed verbatim to any replicas of the target node. * We use the same proxying mechanism used for chaining replication to copy the content sent by the source node directly to the replica nodes. ## `CLUSTER SYNCSLOTS` To coordinate the state machine transitions across the two nodes, a new command is added, `CLUSTER SYNCSLOTS`, that performs this control flow. Each end of the slot migration connection is expected to install a read handler in order to handle `CLUSTER SYNCSLOTS` commands: * `ESTABLISH`: Begins a slot migration. Provides slot migration information to the target and authorizes the connection to write to unowned slots. * `SNAPSHOT-EOF`: appended to the end of the snapshot to signal that the snapshot is done being written to the target. * `PAUSE`: informs the source node to pause whenever it gets the opportunity * `PAUSED`: added to the end of the client output buffer when the pause is performed. The pause is only performed after the buffer shrinks below a configurable size * `REQUEST-FAILOVER`: request the source to either grant or deny a failover for the slot migration. The grant is only granted if the target is still paused. Once a failover is granted, the paused is refreshed for a short duration * `FAILOVER-GRANTED`: sent to the target to inform that REQUEST-FAILOVER is granted * `ACK`: heartbeat command used to ensure liveness ## Interactions with other commands * FLUSHDB on the source node (which flushes the migrating slot) will result in the source dropping the connection, which will flush the slot on the target and reset the state machine back to the beginning. The subsequent retry should very quickly succeed (it is now empty) * FLUSHDB on the target will fail the slot migration. We can iterate with better handling, but for now it is expected that the operator would retry. * Genearlly, FLUSHDB is expected to be executed cluster wide, so preserving partially migrated slots doesn't make much sense * SCAN and KEYS are filtered to avoid exposing importing slot data ## Error handling * For any transient connection drops, the migration will be failed and require the user to retry. * If there is an OOM while reading from the import connection, we will fail the import, which will drop the importing slot data * If there is a client output buffer limit reached on the source node, it will drop the connection, which will cause the migration to fail * If at any point the export loses ownership or either node is failed over, a callback will be triggered on both ends of the migration to fail the import. The import will not reattempt with a new owner * The two ends of the migration are routinely pinging each other with SYNCSLOTS ACK messages. If at any point there is no interaction on the connection for longer than `repl-timeout`, the connection will be dropped, resulting in migration failure * If a failover happens, we will drop keys in all unowned slots. The migration does not persist through failovers and would need to be retried on the new source/target. ## State machine ``` Target/Importing Node State Machine ───────────────────────────────────────────────────────────── ┌────────────────────┐ │SLOT_IMPORT_WAIT_ACK┼──────┐ └──────────┬─────────┘ │ ACK│ │ ┌──────────────▼─────────────┐ │ │SLOT_IMPORT_RECEIVE_SNAPSHOT┼──┤ └──────────────┬─────────────┘ │ SNAPSHOT-EOF│ │ ┌───────────────▼──────────────┐ │ │SLOT_IMPORT_WAITING_FOR_PAUSED┼─┤ └───────────────┬──────────────┘ │ PAUSED│ │ ┌───────────────▼──────────────┐ │ Error Conditions: │SLOT_IMPORT_FAILOVER_REQUESTED┼─┤ 1. OOM └───────────────┬──────────────┘ │ 2. Slot Ownership Change FAILOVER-GRANTED│ │ 3. Demotion to replica ┌──────────────▼─────────────┐ │ 4. FLUSHDB │SLOT_IMPORT_FAILOVER_GRANTED┼──┤ 5. Connection Lost └──────────────┬─────────────┘ │ 6. No ACK from source (timeout) Takeover Performed│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS┼────┤ └──────────────────────────┘ │ │ ┌─────────────────────────────────────▼─┐ │SLOT_IMPORT_FINISHED_WAITING_TO_CLEANUP│ └────────────────────┬──────────────────┘ Unowned Slots Cleaned Up│ ┌─────────────▼───────────┐ │SLOT_MIGRATION_JOB_FAILED│ └─────────────────────────┘ Source/Exporting Node State Machine ───────────────────────────────────────────────────────────── ┌──────────────────────┐ │SLOT_EXPORT_CONNECTING├─────────┐ └───────────┬──────────┘ │ Connected│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_AUTHENTICATING┼───────┤ └─────────────┬────────────┘ │ Authenticated│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_SEND_ESTABLISH┼───────┤ └─────────────┬────────────┘ │ ESTABLISH command written│ │ ┌─────────────────────▼─────────────┐ │ │SLOT_EXPORT_READ_ESTABLISH_RESPONSE┼──────┤ └─────────────────────┬─────────────┘ │ Full response read (+OK)│ │ ┌────────────────▼──────────────┐ │ Error Conditions: │SLOT_EXPORT_WAITING_TO_SNAPSHOT┼─────┤ 1. User sends CANCELMIGRATION └────────────────┬──────────────┘ │ 2. Slot ownership change No other child process│ │ 3. Demotion to replica ┌────────────▼───────────┐ │ 4. FLUSHDB │SLOT_EXPORT_SNAPSHOTTING┼────────┤ 5. Connection Lost └────────────┬───────────┘ │ 6. AUTH failed Snapshot done│ │ 7. ERR from ESTABLISH command ┌───────────▼─────────┐ │ 8. Unpaused before failover completed │SLOT_EXPORT_STREAMING┼──────────┤ 9. Snapshot failed (e.g. Child OOM) └───────────┬─────────┘ │ 10. No ack from target (timeout) PAUSE│ │ 11. Client output buffer overrun ┌──────────────▼─────────────┐ │ │SLOT_EXPORT_WAITING_TO_PAUSE┼──────┤ └──────────────┬─────────────┘ │ Buffer drained│ │ ┌──────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_PAUSED┼───────┤ └──────────────┬────────────┘ │ Failover request granted│ │ ┌───────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_GRANTED┼───────┤ └───────────────┬────────────┘ │ New topology received│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS│ │ └──────────────────────────┘ │ │ ┌─────────────────────────┐ │ │SLOT_MIGRATION_JOB_FAILED│◄────────┤ └─────────────────────────┘ │ │ ┌────────────────────────────┐ │ │SLOT_MIGRATION_JOB_CANCELLED│◄──────┘ └────────────────────────────┘ ``` Co-authored-by: Binbin <binloveplay1314@qq.com> --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-08-11 18:02:37 -07:00
Madelyn Olson	725c3608f0	Throw a useful error when underlying operations failed with tls (#2469 ) Right now, if a TLS connect fails, you get an unhelpful error message in the log since it prints out NULL. This change makes sure that report error always returns a string (never null) as well as tries to print out underlying errors. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2025-08-11 15:49:36 -07:00
VanessaTang	a89e529932	Redact user data when hide-user-data-from-log enabled (#2274 ) ### Description User data logged when crash caused by moduleRDBLoadError. ### Change Redact user data when hide-user-data-from-log enabled. Signed-off-by: VanessaTang <yuetan@amazon.com>	2025-08-11 11:33:23 -07:00
cxljs	ee305ce291	Replace generic C assert with serverAssert (#2467 ) Based on the review suggestion https://github.com/valkey-io/valkey/pull/2432#discussion_r2258249206 --------- Signed-off-by: Xiaolong Chen <fukua95@gmail.com>	2025-08-11 11:03:12 -07:00
Ran Shidlansik	ecca213311	Fix out-of-bound memory access when num-fields is not provided (#2464 ) Following new API presented in https://github.com/valkey-io/valkey/pull/2089, we might access out of bound memory in case of some illegal command input Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-11 12:24:50 +03:00
cxljs	d32f9be04a	Use bool for return types in hashtable functions (#2432 ) closes #2352 --------- Signed-off-by: Xiaolong Chen <fukua95@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-11 10:13:26 +03:00
Ran Shidlansik	bddc9db087	Fix HGETEX to return array response when the hash object is empty (#2462 ) In the current implementation `HGETEX`, when applied on a non existing object, will simply return null value instead of an array (like in the `HMGET` case). --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-11 07:16:18 +03:00
Jacob Murphy	e606fcbaff	Fix AOF rewrite behavior for hashes with expirations (#2454 ) The bug caused many invalid HMSET entries to be added to the AOF during rewriting. This now properly skips emitting HMSET until we find an entry with no expiry. Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-08-10 10:05:02 +03:00
Sarthak Aggarwal	cc465383c4	Adding backup directory check for valkey-cli --cluster backup (#2452 ) Adds a simple check for backup directory, remove the todo. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-08-09 00:12:40 +08:00
amanosme	de7bb614f1	Fix expectations for manual failover tests (#2453 ) Test `Instance #5 is still a slave after some time (no failover)` is supposed to verify that command `CLUSTER FAILOVER` will not promote a replica without quorum from the primary; later in the file (`Instance 5 is a master after some time`), we verify that `CLUSTER FAILOVER FORCE` does promote a replica under the same conditions. There's a couple issues with the tests: 1. `Instance #5 is still a slave after some time (no failover)` should verify that instance 5 is a replica (i.e. that there's no failover), but we call `assert {[s -5 role] eq {master}}`. 2. The reason why the above assert works is that we previously send `DEBUG SLEEP 10` to the primary, which pauses the primary for longer than the configured 3 seconds for`cluster-node-timeout`. The primary is marked as failed from the perspective of the rest of the cluster, so quorum can be established and instance 5 is promoted as primary. This commit fixes the two by shortening the sleep to less than 3 seconds, and then asserting the role is still replica. Test `Instance #5 is a master after some time` is updated to sleep for a shorter duration to ensure that `FAILOVER FORCE` succeeds under the exact same conditions. ### Testing `./runtest --single unit/cluster/manual-failover --loop --fastfail` Signed-off-by: Tyler Amano-Smerling <amanosme@amazon.com>	2025-08-08 16:11:15 +08:00
Hanxi Zhang	7bbf5233c8	Fix and remove conflicting paths from clang-format workflow (#2455 ) Fixes the GitHub Actions error where both `paths` and `paths-ignore` were defined for the same event, which is not allowed. Resolves the error: "you may only define one of `paths` and `paths-ignore` for a single event" Removed the conflicting `paths` section from the `pull_request` trigger, keeping only `paths-ignore` to skip documentation changes while allowing the workflow to run on all other changes. This is a follow-up fix to address the issue identified in the previous PR. Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>	2025-08-08 14:34:19 +08:00
Hanxi Zhang	c6794fd2cc	Add path filters to skip CI on documentation changes (#2393 ) Fixes #626 - Skip CI tests when only documentation files are modified --------- Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>	2025-08-07 15:50:22 -07:00
Seungmin Lee	70d336d892	Increase latency for big list test in defrag (#2421 ) Across multiple runs of the big list test in defrag, the latency check is tripping because the maximum observed defrag cycle latency occasionally spikes above our 5 ms limit. While most cycles complete in just a few milliseconds, rare slowdowns push some cycles into the double digit millisecond range, so a 5 ms hard cap is too aggressive for stable testing. ``` /runtest --verbose --tls --single unit/memefficiency --only '/big list' --accurate --loop --fastfail ``` ``` [err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl Expected 12 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1) (Fast fail: test will exit now) [err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl Expected 21 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1) (Fast fail: test will exit now) [err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl Expected 25 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1) (Fast fail: test will exit now) [err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl Expected 17 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1) (Fast fail: test will exit now) ``` --------- Signed-off-by: Seungmin Lee <sungming@amazon.com> Co-authored-by: Seungmin Lee <sungming@amazon.com>	2025-08-07 09:24:47 -07:00
Ran Shidlansik	ba19ec861a	Fix hashexpire test toggle active expire (#2447 ) 1. Fix the way we toggle active expire. 2. reduce the number of spawned servers during the test Just merged some start_server blocks to reduce the time spent on starting and shutting down the server. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-07 16:38:52 +03:00
Ran Shidlansik	2b9fb7514e	make sure replica and primary are in sync during hfe test Promotion to primary (#2446 ) fixes: https://github.com/valkey-io/valkey/issues/2445 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-07 11:54:41 +03:00
cxljs	2f1b25067e	Remove unused includes (#2344 ) Remove unused includes Signed-off-by: Xiaolong Chen <fukua95@gmail.com>	2025-08-06 13:42:09 -07:00
Ran Shidlansik	772d12ed02	verify expiration is set on hashexpire active expire test (#2440 ) This might help shed some light on the flakey failure e.g: https://github.com/valkey-io/valkey/actions/runs/16754537665/job/47433388425?pr=2431 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-06 20:31:34 +03:00
Binbin	094c80b9ce	Print error context when test fails (#2437 ) We used to did print the context but after #2276, we lost the context. unstable: ``` * Extract version and sha1 details from info command and print in tests/unit/info-command.tcl ``` now: ``` * [err]: Extract version and sha1 details from info command and print in tests/unit/info-command.tcl Expected '0' to be equal to '1' (context: type source line 7 file /xxx/info-command.tcl cmd {assert_equal 0 1} proc ::test) ``` We can see the different, we have provided enough context when asserting fail. Otherwise we need to scroll back (which is usually a lot of server logs) to see the context. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-07 00:20:50 +08:00
uriyage	fd8270a0aa	[CRASH] Fix missing check for executing client (#2347 ) Fix missing check for executing client in `lookupKey` function ### Issue The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in #1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` ### Fix Added a null check for `server.executing_client` before attempting to dereference it: ### Tests Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-06 17:02:10 +03:00
Ran Shidlansik	ba7334c7c6	Fix hfe no malloc size unit (#2436 ) test_vest uses a mock_defrag function in order to simulate defrag flow. In the scenario of no system malloc_size we need to use zmalloc_usable_size in order to prevent stepping over the "to-be-replaced" buffer will partially fix: https://github.com/valkey-io/valkey/issues/2435 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-06 12:33:55 +03:00
Binbin	db11e41dfc	Add STALE command flag to SCRIPT-EXISTS, SCRIPT-SHOW and SCRIPT-FLUSH (#2419 ) We marked SCRIPT-LOAD/EVAL* with STALE in `7eadc5ee70`, it is odd that we can load but won't be able to exists or show it. Also it is technically ok since these commands doesn't relate directly to the server's dataset. Also since now we don't have the script replication, flush also seems safe to add the flag. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-06 10:37:30 +08:00
Ran Shidlansik	33a43f2a8a	bump the RDB version for Valkey 9.0(#2422 ) This is needed due to changes presented in https://github.com/valkey-io/valkey/pull/2089 --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-05 18:32:18 +03:00
Ran Shidlansik	5a5848a44c	Add ACTIVE-EXPIRY and ACTIVE-DEFRAG for hash objects with volatile items This change adds support for active expiration of hash fields with TTLs (Hash Field Expiration), building on the existing key-level expiry system. Field TTL metadata is tracked in volatile sets associated with each hash key. Expired fields are reclaimed incrementally by the active expiration loop, using a new job type to alternate between key expiry and field expiry within the same logic and effort budget. Both key and field expiration now share the same scheduler infrastructure. Alternating job types ensures fairness and avoids starvation, while keeping CPU usage predictable. +-----------------+ \| DB \| +-----------------+ \| v +---------------------+ \| myhash \| (key with TTL) +---------------------+ \| v +------------------------------------+ \| fields (hashType) \| \| - field1 \| \| - field2 \| \| - fieldN \| +------------------------------------+ \| v +------------------------------------+ \| volatile set (field-level TTL) \| \| - field1 expires at T1 \| \| - field5 expires at T5 \| +------------------------------------+ No new configuration was introduced; the existing active-expire-effort and time budget are reused for both key and field expiry. Also active defrag for volatile sets is added. Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-05 18:28:47 +03:00
Ran Shidlansik	c3b1b0317d	Introduce volatile set ------------- Overview: --------- This PR introduces a complete redesign of the 'vset' (stands for volatile set) data structure, creating an adaptive container for expiring entries. The new design is memory-efficient, scalable, and dynamically promotes/demotes its internal representation depending on runtime behavior and volume. The core concept uses a single tagged pointer (`expiry_buckets`) that encodes one of several internal structures: - NONE (-1): Empty set - SINGLE (0x1): One entry - VECTOR (0x2): Sorted vector of entry pointers - HT (0x4): Hash table for larger buckets with many entries - RAX (0x6): Radix tree (keyed by aligned expiry timestamps) This allows the set to grow and shrink seamlessly while optimizing for both space and performance. Motivation: ----------- The previous design lacked flexibility in high-churn environments or workloads with skewed expiry distributions. This redesign enables dynamic layout adjustment based on the time distribution and volume of the inserted entries, while maintaining fast expiry checks and minimal memory overhead. Key Concepts: ------------- - All pointers stored in the structure must be odd-aligned to preserve 3 bits for tagging. This is safe with SDS strings (which set the LSB). - Buckets evolve automatically: - Start as NONE. - On first insert → become SINGLE. - If another entry with similar expiry → promote to VECTOR. - If VECTOR exceeds 127 entries → convert to RAX. - If a RAX bucket's vector fills and cannot split → promote to HT. - Each vector bucket is kept sorted by `entry->getExpiry()`. - Binary search is used for efficient insertion and splitting. # Coarse Buckets Expiration System for Hash Fields This PR introduces coarse-grained expiration buckets to support per-field expirations in hash types — a feature known as volatile fields. It enables scalable expiration tracking by grouping fields into time-aligned buckets instead of individually tracking exact timestamps. ## Motivation Valkey traditionally supports key-level expiration. However, in many applications, there's a strong need to expire individual fields within a hash (e.g., session keys, token caches, etc.). Tracking these at fine granularity is expensive and potentially unscalable, so this implementation introduces bucketed expirations to batch expirations together. ## Bucket Granularity and Timestamp Handling - Each expiration bucket represents a time slice of fixed width (e.g., 8192 ms). - Expiring fields are mapped to the end of a time slice (not the floor). - This design facilitates: - Efficient splitting of large buckets when needed - Downgrading buckets when fields permit tighter packing - Coalescing during lazy cleanup or memory pressure ### Example Calculation Suppose a field has an expiration time of `1690000123456` ms and the max bucket interval is 8192 ms: ``` BUCKET_INTERVAL_MAX = 8192; expiry = 1690000123456; bucket_ts = (expiry & ~(BUCKET_INTERVAL_MAX - 1LL)) + BUCKET_INTERVAL_MAX; = (1690000123456 & ~8191) + 8192 = 1690000122880 + 8192 = 1690000131072 ``` The field is stored in a bucket that ends at `1690000131072` ms. ### Bucket Alignment Diagram ``` Time (ms) → \|----------------\|----------------\|----------------\| 128ms buckets → 1690000122880 1690000131072 ^ ^ \| \| expiry floor assigned bucket end ``` ## Bucket Placement Logic - If a suitable bucket already exists (i.e., its `end_ts > expiry`), the field is added. - If no bucket covers the `expiry`, a new bucket is created at the computed `end_ts`. ## Bucket Downgrade Conditions Buckets are downgraded to smaller intervals when overpopulated (>127 fields). This happens when all fields fit into a tighter bucket. Downgrade rule: ``` (max_expiry & ~(BUCKET_INTERVAL_MIN - 1LL)) + BUCKET_INTERVAL_MIN < current_bucket_ts ``` If the above holds, all fields can be moved to a tighter bucket interval. ### Downgrade Bucket — Diagram ``` Before downgrade: Current Bucket (8192 ms) \|----------------------------------------\| \| Field A \| Field B \| Field C \| Field D \| \| exp=+30 \| +200 \| +500 \| +1500 \| \|----------------------------------------\| ↑ All expiries fall before tighter boundary After downgrade to 1024 ms: New Bucket (1024 ms) \|------------------\| \| A \| B \| C \| D \| \|------------------\| ``` ### Bucket Split Strategy If downgrade is not possible, the bucket is split: - Fields are sorted by expiration time. - A subset that fits in an earlier bucket is moved out. - Remaining fields stay in the original bucket. ### Split Bucket — Diagram ``` Before split: Large Bucket (8192 ms) \|--------------------------------------------------\| \| A \| B \| C \| D \| E \| F \| G \| H \| I \| J \| ... \| Z \| \|---------------- Sorted by expiry ---------------\| ↑ Fields A–L can be moved to an earlier bucket After split: Bucket 1 (end=1690000129024) Bucket 2 (end=1690000131072) \|------------------------\| \|------------------------\| \| A \| B \| C \| ... \| L \| \| M \| N \| O \| ... \| Z \| \|------------------------\| \|------------------------\| ``` ## Summary of Bucket Behavior \| Scenario \| Action Taken \| \|--------------------------------\|------------------------------\| \| No bucket covers expiry \| New bucket is created \| \| Existing bucket fits \| Field is added \| \| Bucket overflows (>127 fields) \| Downgrade or split attempted \| API Changes: ------------ Create/Free: void vsetInit(vset set); void vsetClear(vset set); Mutation: bool vsetAddEntry(vset set, vsetGetExpiryFunc getExpiry, void entry); bool vsetRemoveEntry(vset set, vsetGetExpiryFunc getExpiry, void entry); bool vsetUpdateEntry(vset set, vsetGetExpiryFunc getExpiry, void old_entry, void new_entry, long long old_expiry, long long new_expiry); Expiry Retrieval: long long vsetEstimatedEarliestExpiry(vset set, vsetGetExpiryFunc getExpiry); size_t vsetPopExpired(vset set, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void ctx); Utilities: bool vsetIsEmpty(vset set); size_t vsetMemUsage(vset set); Iteration: void vsetStart(vset set, vsetIterator it); bool vsetNext(vsetIterator it, void entryptr); void vsetStop(vsetIterator it); Entry Requirements: ------------------- All entries must conform to the following interface via `volatileEntryType`: sds entryGetKey(const void entry); // for deduplication long long getExpiry(const void entry); // used for bucketing int expire(void db, void o, void entry); // used for expiration callbacks Diagrams: --------- 1. Tagged Pointer Representation ----------------------------- Lower 3 bits of `expiry_buckets` encode bucket type: +------------------------------+ \| pointer \| TAG (3b) \| +------------------------------+ ↑ masked via VSET_PTR_MASK TAG values: 0x1 → SINGLE 0x2 → VECTOR 0x4 → HT 0x6 → RAX 2. Evolution of the Bucket ------------------------ Volatile set top-level structure: ``` +--------+ +--------+ +--------+ +--------+ \| NONE \| --> \| SINGLE \| --> \| VECTOR \| --> \| RAX \| +--------+ +--------+ +--------+ +--------+ ``` If the top-level element is a RAX, it has child buckets of type: ``` +--------+ +--------+ +-----------+ \| SINGLE \| --> \| VECTOR \| --> \| HASHTABLE \| +--------+ +--------+ +-----------+ ``` Vectors can split into multiple vectors and shrink into SINGLE buckets. A RAX with only one element is collapsed by replacing the RAX with its single element on the top level (except for HASHTABLE buckets which are not allowed on the top level). 3. RAX Structure with Expiry-Aligned Keys -------------------------------------- Buckets in RAX are indexed by aligned expiry timestamps: +------------------------------+ \| RAX key (bucket_ts) → Bucket\| +------------------------------+ \| 0x00000020 → VECTOR \| \| 0x00000040 → VECTOR \| \| 0x00000060 → HT \| +------------------------------+ 4. Bucket Splitting (Inside RAX) ----------------------------- If a vector bucket in a RAX fills: - Binary search for best split point. - Use `getExpiry(entry)` + `get_bucket_ts()` to find transition. - Create 2 new buckets and update RAX. Original: [entry1, entry2, ..., entryN] ← bucket_ts = 64ms After split: [entry1, ..., entryK] → bucket_ts = 32ms [entryK+1, ..., entryN] → bucket_ts = 64ms If all entries share same bucket_ts → promote to HT. 5. Shrinking Behavior ------------------ On deletion: - HT may shrink to VECTOR. - VECTOR with 1 item → becomes SINGLE. - If RAX has only one key left, it’s promoted up. Summary: -------- This redesign provides: ✓ Fine-grained memory control ✓ High scalability for bursty TTL data ✓ Fast expiry checks via windowed organization ✓ Minimal overhead for sparse sets ✓ Flexible binary-search-based sorting and bucketing It also lays the groundwork for future enhancements, including metrics, prioritized expiry policies, or segmented cleaning. Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-05 18:28:47 +03:00
Ran Shidlansik	65215e5378	Introduce HASH items expiration Closes https://github.com/valkey-io/valkey/issues/640 This PR introduces support for field-level expiration in Valkey hash types, making it possible for individual fields inside a hash to expire independently — creating what we call volatile fields. This is just the first out of 3 PRs. The content of this PR focus on enabling the basic ability to set and modify hash fields expiration as well as persistency (AOF+RDB) and defrag. [The second PR](https://github.com/ranshid/valkey/pull/5) introduces the new algorithm (volatile-set) to track volatile hash fields is in the last stages of review. The current implementation in this PR (in volatile-set.h/c) is just s tub implementation and will be replaced by [The second PR](https://github.com/ranshid/valkey/pull/5) [The third PR](https://github.com/ranshid/valkey/pull/4/) which introduces the active expiration and defragmentation jobs. For more highlevel design details you can track the RFC PR: https://github.com/valkey-io/valkey-rfc/pull/22. --- Some highlevel major decisions which are taken as part of this work: 1. We decided to copy the existing Redis API in order to maintain compatibility with existing clients. 2. We decided to avoid introducing lazy-expiration at this point, in order to reduce complexity and rely only on active-expiration for memory reclamation. This will require us to continue to work on improving the active expiration job and potentially consider introduce lazy-expiration support later on. 3. Although different commands which are adding expiration on hash fields are influencing the memory utilization (by allocating more memory for expiration time and metadata) we decided to avoid adding the DENYOOM for these commands (an exception is HSETEX) in order to be better aligned with highlevel keys commands like `expire` 4. Some hash type commands will produce unexpected results: - HLEN - will still reflect the number of fields which exists in the hash object (either actually expired or not). - HRANDFIELD - in some cases we will not be able to randomly select a field which was not already expired. this case happen in 2 cases: 1/ when we are asked to provide a non-uniq fields (i.e negative count) 2/ when the size of the hash is much bigger than the count and we need to provide uniq results. In both cases it is possible that an empty response will be returned to the caller, even in case there are fields in the hash which are either persistent or not expired. 5. For the case were a field is provided with a zero (0) expiration time or expiration time in the past, it is immediately deleted. We decided that, in order to be aligned with how high level keys are handled, we will emit hexpired keyspace event for that case (instead of hdel). For example: for the case: 6. We will ALWAYS load hash fields during rdb load. This means that when primary is rebooting with an old snapshot, it will take time to reclaim all the expired fields. However this simplifies the current logic and avoid major refactoring that I suspect will be needed. ``` HSET myhash f1 v1 > 0 HGETEX myhash EX 0 FIELDS 1 f1 > "v1" HTTL myhash FIELDS 1 f1 > -2 ``` The reported events are: ``` 1) "psubscribe" 2) "__keyevent@0__" 3) (integer) 1 1) "pmessage" 2) "__keyevent@0__" 3) "__keyevent@0__:hset" 4) "myhash" 1) "pmessage" 2) "__keyevent@0__" 3) "__keyevent@0__:hexpired" <---------------- note this 4) "myhash" 1) "pmessage" 2) "__keyevent@0__" 3) "__keyevent@0__:del" 4) "myhash" ``` --- This PR also modularizes and exposes the internal `hashTypeEntry` logic as a new standalone `entry.c/h` module. This new abstraction handles all aspects of field–value–expiry encoding using multiple memory layouts optimized for performance and memory efficiency. An `entry` is an abstraction that represents a single field–value pair with optional expiration. Internally, Valkey uses different memory layouts for compactness and efficiency, chosen dynamically based on size and encoding constraints. The entry pointer is the field sds. Which make us use an entry just like any sds. We encode the entry layout type in the field SDS header. Field type SDS_TYPE_5 doesn't have any spare bits to encode this so we use it only for the first layout type. Entry with embedded value, used for small sizes. The value is stored as SDS_TYPE_8. The field can use any SDS type. Entry can also have expiration timestamp, which is the UNIX timestamp for it to be expired. For aligned fast access, we keep the expiry timestamp prior to the start of the sds header. +----------------+--------------+---------------+ \| Expiration \| field \| value \| \| 1234567890LL \| hdr "foo" \0 \| hdr8 "bar" \0 \| +-----------------------^-------+---------------+ \| \| entry pointer (points to field sds content) Entry with value pointer, used for larger fields and values. The field is SDS type 8 or higher. +--------------+-------+--------------+ \| Expiration \| value \| field \| \| 1234567890LL \| ptr \| hdr "foo" \0 \| +--------------+--^----+------^-------+ \| \| \| \| \| entry pointer (points to field sds content) \| value pointer = value sds The `entry.c/h` API provides methods to: - Create, read, and write and Update field/value/expiration - Set or clear expiration - Check expiration state - Clone or delete an entry --- This PR introduces new commands and extends existing ones to support field expiration: The proposed API is very much identical to the Redis provided API (Redis 7.4 + 8.0). This is intentionally proposed in order to avoid breaking client applications already opted to use hash items TTL. Synopsis ``` HSETEX key [NX \| XX] [FNX \| FXX] [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| KEEPTTL] FIELDS numfields field value [field value ...] ``` Set the value of one or more fields of a given hash key, and optionally set their expiration time or time-to-live (TTL). The HSETEX command supports the following set of options: * `NX` — Only set the fields if the hash object does NOT exist. * `XX` — Only set the fields if if the hash object doesx exist. * `FNX` — Only set the fields if none of them already exist. * `FXX` — Only set the fields if all of them already exist. * `EX seconds` — Set the specified expiration time in seconds. * `PX milliseconds` — Set the specified expiration time in milliseconds. * `EXAT unix-time-seconds` — Set the specified Unix time in seconds at which the fields will expire. * `PXAT unix-time-milliseconds` — Set the specified Unix time in milliseconds at which the fields will expire. * `KEEPTTL` — Retain the TTL associated with the fields. The `EX`, `PX`, `EXAT`, `PXAT`, and `KEEPTTL` options are mutually exclusive. Synopsis ``` HGETEX key [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| PERSIST] FIELDS numfields field [field ...] ``` Get the value of one or more fields of a given hash key and optionally set their expiration time or time-to-live (TTL). The `HGETEX` command supports a set of options: * `EX seconds` — Set the specified expiration time, in seconds. * `PX milliseconds` — Set the specified expiration time, in milliseconds. * `EXAT unix-time-seconds` — Set the specified Unix time at which the fields will expire, in seconds. * `PXAT unix-time-milliseconds` — Set the specified Unix time at which the fields will expire, in milliseconds. * `PERSIST` — Remove the TTL associated with the fields. The `EX`, `PX`, `EXAT`, `PXAT`, and `PERSIST` options are mutually exclusive. Synopsis ``` HEXPIRE key seconds [NX \| XX \| GT \| LT] FIELDS numfields field [field ...] ``` Set an expiration (TTL or time to live) on one or more fields of a given hash key. You must specify at least one field. Field(s) will automatically be deleted from the hash key when their TTLs expire. Field expirations will only be cleared by commands that delete or overwrite the contents of the hash fields, including `HDEL` and `HSET` commands. This means that all the operations that conceptually alter the value stored at a hash key's field without replacing it with a new one will leave the TTL untouched. You can clear the TTL of a specific field by specifying 0 for the ‘seconds’ argument. Note that calling `HEXPIRE`/`HPEXPIRE` with a time in the past will result in the hash field being deleted immediately. The `HEXPIRE` command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. * `XX` — For each specified field, set expiration only when the field has an existing expiration. * `GT` — For each specified field, set expiration only when the new expiration is greater than current one. * `LT` — For each specified field, set expiration only when the new expiration is less than current one. Synopsis ``` HEXPIREAT key unix-time-seconds [NX \| XX \| GT \| LT] FIELDS numfields field [field ...] ``` `HEXPIREAT` has the same effect and semantics as `HEXPIRE`, but instead of specifying the number of seconds for the TTL (time to live), it takes an absolute Unix timestamp in seconds since Unix epoch. A timestamp in the past will delete the field immediately. The `HEXPIREAT` command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. * `XX` — For each specified field, set expiration only when the field has an existing expiration. * `GT` — For each specified field, set expiration only when the new expiration is greater than current one. * `LT` — For each specified field, set expiration only when the new expiration is less than current one. Synopsis ``` HPEXPIRE key milliseconds [NX \| XX \| GT \| LT] FIELDS numfields field [field ...] ``` This command works like `HEXPIRE`, but the expiration of a field is specified in milliseconds instead of seconds. The `HPEXPIRE` command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. * `XX` — For each specified field, set expiration only when the field has an existing expiration. * `GT` — For each specified field, set expiration only when the new expiration is greater than current one. * `LT` — For each specified field, set expiration only when the new expiration is less than current one. Synopsis ``` HPEXPIREAT key unix-time-milliseconds [NX \| XX \| GT \| LT] FIELDS numfields field [field ...] ``` `HPEXPIREAT` has the same effect and semantics as `HEXPIREAT``,` but the Unix time at which the field will expire is specified in milliseconds since Unix epoch instead of seconds. Synopsis ``` HPERSIST key FIELDS numfields field [field ...] ``` Remove the existing expiration on a hash key's field(s), turning the field(s) from volatile (a field with expiration set) to persistent (a field that will never expire as no TTL (time to live) is associated). Synopsis ``` HSETEX key [NX] seconds field value [field value ...] ``` Similar to `HSET` but adds one or more hash fields that expire after specified number of seconds. By default, this command overwrites the values and expirations of specified fields that exist in the hash. If `NX` option is specified, the field data will not be overwritten. If `key` doesn't exist, a new Hash key is created. The HSETEX command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. Synopsis ``` HTTL key FIELDS numfields field [field ...] ``` Returns the remaining TTL (time to live) of a hash key's field(s) that have a set expiration. This introspection capability allows you to check how many seconds a given hash field will continue to be part of the hash key. ``` HPTTL key FIELDS numfields field [field ...] ``` Like `HTTL`, this command returns the remaining TTL (time to live) of a field that has an expiration set, but in milliseconds instead of seconds. Synopsis ``` HEXPIRETIME key FIELDS numfields field [field ...] ``` Returns the absolute Unix timestamp in seconds since Unix epoch at which the given key's field(s) will expire. Synopsis ``` HPEXPIRETIME key FIELDS numfields field [field ...] ``` `HPEXPIRETIME` has the same semantics as `HEXPIRETIME`, but returns the absolute Unix expiration timestamp in milliseconds since Unix epoch instead of seconds. This PR introduces new notification events to support field-level expiration: \| Event \| Trigger \| \|-------------\|-------------------------------------------\| \| `hexpire` \| Field expiration was set \| \| `hexpired` \| Field was deleted due to expiration \| \| `hpersist` \| Expiration was removed from a field \| \| `del` \| Key was deleted after all fields expired \| Note that we diverge from Redis in the cases we emit hexpired event. For example: given the following usecase: ``` HSET myhash f1 v1 (integer) 0 HGETEX myhash EX 0 FIELDS 1 f1 1) "v1" HTTL myhash FIELDS 1 f1 1) (integer) -2 ``` regarding the keyspace-notifications: Redis reports: ``` 1) "psubscribe" 2) "__keyevent@0__:" 3) (integer) 1 1) "pmessage" 2) "__keyevent@0__:" 3) "__keyevent@0__:hset" 4) "myhash2" 1) "pmessage" 2) "__keyevent@0__:" 3) "__keyevent@0__:hdel" <---------------- note this 4) "myhash2" 1) "pmessage" 2) "__keyevent@0__:" 3) "__keyevent@0__:del" 4) "myhash2" ``` However In our current suggestion, Valkey will emit: ``` 1) "psubscribe" 2) "__keyevent@0__" 3) (integer) 1 1) "pmessage" 2) "__keyevent@0__" 3) "__keyevent@0__:hset" 4) "myhash" 1) "pmessage" 2) "__keyevent@0__" 3) "__keyevent@0__:hexpired" <---------------- note this 4) "myhash" 1) "pmessage" 2) "__keyevent@0__" 3) "__keyevent@0__:del" 4) "myhash" ``` --- - Expiration-aware commands (`HSETEX`, `HGETEX`, etc.) are not propagated as-is. - Instead, Valkey rewrites them into equivalent commands like: - `HDEL` (for expired fields) - `HPEXPIREAT` (for setting absolute expiration) - `HPERSIST` (for removing expiration) This ensures compatibility with replication and AOF while maintaining consistent field-level expiry behavior. --- \| Command Name \| QPS Standard \| QPS HFE \| QPS Diff % \| Latency Standard (ms) \| Latency HFE (ms) \| Latency Diff % \| \|--------------\|-------------\|---------\|------------\|----------------------\|------------------\|----------------\| \| One Large Hash Table \| \| HGET \| 137988.12 \| 138484.97 \| +0.36% \| 0.951 \| 0.949 \| -0.21% \| \| HSET \| 138561.73 \| 137343.77 \| -0.87% \| 0.948 \| 0.956 \| +0.84% \| \| HEXISTS \| 139431.12 \| 138677.02 \| -0.54% \| 0.942 \| 0.946 \| +0.42% \| \| HDEL \| 140114.89 \| 138966.09 \| -0.81% \| 0.938 \| 0.945 \| +0.74% \| \| Many Hash Tables (100 fields) \| \| HGET \| 136798.91 \| 137419.27 \| +0.45% \| 0.959 \| 0.956 \| -0.31% \| \| HEXISTS \| 138946.78 \| 139645.31 \| +0.50% \| 0.946 \| 0.941 \| -0.52% \| \| HGETALL \| 42194.09 \| 42016.80 \| -0.42% \| 0.621 \| 0.625 \| +0.64% \| \| HSET \| 137230.69 \| 137249.53 \| +0.01% \| 0.959 \| 0.958 \| -0.10% \| \| HDEL \| 138985.41 \| 138619.34 \| -0.26% \| 0.948 \| 0.949 \| +0.10% \| \| Many Hash Tables (1000 fields) \| \| HGET \| 135795.77 \| 139256.36 \| +2.54% \| 0.965 \| 0.943 \| -2.27% \| \| HEXISTS \| 138121.55 \| 137950.06 \| -0.12% \| 0.951 \| 0.952 \| +0.10% \| \| HGETALL \| 5885.81 \| 5633.80 \| -4.28% \| 2.690 \| 2.841 \| +5.61% \| \| HSET \| 137005.08 \| 137400.39 \| +0.28% \| 0.959 \| 0.955 \| -0.41% \| \| HDEL \| 138293.45 \| 137381.52 \| -0.65% \| 0.948 \| 0.955 \| +0.73% \| [ ] Consider extending HSETEX with extra arguments: NX/XX so that it is possible to prevent adding/setting/mutating fields of a non-existent hash [ ] Avoid loading expired fields when non-preamble RDB is being loaded on primary. This is an optimization in order to reduce loading unnecessary fields (which are expired). This would also require us to propagate the HDEL to the replicas in case of RDBFLAGS_FEED_REPL. Note that it might have to require some refactoring: 1/ propagate the rdbflags and current time to rdbLoadObject. 2/ consider the case of restore and check_rdb etc... For this reason I would like to avoid this optimizationfor the first drop. Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-08-05 18:28:47 +03:00
cxljs	3c738f08a4	Remove the unused value duplicate API from dict of libvalkey (#2387 ) The commit (`0700c441c6`) removes the unused value duplicate API from dict, and libvalkey's dict needs to remain consistent with it. Signed-off-by: Xiaolong Chen <fukua95@gmail.com>	2025-08-05 14:43:45 +02:00
Katie Holly	b7fe0b6435	Strengthen undefined behavior prevention in checkSignedBitfieldOverflow (#2418 ) This one was found by [afl++](https://github.com/AFLplusplus/AFLplusplus). Executing `bitfield 0 set i64 0 1` triggers UBSan at the `int64_t minincr = min - value;` calculation. To fix the undefined behavior in the `minincr` calculation and strengthen the protection in the `maxincr` calculation, we cast both, the minuend and the subtrahend, to an unsigned int, do the calculation, and then cast the result back into a signed int. Signed-off-by: Fusl <fusl@meo.ws>	2025-08-05 14:26:38 +02:00
Katie Holly	52b5519c5a	Use unsigned long for `maxiterations`, fixing undefined behavior in scan with extremely large count (#2414 ) This one was found by [afl++](https://github.com/AFLplusplus/AFLplusplus). Executing `scan 0 count n` with a count that is within 10% of `LONG_MAX`, `count * 10` would cause `maxiterations` to overflow. This is technically not a real problem since the way `maxiterations` is used would eventually cause it to underflow back to `LONG_MAX` again and continue counting down from there but I figured we may want to fix this regardless for expected behavior correctness? Signed-off-by: Fusl <fusl@meo.ws>	2025-08-05 13:17:11 +03:00
Binbin	fa8659d309	Print the complete log instead of just the crash log in log_crashes (#2428 ) We used to, for example in runtest-cluster, we will only print the crash log since in here, when we match the pattern, we start setting found and then print the log starting from the current line. We can print the complete log in this case and it might help troubleshoot the crash. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-05 17:32:35 +08:00
Nicky-2000	aac3c76c71	Fix prefetchNextBucketEntries so it fetches the correct next 2 buckets (#2394 ) This PR fixes a bug in prefetchNextBucketEntries which is used in the hashtable iterator. The current version of prefetchNextBucketEntries does not correctly prefetch the next two buckets that will be iterated over. This is due next_index being incremented twice (once in prefetchNextBucketEntries and again in getNextBucket). Fix: 1. Add check in prefetchNextBucketEntries to ensure we pass the correct 'next_index' to getNextBucket. 2. The extra increment was removed from getNextBucket. Signed-off-by: Nicky Khorasani <nickykhorasani@google.com>	2025-08-05 11:22:57 +02:00
Binbin	dceb9f3b22	Make ./runtest --dump-logs dump logs on valgrind / sanitizer errors (#2427 ) Sometimes we want to print server logs even if valgrind or sanitizer errors occur, which might help troubleshoot the problem. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-05 16:32:58 +08:00
Sarthak Aggarwal	1d20026b93	Dictionary Implementation for Migrating and Importing slots (#2409 ) Resolves #2184 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-08-05 09:51:56 +02:00
Binbin	a50690f02c	Update swapdb command comment to mention why cluster mode is not allowed (#2391 ) The comment has been outdated since #1671, update it. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-08-05 11:47:00 +08:00

1 2 3 4 5 ...

13280 Commits All Branches Search

13280 Commits

All Branches