Commit Graph

13441 Commits

Author SHA1 Message Date
Binbin 8ea7f1330c
Update dual channel replication conf to mention the local buffer is imited by COB (#2824)
After introducing the dual channel replication in #60, we decided in #915
not to add a new configuration item to limit the replica's local replication
buffer, just use "client-output-buffer-limit replica hard" to limit it.

We need to document this behavior and mention that once the limit is reached,
all future data will accumulate in the primary side.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-11-23 23:27:50 +08:00
Binbin 8189fe5c42
Add rdb_transmitted to replstateToString so that we can see it in INFO (#2833)
In dual channel replication, when the rdb channel client finish
the RDB transfer, it will enter REPLICA_STATE_RDB_TRANSMITTED
state. During this time, there will be a brief window that we are
not able to see the connection in the INFO REPLICATION.

In the worst case, we might not see the connection for the
DEFAULT_WAIT_BEFORE_RDB_CLIENT_FREE seconds. I guess there is no
harm to list this state, showing connected_slaves but not showing
the connection is bad when troubleshooting.

Note that this also affects the `valkey-cli --rdb` and `--functions-rdb`
options. Before the client is in the `rdb_transmitted` state and is
released, we will now see it in the info (see the example later).

Before, not showing the replica info
```
role:master
connected_slaves:1
```

After, for dual channel replication:
```
role:master
connected_slaves:1
slave0:ip=xxx,port=xxx,state=rdb_transmitted,offset=0,lag=0,type=rdb-channel
```

After, for valkey-cli --rdb-only and --functions-rdb:
```
role:master
connected_slaves:1
slave0:ip=xxx,port=xxx,state=rdb_transmitted,offset=0,lag=0,type=replica
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-11-21 18:31:31 +08:00
Ricardo Dias 05540af405
Add script function flags in the module API (#2836)
This commit adds script function flags to the module API, which allows
function scripts to specify the function flags programmatically.

When the scripting engine compiles the script code can extract the flags
from the code and set the flags on the compiled function objects.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-11-20 10:23:00 +00:00
Hanxi Zhang ed8856bdfc
Fix cluster slot migration flaky test (#2756)
The original test code only checks:

The original test code only checks:

1. wait_for_cluster_size 4, which calls cluster_size_consistent for every node.
Inside that function, for each node, cluster_size_consistent queries cluster_known_nodes,
which is calculated as (unsigned long long)dictSize(server.cluster->nodes). However, when
a new node is added to the cluster, it is first created in the HANDSHAKE state, and
clusterAddNode adds it to the nodes hash table. Therefore, it is possible for the new
node to still be in HANDSHAKE status (processed asynchronously) even though it appears
that all nodes “know” there are 4 nodes in the cluster.

2. cluster_state for every node, but when a new node is added, server.cluster->state remains FAIL.


Some handshake processes may not have completed yet, which likely causes the flakiness.
To address this, added a --cluster check to ensure that the config state is consistent.

Fixes #2693.

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-11-20 15:07:16 +08:00
aradz44 e19ceb7a6d
deflake "Hash field TTL and active expiry propagates correctly" (#2856)
Fix a little miss in "Hash field TTL and active expiry propagates
correctly through chain replication" test in `hashexpire.tcl`.
The test did not wait for the initial sync of the chained replica and thus  made the test flakey

Signed-off-by: Arad Zilberstein <aradz@amazon.com>
2025-11-19 11:33:55 +02:00
Venkat Pamulapati 3c3a1966ec
Perform data cleanup during RDB load on successful version/signature validation (#2600)
Addresses: https://github.com/valkey-io/valkey/issues/2588

## Overview
Previously we call `emptyData()` during a fullSync before validating the
RDB version is compatible.

This change adds an rdb flag that allows us to flush the database from
within `rdbLoadRioWithLoadingCtx`. THhis provides the option to only
flush the data if the rdb has a valid version and signature. In the case
where we do have an invalid version and signature, we don't emptyData,
so if a full sync fails for that reason a replica can still serve stale
data instead of clients experiencing cache misses.

## Changes
- Added a new flag `RDBFLAGS_EMPTY_DATA` that signals to flush the
database after rdb validation
- Added logic to call `emptyData` in `rdbLoadRioWithLoadingCtx` in
`rdb.c`
- Added logic to not clear data if the RDB validation fails in
`replication.c` using new return type `RDB_INCOMPATIBLE`
- Modified the signature of `rdbLoadRioWithLoadingCtx` to return RDB
success codes and updated all calling sites.

## Testing
Added a tcl test that uses the debug command `reload nosave` to load
from an RDB that has a future version number. This triggers the same
code path that full sync's will use, and verifies that we don't flush
the data until after the validation is complete.

A test already exists that checks that the data is flushed:
https://github.com/valkey-io/valkey/blob/unstable/tests/integration/replication.tcl#L1504

---------

Signed-off-by: Venkat Pamulapati <pamuvenk@amazon.com>
Signed-off-by: Venkat Pamulapati <33398322+ChiliPaneer@users.noreply.github.com>
Co-authored-by: Venkat Pamulapati <pamuvenk@amazon.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
2025-11-18 17:08:10 -08:00
yzc-yzc 57892663be
Fix SCAN consistency test to only test what we guarantee (#2853)
Test the SCAN consistency by alternating SCAN
calls to primary and replica.
We cannot rely on the exact order of the elements and the returned
cursor number.

---------

Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-18 16:06:20 +01:00
chzhoo 33bfac37ba
Optimize zset memory usage by embedding element in skiplist (#2508)
By default, when the number of elements in a zset exceeds 128, the
underlying data structure adopts a skiplist. We can reduce memory usage
by embedding elements into the skiplist nodes. Change the `zskiplistNode`
memory layout as follows:

```
Before
                 +-------------+
         +-----> | element-sds |
         |       +-------------+
         |
 +------------------+-------+------------------+---------+-----+---------+
 | element--pointer | score | backward-pointer | level-0 | ... | level-N |
 +------------------+-------+------------------+---------+-----+---------+



After
 +-------+------------------+---------+-----+---------+-------------+
 + score | backward-pointer | level-0 | ... | level-N | element-sds |
 +-------+------------------+---------+-----+---------+-------------+
```

Before the embedded SDS representation, we include one byte representing
the size of the SDS header, i.e. the offset into the SDS representation
where that actual string starts.

The memory saving is therefore one pointer minus one byte = 7 bytes per
element, regardless of other factors such as element size or number of
elements.

### Benchmark step

I generated the test data using the following lua script && cli command.
And check memory usage using the `info` command.

**lua script**
```
local start_idx = tonumber(ARGV[1])
local end_idx = tonumber(ARGV[2])
local elem_count = tonumber(ARGV[3])

for i = start_idx, end_idx do
    local key = "zset:" .. string.format("%012d", i)
    local members = {}

    for j = 0, elem_count - 1 do
        table.insert(members, j)
        table.insert(members, "member:" .. j)
    end

    redis.call("ZADD", key, unpack(members))
end

return "OK: Created " .. (end_idx - start_idx + 1) .. " zsets"
```

**valkey-cli command**
`valkey-cli EVAL "$(catcreate_zsets.lua)" 0 0 100000
${ZSET_ELEMENT_NUM}`

### Benchmark result
|number of elements in a zset | memory usage before optimization |
memory usage after optimization | change |
|-------|-------|-------|-------|
| 129 | 1047MB | 943MB | -9.9% |
| 256 |  2010MB|  1803MB| -10.3%|
| 512 |  3904MB|3483MB| -10.8%|

---------

Signed-off-by: chzhoo <czawyx@163.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-18 14:27:15 +01:00
Roshan Khatri 616fccb4c5
Fix the failing warmup and duration are cumulative (#2854)
We need to verify total duration was at least 2 seconds, elapsed time
can be quite variable to check upper-bound

Fixes https://github.com/valkey-io/valkey/issues/2843

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2025-11-17 21:26:12 +01:00
Binbin aef56e52f5
Fix timing issue in dual channel replication COB test (#2847)
After #2829, valgrind report a test failure, it seems that the time is
not enough to generate a COB limit in valgrind.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-11-17 17:25:19 +08:00
Binbin a06cf15b20
Allow dual channel full sync in plain failover (#2659)
PSYNC_FULLRESYNC_DUAL_CHANNEL is also a full sync, as the comment says,
we need to allow it. While we have not yet identified the exact edge case
that leads to this line, but during a failover, there should be no difference
between different sync strategies.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-11-15 12:57:27 +08:00
Harkrishn Patro 86db609219
Print node name on a best effort basis if light weight message is received before link stabilization (#2825)
fixes: #2803

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-11-14 14:33:16 -08:00
yzc-yzc b93cfcc332
Attempt to fix flaky SCAN consistency test (#2834)
Related test failures:

https://github.com/valkey-io/valkey/actions/runs/19282092345/job/55135193394

https://github.com/valkey-io/valkey/actions/runs/19200556305/job/54887767594

> *** [err]: scan family consistency with configured hash seed in
tests/integration/scan-family-consistency.tcl
> Expected '5 {k:1 k:25 z k:11 k:18 k:27 k:45 k:7 k:12 k:19 k:29 k:40
k:41 k:43}' to be equal to '5 {k:1 k:25 k:11 z k:18 k:27 k:45 k:7 k:12
k:19 k:29 k:40 k:41 k:43}' (context: type eval line 26 cmd {assert_equal
$primary_cursor_next $replica_cursor_next} proc ::start_server)

The reason is that the RDB part of the primary-replica synchronization
affects the resize policy of the hashtable.
See
b835463a73/src/server.c (L807-L818)

Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>
2025-11-14 10:55:05 +01:00
Binbin 331a852821
Change DEFAULT_WAIT_BEFORE_RDB_CLIENT_FREE from 60s to 5s (#2829)
Consider this scenario:
1. Replica starts loading the RDB using the rdb connection
2. Replica finishes loading the RDB before the replica main connection has
   initiated the PSYNC request
3. Replica stops replicating after receiving replicaof no one
4. Primary can't know that the replica main connection will never ask for
   PSYNC, so it keeps the reference to the replica's replication buffer block
5. Primary has a shutdown-timeout configured and requires to wait for the rdb
   connection to close before it can shut down.

The current 60-second wait time (DEFAULT_WAIT_BEFORE_RDB_CLIENT_FREE) is excessive
and leads to prolonged resource retention in edge cases. Reducing this timeout to
5 seconds would provide adequate time for legitimate PSYNC requests while mitigating
the issue described above.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-11-14 11:32:29 +08:00
Ricardo Dias 8e0b375da4
Fix cluster slot stats for scripts with cross-slot keys (#2835)
This commit fixes the cluster slot stats for scripts executed by
scripting engines when the scripts access cross-slot keys.

This was not a bug in Lua scripting engine, but `VM_Call` was missing a
call to invalidate the script caller client slot to prevent the
accumulation of stats.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-11-13 12:05:52 -08:00
Rain Valentine 01a7657b83
Add --warmup and --duration parameters to valkey-benchmark (#2581)
It's handy to be able to automatically do a warmup and/or test by
duration rather than request count. 🙂

I changed the real-time output a bit - not sure if that's wanted or not.
(Like, would it break people's weird scripts? It'll break my weird
scripts, but I know the price of writing weird fragile scripts.)

```
Prepended "Warming up " when in warmup phase:
Warming up SET: rps=69211.2 (overall: 69747.5) avg_msec=0.425 (overall: 0.393) 3.8 seconds
^^^^^^^^^^

Appended running request counter when based on -n:
SET: rps=70892.0 (overall: 69878.1) avg_msec=0.385 (overall: 0.398) 612482 requests
                                                                    ^^^^^^^^^^^^^^^

Appended running second counter when in warmup or based on --duration:
SET: rps=61508.0 (overall: 61764.2) avg_msec=0.430 (overall: 0.426) 4.8 seconds
                                                                    ^^^^^^^^^^^
```

To be clear, the report at the end remains unchanged.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-13 12:57:46 +01:00
Sarthak Aggarwal b835463a73
Fixes test-freebsd workflow in daily (package lang/tclX) (#2832)
This PR fixes the freebsd daily job that has been failing consistently
for the last days with the error "pkg: No packages available to install
matching 'lang/tclx' have been found in the repositories".

The package name is corrected from `lang/tclx` to `lang/tclX`. The
lowercase version worked previously but appears to have stopped working
in an update of freebsd's pkg tool to 2.4.x.

Example of failed job:

https://github.com/valkey-io/valkey/actions/runs/19282092345/job/55135193499

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
2025-11-13 08:24:37 +01:00
Binbin 7ffe4dcec4
Remove the EXAT and PXAT from some HFE notifications tests (#2831)
As we can see, we expected to get hexpired, but we got hexpire instead,
this means tht the expiration time has expired during execution.
```
*** [err]: HGETEX EXAT keyspace notifications for active expiry in tests/unit/hashexpire.tcl
Expected 'pmessage __keyevent@* __keyevent@9__:hexpired myhash' to match 'pmessage __keyevent@* __keyevent@*:hexpire myhash'
```

We should remove the EXAT and PXAT from these fixtures. And we indeed
have
the dedicated tests that verify that we get 'expired' when EX,PX are set
to 0
or EXAT,PXAT are in the past.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-11-12 14:32:13 +02:00
eifrah-aws 1b0b5c0cfd
New module API to perform prefix‑aware ACL permission check (#2796)
## Description

This change introduces the ability for modules to check ACL permissions
against key prefix. The update adds a dedicated `prefixmatchlen` helper
and extends the core ACL selector logic to support a prefix‑matching
mode.

The new API `ValkeyModule_ACLCheckPrefixPermissions` is registered and
exposed to modules, and a corresponding implementation is added in
`module.c`. Existing internal callers that already perform prefix checks
(e.g., `VM_ACLCheckKeyPermissions`) are updated to use the new flag,
while all legacy paths remain unchanged.

The change also modifies the `aclcheck§ test module that exercises the
new prefix‑checking API, ensuring that read/write operations are
correctly allowed or denied based on the ACL configuration.

Key areas touched:

* ACL logic
* Module API
* Testing

# Motivation

The search module presently makes costly calls to verify index
permissions
(see https://github.com/valkey-io/valkey-search/blob/main/src/acl.cc#L295).
This PR introduces a more efficient approach for that.

---------

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-11-12 10:51:58 +02:00
Daniil Kashapov 3c378862c3
Cluster: Avoid usage of light weight messages to nodes with not ready bidirectional links (#2817)
After network failure nodes that come back to cluster do not always send
and/or receive messages from other nodes in shard, this fix avoids usage
of light weight messages to nodes with not ready bidirectional links.
When a light message comes before any normal message, freeing of cluster
link is happening because on the just established connection link->node
is not assigned yet. It is assigned in getNodeFromLinkAndMsg right after
the condition if (is_light).
So on a cluster with heavy pubsub load a long loop of disconnects is
possible, and we got this.
1. node A establishes cluster link to node B
2. node A propagates PUBLISH to node B
3. node B frees cluster link because of link->node == null as it has not
received non-light messages yet
4. go to 1.
During this loop subscribers of node B does not receive any messages
published to node A.

So here we want to make sure that PING was sent (and link->node was
initialized) on this connection before using lightweight messages.

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
2025-11-11 20:03:24 -08:00
Jim Brunner 047080a622
shared zadd for geoadd (#2828)
GEOADD is allocating/destroying a string object for "ZADD"
each time it is called. Created a shared string instead.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2025-11-11 15:26:53 -08:00
Roshan Khatri b7a3fc988a
Fix Test dual-channel: primary tracking replica backlog refcount (#2827)
This increases the times we check for the logs from 20 to 40.

I found that every `wait-for` check takes about 1.5 to 1.57 milliseconds
so when we were checking 2000 times after 1ms we were actually spending
(2000 * 1) + (2000 *1.75) = 5500ms time waiting.
this can be founds under: for 10 checks we took 35 ms more so thats
around 1.75 ms per check
```
Execution time: 2034 ms (failed)
[err]: 20 100 - Test dual-channel: primary tracking replica backlog refcount - start with empty backlog in tests/integration/dual-channel-replication-flaky.tcl
```

That is why increasing it to 40 100 will check for approx 4,070 ms which
is still less than the original 5500ms but should passes every single
time here:
https://github.com/roshkhatri/valkey/actions/runs/19279424967/job/55126976882

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2025-11-12 00:03:50 +01:00
Arthur Lee 2da21d9def
Allow partial sync after loading AOF with preamble (#2366)
The AOF preamble mechanism replaces the traditional AOF base file with
an RDB snapshot during rewrite operations, which reduces I/O overhead
and improves loading performance.
However, when valkey loads the RDB-formatted preamble from the base AOF
file, it does not process the replication ID (replid) information within
the RDB AUX fields. This omission has two limitations:

* On a primary, it prevents the primary from accepting PSYNC continue
  requests after restarting with a preamble-enabled AOF file.
* On a replica, it prevents the replica from successfully performing
  partial sync requests (avoiding full sync) after restarting with a
  preamble-enabled AOF file.

To resolve this, this commit aligns the AOF preamble handling with the
logic used for standalone RDB files, by storing the replication ID and
replication offset in the AOF preamble and restoring them when loading
the AOF file.

Resolves #2677

---------

Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-11 12:41:27 +01:00
Ricardo Dias 7fbd4cb260
Expose SIMPLE_STRING and ARRAY_NULL reply type to the Module API (#2804)
This commit extends the Module API to expose the `SIMPLE_STRING` and
`ARRAY_NULL` reply types to modules, by passing the new flag `X` to
the `ValkeyModule_Call` function.

By only exposing the new reply types behind a flag we keep the
backward compatibility with existing module implementations and
allow new modules to working with these reply type, which are
required for scripts to process correctly the reply type of commands
called inside scripts.

Before this change, commands like `PING` or `SET`, which return `"OK"`
as a simple string reply, would be returned as string replies to
scripts.
To allow the support of the Lua engine as an external module, we need to
distinguish between simple string and string replies to keep backward
compatibility.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-11-10 15:05:26 +00:00
Ricardo Dias bb8989cfde
Adds new module context flag `VALKEYMODULE_CTX_SCRIPT_EXECUTION` (#2818)
The new module context flag `VALKEYMODULE_CTX_SCRIPT_EXECUTION` denotes
that the module API function is being called in the context of a
scripting engine execution.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-11-10 10:29:40 +00:00
Vadym Khoptynets 65ab07dde7
Leverage zfree_with_size for client reply blocks (#2624)
clientReplyBlock stores the size of the actual allocation in it size
field (minus the header size). This can be used for more effective
deallocation with zfree_with_size.

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
2025-11-09 20:46:27 +02:00
Roshan Khatri 2288657a05
[DEFLAKE] Psync established after rdb load - beyond grace period (#2748)
Resolves: https://github.com/valkey-io/valkey/issues/2695

Increase the wait time for periodic log check for rdb load time. Also,
increases the delay of log check frequency.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Signed-off-by: Roshan Khatri <117414976+roshkhatri@users.noreply.github.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
2025-11-07 15:11:37 -08:00
Harkrishn Patro 7f8c5b6f0c
[flaky-failure-fix] Increase the cluster-node-timeout to have longer delay between failover of each shard (#2793) 2025-11-06 16:14:45 -08:00
yzc-yzc 37d08d3866
Fix flaky DBSIZE test for atomic slot migration (#2805)
Related test failures: 

    *** [err]: Replica importing key containment (slot 0 from node 0 to 2) - DBSIZE command excludes importing keys in tests/unit/cluster/cluster-migrateslots.tcl
    Expected '1' to match '0' (context: type eval line 2 cmd {assert_match "0" [R $node_idx DBSIZE]} proc ::test)

The reason is that we don't wait for the primary-replica synchronization
to complete before starting the next testcase.

---------

Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-06 18:02:27 +01:00
Ricardo Dias 7a1d989696
Add "script" context to ACL log entries (#2798)
In this commit we add a new context for the ACL log entries that is used
to log ACL failures that occur during scripts execution. To maintain
backward compatibility we still maintain the "lua" context for failures
that happen when running Lua scripts. For other scripting engines the
context description will be just "script".

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-11-06 09:46:22 +00:00
hieu2102 cf7a628ada
Add instruction to build Valkey with fast_float (#2810)
The `README.md` file is currently missing a section to build Valkey with
`fast_float`, which was introduced in Valkey 8.1 as an optional
dependency (#1260)

Signed-off-by: hieu2102 <hieund2102@gmail.com>
2025-11-06 09:45:12 +00:00
Sarthak Aggarwal 32844b8b0a
Configurable DB hash seed for SCAN family commands consistency (#2608)
Introduce a new config `hash-seed` which can be set only at startup and
controls the hash seed for the server. This includes all hash tables.
This change makes it so that both primaries and replicas will return the
same results for SCAN/HSCAN/ZSCAN/SSCAN cursors. This is useful in order
to make sure SCAN behaves correctly after a failover.

Resolves #4

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthakaggarwal97@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-11-05 08:45:52 -08:00
xbasel c88c94e326
Reuse dbHasNoKeys() inside dbsHaveNoKeys() to remove duplicate logic (#2800)
Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
2025-11-04 11:28:39 -08:00
Sarthak Aggarwal a49d469f48
Reverts hashHashtableTypeValidate signature (#2799)
Fixes
https://github.com/valkey-io/valkey/actions/runs/19053371057/job/54418411647#step:6:202

Matched hashHashtableTypeValidate to the [generic hashtable callback
signature
](https://github.com/valkey-io/valkey/blob/unstable/src/hashtable.h#L62)and
performed the entry cast internally to preserve expiry checks.

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Jim Brunner <brunnerj@amazon.com>
2025-11-04 20:07:57 +02:00
Jim Brunner a99c636321
Improve header comment and strengthen type checking for entry (#2794)
In `entry.c`, the `entry` is a block of memory with variable contents.
The structure can be difficult to understand. A new header comment more
clearly documents the contents/layout of the `entry`.

Also, in `entry.h`, the `entry` was defined by `typedef void entry`.
This allows blind casting to the `entry` type. It defeats compiler type
checking.

Even though the `entry` has a variable definition, we can define entry
as a legitimate type which allows the compiler to perform type checking.
By performing `typedef struct _entry entry`, now the `entry` is
understood to be a pointer to some type of undefined structure. We can
pass a pointer and the compiler can typecheck the pointer. (Of course we
can't dereference it, because we haven't actually defined the struct.)

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2025-11-03 19:39:36 +02:00
Hanxi Zhang 4d78d36bff
HSETEX: Support NX/XX Flags (#2668)
### Summary
Addresses https://github.com/valkey-io/valkey/issues/2619.  
This PR extends the `HSETEX` command to support optional key-level `NX`
and `XX` flags, allowing operations conditional on the existence of the
hash key.

### Changes
- Updated `hsetex.json` and regenerated `commands.def`.
- Extended argument parsing for NX/XX.
- Added key-level `NX`/`XX` support in `HSETEX`.
- Added tests covering all four NX/XX scenarios.

---------

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
2025-11-03 09:43:48 +02:00
Simon Baatz 6cbc1a31d7
Sentinel: fix regression requiring "+failover" ACL in failover path (#2780)
Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes #2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
2025-10-31 14:46:53 -04:00
harrylin98 189c69e315
Fix: ltrim should not call signalModifiedKey when no elements are removed (#2787)
There’s an issue with the LTRIM command. When LTRIM does not actually
modify the key — for example, with `LTRIM key 0 -1` — the server.dirty
counter is not updated because both ltrim and rtrim values are 0. As a
result, the command is not propagated. However, `signalModifiedKey` is
still called regardless of whether server.dirty changes. This behavior
is unexpected and can cause a mismatch between the source and target
during propagation, since the LTRIM command is not sent.

Signed-off-by: Harry Lin <harrylhl@amazon.com>
Co-authored-by: Harry Lin <harrylhl@amazon.com>
2025-10-31 14:46:36 -04:00
Jacob Murphy 43ee46da33
Authenticate slot migration client on source node to internal user (#2785)
Just setting the authenticated flag actually authenticates to the
default user in this case. The default user may be granted no permission
to use CLUSTER SYNCSLOTS.

Instaed, we now authenticate to the NULL/internal user, which grants
access to all commands. This is the same as what we do for replication:


864de555ce/src/replication.c (L4717)

Add a test for this case as well.

Closes #2783

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2025-10-31 10:57:05 -07:00
Ricardo Dias 84eb459cd4
Add ValkeyModule_ReplyWithCustomErrorFormat to module API (#2791)
Note: these changes are part of the effort to run Lua engine as an
external scripting engine module.

The new function `ValkeyModule_ReplyWithCustomErrorFormat` is being
added to the module API to allow scripting engines to return errors that
originated from running commands within the script code, without
counting twice in the error stats counters.

More details on why this is needed by scripting engines can be read in
an older commit aa856b39f2 messsage.

This PR also adds a new test to ensure the correctness of the newly
added function.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-10-31 16:02:27 +00:00
xbasel f54818cc60
Bug fix: reset io_last_written on c->buf resize to prevent stale pointers (#2786)
Fixes an assert crash in _writeToClient():

    serverAssert(c->io_last_written.data_len == 0 ||
                 c->io_last_written.buf == c->buf);

The issue occurs when clientsCronResizeOutputBuffer() grows or
reallocates c->buf while io_last_written still points to the old buffer
and data_len is non-zero. On the next write, both conditions in the
assertion become false.

Reset io_last_written when resizing the output buffer to prevent stale
pointers and keep state consistent.

fixes https://github.com/valkey-io/valkey/issues/2769

Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
2025-10-30 13:51:01 -07:00
Ricardo Dias 864de555ce
Make ValkeyModule_Call compatible with calling commands from scripting engines (#2782)
Note: these changes are another step towards being able to run Lua
engine as an external scripting engine module.

In this commit we improve the `ValkeyModule_Call` API function code to
match the validations and behavior of the `scriptCall` function,
currently used by the Lua engine to run commands using `server.call` Lua
Valkey API.

The changes made are backward compatible. The new behavior/validations
are only enabled when calling `ValkeyModule_Call` while running a script
using `EVAL` or `FCALL`.

To test these changes, we improved the `HELLO` dummy scripting engine
module to support calling commands, and compare the behavior with
calling the same command from a Lua script.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-10-30 16:19:46 +00:00
Ricardo Dias ea103da5d6
New INFO section for scripting engines (#2738)
This commit adds a new `INFO` section called "Scripting Engines" that
shows the information of the current scripting engines available in the
server.

Here's an output example:

```
> info scriptingengines
# Scripting Engines
engines_count:2
engines_total_used_memory:68608
engines_total_memory_overhead:56
engine_0:name=LUA,module=built-in,abi_version=4,used_memory=68608,memory_overhead=16
engine_1:name=HELLO,module=helloengine,abi_version=4,used_memory=0,memory_overhead=40
```

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-10-30 16:18:01 +00:00
Diego Ciciani e381182297
Add IPv6 availability check to skip tests when unavailable (#2674)
Skip IPv6 tests automatically when IPv6 is not available.

This fixes the problem that tests fail when IPv6 is not available on the
system, which can worry users when they run `make test`.

IPv6 availibility is detected by opening a dummy server socket and
trying to connect to it using a client socket over IPv6.

Fixes #2643

---------

Signed-off-by: diego-ciciani01 <diego.ciciani@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-10-30 11:20:56 +01:00
Sarthak Aggarwal 10281becaf
Adds a summary for tests (#2745)
```
Test Summary: 100 passed, 2 failed

!!! WARNING The following tests failed:
...
````

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
2025-10-29 13:36:37 -07:00
Ken f3b2dee3b7
Add monotonic clock calibration handling if clock speed is not found (#2776)
Currently, monotonic clock initialization relies on the model name field
from /proc/cpuinfo to retrieve the clock speed. However, this is not
always present. In case it is not present, measure the clock tick and
use it instead.

Before fix:
```
monotonic: x86 linux, unable to determine clock rate
```

After fix:
```
21695:M 25 Oct 2025 20:16:23.168 * monotonic clock: X86 TSC @ 2649 ticks/us
```

Fixes #2774

---------

Signed-off-by: Ken Nam <otherscase@gmail.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
2025-10-28 22:20:12 +02:00
Ritoban Dutta 909d082cd0
Reorder valkey.conf: move configs to correct sections (#2737)
- Moved `server-cpulist`, `bio-cpulist`, `aof-rewrite-cpulist`,
  `bgsave-cpulist` configurations to ADVANCED CONFIG.
- Moved `ignore-warnings` configuration to ADVANCED CONFIG.
- Moved `availability-zone` configuration to GENERAL.

These configs were incorrectly placed at the end of the file in the
ACTIVE DEFRAGMENTATION section.

Fixes #2736

---------

Signed-off-by: ritoban23 <ankudutt101@gmail.com>
2025-10-28 10:36:23 +01:00
Sarthak Aggarwal 2c92a6072d
Reverts rdb-key-save-delay value to fix dual channel replication test in macos (#2771)
Resolves #2696 

Set `rdb-key-save-delay` to 200 microseconds to reduce the overall RDB load time.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
2025-10-27 13:08:40 -07:00
Zhijun Liao 861d0794b7
Sentinel: Skip IS-PRIMARY-DOWN-BY-ADDR requests when primary not SDOWN (#2763)
A super tiny change to optimize the function
`sentinelAskPrimaryStateToOtherSentinels` to early return when the
sentinel does not observe the primary as subjectively down.

Signed-off-by: Zhijun <dszhijun@gmail.com>
2025-10-24 16:15:54 -04:00
Zhijun Liao baf2d572f7
Ensure the server executable exists before running tests (#2762)
Previously, running ./runtest without src/valkey-server would hang, now it
throws an error.

Signed-off-by: Zhijun <dszhijun@gmail.com>
2025-10-23 19:29:50 +08:00