The CVE fixes had a formatting and external test issue that wasn't
caught because private branches don't run those CI steps.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
This should be + instread of *, otherwise it does not make any sense.
Otherwise we would have to calculate 20 more bytes for each prefix rax
node in 64 bits build.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
daea05b1e2/src/networking.c (L886-L886)
Fix the issue need to ensure that the subtraction `prev->size -
prev->used` does not underflow. This can be achieved by explicitly
checking that `prev->used` is less than `prev->size` before performing
the subtraction. This approach avoids relying on unsigned arithmetic and
ensures the logic is clear and robust.
The specific changes are:
1. Replace the condition `prev->size - prev->used > 0` with `prev->used
< prev->size`.
2. This change ensures that the logic checks whether there is remaining
space in the buffer without risking underflow.
**References**
[INT02-C. Understand integer conversion
rules](https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules)
[CWE-191](https://cwe.mitre.org/data/definitions/191.html)
---
Signed-off-by: Zeroday BYTE <github@zerodaysec.org>
When calling the command `EVAL error{} 0`, Valkey crashes with the
following stack trace. This patch ensures we never leave the
`err_info.msg` field null when we fail to extract a proper error
message.
---------
Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Warning:
```
postnotifications.c:216:77: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
RedisModule_Log(ctx, "warning", "Got an unexpected subevent '%ld'", subevent);
~~~ ^~~~~~~~
%llu
```
CI:
https://github.com/redis/redis/actions/runs/6937308713/job/18871124342#step:6:115
Add `CFLAGS=-Werror` flag for module CI.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
When modifying port or tls-port through config set, we need to call
clusterUpdateMyselfAnnouncedPorts to update myself's port, otherwise
CLUSTER SLOTS/NODES will be old information from myself's perspective.
In addition, in some places, such as clusterUpdateMyselfAnnouncedPorts
and clusterUpdateMyselfIp, beforeSleep save is added so that the
new ip info can be updated to nodes.conf.
Remove clearCachedClusterSlotsResponse in updateClusterAnnouncedPort
since now we add beforeSleep save in clusterUpdateMyselfAnnouncedPorts,
and it will call clearCachedClusterSlotsResponse.
Signed-off-by: Binbin <binloveplay1314@qq.com>
When a client is blocked by something like `CLIENT PAUSE`, we should not
allow `CLIENT UNBLOCK timeout` to unblock it, since some blocking types
does not has the timeout callback, it will trigger a panic in the core,
people should use `CLIENT UNPAUSE` to unblock it.
Also using `CLIENT UNBLOCK error` is not right, it will return a UNBLOCKED
error to the command, people don't expect a `SET` command to get an error.
So in this commit, in these cases, we will return 0 to `CLIENT UNBLOCK`
to indicate the unblock is fail. The reason is that we assume that if
a command doesn't expect to be timedout, it also doesn't expect to be
unblocked by `CLIENT UNBLOCK`.
The old behavior of the following command will trigger panic in timeout
and get UNBLOCKED error in error. Under the new behavior, client unblock
will get the result of 0.
```
client 1> client pause 100000 write
client 2> set x x
client 1> client unblock 2 timeout
or
client 1> client unblock 2 error
```
Potentially breaking change, previously allowed `CLIENT UNBLOCK error`.
Fixes#2111.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Correctly use a 32 bit integer for accumulating the length of ping
extensions.
The current code may accidentally truncate the length of an
extension that is greater than 64kb and fail the validation check. We
don't currently emit any extensions that are this length, but if we were
to do so in the future we might have issues with older nodes (without
this fix) will silently drop packets from newer nodes. We should
backport this to all versions.
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Resolves https://github.com/valkey-io/valkey/issues/2145
Incorporate the CVE patch that was sent to us by Redis Ltd.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
When creating an outgoing TLS connection, we don't check if `SSL_new()`
returned NULL.
Without this patch, the check was done only for incoming connections in
`connCreateAcceptedTLS()`. This patch moves the check to
`createTLSConnection()` which is used both for incoming and outgoing
connections.
This check makes sure we fail the connection before going any further,
e.g. when `connCreate()` is followed by `connConnect()`, the latter
returns `C_ERR` which is commonly detected where outgoing connections
are established, such where a replica connects to a primary.
```c
int connectWithPrimary(void) {
server.repl_transfer_s = connCreate(connTypeOfReplication());
if (connConnect(server.repl_transfer_s, server.primary_host, server.primary_port, server.bind_source_addr,
server.repl_mptcp, syncWithPrimary) == C_ERR) {
serverLog(LL_WARNING, "Unable to connect to PRIMARY: %s", connGetLastError(server.repl_transfer_s));
connClose(server.repl_transfer_s);
server.repl_transfer_s = NULL;
return C_ERR;
}
```
For a more thorough explanation, see
https://github.com/valkey-io/valkey/issues/1939#issuecomment-2912177877.
Might fix#1939.
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Example fail:
```
In function ‘LFUDecrAndReturn’,
inlined from ‘evictionPoolPopulate’ at evict.c:181:24:
evict.c:321:30: error: ‘o’ may be used uninitialized [-Werror=maybe-uninitialized]
321 | unsigned long counter = o->lru & 255;
| ~^~~~~
evict.c: In function ‘evictionPoolPopulate’:
evict.c:154:15: note: ‘o’ was declared here
154 | robj *o;
| ^
cc1: all warnings being treated as errors
```
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
At some point we added an extra concurrency group in the 7.2 branch,
which prevented the full test run from running.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
When the primary changes the config epoch and then down immediately,
the replica may not update the config epoch in time. Although we will
broadcast the change in cluster (see #1813), there may be a race in
the network or in the code. In this case, the replica will never finish
the failover since other primaries will refuse to vote because the
replica's slot config epoch is old.
We need a way to allow the replica can finish the failover in this case.
When the primary refuses to vote because the replica's config epoch is
less than the dead primary's config epoch, it can send an UPDATE packet
to the replica to inform the replica about the dead primary. The UPDATE
message contains information about the dead primary's config epoch and
owned slots. The failover will time out, but later the replica can try
again with the updated config epoch and it can succeed.
Fixes#2169.
---------
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
NOTE - this is a backport of
https://github.com/valkey-io/valkey/pull/2109
When we refactored the blocking framework we introduced the client
reprocessing infrastructure. In cases the client was blocked on keys, it
will attempt to reprocess the command. One challenge was to keep track
of the command timeout, since we are reprocessing and do not want to
re-register the client with a fresh timeout each time. The solution was
to consider the client reprocessing flag when the client is
blockedOnKeys:
```
if (!c->flag.reprocessing_command) {
/* If the client is re-processing the command, we do not set the timeout
* because we need to retain the client's original timeout. */ c->bstate->timeout = timeout; }
```
However, this introduced a new issue. There are cases where the client
will consecutive blocking of different types for example:
```
CLIENT PAUSE 10000 ALL
BZPOPMAX zset 1
```
would have the client blocked on the zset endlessly if nothing will be
written to it.
**Credits to @uriyage for locating this with his fuzzer testing**
The suggested solution is to only flag the client when it is
specifically unblocked on keys.
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
This commit introduces a mechanism to track client authentication state
with a new `ever_authenticated` flag.
It refactors client authentication handling by adding a `clientSetUser`
function that properly sets both the authenticated and
`ever_authenticated` flags.
The implementation limits output buffer size for clients that have never
been authenticated.
Added tests to verify the output buffer limiting behavior for
unauthenticated clients.
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Fix two problems in fedora CI jobs:
1. Install awk where missing. It's required for building jemalloc.
2. Fix problems with TCL, required for running the tests. Fedora comes
with TCL 9 by default, but the TLS package 'tcltls' isn't built for TCL 9.
Install 'tcl8' and build 'tcltls' from source.
---------
Signed-off-by: vitah <vitahlin@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
This started failing after
ee9a305b5b,
since processing of SHUTDOWN might happen after the CLIENT INFO is
retrieved. Now we use wait_for_condition to make sure CLIENT INFO is
retried in the case that SHUTDOWN is not processed yet.
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
When a tombstone is created after the last_id of a consumer group.
The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.
Close: #1951
Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
When previous timeout isn't zeroed out, shutdown will use the previously
executed blocking commands timeout. Since shutdown is not expected to
have timeouts, this causes a serverPanic.
Without this fix, this scenario leads to a server panic:
1. Call any command that uses blockClient with a timeout (this could be
BLPOP or something that blocks implicitly like CLUSTER SETSLOT)
2. On the same client, call SHUTDOWN with some pending replication data,
which will trigger shutdown to be blocked while we try to catch the
replica up
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Use sampling when calculating the memory usage of consumers as part of
streams in MEMORY USAGE and MEMORY STATS.
Without this fix, all consumers are traversed, which is slow if there
are very many consumers.
Fixes#1824
Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
When the `client pause write` is set and all the keys in the server
are expired keys, executing the `randomkey` command will lead to an
infinite loop.
The reason is that expired keys are not deleted in this case. Limit
the number of tries and return an expired key after the max number
tries in this case.
Closes#1848.
---------
Signed-off-by: li-benson <1260437731@qq.com>
Signed-off-by: youngmore <youngmore1024@outlook.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: youngmore <youngmore1024@outlook.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Issue described in #1135
When call to `connTLSWrite` fails on next attempt `connTLSWritev` should
provide buffer with at least same amount of bytes,
but if first call was with buffer smaller than
`NET_MAX_WRITES_PER_EVENT` then buffer gets larger and exceed
`NET_MAX_WRITES_PER_EVENT`, `connTLSWritev` will not combine `iov` into
one buffer resulting into chance that first element of `iov` is smaller
than last failed call to `connTLSWrite` and causing `SSL routines::bad
length`.
This change force combining `iov`s into one buffer if first element of
`iov` is smaller than last failed write by `connTLSWrite`
---------
Signed-off-by: Marek Zoremba <marek@janeasystems.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: ranshid <ranshid@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: ranshid <ranshid@amazon.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Fix raxRemove crash at memcpy() (line 1181) due to key size exceeds
`RAX_NODE_MAX_SIZE`. Note that this could happen when key size was more
than 512MB if we allow it by increasing the default
`proto-max-bulk-len`. The crash could happen when we recompress the rax
after removing a key due to expiry or DEL while memcpy() merge the key
that exceed 512MB limit. While the counting phase has the size check,
the actual compress logic is missing it which lead to this crash.
---------
Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
When latency-monitor-threshold is set to 0, it means the latency monitor
is disabled, and in VM_LatencyAddSample, we wrote the condition
incorrectly, causing us to record latency when latency was turned off.
This bug was introduced in the very first day, see e3b1d6d, it was merged
in 2019.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Few CI improvements witch will reduce occupation CI queue and eliminate
stale runs.
1. Kill CI jobs on PRs once PR branch gets a new push. This will prevent
situation happened today - a huge job triggered twice in less than an
hour and occupied all **org** (for all repositories) runners queue for
the rest of the day (see pic). This completely blocked valkey-glide
team.
2. Distribute nightly croned jobs on time to prevent them running
together. Keep in mind, cron's TZ is UTC, so midnight tasks incur
developers located in other timezones.
This must be backported to all release branches (`valkey-x.y` and `x.y`)

---------
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
When processing a cluster bus PING extension, there is a memory leak
when adding a new key to the `nodes_black_list` dict. We now make sure
to free the key `sds` if the dict did not take ownership of it.
Backport of #1574 from unstable.
Signed-off-by: Pierre Turin <pieturin@amazon.com>
This trigger-build-release.yml file will be used to automate the release
process as described in #1397. It will trigger the post release task after a new release
---------
Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
Signed-off-by: Nikhil Manglore <nikhilmanglore9@gmail.com>
Co-authored-by: Roshan Khatri <117414976+roshkhatri@users.noreply.github.com>
There is a crash report in #1872:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
69308:M 22 Mar 2025 08:35:10.242 # valkey 7.2.8 crashed by signal: 11, si_code: 1
69308:M 22 Mar 2025 08:35:10.245 # Accessing address: 0x50
69308:M 22 Mar 2025 08:35:10.246 # Crashed running the instruction at: 0x109e82510
------ STACK TRACE ------
EIP:
0 valkey-server 0x0000000109e82510 dbDictAfterReplaceEntry + 576
Backtrace:
0 libsystem_platform.dylib 0x00007ff801d12e1d _sigtramp + 29
1 ??? 0x0000000000000000 0x0 + 0
2 valkey-server 0x0000000109e81782 dictDefragBucket + 354
3 valkey-server 0x0000000109e813ec dictScanDefrag + 732
4 valkey-server 0x0000000109faa520 activeDefragCycle + 592
5 valkey-server 0x0000000109e86b8e serverCron + 4974
6 valkey-server 0x0000000109e7e466 aeProcessEvents + 1094
7 valkey-server 0x0000000109ea29ad main + 24781
8 dyld 0x00007ff80194d2cd start + 1805
```
The reason is that when doing FLUSHDB async, in emptyDbAsync, after we create
the new dict, we did not call slotToKeyInit to init the clusterDictMetadata.
And then in slotToKeyReplaceEntry we will get a wrong pointer and crash.
This issue only occurs under 7.2 since the code structure changed in unstable
branch. Fixes#1872.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Backport - Fix Valkey binary build workflow, version support changes.
(#1429)
This change makes the binary build on the target ubuntu version.
This PR also deprecated ubuntu18 and valkey will not support.
- X86:
- Ubuntu 20
- Ubuntu 22
- Ubuntu 24
- ARM:
- Ubuntu 20
- Ubuntu 22
Removed ARM ubuntu 24 as the action we are using for ARM builds does not
support Ubuntu 24.
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
- Moves `build-config.json` to workflow dir to build old versions with
new configs.
- Enables contributors to test release Wf on private repo by adding
`github.event_name == 'workflow_dispatch' ||`
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>