valkey

Commit Graph

Author	SHA1	Message	Date
Binbin	51f871ae52	Strictly check CRLF when parsing querybuf (#2872 ) Currently, when parsing querybuf, we are not checking for CRLF, instead we assume the last two characters are CRLF by default, as shown in the following example: ``` telnet 127.0.0.1 6379 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. 3 $3 set $3 key $5 value12 +OK get key $5 value 3 $3 set $3 key $5 value12345 +OK -ERR unknown command '345', with args beginning with: ``` This should actually be considered a protocol error. When a bug occurs in the client-side implementation, we may execute incorrect requests (writing incorrect data is the most serious of these). --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-12-14 09:10:57 +02:00
bandalgomsu	2a9a1731ee	Fix CLUSTER SLOTS crash when called from module timer callback (#2915 ) The CLUSTER SLOTS reply depends on whether the client is connected over IPv6, but for a fake client there is no connection and when this command is called from a module timer callback or other scenario where no real client is involved, there is no connection to check IPv6 support on. This fix handles the missing case by returning the reply for IPv4 connected clients. Fixes #2912. --------- Signed-off-by: Su Ko <rhtn1128@gmail.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Su Ko <rhtn1128@gmail.com> Co-authored-by: KarthikSubbarao <karthikrs2021@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-12-09 14:02:35 +01:00
Binbin	48e0cbbb41	Fix commandlog large-reply when using reply copy avoidance (#2652 ) In #2078, we did not report large reply when copy avoidance is allowed. This results in replies larger than 16384 not being recorded in the commandlog large-reply. This 16384 is controlled by the hidden config min-string-size-avoid-copy-reply. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-12-09 10:20:45 +08:00
Binbin	29d3244937	Make the COB soft limit also use repl-backlog-size when its value is smaller (#2866 ) We have the same settings for the hard limit, and we should apply them to the soft limit as well. When the `repl-backlog-size` value is larger, all replication buffers can be handled by the replication backlog, so there's no need to worry about the client output buffer soft limit in here. Furthermore, when `soft_seconds` is 0, in some ways, the soft limit behaves the same (mostly) as the hard limit. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-12-08 12:21:34 +08:00
Binbin	d16788e52d	Fix discarded-qualifiers warnings reported by fedorarawhide (#2874 ) fedorarawhide CI reports these warnings: ``` networking.c: In function 'afterErrorReply': networking.c:821:30: error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] 821 \| char *spaceloc = memchr(s, ' ', len < 32 ? len : 32); ``` Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-11-27 10:39:42 +08:00
Vadym Khoptynets	65ab07dde7	Leverage zfree_with_size for client reply blocks (#2624 ) clientReplyBlock stores the size of the actual allocation in it size field (minus the header size). This can be used for more effective deallocation with zfree_with_size. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>	2025-11-09 20:46:27 +02:00
Jacob Murphy	28e5dcce2c	Fix crash that occurs sometimes when aborting a slot migration while child snapshot is active (#2721 ) The race condition causes the client to be used and subsequently double freed by the slot migration read pipe handler. The order of events is: 1. We kill the slot migration child process during CANCELSLOTMIGRATIONS 1. We then free the associated client to the target node 1. Although we kill the child process, it is not guaranteed that the pipe will be empty from child to parent 1. If the pipe is not empty, we later will read that out in the slotMigrationPipeReadHandler 1. In the pipe read handler, we attempt to write to the connection. If writing to the connection fails, we will attempt to free the client 1. However, the client was already freed, so this a double free Notably, the slot migration being aborted doesn't need to be triggered by `CANCELSLOTMIGRATIONS`, it can be any failure. To solve this, we simply: 1. Set the slot migration pipe connection to NULL whenever it is unlinked 2. Bail out early in slot migration pipe read handler if the connection is NULL I also consolidate the killSlotMigrationChild call to one code path, which is executed on client unlink. Before, there were two code paths that would do this twice (once on slot migration job finish, and once on client unlink). Sending the signal twice is fine, but inefficient. Also, add a test to cancel during the slot migration snapshot to make sure this case is covered (we only caught it during the module test). --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-10-13 08:18:13 -07:00
Harkrishn Patro	155b0bb821	Fix memory leak with CLIENT LIST/KILL duplicate filters (#2362 ) With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client\|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>	2025-10-08 16:04:47 -07:00
Jacob Murphy	d5bb986fd5	Add slot migration client flags and module context flags (#2639 ) New client flags in reported by CLIENT INFO and CLIENT LIST: * `i` for atomic slot migration importing client * `E` for atomic slot migration exporting client New flags in return value of `ValkeyModule_GetContextFlags`: * `VALKEYMODULE_CTX_FLAGS_SLOT_IMPORT_CLIENT`: Indicate the that client attached to this context is the slot import client. * `VALKEYMODULE_CTX_FLAGS_SLOT_EXPORT_CLIENT`: Indicate the that client attached to this context is the slot export client. Users could use this to monitor the underlying client info of the slot migration, and more clearly understand why they see extra clients during the migration. Modules can use these to detect keyspace notifications on import clients. I am also adding export flags for symmetry, although there should not be keyspace notifications. But they would potentially be visible in command filters or in server events triggered by that client. --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2025-10-02 21:12:34 +02:00
uriyage	80bbbcf6fe	Fix memory leak in deferred reply buffer (#2615 ) Set free method for deferred_reply list to properly clean up ClientReplyValue objects when the list is destroyed Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2025-09-17 11:14:44 +03:00
Sarthak Aggarwal	9b11d3d9ed	Evict client only when limit is breached (#2596 ) I believe we should evict the clients when the client eviction limit is breached instead of _at_ the breach. I came across this function in the failed [daily test](https://github.com/valkey-io/valkey/actions/runs/17521272806/job/49765359298#step:6:7770), which could possibly be related. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-09-11 17:19:05 +03:00
Viktor Söderqvist	06cefc181a	Attempt to fix sub-replica getting out of sync (#2548 ) Try to fix the failures seen for `test "PSYNC2 #3899 regression: verify consistency"`. This change resets the query buffer parser state in `replicationCachePrimary()` which is called when the connection to the primary is lost. Before #2092, this was done by `resetClient()`. The solution was inspired by the discussion about the regression mentioned (discussion from 2017) and the related commits from that time: `6bc6bd4c38`, `469d6e2b37`, `c180bc7d98`. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-08-27 07:24:00 +02:00
Ted Lyngmo	9d682bad0b	bio.c: Organize all worker data in a struct (#2530 ) This gets rid of the need to use a void* as a carrier for the worker number. Instead a pointer to the relevant worker data is passed to the started thread. Fixes #2529 --------- Signed-off-by: Ted Lyngmo <ted@lyncon.se>	2025-08-25 12:24:34 +02:00
Viktor Söderqvist	09fb436cf0	Optimize pipelining by parsing and prefetching multiple commands (#2092 ) Instead of parsing only one command per client before executing it, parse multiple commands from the query buffer and batch-prefetch the keys accessed by the commands in the queue before executing them. This is an optimization for pipelined commands, both with and without I/O threads. The optimization is currently disabled for the replication stream, due to failures (probably caused by how the replication offset is calculated based on the query buffer offset). * When parsing commands from the input buffer, multiple commands are parsed and stored in a command queue per client. * In single-threaded mode (I/O threads off) keys are batch-prefetched before the commands in the queue are executed. Multi-key commands like MGET, MSET and DEL benefit from this even if pipelining is not used. * Prefetching when I/O threads are used does prefetching for multiple clients in parallel. This code takes client command queues into account, improving prefetching when pipelining is used. The batch size is controlled by the existing config `prefetch-batch-max-size` (default 16), which so far only was used together with I/O threads. The config is moved to a different section in `valkey.conf`. * When I/O threads are used and the maximum number of keys are prefetched, a client's command is executed, then the next one in the queue, etc. If there are more commands in the queue for which the keys have not been prefetched (say the client sends 16 pipelined MGET with 16 keys in each) keys for the next few commands in the queue are prefetched before the commands is executed if prefetching has not been done for the next command. (This utilizes the code path used in single-threaded mode.) Code improvements: * Decoupling of command parser state and command execution state: * The variables reqtype, multibulklen and bulklen refer to the current position in the query buffer. These are no longer reset in resetClient (which runs after each command being executed). Instead, they are reset in the parser code after each completely parsed command. * The command parser code is partially decoupled from the client struct. The query buffer is still one per client, but the resulting argument vector is stored in caller-defined variables. Fixes #2044 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-08-22 16:02:27 +02:00
yzc-yzc	e7bb2354da	Don't call SSL_write() with num=0 (#2490 ) `aefed3d363/src/networking.c (L2279-L2293)` From above code, we can see that `c->repl_data->ref_block_pos` could be equal to `o->used`. When `o->used == o->size`, we may call SSL_write() with num=0 which does not comply with the openSSL specification. (ref: https://docs.openssl.org/master/man3/SSL_write/#warnings) What's worse is that it's still the case after the reconnection. See `aefed3d363/src/replication.c (L756-L769)`. So in this case the replica will keep reconnecting again and again until it doesn't meet the requirements for partial synchronization. Resolves #2119 --------- Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>	2025-08-15 20:37:15 +02:00
Jacob Murphy	d7993b78d8	Introduce atomic slot migration (#1949 ) Introduces a new family of commands for migrating slots via replication. The procedure is driven by the source node which pushes an AOF formatted snapshot of the slots to the target, followed by a replication stream of changes on that slot (a la manual failover). This solution is an adaptation of the solution provided by @enjoy-binbin, combined with the solution I previously posted at #1591, modified to meet the designs we had outlined in #23. ## New commands * `CLUSTER MIGRATESLOTS SLOTSRANGE start end [start end]... NODE node-id`: Begin sending the slot via replication to the target. Multiple targets can be specified by repeating `SLOTSRANGE ... NODE ...` * `CLUSTER CANCELMIGRATION ALL`: Cancel all slot migrations * `CLUSTER GETSLOTMIGRATIONS`: See a recent log of migrations This PR only implements "one shot" semantics with an asynchronous model. Later, "two phase" (e.g. slot level replicate/failover commands) can be added with the same core. ## Slot migration jobs Introduces the concept of a slot migration job. While active, a job tracks a connection created by the source to the target over which the contents of the slots are sent. This connection is used for control messages as well as replicated slot data. Each job is given a 40 character random name to help uniquely identify it. All jobs, including those that finished recently, can be observed using the `CLUSTER GETSLOTMIGRATIONS` command. ## Replication * Since the snapshot uses AOF, the snapshot can be replayed verbatim to any replicas of the target node. * We use the same proxying mechanism used for chaining replication to copy the content sent by the source node directly to the replica nodes. ## `CLUSTER SYNCSLOTS` To coordinate the state machine transitions across the two nodes, a new command is added, `CLUSTER SYNCSLOTS`, that performs this control flow. Each end of the slot migration connection is expected to install a read handler in order to handle `CLUSTER SYNCSLOTS` commands: * `ESTABLISH`: Begins a slot migration. Provides slot migration information to the target and authorizes the connection to write to unowned slots. * `SNAPSHOT-EOF`: appended to the end of the snapshot to signal that the snapshot is done being written to the target. * `PAUSE`: informs the source node to pause whenever it gets the opportunity * `PAUSED`: added to the end of the client output buffer when the pause is performed. The pause is only performed after the buffer shrinks below a configurable size * `REQUEST-FAILOVER`: request the source to either grant or deny a failover for the slot migration. The grant is only granted if the target is still paused. Once a failover is granted, the paused is refreshed for a short duration * `FAILOVER-GRANTED`: sent to the target to inform that REQUEST-FAILOVER is granted * `ACK`: heartbeat command used to ensure liveness ## Interactions with other commands * FLUSHDB on the source node (which flushes the migrating slot) will result in the source dropping the connection, which will flush the slot on the target and reset the state machine back to the beginning. The subsequent retry should very quickly succeed (it is now empty) * FLUSHDB on the target will fail the slot migration. We can iterate with better handling, but for now it is expected that the operator would retry. * Genearlly, FLUSHDB is expected to be executed cluster wide, so preserving partially migrated slots doesn't make much sense * SCAN and KEYS are filtered to avoid exposing importing slot data ## Error handling * For any transient connection drops, the migration will be failed and require the user to retry. * If there is an OOM while reading from the import connection, we will fail the import, which will drop the importing slot data * If there is a client output buffer limit reached on the source node, it will drop the connection, which will cause the migration to fail * If at any point the export loses ownership or either node is failed over, a callback will be triggered on both ends of the migration to fail the import. The import will not reattempt with a new owner * The two ends of the migration are routinely pinging each other with SYNCSLOTS ACK messages. If at any point there is no interaction on the connection for longer than `repl-timeout`, the connection will be dropped, resulting in migration failure * If a failover happens, we will drop keys in all unowned slots. The migration does not persist through failovers and would need to be retried on the new source/target. ## State machine ``` Target/Importing Node State Machine ───────────────────────────────────────────────────────────── ┌────────────────────┐ │SLOT_IMPORT_WAIT_ACK┼──────┐ └──────────┬─────────┘ │ ACK│ │ ┌──────────────▼─────────────┐ │ │SLOT_IMPORT_RECEIVE_SNAPSHOT┼──┤ └──────────────┬─────────────┘ │ SNAPSHOT-EOF│ │ ┌───────────────▼──────────────┐ │ │SLOT_IMPORT_WAITING_FOR_PAUSED┼─┤ └───────────────┬──────────────┘ │ PAUSED│ │ ┌───────────────▼──────────────┐ │ Error Conditions: │SLOT_IMPORT_FAILOVER_REQUESTED┼─┤ 1. OOM └───────────────┬──────────────┘ │ 2. Slot Ownership Change FAILOVER-GRANTED│ │ 3. Demotion to replica ┌──────────────▼─────────────┐ │ 4. FLUSHDB │SLOT_IMPORT_FAILOVER_GRANTED┼──┤ 5. Connection Lost └──────────────┬─────────────┘ │ 6. No ACK from source (timeout) Takeover Performed│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS┼────┤ └──────────────────────────┘ │ │ ┌─────────────────────────────────────▼─┐ │SLOT_IMPORT_FINISHED_WAITING_TO_CLEANUP│ └────────────────────┬──────────────────┘ Unowned Slots Cleaned Up│ ┌─────────────▼───────────┐ │SLOT_MIGRATION_JOB_FAILED│ └─────────────────────────┘ Source/Exporting Node State Machine ───────────────────────────────────────────────────────────── ┌──────────────────────┐ │SLOT_EXPORT_CONNECTING├─────────┐ └───────────┬──────────┘ │ Connected│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_AUTHENTICATING┼───────┤ └─────────────┬────────────┘ │ Authenticated│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_SEND_ESTABLISH┼───────┤ └─────────────┬────────────┘ │ ESTABLISH command written│ │ ┌─────────────────────▼─────────────┐ │ │SLOT_EXPORT_READ_ESTABLISH_RESPONSE┼──────┤ └─────────────────────┬─────────────┘ │ Full response read (+OK)│ │ ┌────────────────▼──────────────┐ │ Error Conditions: │SLOT_EXPORT_WAITING_TO_SNAPSHOT┼─────┤ 1. User sends CANCELMIGRATION └────────────────┬──────────────┘ │ 2. Slot ownership change No other child process│ │ 3. Demotion to replica ┌────────────▼───────────┐ │ 4. FLUSHDB │SLOT_EXPORT_SNAPSHOTTING┼────────┤ 5. Connection Lost └────────────┬───────────┘ │ 6. AUTH failed Snapshot done│ │ 7. ERR from ESTABLISH command ┌───────────▼─────────┐ │ 8. Unpaused before failover completed │SLOT_EXPORT_STREAMING┼──────────┤ 9. Snapshot failed (e.g. Child OOM) └───────────┬─────────┘ │ 10. No ack from target (timeout) PAUSE│ │ 11. Client output buffer overrun ┌──────────────▼─────────────┐ │ │SLOT_EXPORT_WAITING_TO_PAUSE┼──────┤ └──────────────┬─────────────┘ │ Buffer drained│ │ ┌──────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_PAUSED┼───────┤ └──────────────┬────────────┘ │ Failover request granted│ │ ┌───────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_GRANTED┼───────┤ └───────────────┬────────────┘ │ New topology received│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS│ │ └──────────────────────────┘ │ │ ┌─────────────────────────┐ │ │SLOT_MIGRATION_JOB_FAILED│◄────────┤ └─────────────────────────┘ │ │ ┌────────────────────────────┐ │ │SLOT_MIGRATION_JOB_CANCELLED│◄──────┘ └────────────────────────────┘ ``` Co-authored-by: Binbin <binloveplay1314@qq.com> --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-08-11 18:02:37 -07:00
zhaozhao.zz	1fbf5fb1fe	support negative filtering for client command filters (#2378 ) introduce negative filters for `CLIENT LIST` and `CLIENT KILL` commands: 1. `NOT-ID`: Excludes clients in the IDs set 2. `NOT-TYPE`: Excludes clients of the specified type 3. `NOT-ADDR`: Excludes clients of the specified address and port 4. `NOT-LADDR`: Excludes clients connected to the specified local address and port 5. `NOT-USER`: Excludes clients of the specified user 6. `NOT-FLAGS`: Excludes clients with the specified flag string 7. `NOT-NAME`: Excludes clients with the specified name 8. `NOT-LIB-NAME`: Excludes clients using the specified library name 9. `NOT-LIB-VER`: Excludes clients with the specified library version 10. `NOT-DB`: Excludes clients with the specified database ID 11. `NOT-CAPA`: Excludes clients with the specified capabilities 12. `NOT-IP`: Excludes clients with the specified IP address close #1936 and fix the matching algorithm for flag 'N'. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-08-01 10:31:24 +08:00
Omkar Mestry	a739531106	Fixing open comments on #1920 PR (#2364 )	2025-07-22 22:44:01 -07:00
Binbin	e5956d0c23	Fix client tracking memory overhead calculation (#2360 ) This should be + instread of *, otherwise it does not make any sense. Otherwise we would have to calculate 20 more bytes for each prefix rax node in 64 bits build. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-07-22 17:19:02 +08:00
Omkar Mestry	2cd62fa809	Add support for automatic client authentication via TLS certificate fields (#1920 ) This PR implements support for automatic client authentication based on a field in the client's TLS certificate. API Changes: * New configuration directive `tls-auth-clients-user`, values `CN` \| `off`, default `off`. CN means take username from the CommonName field in the client's certificate. * New INFO field `acl_access_denied_tls_cert` under the `Stats` section, indicating the number of failed authentications using this feature, i.e. client certificates for which no matching username was found. * New reason "tcl-cert" in the ACL log, logged when a client certificate's CommonName fails to match any existing username. Closes #1866 --------- Signed-off-by: Omkar Mestry <om.m.mestry@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Omkar Mestry <omanges@google.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-07-12 10:58:25 +02:00
Binbin	3a01c7f7e3	Fix unixsocket too long cause addr / laddr being truncated in CLIENT INFO / LIST and logging (#2216 ) When displaying client addr / laddr using catClientInfoString for example, if the client is connected via unix domain socket and the unixsocket exceeds NET_ADDR_STR_LEN, the output will be truncated because NET_ADDR_STR_LEN is not long enough. Currently NET_ADDR_STR_LEN is 46+32 that is 78, and the maximum length of a UNIX domain socket path is typically 108 bytes on Linux and 104 bytes on macOS, both number including null terminator. In this fix, we use CONN_ADDR_STR_LEN instead which is 128, that should be long enough for most cases. It affects the output of CLIENT INFO / CLIENT LIST commands, as well as the printing of some logs. Other cleanup: 1. In acceptCommonHandler, when calling connFormatAddr on a unix socket client, the connFormatAddr function will always return /unixsocket since in anetFdToString we hardcoded the string. We changed it to the same display. 2. We changed the return value of connSocketAddr from returning C_OK/C_ERR to returning 0 and -1, which is consistent with the return values of other types. One change worth mentioning is that in connSocketAddr, we used to call anetFdToString which would return `/unixsocket` hardcoded in this case. Now we will use server.unixsocket, the format is `path:0`. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-06-20 10:33:38 +08:00
charsyam	b5e012a108	Fix invalid functionname processMultibulkBuffer typo in comments. (#2097 ) Fixed processMultiBulkBuffer to processMultibulkBuffer. processMultibulkBuffer is real function name but there is a typo to write as processMultiBulkBuffer, B should be lowercase. Signed-off-by: charsyam <charsyam@naver.com>	2025-06-10 17:07:37 +08:00
Binbin	3bc40be6cd	CLIENT UNBLOCK should't be able to unpause paused clients (#2117 ) When a client is blocked by something like `CLIENT PAUSE`, we should not allow `CLIENT UNBLOCK timeout` to unblock it, since some blocking types does not has the timeout callback, it will trigger a panic in the core, people should use `CLIENT UNPAUSE` to unblock it. Also using `CLIENT UNBLOCK error` is not right, it will return a UNBLOCKED error to the command, people don't expect a `SET` command to get an error. So in this commit, in these cases, we will return 0 to `CLIENT UNBLOCK` to indicate the unblock is fail. The reason is that we assume that if a command doesn't expect to be timedout, it also doesn't expect to be unblocked by `CLIENT UNBLOCK`. The old behavior of the following command will trigger panic in timeout and get UNBLOCKED error in error. Under the new behavior, client unblock will get the result of 0. ``` client 1> client pause 100000 write client 2> set x x client 1> client unblock 2 timeout or client 1> client unblock 2 error ``` Potentially breaking change, previously allowed `CLIENT UNBLOCK error`. Fixes #2111. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-06-10 10:29:25 +08:00
skyfirelee	1941d28acd	[NEW] Introduce lttng based tracing (#2070 ) ## Introduce In a production environment, it's quite challenging to figure out why a Valkey is under high load. Right now, tools like INFO or slowlog can offer some clues. But if the Valkey can't respond, we might not get any information at all. Usually, we have to rely on tools like `strace` or `perf` to find the root cause. If we set up trace points in advance during the project development, we can quickly pinpoint performance issues. In this current PR, support has been added for all latency sampling points. Also, information reporting for command execution has been added. At the same time, it supports dynamically turning on or off the information reporting as required. The trace feature is implemented based on LTTng, and this capability is supported in projects like QEMU, Ceph. ## How to use Building Valkey with LTTng support: ``` USE_LTTNG=yes make ``` Open event report： ``` config set trace-events "sys server db cluster aof commands" ``` Events are classified as follows： - sys (System-level operations) - server (Server core logic) - db (Database core operations) - cluster (Cluster configuration operations) - aof (AOF persistence operations) - commands（Command execution information） ## How to trace Enable lttng trace events dynamically: ``` ~# lttng destroy valkey ~# lttng create valkey ~# lttng enable-event -u valkey:* ~# lttng track -u -p `pidof valkey-server` ~# lttng start ~# lttng stop ~# lttng view ``` Examples (a client run 'SET', another run 'keys'): ``` [15:30:19.334463706] (+0.000001243) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 0 } [15:30:19.334465183] (+0.000001477) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 1 } [15:30:19.334466516] (+0.000001333) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 0 } [15:30:19.334467738] (+0.000001222) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 0 } [15:30:19.334469105] (+0.000001367) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 1 } [15:30:19.334470327] (+0.000001222) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 0 } [15:30:19.369348485] (+0.034878158) libai valkey:command_call: { cpu_id = 15 }, { name = "keys", duration = 34874 } [15:30:19.369698322] (+0.000349837) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 4 } [15:30:19.369702327] (+0.000004005) libai valkey:command_call: { cpu_id = 15 }, { name = "set", duration = 2 } ``` Then we can use another script to analyze topN slow commands and other system level events. About performance overhead (valkey-benchmark -t get -n 1000000 --threads 4): 1> no lttng builtin: 285632.69 requests per second 2> lttng builtin, no trace: 285551.09 requests per second (almost 0 overhead) 3> lttng builtin, trace commands: 266595.59 requests per second (about ~6.6 overhead) Generally valkey-server would not run in full utilization, the overhead is acceptable. ## Problem analysis Add prot and conn field into trace command Run benchmark tool: ``` GET: rps=227428.0 (overall: 222756.2) avg_msec=0.114 (overall: 0.117) GET: rps=225248.0 (overall: 223005.2) avg_msec=0.115 (overall: 0.117) GET: rps=167474.1 (overall: 217942.2) avg_msec=0.193 (overall: 0.122) --> performance drop GET: rps=220192.0 (overall: 218129.5) avg_msec=0.118 (overall: 0.122) GET: rps=222868.0 (overall: 218493.7) avg_msec=0.117 (overall: 0.121) ``` Run another 'keys *' command in another connection, lead benchmark performance drop. At the same time, lttng traces events: ``` [21:16:30.420997167] (+0.000004064) zhenwei valkey:command_call: { cpu_id = 6 }, { prot = "tcp", conn = "127.0.0.1:6379-127.0.0.1:54668", name = "get", duration = 1 } [21:16:30.421001262] (+0.000004095) zhenwei valkey:command_call: { cpu_id = 6 }, { prot = "tcp", conn = "127.0.0.1:6379-127.0.0.1:54782", name = "get", duration = 1 } [21:16:30.485562459] (+0.064561197) zhenwei valkey:command_call: { cpu_id = 6 }, { prot = "tcp", conn = "127.0.0.1:6379-127.0.0.1:54386", name = "keys", duration = 64551 } --> root cause [21:16:30.485583101] (+0.000020642) zhenwei valkey:command_call: { cpu_id = 6 }, { prot = "tcp", conn = "127.0.0.1:6379-127.0.0.1:54522", name = "get", duration = 1 } [21:16:30.485763891] (+0.000180790) zhenwei valkey:command_call: { cpu_id = 6 }, { prot = "tcp", conn = "127.0.0.1:6379-127.0.0.1:54542", name = "get", duration = 1 } [21:16:30.485766451] (+0.000002560) zhenwei valkey:command_call: { cpu_id = 6 }, { prot = "tcp", conn = "127.0.0.1:6379-127.0.0.1:54438", name = "get", duration = 1 } ``` From this change, we can see that connection 127.0.0.1:6379-127.0.0.1:54386 affects other connections. --------- Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: artikell <739609084@qq.com> Signed-off-by: skyfirelee <739609084@qq.com> Co-authored-by: zhenwei pi <pizhenwei@bytedance.com>	2025-06-08 14:39:57 -07:00
xbasel	838ba44cd6	Reply Copy Avoidance (#2078 ) ### Overview This PR introduces the ability to avoid copying the content of string object into replies (i.e. bulk string replies) and to allow I/O threads refer directly to obj->ptr in writev iov. ### Key Changes * Added capability to reply construction allowing to interleave regular replies with copy avoid replies in client reply buffers * Extended write-to-client handlers to support copy avoid replies * Added copy avoidance of string bulk replies when copy avoidance indicated by I/O threads * Minor changes in cluster slots stats in order to support `network-bytes-out` for copy avoid replies * Copy avoidance is beneficial for performance despite object size only starting certain number of threads. So it will be enabled only starting certain number of threads. Internal configuration ``min-io-threads-copy-avoid`` introduced to manage this number of threads Note: When copy avoidance disabled content and handling of client reply buffers remains as before this PR ### Implementation Details #### ``client`` and ``clientReplyBlock`` structs: 1. ``buf_encoded`` flag has been added to ``clientReplyBlock`` struct and to ``client`` struct for static ``c->buf`` to indicate if reply buffer is in copy avoidance mode (i.e. include headers and payloads) or not (i.e. plain replies only). 2. ``io_last_written_buf``, ``io_last_written_bufpos``, ``io_last_written_data_len`` fields added ``client`` struct to to keep track of write state between ``writevToClient`` invocations #### Reply construction: 1. Original ```_addReplyToBuffer``` and ```_addReplyProtoToList``` have been renamed to ```_addReplyPayloadToBuffer``` and ```_addReplyPayloadToList``` and extended to support different types of payloads - regular replies and copy avoid replies. 3. New ```_addReplyToBuffer``` and ```_addReplyProtoToList``` calls now ```_addReplyPayloadToBuffer``` and ```_addReplyPayloadToList``` and used for adding regular replies to client reply buffers. 4. Newly introduced ```_addBulkOffloadToBuffer``` and ```_addBulkOffloadToList``` are used for adding copy avoid replies to client reply buffers. #### Write-to-client infrastructure: The ```writevToClient``` and ```_postWriteToClient``` has been significantly changed to support copy avoidance capability. #### Debug configuration: 1. ``min-io-threads-avoid-copy-reply`` - Minimum number of IO threads for copy avoidance 2. ``min-string-size-avoid-copy-reply`` - Minimum bulk string size for copy avoidance when IO threads disabled 3. ``min-string-size-avoid-copy-reply-threaded`` - Minimum bulk string size for copy avoidance when IO threads enabled ### Testing 1. Existing unit and integration tests passed. Copy avoidance enabled on tests with ``--io-threads`` flag 2. Added unit tests for copy avoidance functionality ### Performance Tests Note: pay attention `io-threads 1` config means only main thread with no additional io-threads, `io-threads 2` means main thread plus 1 I/O thread, `io-threads 9` means main thread plus 8 I/O threads. #### 512 byte object size Tests are conducted on memory optimized instances using: * 3,000,000 keys * 512 bytes object size * 1000 clients \|io-threads (including main thread) \|Plain Reply \|Copy Avoidance \| \|--- \|--- \|--- \| \|7 \|1,160,000 \|1,160,000 \| \|8 \|1,150,000 \|1,280,000 \| \|9 \|1,150,000 \|1,330,000 \| \|10 \|N/A \|1,380,000 \| \|11 \|N/A \|1,420,000 \| #### Various object size, small number of threads \|iothreads \|Data size \|Keys \|Clients \|Instance type \|Unstable branch \|Copy Avoidance On \| \|--- \|--- \|--- \|--- \|--- \|--- \|--- \| \|1 \|512 byte \|3,000,000 \|1,000 \|memory optimized \|195,000 \|195,000 \| \|2 \|512 byte \|3,000,000 \|1,000 \|memory optimized \|245,000 \|245,000 \| \|3 \|512 byte \|3,000,000 \|1,000 \|memory optimized \|455,000 \|459,000 \| \|4 \|512 byte \|3,000,000 \|1,000 \|memory optimized \|685,000 \|685,000 \| \| \| \| \| \| \| \| \| \|1 \|1K \|3,000,000 \|1,000 \|memory optimized \|185,000 \|185,000 \| \|2 \|1K \|3,000,000 \|1,000 \|memory optimized \|235,000 \|235,000 \| \|3 \|1K \|3,000,000 \|1,000 \|memory optimized \|450,000 \|450,000 \| \| \| \| \| \| \| \| \| \|1 \|4K \|1,000,000 \|1,000 \|network optimized \|182,000 \|187,000 \| \|2 \|4K \|1,000,000 \|1,000 \|network optimized \|240,000 \|238,000 \| \| \| \| \| \| \| \| \| \|1 \|16K \|1,000,000 \|500 \|network optimized \|100,000 \|120,000 \| \|2 \|16K \|1,000,000 \|500 \|network optimized \|140,000 \|140,000 \| \|3 \|16K \|1,000,000 \|500 \|network optimized \|275,000 \|260,000 \| \| \| \| \| \| \| \| \| \|1 \|32K \|500,000 \|500 \|network optimized \|57,000 \|90,000 \| \|2 \|32K \|500,000 \|500 \|network optimized \|110,000 \|110,000 \| \|3 \|32K \|500,000 \|500 \|network optimized \|215,000 \|215,000 \| \| \| \| \| \| \| \| \| \|1 \|64K \|100,000 \|500 \|network optimized \|30,000 \|57,000 \| \|2 \|64K \|100,000 \|500 \|network optimized \|69,000 \|61,000 \| \|3 \|64K \|100,000 \|500 \|network optimized \|120,000 \|120,000 \| \|4 \|64K \|100,000 \|500 \|network optimized \|115,000 - 175,000 \|175,000 \| \|5 \|64K \|100,000 \|500 \|network optimized \|115,000 - 165,000 \|230,000 \| --------- Signed-off-by: Alexander Shabanov <alexander.shabanov@gmail.com> Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Alexander Shabanov <alexander.shabanov@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-06-07 20:49:51 -07:00
Viktor Söderqvist	3789b29e92	Offload slot calculation and cross-slot detection to I/O threads (#2165 ) Refactors `getNodeByQuery()` to be able to move the CRC16 slot calculations in I/O threads and skip many checks if slot migrations are not ongoing. The slot calculation for a command is moved out of `getNodeByQuery()` into a new function `clusterSlotByCommand()` which is safe to call from I/O threads. For MULTI-EXEC transactions, the slot is stored per command in the multi-state, to be able to detect cross-slot transactions on EXEC without computing the slots again. Additionally, cross-slot detection and arity check is offloaded to I/O threads. To unify the code paths for commands parsed by I/O threads and commands parsed by the main thread, the command lookup, arity check, slot lookup are moved out of `processCommand()` to a new `prepareCommand()` function that needs to be called before `processCommand()`. The client's read flags are used for passing information about bad arity and cross-slot. These flags are already used to convey information per command between I/O thread and main thread. Fixes #2077 Related to #632 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-06-04 09:50:12 +02:00
Ayush Sharma	4aaaf88337	Allow dynamic modification of io-threads num (#2033 ) Item from #761 This PR has the following changes 1. Bug fix where calling `pthread_join()` from main thread for an IO thread would hang indefinitely. This is because `IOThreadMain()` doesn't have a cancellation point.So `pthread_cancel()` from main thread is not honored. Can be reproed by calling `shutdownIOThread()` from the main thread for any active thread with empty job queue. Fixed by adding cancellation point in `IOThreadMain()`. 2. Makes `io-threads` config runtime modifiable. Signed-off-by: Ayush Sharma <mrayushs933@gmail.com>	2025-05-28 11:25:02 -07:00
xbasel	f5b92f526a	Prioritize replication traffic in the replica (#1838 ) ## Overview In high-load scenarios, a replica might not consume replication data fast enough, leading to backpressure on the primary. When the primary’s buffer overflows, replication lag increases and it can drops the replica connection, triggering a full sync, a costly operation that impacts system performance. The solution is to read from replication clients until their is no longer pending data, up to 25 iterations. ## Performance Impact ## Test setup: 1. Bombard the replica with expensive commands, leading to high CPU utilization 2. Write to the main database to trigger replication traffic Metric \| Before (repl-flow-control Disabled) \| After (repl-flow-control Enabled) -- \| -- \| -- Throughput (requests/sec) \| 941.71 \| 760.98 Avg Latency (ms) \| 52.865 \| 65.534 p50 Latency (ms) \| 59.743 \| 68.543 p95 Latency (ms) \| 79.231 \| 106.687 p99 Latency (ms) \| 90.303 \| 126.527 Max Latency (ms) \| 188.031 \| 385.535 - Replication stability improves, no full syncs were observed after enabling flow control. - Higher latency for normal clients due to increased resource allocation for replication. - CPU and memory usage remain stable, with no major overhead. - Replica throughput slightly decreases as replication takes priority. Implements https://github.com/valkey-io/valkey/issues/1596 --------- Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-05-28 10:50:21 -07:00
Zeroday BYTE	374718b2a3	Fix unsigned difference expression compared to zero (#2101 ) `daea05b1e2/src/networking.c (L886-L886)` Fix the issue need to ensure that the subtraction `prev->size - prev->used` does not underflow. This can be achieved by explicitly checking that `prev->used` is less than `prev->size` before performing the subtraction. This approach avoids relying on unsigned arithmetic and ensures the logic is clear and robust. The specific changes are: 1. Replace the condition `prev->size - prev->used > 0` with `prev->used < prev->size`. 2. This change ensures that the logic checks whether there is remaining space in the buffer without risking underflow. References [INT02-C. Understand integer conversion rules](https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules) [CWE-191](https://cwe.mitre.org/data/definitions/191.html) --- Signed-off-by: Zeroday BYTE <github@zerodaysec.org>	2025-05-26 14:57:00 +03:00
Viktor Söderqvist	249495a3ff	Convert pubsub dicts to hashtables (#2007 ) Two dicts are converted to hashtables: 1. On each client, the set of channels/patterns/shard-channels the client is subscribed to 2. On each channel or pattern, the set of clients subscribed to it. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Rain Valentine <rsg000@gmail.com>	2025-04-27 09:38:59 +02:00
Ran Shidlansik	30e9109d19	Add output buffer limiting for unauthenticated clients (#2006 ) This commit introduces a mechanism to track client authentication state with a new `ever_authenticated` flag. It refactors client authentication handling by adding a `clientSetUser` function that properly sets both the `authenticated` and `ever_authenticated` flags. The implementation limits output buffer size for clients that have never been authenticated. Added tests to verify the output buffer limiting behavior for unauthenticated clients. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: uriyage <78144248+uriyage@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-04-25 09:29:32 +03:00
Yair Gottdenker	38846504dc	blocking client followup fix related to log_req_res (#1989 ) See https://github.com/valkey-io/valkey/pull/1819#issuecomment-2817273924 Verified the fix by: 1. Build: `make SERVER_CFLAGS='-Werror -DLOG_REQ_RES' -j10` 2. Running module api tests: `CFLAGS='-Werror' ./runtest-moduleapi --log-req-res --no-latency --dont-clean --force-resp3 --dont-pre-clean --verbose --dump-logs` --------- Signed-off-by: yairgott <yairgott@gmail.com>	2025-04-23 00:41:17 +02:00
Yair Gottdenker	57118a4754	Fix engine crash on module client blocking during keyspace events (#1819 ) This change enhances user experience and consistency by allowing a module to block a client on keyspace event notifications. Consistency is improved by allowing that reads after writes on the same connection yield expected results. For example, in ValkeySearch, mutations processed earlier on the same connection will be available for search. The implementation extends `VM_BlockClient` to support blocking clients on keyspace event notifications. Internal clients, LUA clients, clients issueing multi exec and those with the `deny_blocking` flag set are not blocked. Once blocked, a client’s reply is withheld until it is explicitly unblocked. --------- Signed-off-by: yairgott <yairgott@gmail.com>	2025-04-17 18:13:21 -07:00
Sarthak Aggarwal	be60586f17	[Client Introspection] Client Commands Extended Filtering (#1466 ) In this PR, we introduce support for new filters for `CLIENT LIST` and `CLIENT KILL` commands. The new filters are: 1. FLAGS `Client must include this flag. This can be a string with bunch of flags present one after the other.` 2. NAME `client name` 3. IDLE `minimum idle time of the client` 4. LIB-NAME `clients with the specified lib name.` 5. LIB-VER `clients with the specified lib version.` 6. DB `clients currently operating on the specified database ID` 7. IP `client ip address` 8. CAPA `client capabilities` Partly Addresses: #668 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-04-14 16:08:41 -07:00
Sarthak Aggarwal	e2b28c6a07	Rebranding in security warning log (#1945 ) Replacing Redis with Valkey in security warning log via the `SERVER_TITLE` macro. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-04-13 05:41:07 +02:00
VanessaTang	09f9630171	Redact protocol error log when hide-user-data-from-log enabled (#1889 ) In this code logic: https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L2767-L2773, `c->querybuf + c->qb_pos` may also include user data. Update the log message when config `hide-user-data-from-log` is enabled. --------- Signed-off-by: VanessaTang <yuetan@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com>	2025-03-31 10:46:20 +03:00
uriyage	cab500cf6a	Fix incorrect assertion in client list operations (#1800 ) The current assertion introduced in #11220: ```c serverAssert(&c->clients_pending_write_node.next != NULL \|\| &c->clients_pending_write_node.prev != NULL); ``` is incorrect for two reasons: 1. Using &pointer.next would always be non-NULL since it's the address of the field. 2. The check is incorrect even without the & because in a single-node list, both pointers can be NULL. Fix: 1. Remove the always-true assertion 2. Add proper assertions in listUnlinkNode to ensure the node membership in the list to cover all list cases. Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-03-04 15:24:28 +08:00
zhaozhao.zz	0cc0bf7222	make net_input_bytes_curr_cmd more readable (#1756 ) The metric `net_input_bytes_curr_cmd` is now computed by aggregating its components separately. Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-03-03 09:40:24 +08:00
zhaozhao.zz	49663575c0	cmd's out bytes need count deferred reply (#1760 ) the special deferred reply is ignored in current command's `net_output_bytes_curr_cmd` counting Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-02-27 15:59:48 +08:00
Seungmin Lee	b7116d4a72	Move TCP/TLS specific options from generic client to connection type (#1706 ) Fixes https://github.com/valkey-io/valkey/issues/1702 Signed-off-by: Seungmin Lee <sungming@amazon.com> Co-authored-by: Seungmin Lee <sungming@amazon.com>	2025-02-20 14:33:37 -08:00
Ray Cao	462cba5801	Modify parameter of clientMatchesFilter function. (#1733 ) small change to [#1401](https://github.com/valkey-io/valkey/pull/1401/). pass `clientFilter *` to clientMatchesFilter as suggest in [comment](`6bc64ca537 (r1875844273)`). Signed-off-by: Ray Cao <zisong.cw@alibaba-inc.com>	2025-02-15 10:49:30 +01:00
zhaozhao.zz	96857bfeb1	show capa in CLIENT LIST (#1698 ) Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-02-11 00:06:41 +08:00
bodong.ybd	61a854dbbd	Fix client trackinginfo crash when tracking off by default (#1684 ) After #1405, `client trackinginfo` will crash when tracking is off ``` /lib64/libpthread.so.0(+0xf630)[0x7fab74931630] ./src/valkey-server :6379(clientTrackingInfoCommand+0x12b)[0x57f8db] ./src/valkey-server :6379(call+0x5ba)[0x5a791a] ./src/valkey-server :6379(processCommand+0x968)[0x5a8938] ./src/valkey-server :6379(processInputBuffer+0x18d)[0x58381d] ./src/valkey-server :6379(readQueryFromClient+0x59)[0x585ea9] ./src/valkey-server :6379[0x46fa4d] ./src/valkey-server :6379(aeMain+0x89)[0x5bf3e9] ./src/valkey-server :6379(main+0x4e1)[0x455821] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555] ./src/valkey-server *:6379[0x4564f2] ``` The reason is that we did not init pubsub_data by default, we only init it when tracking on. Fixes #1683. Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>	2025-02-10 10:56:23 +08:00
uriyage	de3672a7ff	Offload replication writes to IO threads (#1485 ) This PR offloads the write to replica clients to IO threads. ## Main Changes * Replica writes will be offloaded but only after the replica is in online mode.. * Replica reads will still be done in the main thread to reduce complexity and because read traffic from replicas is negligible. ### Implementation Details In order to offload the writes, `writeToReplica` has been split into 2 parts: 1. The write itself made by the IO thread or by the main thread 2. The post write where we update the replication buffers refcount will be done in the main-thread after the write-job is done in the IO thread (similar to what we do with a regular client) ### Additional Changes * In `writeToReplica` we now use `writev` in case more than 1 buffer exists. * Changed client `nwritten` field to `ssize_t` since with a replica the `nwritten` can theoretically exceed `int` size (not subject to `NET_MAX_WRITES_PER_EVENT` limit). * Changed parsing code to use `memchr` instead of `strchr`: * During parsing command, ASAN got stuck for unknown reason when called to `strchr` to look for the next `\r` * Adding assert for null-terminated querybuf didn't resolve the issue. * Switched to `memchr` as it's more secure and resolves the issue ### Testing * Added integration tests * Added unit tests Related issue: #761 --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2025-02-09 13:34:13 +02:00
Binbin	da3f1c6144	Add paused_reason to INFO CLIENTS (#1564 ) In #1519, we added paused_actions and paused_timeout_milliseconds, it would be helpful if we add the paused_purpose since users also want to know the purpose for the pause. Currently available options: - client_pause: trigger by CLIENT PAUSE command. - shutdown_in_progress: during shutdown, primary waits the replicas to catch up the offset. - failover_in_progress: during failover, primary waits the replica to catch up the offset. - none --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-02-06 16:56:32 +01:00
zhaozhao.zz	3f21705a6c	Feature COMMANDLOG to record slow execution and large request/reply (#1294 ) As discussed in PR #336. We have different types of resources like CPU, memory, network, etc. The `slowlog` can only record commands eat lots of CPU during the processing phase (doesn't include read/write network time), but can not record commands eat too many memory and network. For example: 1. run "SET key value(10 megabytes)" command would not be recored in slowlog, since when processing it the SET command only insert the value's pointer into db dict. But that command eats huge memory in query buffer and bandwidth from network. In this case, just 1000 tps can cause 10GB/s network flow. 2. run "GET key" command and the key's value length is 10 megabytes. The get command can eat huge memory in output buffer and bandwidth to network. This PR introduces a new command `COMMANDLOG`, to log commands that consume significant network bandwidth, including both input and output. Users can retrieve the results using `COMMANDLOG get <count> large-request` and `COMMANDLOG get <count> large-reply`, all subcommands for `COMMANDLOG` are: * `COMMANDLOG HELP` * `COMMANDLOG GET <count> <slow\|large-request\|large-reply>` * `COMMANDLOG LEN <slow\|large-request\|large-reply>` * `COMMANDLOG RESET <slow\|large-request\|large-reply>` And the slowlog is also incorporated into the commandlog. For each of these three types, additional configs have been added for control: * `commandlog-request-larger-than` and `commandlog-large-request-max-len` represent the threshold for large requests(the unit is Bytes) and the maximum number of commands that can be recorded. * `commandlog-reply-larger-than` and `commandlog-large-reply-max-len` represent the threshold for large replies(the unit is Bytes) and the maximum number of commands that can be recorded. * `commandlog-execution-slower-than` and `commandlog-slow-execution-max-len` represent the threshold for slow executions(the unit is microseconds) and the maximum number of commands that can be recorded. * Additionally, `slowlog-log-slower-than` and `slowlog-max-len` are now set as aliases for these two new configs. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2025-01-24 11:41:40 +08:00
Sarthak Aggarwal	6a8f068e36	Adding Missing filters to CLIENT LIST and Dedup Parsing (#1401 ) Adds filter options to CLIENT LIST: * USER <username> Return clients authenticated by <username>. * ADDR <ip:port> Return clients connected from the specified address. * LADDR <ip:port> Return clients connected to the specified local address. * SKIPME (YES\|NO) Exclude the current client from the list (default: no). * MAXAGE <maxage> Only list connections older than the specified age. Modifies the ID filter to CLIENT KILL to allow multiple IDs * ID <client-id> [<client-id>...] Kill connections by client ids. This makes CLIENT LIST and CLIENT KILL accept the same options. For backward compatibility, the default value for SKIPME is NO for CLIENT LIST and YES for CLIENT KILL. The MAXAGE comes from CLIENT KILL, where it keeps clients with the given max age and kills the older ones. This logic becomes weird for CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case of first testing manually using CLIENT LIST, and then running CLIENT KILL with the same filters. The `ID client-id [client-id ...]` no longer needs to be the last filter. The parsing logic determines if an argument is an ID or not based on whether it can be parsed as an integer or not. Partly addresses: #668 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-01-15 20:44:13 +01:00
zhaozhao.zz	c5a1585547	add paused_actions for INFO Clients (#1519 ) Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients to inform users about if clients are paused. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-01-14 19:01:00 +08:00
uriyage	6c09eea2bc	client struct: lazy init components and optimize struct layout (#1405 ) # Refactor client structure to use modular data components ## Current State The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality. ## This PR's Change 1. Lazy Initialization - Components are only allocated when first used: - PubSubData: Created on first SUBSCRIBE/PUBLISH operation - ReplicationData: Initialized only for replica connections - ModuleData: Allocated when module interaction begins - BlockingState: Created when first blocking command is issued - MultiState: Initialized on MULTI command 2. Memory Layout Optimization: - Grouped related fields for better locality - Moved rarely accessed fields (e.g., client->name) to struct end - Optimized field alignment to eliminate padding 3. Additional changes: - Moved watched_keys to be static allocated in the `mstate` struct - Relocated replication init logic to replication.c ### Key Benefits - Efficient Memory Usage: - 45% smaller base client structure - Basic clients now use 528 bytes (down from 896). - Better memory locality for related operations - Performance improvement in high throughput scenarios. No performance regressions in other cases. ### Performance Impact Tested with 650 clients and 512 bytes values. #### Single Thread Performance \| Operation \| Dataset \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|------------\|---------\|---------------\|---------------\|-----------\| \| SET \| 1 key \| 261,799 \| 258,261 \| +1.37% \| \| SET \| 3M keys \| 209,134 \| ~209,000 \| ~0% \| \| GET \| 1 key \| 281,564 \| 277,965 \| +1.29% \| \| GET \| 3M keys \| 231,158 \| 228,410 \| +1.20% \| #### 8 IO Threads Performance \| Operation \| Dataset \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|------------\|---------\|---------------\|---------------\|-----------\| \| SET \| 1 key \| 1,331,578 \| 1,331,626 \| -0.00% \| \| SET \| 3M keys \| 1,254,441 \| 1,152,645 \| +8.83% \| \| GET \| 1 key \| 1,293,149 \| 1,289,503 \| +0.28% \| \| GET \| 3M keys \| 1,152,898 \| 1,101,791 \| +4.64% \| #### Pipeline Performance (3M keys) \| Operation \| Pipeline Size \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|-----------\|--------------\|---------------\|---------------\|-----------\| \| SET \| 10 \| 548,964 \| 538,498 \| +1.94% \| \| SET \| 20 \| 606,148 \| 594,872 \| +1.89% \| \| SET \| 30 \| 631,122 \| 616,606 \| +2.35% \| \| GET \| 10 \| 628,482 \| 624,166 \| +0.69% \| \| GET \| 20 \| 687,371 \| 681,659 \| +0.84% \| \| GET \| 30 \| 725,855 \| 721,102 \| +0.66% \| ### Observations: 1. Single-threaded operations show consistent improvements (1-1.4%) 2. Multi-threaded performance shows significant gains for large datasets: - SET with 3M keys: +8.83% improvement - GET with 3M keys: +4.64% improvement 3. Pipeline operations show consistent improvements: - SET operations: +1.89% to +2.35% - GET operations: +0.66% to +0.84% 4. No performance regressions observed in any test scenario Related issue:https://github.com/valkey-io/valkey/issues/761 --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-08 10:28:54 +02:00
Rueian	3b52186b6a	Add `availability_zone` to the HELLO response (#1487 ) It's inconvenient for client implementations to extract the `availability_zone` information from the `INFO` response. The `INFO` response contains a lot of information that a client implementation typically doesn't need. This PR adds the availability zone to the `HELLO` response. Clients usually already use the `HELLO` command for protocol negotiation and also get the server `version` and `role` from its response. To keep the `HELLO` response small, the field is only added if availability zone is configured. --------- Signed-off-by: Rueian <rueiancsie@gmail.com>	2025-01-07 22:54:55 +01:00

1 2 3 4 5 ...

823 Commits