Commit Graph

6439 Commits

Author SHA1 Message Date
Jakub Kicinski
d07b7b32da Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:

====================
pull-request: bpf-next 2023-08-03

We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 84 files changed, 4026 insertions(+), 562 deletions(-).

The main changes are:

1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer,
   Daniel Borkmann

2) Support new insns from cpu v4 from Yonghong Song

3) Non-atomically allocate freelist during prefill from YiFei Zhu

4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu

5) Add tracepoint to xdp attaching failure from Leon Hwang

6) struct netdev_rx_queue and xdp.h reshuffling to reduce
   rebuild time from Jakub Kicinski

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
  net: invert the netdevice.h vs xdp.h dependency
  net: move struct netdev_rx_queue out of netdevice.h
  eth: add missing xdp.h includes in drivers
  selftests/bpf: Add testcase for xdp attaching failure tracepoint
  bpf, xdp: Add tracepoint to xdp attaching failure
  selftests/bpf: fix static assert compilation issue for test_cls_*.c
  bpf: fix bpf_probe_read_kernel prototype mismatch
  riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework
  libbpf: fix typos in Makefile
  tracing: bpf: use struct trace_entry in struct syscall_tp_t
  bpf, devmap: Remove unused dtab field from bpf_dtab_netdev
  bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry
  netfilter: bpf: Only define get_proto_defrag_hook() if necessary
  bpf: Fix an array-index-out-of-bounds issue in disasm.c
  net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn
  docs/bpf: Fix malformed documentation
  bpf: selftests: Add defrag selftests
  bpf: selftests: Support custom type and proto for client sockets
  bpf: selftests: Support not connecting client socket
  netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
  ...
====================

Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03 15:34:36 -07:00
Jakub Kicinski
35b1b1fd96 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/dsa/port.c
  9945c1fb03 ("net: dsa: fix older DSA drivers using phylink")
  a88dd75384 ("net: dsa: remove legacy_pre_march2020 detection")
https://lore.kernel.org/all/20230731102254.2c9868ca@canb.auug.org.au/

net/xdp/xsk.c
  3c5b4d69c3 ("net: annotate data-races around sk->sk_mark")
  b7f72a30e9 ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
https://lore.kernel.org/all/20230731102631.39988412@canb.auug.org.au/

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  37b61cda9c ("bnxt: don't handle XDP in netpoll")
  2b56b3d992 ("eth: bnxt: handle invalid Tx completions more gracefully")
https://lore.kernel.org/all/20230801101708.1dc7faac@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
  62da08331f ("net/mlx5e: Set proper IPsec source port in L4 selector")
  fbd517549c ("net/mlx5e: Add function to get IPsec offload namespace")

drivers/net/ethernet/sfc/selftest.c
  55c1528f9b ("sfc: fix field-spanning memcpy in selftest")
  ae9d445cd4 ("sfc: Miscellaneous comment removals")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03 14:34:37 -07:00
Jakub Kicinski
680ee0456a net: invert the netdevice.h vs xdp.h dependency
xdp.h is far more specific and is included in only 67 other
files vs netdevice.h's 1538 include sites.
Make xdp.h include netdevice.h, instead of the other way around.
This decreases the incremental allmodconfig builds size when
xdp.h is touched from 5947 to 662 objects.

Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h
is a mega-header in its own right so it's nice to avoid xdp.h
getting included there as well.

The only unfortunate part is that the typedef for xdp_features_t
has to move to netdevice.h, since its embedded in struct netdevice.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-03 08:38:07 -07:00
Daniel Xu
81584c23f2 netfilter: bpf: Only define get_proto_defrag_hook() if necessary
Before, we were getting this warning:

  net/netfilter/nf_bpf_link.c:32:1: warning: 'get_proto_defrag_hook' defined but not used [-Wunused-function]

Guard the definition with CONFIG_NF_DEFRAG_IPV[4|6].

Fixes: 91721c2d02 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307291213.fZ0zDmoG-lkp@intel.com/
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/b128b6489f0066db32c4772ae4aaee1480495929.1690840454.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-31 17:51:13 -07:00
Ratheesh Kannoth
2b3082c6ef net: flow_dissector: Use 64bits for used_keys
As 32bits of dissector->used_keys are exhausted,
increase the size to 64bits.

This is base change for ESP/AH flow dissector patch.
Please find patch and discussions at
https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Tested-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31 09:11:24 +01:00
Eric Dumazet
3c5b4d69c3 net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.

Fixes: 4a19ec5800 ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Daniel Xu
91721c2d02 netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
This commit adds support for enabling IP defrag using pre-existing
netfilter defrag support. Basically all the flag does is bump a refcnt
while the link the active. Checks are also added to ensure the prog
requesting defrag support is run _after_ netfilter defrag hooks.

We also take care to avoid any issues w.r.t. module unloading -- while
defrag is active on a link, the module is prevented from unloading.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Daniel Xu
9abddac583 netfilter: defrag: Add glue hooks for enabling/disabling defrag
We want to be able to enable/disable IP packet defrag from core
bpf/netfilter code. In other words, execute code from core that could
possibly be built as a module.

To help avoid symbol resolution errors, use glue hooks that the modules
will register callbacks with during module init.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28 16:52:08 -07:00
Jakub Kicinski
5908a4c47c Merge tag 'nf-next-23-07-27' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
netfilter updates for net-next

1.  silence a harmless warning for CONFIG_NF_CONNTRACK_PROCFS=n builds,
 from Zhu Wang.

2, 3:
Allow NLA_POLICY_MASK to be used with BE16/BE32 types, and replace a few
manual checks with nla_policy based one in nf_tables, from myself.

4: cleanup in ctnetlink to validate while parsing rather than
   using two steps, from Lin Ma.

5: refactor boyer-moore textsearch by moving a small chunk to
   a helper function, rom Jeremy Sowden.

* tag 'nf-next-23-07-27' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  lib/ts_bm: add helper to reduce indentation and improve readability
  netfilter: conntrack: validate cta_ip via parsing
  netfilter: nf_tables: use NLA_POLICY_MASK to test for valid flag options
  netlink: allow be16 and be32 types in all uint policy checks
  nf_conntrack: fix -Wunused-const-variable=
====================

Link: https://lore.kernel.org/r/20230727133604.8275-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 20:25:43 -07:00
Jakub Kicinski
014acf2668 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 15:22:46 -07:00
Lin Ma
0c805e80e3 netfilter: conntrack: validate cta_ip via parsing
In current ctnetlink_parse_tuple_ip() function, nested parsing and
validation is splitting as two parts,  which could be cleanup to a
simplified form. As the nla_parse_nested_deprecated function
supports validation in the fly. These two finially reach same place
__nla_validate_parse with same validate flag.

nla_parse_nested_deprecated
  __nla_parse(.., NL_VALIDATE_LIBERAL, ..)
    __nla_validate_parse

nla_validate_nested_deprecated
  __nla_validate_nested(.., NL_VALIDATE_LIBERAL, ..)
    __nla_validate
      __nla_validate_parse

This commit removes the call to nla_validate_nested_deprecated and pass
cta_ip_nla_policy when do parsing.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-27 13:45:51 +02:00
Florian Westphal
100a11b698 netfilter: nf_tables: use NLA_POLICY_MASK to test for valid flag options
nf_tables relies on manual test of netlink attributes coming from userspace
even in cases where this could be handled via netlink policy.

Convert a bunch of 'flag' attributes to use NLA_POLICY_MASK checks.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-27 13:45:51 +02:00
Zhu Wang
a927d77778 nf_conntrack: fix -Wunused-const-variable=
When building with W=1, the following warning occurs.

net/netfilter/nf_conntrack_proto_dccp.c:72:27: warning: ‘dccp_state_names’ defined but not used [-Wunused-const-variable=]
 static const char * const dccp_state_names[] = {

We include dccp_state_names in the macro
CONFIG_NF_CONNTRACK_PROCFS, since it is only used in the place
which is included in the macro CONFIG_NF_CONNTRACK_PROCFS.

Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-27 13:45:51 +02:00
Pablo Neira Ayuso
0ebc1064e4 netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
Bail out with EOPNOTSUPP when adding rule to bound chain via
NFTA_RULE_CHAIN_ID. The following warning splat is shown when
adding a rule to a deleted bound chain:

 WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
 CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1
 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-26 16:48:49 +02:00
Pablo Neira Ayuso
0a771f7b26 netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
On error when building the rule, the immediate expression unbinds the
chain, hence objects can be deactivated by the transaction records.

Otherwise, it is possible to trigger the following warning:

 WARNING: CPU: 3 PID: 915 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
 CPU: 3 PID: 915 Comm: chain-bind-err- Not tainted 6.1.39 #1
 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]

Fixes: 4bedf9eee0 ("netfilter: nf_tables: fix chain binding transaction logic")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-26 16:48:49 +02:00
Florian Westphal
f718863aca netfilter: nft_set_rbtree: fix overlap expiration walk
The lazy gc on insert that should remove timed-out entries fails to release
the other half of the interval, if any.

Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0
in nftables.git and kmemleak enabled kernel.

Second bug is the use of rbe_prev vs. prev pointer.
If rbe_prev() returns NULL after at least one iteration, rbe_prev points
to element that is not an end interval, hence it should not be removed.

Lastly, check the genmask of the end interval if this is active in the
current generation.

Fixes: c9e6978e27 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-26 16:48:13 +02:00
Jakub Kicinski
59be3baa8d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-20 15:52:55 -07:00
Pablo Neira Ayuso
6eaf41e87a netfilter: nf_tables: skip bound chain on rule flush
Skip bound chain when flushing table rules, the rule that owns this
chain releases these objects.

Otherwise, the following warning is triggered:

  WARNING: CPU: 2 PID: 1217 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
  CPU: 2 PID: 1217 Comm: chain-flush Not tainted 6.1.39 #1
  RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-20 17:21:11 +02:00
Pablo Neira Ayuso
751d460ccf netfilter: nf_tables: skip bound chain in netns release path
Skip bound chain from netns release path, the rule that owns this chain
releases these objects.

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-20 17:02:45 +02:00
Florian Westphal
87b5a5c209 netfilter: nft_set_pipapo: fix improper element removal
end key should be equal to start unless NFT_SET_EXT_KEY_END is present.

Its possible to add elements that only have a start key
("{ 1.0.0.0 . 2.0.0.0 }") without an internval end.

Insertion treats this via:

if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
   end = (const u8 *)nft_set_ext_key_end(ext)->data;
else
   end = start;

but removal side always uses nft_set_ext_key_end().
This is wrong and leads to garbage remaining in the set after removal
next lookup/insert attempt will give:

BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90
Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399
Call Trace:
 kasan_report+0x105/0x140
 pipapo_get+0x8eb/0xb90
 nft_pipapo_insert+0x1dc/0x1710
 nf_tables_newsetelem+0x31f5/0x4e00
 ..

Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: lonial con <kongln9170@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-20 12:27:34 +02:00
Florian Westphal
314c828416 netfilter: nf_tables: can't schedule in nft_chain_validate
Can be called via nft set element list iteration, which may acquire
rcu and/or bh read lock (depends on set type).

BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by nft/1232:
 #0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid
 #1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire
Call Trace:
 nft_chain_validate
 nft_lookup_validate_setelem
 nft_pipapo_walk
 nft_lookup_validate
 nft_chain_validate
 nft_immediate_validate
 nft_chain_validate
 nf_tables_validate
 nf_tables_abort

No choice but to move it to nf_tables_validate().

Fixes: 81ea010667 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-20 11:59:25 +02:00
Florian Westphal
ddbd8be689 netfilter: nf_tables: fix spurious set element insertion failure
On some platforms there is a padding hole in the nft_verdict
structure, between the verdict code and the chain pointer.

On element insertion, if the new element clashes with an existing one and
NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
the data associated with duplicated element is the same as the existing
one.  The data equality check uses memcmp.

For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
padding area leads to spurious failure even if the verdict data is the
same.

This then makes the insertion fail with 'already exists' error, even
though the new "key : data" matches an existing entry and userspace
told the kernel that it doesn't want to receive an error indication.

Fixes: c016c7e45d ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-20 11:56:50 +02:00
Xin Long
4914109a8e netfilter: allow exp not to be removed in nf_ct_find_expectation
Currently nf_conntrack_in() calling nf_ct_find_expectation() will
remove the exp from the hash table. However, in some scenario, we
expect the exp not to be removed when the created ct will not be
confirmed, like in OVS and TC conntrack in the following patches.

This patch allows exp not to be removed by setting IPS_CONFIRMED
in the status of the tmpl.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-20 10:06:36 +02:00
Paolo Abeni
ceb20a3cc5 Merge tag 'nf-23-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix missing overflow use refcount checks in nf_tables.

2) Do not set IPS_ASSURED for IPS_NAT_CLASH entries in GRE tracker,
   from Florian Westphal.

3) Bail out if nf_ct_helper_hash is NULL before registering helper,
   from Florent Revest.

4) Use siphash() instead siphash_4u64() to fix performance regression,
   also from Florian.

5) Do not allow to add rules to removed chains via ID,
   from Thadeu Lima de Souza Cascardo.

6) Fix oob read access in byteorder expression, also from Thadeu.

netfilter pull request 23-07-06

====================

Link: https://lore.kernel.org/r/20230705230406.52201-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-06 11:02:58 +02:00
Thadeu Lima de Souza Cascardo
caf3ef7468 netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[   23.095215] ==================================================================
[   23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[   23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[   23.096358]
[   23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[   23.096770] Call Trace:
[   23.096910]  <IRQ>
[   23.097030]  dump_stack_lvl+0x60/0xc0
[   23.097218]  print_report+0xcf/0x630
[   23.097388]  ? nft_byteorder_eval+0x13c/0x320
[   23.097577]  ? kasan_addr_to_slab+0xd/0xc0
[   23.097760]  ? nft_byteorder_eval+0x13c/0x320
[   23.097949]  kasan_report+0xc9/0x110
[   23.098106]  ? nft_byteorder_eval+0x13c/0x320
[   23.098298]  __asan_load2+0x83/0xd0
[   23.098453]  nft_byteorder_eval+0x13c/0x320
[   23.098659]  nft_do_chain+0x1c8/0xc50
[   23.098852]  ? __pfx_nft_do_chain+0x10/0x10
[   23.099078]  ? __kasan_check_read+0x11/0x20
[   23.099295]  ? __pfx___lock_acquire+0x10/0x10
[   23.099535]  ? __pfx___lock_acquire+0x10/0x10
[   23.099745]  ? __kasan_check_read+0x11/0x20
[   23.099929]  nft_do_chain_ipv4+0xfe/0x140
[   23.100105]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
[   23.100327]  ? lock_release+0x204/0x400
[   23.100515]  ? nf_hook.constprop.0+0x340/0x550
[   23.100779]  nf_hook_slow+0x6c/0x100
[   23.100977]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
[   23.101223]  nf_hook.constprop.0+0x334/0x550
[   23.101443]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[   23.101677]  ? __pfx_nf_hook.constprop.0+0x10/0x10
[   23.101882]  ? __pfx_ip_rcv_finish+0x10/0x10
[   23.102071]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[   23.102291]  ? rcu_read_lock_held+0x4b/0x70
[   23.102481]  ip_local_deliver+0xbb/0x110
[   23.102665]  ? __pfx_ip_rcv+0x10/0x10
[   23.102839]  ip_rcv+0x199/0x2a0
[   23.102980]  ? __pfx_ip_rcv+0x10/0x10
[   23.103140]  __netif_receive_skb_one_core+0x13e/0x150
[   23.103362]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[   23.103647]  ? mark_held_locks+0x48/0xa0
[   23.103819]  ? process_backlog+0x36c/0x380
[   23.103999]  __netif_receive_skb+0x23/0xc0
[   23.104179]  process_backlog+0x91/0x380
[   23.104350]  __napi_poll.constprop.0+0x66/0x360
[   23.104589]  ? net_rx_action+0x1cb/0x610
[   23.104811]  net_rx_action+0x33e/0x610
[   23.105024]  ? _raw_spin_unlock+0x23/0x50
[   23.105257]  ? __pfx_net_rx_action+0x10/0x10
[   23.105485]  ? mark_held_locks+0x48/0xa0
[   23.105741]  __do_softirq+0xfa/0x5ab
[   23.105956]  ? __dev_queue_xmit+0x765/0x1c00
[   23.106193]  do_softirq.part.0+0x49/0xc0
[   23.106423]  </IRQ>
[   23.106547]  <TASK>
[   23.106670]  __local_bh_enable_ip+0xf5/0x120
[   23.106903]  __dev_queue_xmit+0x789/0x1c00
[   23.107131]  ? __pfx___dev_queue_xmit+0x10/0x10
[   23.107381]  ? find_held_lock+0x8e/0xb0
[   23.107585]  ? lock_release+0x204/0x400
[   23.107798]  ? neigh_resolve_output+0x185/0x350
[   23.108049]  ? mark_held_locks+0x48/0xa0
[   23.108265]  ? neigh_resolve_output+0x185/0x350
[   23.108514]  neigh_resolve_output+0x246/0x350
[   23.108753]  ? neigh_resolve_output+0x246/0x350
[   23.109003]  ip_finish_output2+0x3c3/0x10b0
[   23.109250]  ? __pfx_ip_finish_output2+0x10/0x10
[   23.109510]  ? __pfx_nf_hook+0x10/0x10
[   23.109732]  __ip_finish_output+0x217/0x390
[   23.109978]  ip_finish_output+0x2f/0x130
[   23.110207]  ip_output+0xc9/0x170
[   23.110404]  ip_push_pending_frames+0x1a0/0x240
[   23.110652]  raw_sendmsg+0x102e/0x19e0
[   23.110871]  ? __pfx_raw_sendmsg+0x10/0x10
[   23.111093]  ? lock_release+0x204/0x400
[   23.111304]  ? __mod_lruvec_page_state+0x148/0x330
[   23.111567]  ? find_held_lock+0x8e/0xb0
[   23.111777]  ? find_held_lock+0x8e/0xb0
[   23.111993]  ? __rcu_read_unlock+0x7c/0x2f0
[   23.112225]  ? aa_sk_perm+0x18a/0x550
[   23.112431]  ? filemap_map_pages+0x4f1/0x900
[   23.112665]  ? __pfx_aa_sk_perm+0x10/0x10
[   23.112880]  ? find_held_lock+0x8e/0xb0
[   23.113098]  inet_sendmsg+0xa0/0xb0
[   23.113297]  ? inet_sendmsg+0xa0/0xb0
[   23.113500]  ? __pfx_inet_sendmsg+0x10/0x10
[   23.113727]  sock_sendmsg+0xf4/0x100
[   23.113924]  ? move_addr_to_kernel.part.0+0x4f/0xa0
[   23.114190]  __sys_sendto+0x1d4/0x290
[   23.114391]  ? __pfx___sys_sendto+0x10/0x10
[   23.114621]  ? __pfx_mark_lock.part.0+0x10/0x10
[   23.114869]  ? lock_release+0x204/0x400
[   23.115076]  ? find_held_lock+0x8e/0xb0
[   23.115287]  ? rcu_is_watching+0x23/0x60
[   23.115503]  ? __rseq_handle_notify_resume+0x6e2/0x860
[   23.115778]  ? __kasan_check_write+0x14/0x30
[   23.116008]  ? blkcg_maybe_throttle_current+0x8d/0x770
[   23.116285]  ? mark_held_locks+0x28/0xa0
[   23.116503]  ? do_syscall_64+0x37/0x90
[   23.116713]  __x64_sys_sendto+0x7f/0xb0
[   23.116924]  do_syscall_64+0x59/0x90
[   23.117123]  ? irqentry_exit_to_user_mode+0x25/0x30
[   23.117387]  ? irqentry_exit+0x77/0xb0
[   23.117593]  ? exc_page_fault+0x92/0x140
[   23.117806]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   23.118081] RIP: 0033:0x7f744aee2bba
[   23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[   23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[   23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[   23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[   23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[   23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[   23.121617]  </TASK>
[   23.121749]
[   23.121845] The buggy address belongs to the virtual mapping at
[   23.121845]  [ffffc90000000000, ffffc90000009000) created by:
[   23.121845]  irq_init_percpu_irqstack+0x1cf/0x270
[   23.122707]
[   23.122803] The buggy address belongs to the physical page:
[   23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[   23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[   23.123998] page_type: 0xffffffff()
[   23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[   23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   23.125023] page dumped because: kasan: bad access detected
[   23.125326]
[   23.125421] Memory state around the buggy address:
[   23.125682]  ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.126072]  ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[   23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[   23.126840]                                               ^
[   23.127138]  ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[   23.127522]  ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[   23.127906] ==================================================================
[   23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-06 00:53:14 +02:00
Thadeu Lima de Souza Cascardo
515ad53079 netfilter: nf_tables: do not ignore genmask when looking up chain by id
When adding a rule to a chain referring to its ID, if that chain had been
deleted on the same batch, the rule might end up referring to a deleted
chain.

This will lead to a WARNING like following:

[   33.098431] ------------[ cut here ]------------
[   33.098678] WARNING: CPU: 5 PID: 69 at net/netfilter/nf_tables_api.c:2037 nf_tables_chain_destroy+0x23d/0x260
[   33.099217] Modules linked in:
[   33.099388] CPU: 5 PID: 69 Comm: kworker/5:1 Not tainted 6.4.0+ #409
[   33.099726] Workqueue: events nf_tables_trans_destroy_work
[   33.100018] RIP: 0010:nf_tables_chain_destroy+0x23d/0x260
[   33.100306] Code: 8b 7c 24 68 e8 64 9c ed fe 4c 89 e7 e8 5c 9c ed fe 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7 c3 cc cc cc cc <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7
[   33.101271] RSP: 0018:ffffc900004ffc48 EFLAGS: 00010202
[   33.101546] RAX: 0000000000000001 RBX: ffff888006fc0a28 RCX: 0000000000000000
[   33.101920] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   33.102649] RBP: ffffc900004ffc78 R08: 0000000000000000 R09: 0000000000000000
[   33.103018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880135ef500
[   33.103385] R13: 0000000000000000 R14: dead000000000122 R15: ffff888006fc0a10
[   33.103762] FS:  0000000000000000(0000) GS:ffff888024c80000(0000) knlGS:0000000000000000
[   33.104184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.104493] CR2: 00007fe863b56a50 CR3: 00000000124b0001 CR4: 0000000000770ee0
[   33.104872] PKRU: 55555554
[   33.104999] Call Trace:
[   33.105113]  <TASK>
[   33.105214]  ? show_regs+0x72/0x90
[   33.105371]  ? __warn+0xa5/0x210
[   33.105520]  ? nf_tables_chain_destroy+0x23d/0x260
[   33.105732]  ? report_bug+0x1f2/0x200
[   33.105902]  ? handle_bug+0x46/0x90
[   33.106546]  ? exc_invalid_op+0x19/0x50
[   33.106762]  ? asm_exc_invalid_op+0x1b/0x20
[   33.106995]  ? nf_tables_chain_destroy+0x23d/0x260
[   33.107249]  ? nf_tables_chain_destroy+0x30/0x260
[   33.107506]  nf_tables_trans_destroy_work+0x669/0x680
[   33.107782]  ? mark_held_locks+0x28/0xa0
[   33.107996]  ? __pfx_nf_tables_trans_destroy_work+0x10/0x10
[   33.108294]  ? _raw_spin_unlock_irq+0x28/0x70
[   33.108538]  process_one_work+0x68c/0xb70
[   33.108755]  ? lock_acquire+0x17f/0x420
[   33.108977]  ? __pfx_process_one_work+0x10/0x10
[   33.109218]  ? do_raw_spin_lock+0x128/0x1d0
[   33.109435]  ? _raw_spin_lock_irq+0x71/0x80
[   33.109634]  worker_thread+0x2bd/0x700
[   33.109817]  ? __pfx_worker_thread+0x10/0x10
[   33.110254]  kthread+0x18b/0x1d0
[   33.110410]  ? __pfx_kthread+0x10/0x10
[   33.110581]  ret_from_fork+0x29/0x50
[   33.110757]  </TASK>
[   33.110866] irq event stamp: 1651
[   33.111017] hardirqs last  enabled at (1659): [<ffffffffa206a209>] __up_console_sem+0x79/0xa0
[   33.111379] hardirqs last disabled at (1666): [<ffffffffa206a1ee>] __up_console_sem+0x5e/0xa0
[   33.111740] softirqs last  enabled at (1616): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[   33.112094] softirqs last disabled at (1367): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[   33.112453] ---[ end trace 0000000000000000 ]---

This is due to the nft_chain_lookup_byid ignoring the genmask. After this
change, adding the new rule will fail as it will not find the chain.

Fixes: 837830a4b4 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Cc: stable@vger.kernel.org
Reported-by: Mingi Cho of Theori working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05 14:42:16 +02:00
Florian Westphal
eaf9e7192e netfilter: conntrack: don't fold port numbers into addresses before hashing
Originally this used jhash2() over tuple and folded the zone id,
the pernet hash value, destination port and l4 protocol number into the
32bit seed value.

When the switch to siphash was done, I used an on-stack temporary
buffer to build a suitable key to be hashed via siphash().

But this showed up as performance regression, so I got rid of
the temporary copy and collected to-be-hashed data in 4 u64 variables.

This makes it easy to build tuples that produce the same hash, which isn't
desirable even though chain lengths are limited.

Switch back to plain siphash, but just like with jhash2(), take advantage
of the fact that most of to-be-hashed data is already in a suitable order.

Use an empty struct as annotation in 'struct nf_conntrack_tuple' to mark
last member that can be used as hash input.

The only remaining data that isn't present in the tuple structure are the
zone identifier and the pernet hash: fold those into the key.

Fixes: d2c806abcf ("netfilter: conntrack: use siphash_4u64")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05 14:42:16 +02:00
Florent Revest
6eef7a2b93 netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
If nf_conntrack_init_start() fails (for example due to a
register_nf_conntrack_bpf() failure), the nf_conntrack_helper_fini()
clean-up path frees the nf_ct_helper_hash map.

When built with NF_CONNTRACK=y, further netfilter modules (e.g:
netfilter_conntrack_ftp) can still be loaded and call
nf_conntrack_helpers_register(), independently of whether nf_conntrack
initialized correctly. This accesses the nf_ct_helper_hash dangling
pointer and causes a uaf, possibly leading to random memory corruption.

This patch guards nf_conntrack_helper_register() from accessing a freed
or uninitialized nf_ct_helper_hash pointer and fixes possible
uses-after-free when loading a conntrack module.

Cc: stable@vger.kernel.org
Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05 14:42:15 +02:00
Florian Westphal
8a9dc07ba9 netfilter: conntrack: gre: don't set assured flag for clash entries
Now that conntrack core is allowd to insert clashing entries, make sure
GRE won't set assured flag on NAT_CLASH entries, just like UDP.

Doing so prevents early_drop logic for these entries.

Fixes: d671fd82ea ("netfilter: conntrack: allow insertion clash of gre protocol")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05 14:42:15 +02:00
Pablo Neira Ayuso
1689f25924 netfilter: nf_tables: report use refcount overflow
Overflow use refcount checks are not complete.

Add helper function to deal with object reference counter tracking.
Report -EMFILE in case UINT_MAX is reached.

nft_use_dec() splats in case that reference counter underflows,
which should not ever happen.

Add nft_use_inc_restore() and nft_use_dec_restore() which are used
to restore reference counter from error and abort paths.

Use u32 in nft_flowtable and nft_object since helper functions cannot
work on bitfields.

Remove the few early incomplete checks now that the helper functions
are in place and used to check for refcount overflow.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05 14:42:15 +02:00
Linus Torvalds
c156d4af43 Merge tag 'leds-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
 "New Drivers:
   - Add support for Intel Cherry Trail Whiskey Cove PMIC LEDs
   - Add support for Awinic AW20036/AW20054/AW20072 LEDs

  New Device Support:
   - Add support for PMI632 LPG to QCom LPG
   - Add support for PMI8998 to QCom Flash
   - Add support for MT6331, WLEDs and MT6332 to Mediatek MT6323 PMIC

  New Functionality:
   - Implement the LP55xx Charge Pump
   - Add support for suspend / resume to Intel Cherry Trail Whiskey Cove PMIC
   - Add support for breathing mode to Intel Cherry Trail Whiskey Cove PMIC
   - Enable per-pin resolution Pinctrl in LEDs GPIO

  Fix-ups:
   - Allow thread to sleep by switching from spinlock to mutex
   - Add lots of Device Tree bindings / support
   - Adapt relationships / dependencies driven by Kconfig
   - Switch I2C drivers from .probe_new() to .probe()
   - Remove superfluous / duplicate code
   - Replace strlcpy() with strscpy() for efficiency and overflow prevention
   - Staticify various functions
   - Trivial: Fixing coding style
   - Simplify / reduce code

  Bug Fixes:
   - Prevent NETDEV_LED_MODE_LINKUP from being cleared on rename
   - Repair race between led_set_brightness(LED_{OFF,FULL})
   - Fix Oops relating to sleeping in critical sections
   - Clear LED_INIT_DEFAULT_TRIGGER flag when clearing the current trigger
   - Do not leak resources in error handling paths
   - Fix unsigned comparison which can never be negative
   - Provide missing NULL terminating entries in tables
   - Fix misnaming issues"

* tag 'leds-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (53 commits)
  leds: leds-mt6323: Adjust return/parameter types in wled get/set callbacks
  leds: sgm3140: Add richtek,rt5033-led compatible
  dt-bindings: leds: sgm3140: Document richtek,rt5033 compatible
  dt-bindings: backlight: kinetic,ktz8866: Add missing type for "current-num-sinks"
  dt-bindings: leds: Drop unneeded quotes
  leds: Fix config reference for AW200xx driver
  leds: leds-mt6323: Add support for WLEDs and MT6332
  leds: leds-mt6323: Add support for MT6331 leds
  leds: leds-mt6323: Open code and drop MT6323_CAL_HW_DUTY macro
  leds: leds-mt6323: Drop MT6323_ prefix from macros and defines
  leds: leds-mt6323: Specify registers and specs in platform data
  dt-bindings: leds: leds-mt6323: Document mt6332 compatible
  dt-bindings: leds: leds-mt6323: Document mt6331 compatible
  leds: simatic-ipc-leds-gpio: Introduce more Kconfig switches
  leds: simatic-ipc-leds-gpio: Split up into multiple drivers
  leds: simatic-ipc-leds-gpio: Move two extra gpio pins into another table
  leds: simatic-ipc-leds-gpio: Add terminating entries to gpio tables
  leds: flash: leds-qcom-flash: Fix an unsigned comparison which can never be negative
  leds: cht-wcove: Remove unneeded semicolon
  leds: cht-wcove: Fix an unsigned comparison which can never be negative
  ...
2023-07-03 11:26:05 -07:00
Linus Torvalds
3a8a670eee Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
 "WiFi 7 and sendpage changes are the biggest pieces of work for this
  release. The latter will definitely require fixes but I think that we
  got it to a reasonable point.

  Core:

   - Rework the sendpage & splice implementations

     Instead of feeding data into sockets page by page extend sendmsg
     handlers to support taking a reference on the data, controlled by a
     new flag called MSG_SPLICE_PAGES

     Rework the handling of unexpected-end-of-file to invoke an
     additional callback instead of trying to predict what the right
     combination of MORE/NOTLAST flags is

     Remove the MSG_SENDPAGE_NOTLAST flag completely

   - Implement SCM_PIDFD, a new type of CMSG type analogous to
     SCM_CREDENTIALS, but it contains pidfd instead of plain pid

   - Enable socket busy polling with CONFIG_RT

   - Improve reliability and efficiency of reporting for ref_tracker

   - Auto-generate a user space C library for various Netlink families

  Protocols:

   - Allow TCP to shrink the advertised window when necessary, prevent
     sk_rcvbuf auto-tuning from growing the window all the way up to
     tcp_rmem[2]

   - Use per-VMA locking for "page-flipping" TCP receive zerocopy

   - Prepare TCP for device-to-device data transfers, by making sure
     that payloads are always attached to skbs as page frags

   - Make the backoff time for the first N TCP SYN retransmissions
     linear. Exponential backoff is unnecessarily conservative

   - Create a new MPTCP getsockopt to retrieve all info
     (MPTCP_FULL_INFO)

   - Avoid waking up applications using TLS sockets until we have a full
     record

   - Allow using kernel memory for protocol ioctl callbacks, paving the
     way to issuing ioctls over io_uring

   - Add nolocalbypass option to VxLAN, forcing packets to be fully
     encapsulated even if they are destined for a local IP address

   - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
     in-kernel ECMP implementation (e.g. Open vSwitch) select the same
     link for all packets. Support L4 symmetric hashing in Open vSwitch

   - PPPoE: make number of hash bits configurable

   - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
     (ipconfig)

   - Add layer 2 miss indication and filtering, allowing higher layers
     (e.g. ACL filters) to make forwarding decisions based on whether
     packet matched forwarding state in lower devices (bridge)

   - Support matching on Connectivity Fault Management (CFM) packets

   - Hide the "link becomes ready" IPv6 messages by demoting their
     printk level to debug

   - HSR: don't enable promiscuous mode if device offloads the proto

   - Support active scanning in IEEE 802.15.4

   - Continue work on Multi-Link Operation for WiFi 7

  BPF:

   - Add precision propagation for subprogs and callbacks. This allows
     maintaining verification efficiency when subprograms are used, or
     in fact passing the verifier at all for complex programs,
     especially those using open-coded iterators

   - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
     assumed the length is always equal to the amount of written data.
     But some protos allow passing a NULL buffer to discover what the
     output buffer *should* be, without writing anything

   - Accept dynptr memory as memory arguments passed to helpers

   - Add routing table ID to bpf_fib_lookup BPF helper

   - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands

   - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
     maps as read-only)

   - Show target_{obj,btf}_id in tracing link fdinfo

   - Addition of several new kfuncs (most of the names are
     self-explanatory):
      - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
        bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
        and bpf_dynptr_clone().
      - bpf_task_under_cgroup()
      - bpf_sock_destroy() - force closing sockets
      - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs

  Netfilter:

   - Relax set/map validation checks in nf_tables. Allow checking
     presence of an entry in a map without using the value

   - Increase ip_vs_conn_tab_bits range for 64BIT builds

   - Allow updating size of a set

   - Improve NAT tuple selection when connection is closing

  Driver API:

   - Integrate netdev with LED subsystem, to allow configuring HW
     "offloaded" blinking of LEDs based on link state and activity
     (i.e. packets coming in and out)

   - Support configuring rate selection pins of SFP modules

   - Factor Clause 73 auto-negotiation code out of the drivers, provide
     common helper routines

   - Add more fool-proof helpers for managing lifetime of MDIO devices
     associated with the PCS layer

   - Allow drivers to report advanced statistics related to Time Aware
     scheduler offload (taprio)

   - Allow opting out of VF statistics in link dump, to allow more VFs
     to fit into the message

   - Split devlink instance and devlink port operations

  New hardware / drivers:

   - Ethernet:
      - Synopsys EMAC4 IP support (stmmac)
      - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
      - Marvell 88E6250 7 port switches
      - Microchip LAN8650/1 Rev.B0 PHYs
      - MediaTek MT7981/MT7988 built-in 1GE PHY driver

   - WiFi:
      - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
      - Realtek RTL8723DS (SDIO variant)
      - Realtek RTL8851BE

   - CAN:
      - Fintek F81604

  Drivers:

   - Ethernet NICs:
      - Intel (100G, ice):
         - support dynamic interrupt allocation
         - use meta data match instead of VF MAC addr on slow-path
      - nVidia/Mellanox:
         - extend link aggregation to handle 4, rather than just 2 ports
         - spawn sub-functions without any features by default
      - OcteonTX2:
         - support HTB (Tx scheduling/QoS) offload
         - make RSS hash generation configurable
         - support selecting Rx queue using TC filters
      - Wangxun (ngbe/txgbe):
         - add basic Tx/Rx packet offloads
         - add phylink support (SFP/PCS control)
      - Freescale/NXP (enetc):
         - report TAPRIO packet statistics
      - Solarflare/AMD:
         - support matching on IP ToS and UDP source port of outer
           header
         - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
         - add devlink dev info support for EF10

   - Virtual NICs:
      - Microsoft vNIC:
         - size the Rx indirection table based on requested
           configuration
         - support VLAN tagging
      - Amazon vNIC:
         - try to reuse Rx buffers if not fully consumed, useful for ARM
           servers running with 16kB pages
      - Google vNIC:
         - support TCP segmentation of >64kB frames

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - enable USXGMII (88E6191X)
      - Microchip:
         - lan966x: add support for Egress Stage 0 ACL engine
         - lan966x: support mapping packet priority to internal switch
           priority (based on PCP or DSCP)

   - Ethernet PHYs:
      - Broadcom PHYs:
         - support for Wake-on-LAN for BCM54210E/B50212E
         - report LPI counter
      - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
      - Micrel PHYs: receive timestamp in the frame (LAN8841)
      - Realtek PHYs: support optional external PHY clock
      - Altera TSE PCS: merge the driver into Lynx PCS which it is a
        variant of

   - CAN: Kvaser PCIEcan:
      - support packet timestamping

   - WiFi:
      - Intel (iwlwifi):
         - major update for new firmware and Multi-Link Operation (MLO)
         - configuration rework to drop test devices and split the
           different families
         - support for segmented PNVM images and power tables
         - new vendor entries for PPAG (platform antenna gain) feature
      - Qualcomm 802.11ax (ath11k):
         - Multiple Basic Service Set Identifier (MBSSID) and Enhanced
           MBSSID Advertisement (EMA) support in AP mode
         - support factory test mode
      - RealTek (rtw89):
         - add RSSI based antenna diversity
         - support U-NII-4 channels on 5 GHz band
      - RealTek (rtl8xxxu):
         - AP mode support for 8188f
         - support USB RX aggregation for the newer chips"

* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
  net: scm: introduce and use scm_recv_unix helper
  af_unix: Skip SCM_PIDFD if scm->pid is NULL.
  net: lan743x: Simplify comparison
  netlink: Add __sock_i_ino() for __netlink_diag_dump().
  net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
  Revert "af_unix: Call scm_recv() only after scm_set_cred()."
  phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
  libceph: Partially revert changes to support MSG_SPLICE_PAGES
  net: phy: mscc: fix packet loss due to RGMII delays
  net: mana: use vmalloc_array and vcalloc
  net: enetc: use vmalloc_array and vcalloc
  ionic: use vmalloc_array and vcalloc
  pds_core: use vmalloc_array and vcalloc
  gve: use vmalloc_array and vcalloc
  octeon_ep: use vmalloc_array and vcalloc
  net: usb: qmi_wwan: add u-blox 0x1312 composition
  perf trace: fix MSG_SPLICE_PAGES build error
  ipvlan: Fix return value of ipvlan_queue_xmit()
  netfilter: nf_tables: fix underflow in chain reference counter
  netfilter: nf_tables: unbind non-anonymous set if rule construction fails
  ...
2023-06-28 16:43:10 -07:00
Linus Torvalds
582c161cf3 Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "There are three areas of note:

  A bunch of strlcpy()->strscpy() conversions ended up living in my tree
  since they were either Acked by maintainers for me to carry, or got
  ignored for multiple weeks (and were trivial changes).

  The compiler option '-fstrict-flex-arrays=3' has been enabled
  globally, and has been in -next for the entire devel cycle. This
  changes compiler diagnostics (though mainly just -Warray-bounds which
  is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
  coverage. In other words, there are no new restrictions, just
  potentially new warnings. Any new FORTIFY warnings we've seen have
  been fixed (usually in their respective subsystem trees). For more
  details, see commit df8fc4e934.

  The under-development compiler attribute __counted_by has been added
  so that we can start annotating flexible array members with their
  associated structure member that tracks the count of flexible array
  elements at run-time. It is possible (likely?) that the exact syntax
  of the attribute will change before it is finalized, but GCC and Clang
  are working together to sort it out. Any changes can be made to the
  macro while we continue to add annotations.

  As an example of that last case, I have a treewide commit waiting with
  such annotations found via Coccinelle:

    https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b

  Also see commit dd06e72e68 for more details.

  Summary:

   - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)

   - Convert strreplace() to return string start (Andy Shevchenko)

   - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)

   - Add missing function prototypes seen with W=1 (Arnd Bergmann)

   - Fix strscpy() kerndoc typo (Arne Welzel)

   - Replace strlcpy() with strscpy() across many subsystems which were
     either Acked by respective maintainers or were trivial changes that
     went ignored for multiple weeks (Azeem Shaikh)

   - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)

   - Add KUnit tests for strcat()-family

   - Enable KUnit tests of FORTIFY wrappers under UML

   - Add more complete FORTIFY protections for strlcat()

   - Add missed disabling of FORTIFY for all arch purgatories.

   - Enable -fstrict-flex-arrays=3 globally

   - Tightening UBSAN_BOUNDS when using GCC

   - Improve checkpatch to check for strcpy, strncpy, and fake flex
     arrays

   - Improve use of const variables in FORTIFY

   - Add requested struct_size_t() helper for types not pointers

   - Add __counted_by macro for annotating flexible array size members"

* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
  netfilter: ipset: Replace strlcpy with strscpy
  uml: Replace strlcpy with strscpy
  um: Use HOST_DIR for mrproper
  kallsyms: Replace all non-returning strlcpy with strscpy
  sh: Replace all non-returning strlcpy with strscpy
  of/flattree: Replace all non-returning strlcpy with strscpy
  sparc64: Replace all non-returning strlcpy with strscpy
  Hexagon: Replace all non-returning strlcpy with strscpy
  kobject: Use return value of strreplace()
  lib/string_helpers: Change returned value of the strreplace()
  jbd2: Avoid printing outside the boundary of the buffer
  checkpatch: Check for 0-length and 1-element arrays
  riscv/purgatory: Do not use fortified string functions
  s390/purgatory: Do not use fortified string functions
  x86/purgatory: Do not use fortified string functions
  acpi: Replace struct acpi_table_slit 1-element array with flex-array
  clocksource: Replace all non-returning strlcpy with strscpy
  string: use __builtin_memcpy() in strlcpy/strlcat
  staging: most: Replace all non-returning strlcpy with strscpy
  drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
  ...
2023-06-27 21:24:18 -07:00
Jakub Kicinski
3674fbf045 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.5 net-next PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27 09:45:22 -07:00
Jakub Kicinski
61dc651cdf Merge tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Allow slightly larger IPVS connection table size from Kconfig for
   64-bit arch, from Abhijeet Rastogi.

2) Since IPVS connection table might be larger than 2^20 after previous
   patch, allow to limit it depending on the available memory.
   Moreover, use kvmalloc. From Julian Anastasov.

3) Do not rebuild VLAN header in nft_payload when matching source and
   destination MAC address.

4) Remove nested rcu read lock side in ip_set_test(), from Florian Westphal.

5) Allow to update set size, also from Florian.

6) Improve NAT tuple selection when connection is closing,
   from Florian Westphal.

7) Support for resetting set element stateful expression, from Phil Sutter.

8) Use NLA_POLICY_MAX to narrow down maximum attribute value in nf_tables,
   from Florian Westphal.

* tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: limit allowed range via nla_policy
  netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET
  netfilter: snat: evict closing tcp entries on reply tuple collision
  netfilter: nf_tables: permit update of set size
  netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test
  netfilter: nft_payload: rebuild vlan header when needed
  ipvs: dynamically limit the connection hash table
  ipvs: increase ip_vs_conn_tab_bits range for 64BIT
====================

Link: https://lore.kernel.org/r/20230626064749.75525-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-26 12:59:18 -07:00
Pablo Neira Ayuso
b389139f12 netfilter: nf_tables: fix underflow in chain reference counter
Set element addition error path decrements reference counter on chains
twice: once on element release and again via nft_data_release().

Then, d6b478666f ("netfilter: nf_tables: fix underflow in object
reference counter") incorrectly fixed this by removing the stateful
object reference count decrement.

Restore the stateful object decrement as in b91d903688 ("netfilter:
nf_tables: fix leaking object reference count") and let
nft_data_release() decrement the chain reference counter, so this is
done only once.

Fixes: d6b478666f ("netfilter: nf_tables: fix underflow in object reference counter")
Fixes: 628bd3e49c ("netfilter: nf_tables: drop map element references from preparation phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 17:18:55 +02:00
Pablo Neira Ayuso
3e70489721 netfilter: nf_tables: unbind non-anonymous set if rule construction fails
Otherwise a dangling reference to a rule object that is gone remains
in the set binding list.

Fixes: 26b5a5712e ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 17:18:55 +02:00
Ilia.Gavrilov
f188d30087 netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
ct_sip_parse_numerical_param() returns only 0 or 1 now.
But process_register_request() and process_register_response() imply
checking for a negative value if parsing of a numerical header parameter
failed.
The invocation in nf_nat_sip() looks correct:
 	if (ct_sip_parse_numerical_param(...) > 0 &&
 	    ...) { ... }

Make the return value of the function ct_sip_parse_numerical_param()
a tristate to fix all the cases
a) return 1 if value is found; *val is set
b) return 0 if value is not found; *val is unchanged
c) return -1 on error; *val is undefined

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: 0f32a40fc9 ("[NETFILTER]: nf_conntrack_sip: create signalling expectations")
Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 17:18:48 +02:00
Florian Westphal
ff0a3a7d52 netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
Eric Dumazet says:
  nf_conntrack_dccp_packet() has an unique:

  dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh);

  And nothing more is 'pulled' from the packet, depending on the content.
  dh->dccph_doff, and/or dh->dccph_x ...)
  So dccp_ack_seq() is happily reading stuff past the _dh buffer.

BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0
Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371
[..]

Fix this by increasing the stack buffer to also include room for
the extra sequence numbers and all the known dccp packet type headers,
then pull again after the initial validation of the basic header.

While at it, mark packets invalid that lack 48bit sequence bit but
where RFC says the type MUST use them.

Compile tested only.

v2: first skb_header_pointer() now needs to adjust the size to
    only pull the generic header. (Eric)

Heads-up: I intend to remove dccp conntrack support later this year.

Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 13:26:39 +02:00
Florian Westphal
a412dbf40f netfilter: nf_tables: limit allowed range via nla_policy
These NLA_U32 types get stored in u8 fields, reject invalid values
instead of silently casting to u8.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:57 +02:00
Phil Sutter
079cd63321 netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET
Analogous to NFT_MSG_GETOBJ_RESET, but for set elements with a timeout
or attached stateful expressions like counters or quotas - reset them
all at once. Respect a per element timeout value if present to reset the
'expires' value to.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:57 +02:00
Florian Westphal
4589725502 netfilter: snat: evict closing tcp entries on reply tuple collision
When all tried source tuples are in use, the connection request (skb)
and the new conntrack will be dropped in nf_confirm() due to the
non-recoverable clash.

Make it so that the last 32 attempts are allowed to evict a colliding
entry if this connection is already closing and the new sequence number
has advanced past the old one.

Such "all tuples taken" secenario can happen with tcp-rpc workloads where
same dst:dport gets queried repeatedly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:57 +02:00
Florian Westphal
96b2ef9b16 netfilter: nf_tables: permit update of set size
Now that set->nelems is always updated permit update of the sets max size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:57 +02:00
Florian Westphal
78aa23d008 netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test
Callers already hold rcu_read_lock.

Prior to RCU conversion this used to be a read_lock_bh(), but now the
bh-disable isn't needed anymore.

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:56 +02:00
Pablo Neira Ayuso
de6843be30 netfilter: nft_payload: rebuild vlan header when needed
Skip rebuilding the vlan header when accessing destination and source
mac address.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:45 +02:00
Jakub Kicinski
a7384f3918 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

tools/testing/selftests/net/fcnal-test.sh
  d7a2fc1437 ("selftests: net: fcnal-test: check if FIPS mode is enabled")
  dd017c72dd ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.")
https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/
https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22 18:40:38 -07:00
Phil Sutter
42e344f016 netfilter: nf_tables: Fix for deleting base chains with payload
When deleting a base chain, iptables-nft simply submits the whole chain
to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code
added by fixed commit then turned this into a chain update, destroying
the hook but not the chain itself. Detect the situation by checking if
the chain type is either netdev or inet/ingress.

Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20 22:43:42 +02:00
Pablo Neira Ayuso
62f9a68a36 netfilter: nfnetlink_osf: fix module autoload
Move the alias from xt_osf to nfnetlink_osf.

Fixes: f932495208 ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20 22:43:42 +02:00
Pablo Neira Ayuso
043d2acf57 netfilter: nf_tables: drop module reference after updating chain
Otherwise the module reference counter is leaked.

Fixes b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20 22:43:42 +02:00
Pablo Neira Ayuso
e26d3009ef netfilter: nf_tables: disallow timeout for anonymous sets
Never used from userspace, disallow these parameters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20 22:43:41 +02:00