Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Allow slightly larger IPVS connection table size from Kconfig for
64-bit arch, from Abhijeet Rastogi.
2) Since IPVS connection table might be larger than 2^20 after previous
patch, allow to limit it depending on the available memory.
Moreover, use kvmalloc. From Julian Anastasov.
3) Do not rebuild VLAN header in nft_payload when matching source and
destination MAC address.
4) Remove nested rcu read lock side in ip_set_test(), from Florian Westphal.
5) Allow to update set size, also from Florian.
6) Improve NAT tuple selection when connection is closing,
from Florian Westphal.
7) Support for resetting set element stateful expression, from Phil Sutter.
8) Use NLA_POLICY_MAX to narrow down maximum attribute value in nf_tables,
from Florian Westphal.
* tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: limit allowed range via nla_policy
netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET
netfilter: snat: evict closing tcp entries on reply tuple collision
netfilter: nf_tables: permit update of set size
netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test
netfilter: nft_payload: rebuild vlan header when needed
ipvs: dynamically limit the connection hash table
ipvs: increase ip_vs_conn_tab_bits range for 64BIT
====================
Link: https://lore.kernel.org/r/20230626064749.75525-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Cree says:
====================
sfc: fix unaligned access in loopback selftests
Arnd reported that the sfc drivers each define a packed loopback_payload
structure with an ethernet header followed by an IP header, whereas the
kernel definition of iphdr specifies that this is 4-byte aligned,
causing a W=1 warning.
Fix this in each case by adding two bytes of leading padding to the
struct, taking care that these are not sent on the wire.
Tested on EF10; build-tested on Siena and Falcon.
Changed in v2:
* added __aligned(4) to payload struct definitions (Arnd)
* fixed dodgy whitespace (checkpatch)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two bytes of padding to the start of struct ef4_loopback_payload,
which are not sent on the wire. This ensures the 'ip' member is
4-byte aligned, preventing the following W=1 warning:
net/ethernet/sfc/falcon/selftest.c:43:15: error: field ip within 'struct ef4_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct ef4_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct iphdr ip;
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two bytes of padding to the start of struct efx_loopback_payload,
which are not sent on the wire. This ensures the 'ip' member is
4-byte aligned, preventing the following W=1 warning:
net/ethernet/sfc/siena/selftest.c:46:15: error: field ip within 'struct efx_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct efx_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct iphdr ip;
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two bytes of padding to the start of struct efx_loopback_payload,
which are not sent on the wire. This ensures the 'ip' member is
4-byte aligned, preventing the following W=1 warning:
net/ethernet/sfc/selftest.c:46:15: error: field ip within 'struct efx_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct efx_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct iphdr ip;
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These NLA_U32 types get stored in u8 fields, reject invalid values
instead of silently casting to u8.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Analogous to NFT_MSG_GETOBJ_RESET, but for set elements with a timeout
or attached stateful expressions like counters or quotas - reset them
all at once. Respect a per element timeout value if present to reset the
'expires' value to.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When all tried source tuples are in use, the connection request (skb)
and the new conntrack will be dropped in nf_confirm() due to the
non-recoverable clash.
Make it so that the last 32 attempts are allowed to evict a colliding
entry if this connection is already closing and the new sequence number
has advanced past the old one.
Such "all tuples taken" secenario can happen with tcp-rpc workloads where
same dst:dport gets queried repeatedly.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that set->nelems is always updated permit update of the sets max size.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Callers already hold rcu_read_lock.
Prior to RCU conversion this used to be a read_lock_bh(), but now the
bh-disable isn't needed anymore.
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ocfs2 uses kzalloc() to allocate buffers for o2net_hand, o2net_keep_req and
o2net_keep_resp and then passes these to sendpage. This isn't really
allowed as the lifetime of slab objects is not controlled by page ref -
though in this case it will probably work. sendmsg() with MSG_SPLICE_PAGES
will, however, print a warning and give an error.
Fix it to use folio_alloc() instead to allocate a buffer for the handshake
message, keepalive request and reply messages.
Fixes: 98211489d4 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mark Fasheh <mark@fasheh.com>
cc: Kurt Hackel <kurt.hackel@oracle.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: Joseph Qi <joseph.qi@linux.alibaba.com>
cc: ocfs2-devel@oss.oracle.com
Link: https://lore.kernel.org/r/20230623225513.2732256-14-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage in
skb_send_sock(). This causes pages to be spliced from the source iterator
if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Note that this could perhaps be improved to fill out a bvec array with all
the frags and then make a single sendmsg call, possibly sticking the header
on the front also.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/20230623225513.2732256-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5-updates-2023-06-21
mlx5 driver minor cleanup and fixes to net-next
* tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Remove pointless vport lookup from mlx5_esw_check_port_type()
net/mlx5: Remove redundant check from mlx5_esw_query_vport_vhca_id()
net/mlx5: Remove redundant is_mdev_switchdev_mode() check from is_ib_rep_supported()
net/mlx5: Remove redundant MLX5_ESWITCH_MANAGER() check from is_ib_rep_supported()
net/mlx5e: E-Switch, Fix shared fdb error flow
net/mlx5e: Remove redundant comment
net/mlx5e: E-Switch, Pass other_vport flag if vport is not 0
net/mlx5e: E-Switch, Use xarray for devcom paired device index
net/mlx5e: E-Switch, Add peer fdb miss rules for vport manager or ecpf
net/mlx5e: Use vhca_id for device index in vport rx rules
net/mlx5: Lag, Remove duplicate code checking lag is supported
net/mlx5: Fix error code in mlx5_is_reset_now_capable()
net/mlx5: Fix reserved at offset in hca_cap register
net/mlx5: Fix SFs kernel documentation error
net/mlx5: Fix UAF in mlx5_eswitch_cleanup()
====================
Link: https://lore.kernel.org/r/20230623192907.39033-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter says:
====================
netlink: add display-hint to ynl
Add a display-hint property to the netlink schema, to be used by generic
netlink clients as hints about how to display attribute values.
A display-hint on an attribute definition is intended for letting a
client such as ynl know that, for example, a u32 should be rendered as
an ipv4 address. The display-hint enumeration includes a small number of
networking domain-specific value types.
====================
Link: https://lore.kernel.org/r/20230623201928.14275-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a display-hint property to the netlink schema that is for providing
optional hints to generic netlink clients about how to display attribute
values. A display-hint on an attribute definition is intended for
letting a client such as ynl know that, for example, a u32 should be
rendered as an ipv4 address. The display-hint enumeration includes a
small number of networking domain-specific value types.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230623201928.14275-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Miquel Raynal says:
====================
Core WPAN changes:
- Support for active scans
- Support for answering BEACON_REQ
- Specific MLME handling for limited devices
WPAN driver changes:
- ca8210:
- Flag the devices as limited
- Remove stray gpiod_unexport() call
* tag 'ieee802154-for-net-next-2023-06-23' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next:
ieee802154: ca8210: Remove stray gpiod_unexport() call
ieee802154: ca8210: Flag the driver as being limited
net: ieee802154: Handle limited devices with only datagram support
mac802154: Handle received BEACON_REQ
ieee802154: Add support for allowing to answer BEACON_REQ
mac802154: Handle active scanning
ieee802154: Add support for user active scan requests
====================
Link: https://lore.kernel.org/r/20230623195506.40b87b5f@xps-13
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau says:
====================
selftests: mptcp: Refactoring and minor fixes
Patch 1 moves code around for clarity and improved code reuse.
Patch 2 makes use of new MPTCP info that consolidates MPTCP-level and
subflow-level information.
Patches 3-7 refactor code to favor limited-scope environment vars over
optional parameters.
Patch 8: typo fix
====================
Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-0-a883213c8ba9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
run_tests() accepts too many optional parameters. Before this modification,
it was required to set all of then when only the last one had to be
changed. That's not clear to see all these 0 and it makes the maintenance
harder:
run_tests $ns1 $ns2 10.0.1.1 1 2 3 slow
Instead, the parameter can be set as an env var with a limited scope:
foo=1 bar=2 next=3 \
run_tests $ns1 $ns2 10.0.1.1 slow
This patch switches to key/value "sflags=*" instead of positional parameter
sflags of do_transfer() and run_tests().
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-6-a883213c8ba9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
run_tests() accepts too many optional parameters. Before this modification,
it was required to set all of then when only the last one had to be
changed. That's not clear to see all these 0 and it makes the maintenance
harder:
run_tests $ns1 $ns2 10.0.1.1 1 2 3 slow
Instead, the parameter can be set as an env var with a limited scope:
foo=1 bar=2 next=3 \
run_tests $ns1 $ns2 10.0.1.1 slow
This patch switches to key/value "addr_nr_ns1=*, addr_nr_ns2=*" instead
of positional parameters addr_nr_ns1 and addr_nr_ns2 of do_transfer()
and run_tests().
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-5-a883213c8ba9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
run_tests() accepts too many optional parameters. Before this modification,
it was required to set all of then when only the last one had to be
changed. That's not clear to see all these 0 and it makes the maintenance
harder:
run_tests $ns1 $ns2 10.0.1.1 1 2 3 slow
Instead, the parameter can be set as an env var with a limited scope:
foo=1 bar=2 next=3 \
run_tests $ns1 $ns2 10.0.1.1 slow
This patch switches to key/value "test_linkfail=*" instead of positional
parameter test_linkfail of do_transfer() and run_tests().
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-4-a883213c8ba9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
New MPTCP info are being checked in multiple places to improve the code
coverage when using the userspace PM.
This patch makes chk_mptcp_info() more generic to be able to check
subflows, add_addr_signal and add_addr_accepted info (and even more
later). New arguments are now required to get different infos from the
two namespaces because some counters are specific to the client or the
server.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-2-a883213c8ba9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bartosz Golaszewski says:
====================
net: stmmac: introduce devres helpers for stmmac platform drivers
The goal of this series is two-fold: to make the API for stmmac platforms more
logically correct (by providing functions that acquire resources with release
counterparts that undo only their actions and nothing more) and to provide
devres variants of commonly use registration functions that allows to
significantly simplify the platform drivers.
The current pattern for stmmac platform drivers is to call
stmmac_probe_config_dt(), possibly the platform's init() callback and then
call stmmac_drv_probe(). The resources allocated by these calls will then
be released by calling stmmac_pltfr_remove(). This goes against the commonly
accepted way of providing each function that allocated a resource with a
function that frees it.
First: provide wrappers around platform's init() and exit() callbacks that
allow users to skip checking if the callbacks exist manually.
Second: provide stmmac_pltfr_probe() which calls the platform init() callback
and then calls stmmac_drv_probe() together with a variant of
stmmac_pltfr_remove() that DOES NOT call stmmac_remove_config_dt(). For now
this variant is called stmmac_pltfr_remove_no_dt() but once all users of
the old stmmac_pltfr_remove() are converted to the devres helper, it will be
renamed back to stmmac_pltfr_remove() and the no_dt function removed.
Finally use the devres helpers in dwmac-qco-ethqos to show how much simplier
the driver's probe() becomes.
This series obviously just starts the conversion process and other platform
drivers will need to be converted once the helpers land in net/.
====================
Link: https://lore.kernel.org/r/20230623100417.93592-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>