Current GRO stack only supports incoming packets containing
one frame/MSS.
This patch changes GRO to accept packets that are already GRO.
HW-GRO (aka RSC for some vendors) is very often limited in presence
of interleaved packets. Linux SW GRO stack can complete the job
and provide larger GRO packets, thus reducing rate of ACK packets
and cpu overhead.
This also means BIG TCP can still be used, even if HW-GRO/RSC was
able to cook ~64 KB GRO packets.
v2: fix logic in tcp_gro_receive()
Only support TCP for the moment (Paolo)
Co-Developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MPTCP data path is quite complex and hard to understend even
without some foggy comments referring to modified code and/or
completely misleading from the beginning.
Update a few of them to more accurately describing the current
status.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daire reported a user-space application hang-up when the
peer is forcibly closed before the data transfer completion.
The relevant application expects the peer to either
do an application-level clean shutdown or a transport-level
connection reset.
We can accommodate a such user by extending the fastclose
usage: at fd close time, if the msk socket has some unread
data, and at FIN_WAIT timeout.
Note that at MPTCP close time we must ensure that the TCP
subflows will reset: set the linger socket option to a suitable
value.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an mptcp socket is closed due to an incoming FASTCLOSE
option, so specific sk_err is set and later syscall will
fail usually with EPIPE.
Align the current fastclose error handling with TCP reset,
properly setting the socket error according to the current
msk state and propagating such error.
Additionally sendmsg() is currently not handling properly
the sk_err, always returning EPIPE.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
Reduces duplicate code, no actual functional changes.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
Reduces duplicate code, no actual functional changes.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Refactor selftests to use an array of structs in xfrm_fill_key().
From Gautam Menghani.
2) Drop an unused argument from xfrm_policy_match.
From Hongbin Wang.
3) Support collect metadata mode for xfrm interfaces.
From Eyal Birger.
4) Add netlink extack support to xfrm.
From Sabrina Dubroca.
Please note, there is a merge conflict in:
include/net/dst_metadata.h
between commit:
0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
from the net-next tree and commit:
5182a5d48c ("net: allow storing xfrm interface metadata in metadata_dst")
from the ipsec-next tree.
Can be solved as done in linux-next.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
All bind_class callbacks are directly returned when n arg is empty.
Therefore, bind_class is invoked only when n arg is not empty.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since dsa_port_devlink_setup() and dsa_port_devlink_teardown() are
already called from code paths which only execute once per port (due to
the existing bool dp->setup), keeping another dp->devlink_port_setup is
redundant, because we can already manage to balance the calls properly
(and not call teardown when setup was never called, or call setup twice,
or things like that).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 3122433eb5 ("net: dsa: Register devlink ports before calling DSA driver setup()")
moved devlink port setup to be done early before driver setup()
was called. That is no longer needed, so move the devlink port
initialization back to dsa_port_setup(), as the first thing done there.
Note there is no longer needed to reinit port as unused if
dsa_port_setup() fails, as it unregisters the devlink port instance on
the error path.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a desire to simplify the dsa_port registration path with
devlink, and this involves reworking a bit how user ports which fail to
connect to their PHY (because it's missing) get reinitialized as UNUSED
devlink ports.
The desire is for the change to look something like this; basically
dsa_port_setup() has failed, we just change dp->type and call
dsa_port_setup() again.
-/* Destroy the current devlink port, and create a new one which has the UNUSED
- * flavour.
- */
-static int dsa_port_reinit_as_unused(struct dsa_port *dp)
+static int dsa_port_setup_as_unused(struct dsa_port *dp)
{
- dsa_port_devlink_teardown(dp);
dp->type = DSA_PORT_TYPE_UNUSED;
- return dsa_port_devlink_setup(dp);
+ return dsa_port_setup(dp);
}
For an UNUSED port, dsa_port_setup() mostly only calls dsa_port_devlink_setup()
anyway, so we could get away with calling just that. But if we call the
full blown dsa_port_setup(dp) (which will be needed to properly set
dp->setup = true), the callee will have the tendency to go through this
code block too, and call dsa_port_disable(dp):
switch (dp->type) {
case DSA_PORT_TYPE_UNUSED:
dsa_port_disable(dp);
break;
That is not very good, because dsa_port_disable() has this hidden inside
of it:
if (dp->pl)
phylink_stop(dp->pl);
Fact is, we are not prepared to handle a call to dsa_port_disable() with
a struct dsa_port that came from a previous (and failed) call to
dsa_port_setup(). We do not clean up dp->pl, and this will make the
second call to dsa_port_setup() call phylink_stop() on a dangling dp->pl
pointer.
Solve this by creating an API for phylink destruction which is symmetric
to the phylink creation, and never leave dp->pl set to anything except
NULL or a valid phylink structure.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move port_setup() op to be called before devlink_port_register() and
port_teardown() after devlink_port_unregister().
Note it makes sense to move this alongside the rest of the devlink port
code, the reinit() function also gets much nicer, as clearly the fact that
port_setup()->devlink_port_region_create() was called in dsa_port_setup
did not fit the flow.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lifetime of some of the devlink objects, like regions, is currently
forced to be different for devlink instance and devlink port instance
(per-port regions). The reason is that for devlink ports, the internal
structures initialization happens only after devlink_port_register() is
called.
To resolve this inconsistency, introduce new set of helpers to allow
driver to initialize devlink pointer and region list before
devlink_register() is called. That allows port regions to be created
before devlink port registration and destroyed after devlink
port unregistration.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of relying on devlink pointer not being initialized, introduce
an extra flag to indicate if devlink port is registered. This is needed
as later on devlink pointer is going to be initialized even in case
devlink port is not registered yet.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of checking devlink_port->devlink pointer for not being NULL
which indicates that devlink port is registered, put this check to new
pair of helpers similar to what we have for devlink and use them in
other functions.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next
- Add RTL8761BUV device (Edimax BT-8500)
- Add a new PID/VID 13d3/3583 for MT7921
- Add Realtek RTL8852C support ID 0x13D3:0x3592
- Add VID/PID 0489/e0e0 for MediaTek MT7921
- Add a new VID/PID 0e8d/0608 for MT7921
- Add a new PID/VID 13d3/3578 for MT7921
- Add BT device 0cb8:c549 from RTW8852AE
- Add support for Intel Magnetor
* tag 'for-net-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (49 commits)
Bluetooth: hci_sync: Fix not indicating power state
Bluetooth: L2CAP: Fix user-after-free
Bluetooth: Call shutdown for HCI_USER_CHANNEL
Bluetooth: Prevent double register of suspend
Bluetooth: hci_core: Fix not handling link timeouts propertly
Bluetooth: hci_event: Make sure ISO events don't affect non-ISO connections
Bluetooth: hci_debugfs: Fix not checking conn->debugfs
Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
Bluetooth: MGMT: fix zalloc-simple.cocci warnings
Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure
Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/release
Bluetooth: hci_sync: allow advertise when scan without RPA
Bluetooth: btusb: Add a new VID/PID 0e8d/0608 for MT7921
Bluetooth: btusb: Add a new PID/VID 13d3/3583 for MT7921
Bluetooth: avoid hci_dev_test_and_set_flag() in mgmt_init_hdev()
Bluetooth: btintel: Mark Intel controller to support LE_STATES quirk
Bluetooth: btintel: Add support for Magnetor
Bluetooth: btusb: Add a new PID/VID 13d3/3578 for MT7921
...
====================
Link: https://lore.kernel.org/r/20221001004602.297366-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When setting power state using legacy/non-mgmt API
(e.g hcitool hci0 up) the likes of mgmt_set_powered_complete won't be
called causing clients of the MGMT API to not be notified of the change
of the state.
Fixes: cf75ad8b41 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Kalle Valo says:
====================
wireless-next patches for v6.1
Few stack changes and lots of driver changes in this round. brcmfmac
has more activity as usual and it gets new hardware support. ath11k
improves WCN6750 support and also other smaller features. And of
course changes all over.
Note: in early September wireless tree was merged to wireless-next to
avoid some conflicts with mac80211 patches, this shouldn't cause any
problems but wanted to mention anyway.
Major changes:
mac80211
- refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO)
feature continues
brcmfmac
- support CYW43439 SDIO chipset
- support BCM4378 on Apple platforms
- support CYW89459 PCIe chipset
rtw89
- more work to get rtw8852c supported
- P2P support
- support for enabling and disabling MSDU aggregation via nl80211
mt76
- tx status reporting improvements
ath11k
- cold boot calibration support on WCN6750
- Target Wake Time (TWT) debugfs support for STA interface
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- implement SRAM dump debugfs interface
- enable threaded NAPI on all hardware
- WoW support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
wcn36xx
- add SNR from a received frame as a source of system entropy
* tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits)
wifi: rtl8xxxu: Improve rtl8xxxu_queue_select
wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
wifi: rtl8xxxu: gen2: Enable 40 MHz channel width
wifi: rtw89: 8852b: configure DLE mem
wifi: rtw89: check DLE FIFO size with reserved size
wifi: rtw89: mac: correct register of report IMR
wifi: rtw89: pci: set power cut closed for 8852be
wifi: rtw89: pci: add to do PCI auto calibration
wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf
wifi: rtw89: add DMA busy checking bits to chip info
wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels
wifi: rtw89: pci: mask out unsupported TX channels
iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
wifi: iwlwifi: Track scan_cmd allocation size explicitly
brcmfmac: Remove the call to "dtim_assoc" IOVAR
brcmfmac: increase dcmd maximum buffer size
brcmfmac: Support 89459 pcie
brcmfmac: increase default max WOWL patterns to 16
cw1200: fix incorrect check to determine if no element is found in list
...
====================
Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As 2.5G, 5G ethernet ports are more common and affordable,
these ports are being used in LAN bridge devices.
STP port_cost() is missing path_cost assignment for these link speeds,
causes highest cost 100 being used.
This result in lower speed port being picked
when there is loop between 5G and 1G ports.
Original path_cost: 10G=2, 1G=4, 100m=19, 10m=100
Adjusted path_cost: 10G=2, 5G=3, 2.5G=4, 1G=5, 100m=19, 10m=100
speed greater than 10G = 1
Signed-off-by: Steven Hsieh <steven.hsieh@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull already contains all of the checks performed by
pskb_pull.
Use pskb_may_pull for validation in pskb_pull, eliminating the
duplication and making __pskb_pull obsolete.
Replace __pskb_pull with pskb_pull where applicable.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data
types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the
existence of a per traffic class limitation of maximum frame sizes, with
a fallback on the port-based MTU.
As far as I am able to understand, the 802.1Q Service Data Unit (SDU)
represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding
any number of prepended VLAN headers which may be otherwise present in
the MSDU. Therefore, the queueMaxSDU is directly comparable to the
device MTU (1500 means L2 payload sizes are accepted, or frame sizes of
1518 octets, or 1522 plus one VLAN header). Drivers which offload this
are directly responsible of translating into other units of measurement.
To keep the fast path checks optimized, we keep 2 arrays in the qdisc,
one for max_sdu translated into frame length (so that it's comparable to
skb->len), and another for offloading and for dumping back to the user.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When adding optional new features to Qdisc offloads, existing drivers
must reject the new configuration until they are coded up to act on it.
Since modifying all drivers in lockstep with the changes in the Qdisc
can create problems of its own, it would be nice if there existed an
automatic opt-in mechanism for offloading optional features.
Jakub proposes that we multiplex one more kind of call through
ndo_setup_tc(): one where the driver populates a Qdisc-specific
capability structure.
First user will be taprio in further changes. Here we are introducing
the definitions for the base functionality.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 3226b158e6 ("net: avoid 32 x truesize under-estimation
for tiny skbs") we are observing 10-20% regressions in performance
tests with small packets. The perf trace points to high pressure on
the slab allocator.
This change tries to improve the allocation schema for small packets
using an idea originally suggested by Eric: a new per CPU page frag is
introduced and used in __napi_alloc_skb to cope with small allocation
requests.
To ensure that the above does not lead to excessive truesize
underestimation, the frag size for small allocation is inflated to 1K
and all the above is restricted to build with 4K page size.
Note that we need to update accordingly the run-time check introduced
with commit fd9ea57f4e ("net: add napi_get_frags_check() helper").
Alex suggested a smart page refcount schema to reduce the number
of atomic operations and deal properly with pfmemalloc pages.
Under small packet UDP flood, I measure a 15% peak tput increases.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Alexander H Duyck <alexanderduyck@fb.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This uses l2cap_chan_hold_unless_zero() after calling
__l2cap_get_chan_blah() to prevent the following trace:
Bluetooth: l2cap_core.c:static void l2cap_chan_destroy(struct kref
*kref)
Bluetooth: chan 0000000023c4974d
Bluetooth: parent 00000000ae861c08
==================================================================
BUG: KASAN: use-after-free in __mutex_waiter_is_first
kernel/locking/mutex.c:191 [inline]
BUG: KASAN: use-after-free in __mutex_lock_common
kernel/locking/mutex.c:671 [inline]
BUG: KASAN: use-after-free in __mutex_lock+0x278/0x400
kernel/locking/mutex.c:729
Read of size 8 at addr ffff888006a49b08 by task kworker/u3:2/389
Link: https://lore.kernel.org/lkml/20220622082716.478486-1-lee.jones@linaro.org
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Pull networking fixes from Paolo Abeni:
"Including fixes from wifi and can.
Current release - regressions:
- phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()
- wifi: fix locking in mac80211 mlme
- eth:
- revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
- mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
Previous releases - regressions:
- wifi: fix regression with non-QoS drivers
Previous releases - always broken:
- mptcp: fix unreleased socket in accept queue
- wifi:
- don't start TX with fq->lock to fix deadlock
- fix memory corruption in minstrel_ht_update_rates()
- eth:
- macb: fix ZynqMP SGMII non-wakeup source resume failure
- mt7531: only do PLL once after the reset
- usbnet: fix memory leak in usbnet_disconnect()
Misc:
- usb: qmi_wwan: add new usb-id for Dell branded EM7455"
* tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
mptcp: fix unreleased socket in accept queue
mptcp: factor out __mptcp_close() without socket lock
net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2}
net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge
can: c_can: don't cache TX messages for C_CAN cores
ice: xsk: drop power of 2 ring size restriction for AF_XDP
ice: xsk: change batched Tx descriptor cleaning
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
selftests: Fix the if conditions of in test_extra_filter()
net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
net: stmmac: power up/down serdes in stmmac_open/release
wifi: mac80211: mlme: Fix double unlock on assoc success handling
wifi: mac80211: mlme: Fix missing unlock on beacon RX
wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
wifi: mac80211: fix regression with non-QoS drivers
wifi: mac80211: ensure vif queues are operational after start
wifi: mac80211: don't start TX with fq->lock to fix deadlock
wifi: cfg80211: fix MCS divisor value
net: hippi: Add missing pci_disable_device() in rr_init_one()
net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
...
The mptcp socket and its subflow sockets in accept queue can't be
released after the process exit.
While the release of a mptcp socket in listening state, the
corresponding tcp socket will be released too. Meanwhile, the tcp
socket in the unaccept queue will be released too. However, only init
subflow is in the unaccept queue, and the joined subflow is not in the
unaccept queue, which makes the joined subflow won't be released, and
therefore the corresponding unaccepted mptcp socket will not be released
to.
This can be reproduced easily with following steps:
1. create 2 namespace and veth:
$ ip netns add mptcp-client
$ ip netns add mptcp-server
$ sysctl -w net.ipv4.conf.all.rp_filter=0
$ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
$ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
$ ip link add red-client netns mptcp-client type veth peer red-server \
netns mptcp-server
$ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
$ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
$ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
$ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
$ ip -n mptcp-server link set red-server up
$ ip -n mptcp-client link set red-client up
2. configure the endpoint and limit for client and server:
$ ip -n mptcp-server mptcp endpoint flush
$ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
$ ip -n mptcp-client mptcp endpoint flush
$ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
$ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
1 subflow
3. listen and accept on a port, such as 9999. The nc command we used
here is modified, which makes it use mptcp protocol by default.
$ ip netns exec mptcp-server nc -l -k -p 9999
4. open another *two* terminal and use each of them to connect to the
server with the following command:
$ ip netns exec mptcp-client nc 10.0.0.1 9999
Input something after connect to trigger the connection of the second
subflow. So that there are two established mptcp connections, with the
second one still unaccepted.
5. exit all the nc command, and check the tcp socket in server namespace.
And you will find that there is one tcp socket in CLOSE_WAIT state
and can't release forever.
Fix this by closing all of the unaccepted mptcp socket in
mptcp_subflow_queue_clean() with __mptcp_close().
Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
__mptcp_close() before they are released, so mptcp_sock_destruct(), which
is used to clean the unaccepted mptcp socket, is not needed anymore.
The selftests for mptcp is ran for this commit, and no new failures.
Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The v6_rcv_saddr and rcv_saddr are inside a union in the
'struct inet_bind2_bucket'. When searching a bucket by following the
bhash2 hashtable chain, eg. inet_bind2_bucket_match, it is only using
the sk->sk_family and there is no way to check if the inet_bind2_bucket
has a v6 or v4 address in the union. This leads to an uninit-value
KMSAN report in [0] and also potentially incorrect matches.
This patch fixes it by adding a family member to the inet_bind2_bucket
and then tests 'sk->sk_family != tb->family' before matching
the sk's address to the tb's address.
Cc: Joanne Koong <joannelkoong@gmail.com>
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20220927002544.3381205-1-kafai@fb.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can benefit from a smaller struct ubuf_info, so leave only mandatory
fields and let users to decide how they want to extend it. Convert
MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields.
This reduces the size from 48 bytes to just 16.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some drivers depend on shutdown being called for proper operation.
Unset HCI_USER_CHANNEL and call the full close routine since shutdown is
complementary to setup.
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Suspend notifier should only be registered and unregistered once per
hdev. Simplify this by only registering during driver registration and
simply exiting early when HCI_USER_CHANNEL is set.
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 359ee4f834 (Bluetooth: Unregister suspend with userchannel)
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>