The driver does not call tasklet_kill in several places.
Add the calls to fix it.
Fixes: 85b85c8534 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Jiguang Xiao <jiguang.xiao@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the PF check the VF alive by the KEEP_ALVE
mailbox from VF. VF keep sending the mailbox per 2
seconds. Once PF lost the mailbox for more than 8
seconds, it will regards the VF is abnormal, and stop
notifying the state change to VF, include link state,
vf mac, reset, even though it receives the KEEP_ALIVE
mailbox again. It's inreasonable.
This patch fixes it. PF will record the state change which
need to notify VF when lost the VF's KEEP_ALIVE mailbox.
And notify VF when receive the mailbox again. Introduce a
new flag HCLGE_VPORT_STATE_INITED, used to distinguish the
case whether VF driver loaded or not. For VF will query
these states when initializing, so it's unnecessary to
notify it in this case.
Fixes: aa5c4f175b ("net: hns3: add reset handling for VF when doing PF reset")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima says:
===================
tcp: Fix bhash2 and TIME_WAIT regression.
We forgot to add twsk to bhash2. Therefore TIME_WAIT sockets cannot
prevent bind() to the same local address and port.
Changes:
v1:
* Patch 1:
* Add tw_bind2_node in inet_timewait_sock instead of
moving sk_bind2_node from struct sock to struct
sock_common.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bhash2 split the bind() validation logic into wildcard and non-wildcard
cases. Let's add a test to catch future regression.
Before the previous patch:
# ./bind_timewait
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN bind_timewait.localhost.1 ...
# bind_timewait.c:87:1:Expected ret (0) == -1 (-1)
# 1: Test terminated by assertion
# FAIL bind_timewait.localhost.1
not ok 1 bind_timewait.localhost.1
# RUN bind_timewait.addrany.1 ...
# OK bind_timewait.addrany.1
ok 2 bind_timewait.addrany.1
# FAILED: 1 / 2 tests passed.
# Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0
After:
# ./bind_timewait
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN bind_timewait.localhost.1 ...
# OK bind_timewait.localhost.1
ok 1 bind_timewait.localhost.1
# RUN bind_timewait.addrany.1 ...
# OK bind_timewait.addrany.1
ok 2 bind_timewait.addrany.1
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby reported regression of bind() with a simple repro. [0]
The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port. Before commit 28044fc1d4 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.
The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.
The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.
To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().
[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cited patch added support of matching on geneve option by setting
geneve_tlv_option_0_data mask and key but didn't set geneve_tlv_option_0_exist
bit which is required on some HWs when matching geneve_tlv_option_0_data parameter,
this may cause in some cases for packets to wrongly match on rules with different
geneve option.
Example of such case is packet with geneve_tlv_object class=789 and data=456
will wrongly match on rule with match geneve_tlv_object class=123 and data=456.
Fix it by setting geneve_tlv_option_0_exist bit when supported by the HW when matching
on geneve_tlv_option_0_data parameter.
Fixes: 9272e3df30 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
hw mtu (configured at xdp sq allocation) before xmiting it. This check
does not account for ethernet fcs length (calculated and filled by the
nic). Hence, when we try sending packets with length > (hw-mtu -
ethernet-fcs-size), the device port drops it and tx_errors_phy is
incremented. Desired behavior is to catch these packets and drop them
by the driver.
Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
subtracting ethernet FCS header size (4 Bytes) from current hw mtu
value, since ethernet FCS is calculated and written to ethernet frames
by the nic.
Fixes: d8bec2b29a ("net/mlx5e: Support bpf_xdp_adjust_head()")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit introduced a bug for multiple encapsulations flow.
If one dest encap becomes invalid, the flow is set slow path flag.
But when other dests encap become invalid, they are not cleared due
to slow path flag of the flow. When neigh-update-add is running, it
will use invalid encap.
Fix it by checking slow path flag after clearing dest encap.
Fixes: 9a5f9cc794 ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Need to use sprintf to build a string instead of sscanf. Otherwise
dirname is null and both "ct_nic" and "ct_fdb" won't be created.
But its redundant anyway as driver could be in switchdev mode but
still add nic rules. So use "ct" as folder name.
Fixes: 77422a8f6f ("net/mlx5e: CT: Add ct driver counters")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
RX reporter mistakenly reads from the regular (inactive) RQ
when XSK RQ is active. Fix it here.
Fixes: 3db4c85cde ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_build_nic_params will turn CQE compression on if the hardware
capability is enabled and the slow_pci_heuristic condition is detected.
As IPoIB doesn't support CQE compression, make sure to disable the
feature in the IPoIB profile init.
Please note that the feature is not exposed to the user for IPoIB
interfaces, so it can't be subsequently turned on.
Fixes: b797a684b0 ("net/mlx5e: Enable CQE compression when PCI is slower than link")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is
marked as unsupported on those VFs/SFs.
The cited patch added an option for disable (and enable) RoCE at HCA
level. However, that commit didn't check whether RoCE is supported on
the HCA and enabled user to try and set RoCE to on.
Fix it by checking whether the HCA supports RoCE.
Fixes: fbfa97b4d7 ("net/mlx5: Disable roce at HCA level")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.
Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.
Fixes: 51d138c261 ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
io_eq_size and event_eq_size params are of param type
DEVLINK_PARAM_TYPE_U32. But, the validation callback is addressing them
as DEVLINK_PARAM_TYPE_U16.
This cause mismatch in validation in big-endian systems, in which
values in range were rejected while 268500991 was accepted.
Fix it by checking the U32 value in the validation callback.
Fixes: 0844fa5f7b ("net/mlx5: Let user configure io_eq_size param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There are two cleanup calls missing in mlx5_init_once() error path.
Add them making the error path flow to be the same as
mlx5_cleanup_once().
Fixes: 52ec462eca ("net/mlx5: Add reserved-gids support")
Fixes: 7c39afb394 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.
In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.
Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Rudi reports a compilation failure on x86_64 when CONFIG_NET_CLS or
CONFIG_NET_CLS_ACT is not set but CONFIG_RETPOLINE is set.
A misplaced '#endif' was causing the issue.
Fixes: 7f0e810220 ("net/sched: add retpoline wrapper for tc")
Tested-by: Rudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the advice of the Documentation/filesystems/sysfs.rst
and show() should only use sysfs_emit() or sysfs_emit_at()
when formatting the value to be returned to user space.
Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunhao Lin says:
====================
r8169: fix dmar pte write access is not set error
This series fixes dmar pte write access is not set error.
Chunhao Lin (2):
r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
r8169: fix dmar pte write access is not set error
v2:
-update commit message
-adjust the code according to current kernel code
v3:
-update title and commit message
-split the patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When close device, if wol is enabled, rx will be enabled. When open
device it will cause rx packet to be dma to the wrong memory address
after pci_set_master() and system log will show blow messages.
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [02:00.0] PASID ffffffff fault addr
ffdd4000 [fault reason 05] PTE Write access is not set
In this patch, driver disable tx/rx when close device. If wol is
enabled, only enable rx filter and disable rxdv_gate(if support) to
let hardware only receive packet to fifo but not to dma it.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no functional change. Moving these two functions for following
patch "r8169: fix dmar pte write access is not set error".
Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniil Tatianin says:
====================
net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers
This series fixes a potential NULL dereference in ethtool_get_phy_stats
while also attempting to refactor/split said function into multiple
helpers so that it's easier to reason about what's going on.
I've taken Andrew Lunn's suggestions on the previous version of this
patch and added a bit of my own.
Changes since v1:
- Remove an extra newline in the first patch
- Move WARN_ON_ONCE into the if check as it already returns the
result of the comparison
- Actually split ethtool_get_phy_stats instead of attempting to
refactor it
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it's easier to follow and make sense of the branching and
various conditions.
Stats retrieval has been split into two separate functions
ethtool_get_phy_stats_phydev & ethtool_get_phy_stats_ethtool.
The former attempts to retrieve the stats using phydev & phy_ops, while
the latter uses ethtool_ops.
Actual n_stats validation & array allocation has been moved into a new
ethtool_vzalloc_stats_array helper.
This also fixes a potential NULL dereference of
ops->get_ethtool_phy_stats where it was getting called in an else branch
unconditionally without making sure it was actually present.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we always early return if we don't have any stats we can remove
these checks as they're no longer necessary.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not very useful to copy back an empty ethtool_stats struct and
return 0 if we didn't actually have any stats. This also allows for
further simplification of this function in the future commits.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Bug fixes
This series fixes a devlink bug and several XDP related bugs. The
devlink bug causes a kernel crash on VF devices. The XDP driver
patches fix and clean up the RX XDP path and re-enable header-data
split that was disabled by mistake when adding the XDP multi-buffer
support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent XDP multi-buffer feature has introduced regressions in the
setting of HDS and jumbo thresholds. HDS was accidentally disabled in
the nornmal mode without XDP. This patch restores jumbo HDS placement
when not in XDP mode. In XDP multi-buffer mode, HDS should be disabled
and the jumbo threshold should be set to the usable page size in the
first page buffer.
Fixes: 3286123619 ("bnxt: change receive ring space parameters")
Reviewed-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of the first buffer is always page size, and the useable
space is the page size minus the offset and the skb_shared_info size.
Make sure SKB and XDP buf sizes match so that the skb_shared_info
is at the same offset seen from the SKB and XDP_BUF.
build_skb() should be passed PAGE_SIZE. xdp_init_buff() should
be passed PAGE_SIZE as well. xdp_get_shared_info_from_buff() will
automatically deduct the skb_shared_info size if the XDP buffer
has frags. There is no need to keep bp->xdp_has_frags.
Change BNXT_PAGE_MODE_BUF_SIZE to BNXT_MAX_PAGE_MODE_MTU_SBUF
since this constant is really the MTU with ethernet header size
subtracted.
Also fix the BNXT_MAX_PAGE_MODE_MTU macro with proper parentheses.
Fixes: 3286123619 ("bnxt: change receive ring space parameters")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The XDP program can change the starting address of the RX data buffer and
this information needs to be passed back from bnxt_rx_xdp() to
bnxt_rx_pkt() for the XDP_PASS case so that the SKB can point correctly
to the modified buffer address. Add back the data_ptr parameter to
bnxt_rx_xdp() to make this work.
Fixes: b231c3f341 ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_xdp_buff_init() does not modify the data_ptr or the len parameters,
so no need to pass in the addresses of these parameters.
Fixes: b231c3f341 ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda says:
====================
net: ethernet: renesas: rswitch: Fix minor issues
This patch series is based on v6.2-rc2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To get mac address from device tree which is from each ethernet-port,
fix the first argument of of_get_ethdev_address().
Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rswitch_init() returns non-zero and this driver is re-probed,
the following error happens:
renesas_eth_sw e6880000.ethernet: Unbalanced pm_runtime_enable!
So, fix error path in renesas_eth_sw_probe().
Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
netdev doc de-FAQization
We have outgrown the FAQ format for our process doc.
I often find myself struggling to locate information in this doc,
because the questions do not serve well as section headers.
Reformat the document.
v2: update the headers
v1: https://lore.kernel.org/all/20221221184007.1170384-1-kuba@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netdev-FAQ document has grown over the years to the point
where finding information in it is somewhat challenging.
The length of the questions prevents readers from locating
content that's relevant at a glance.
Convert to a more standard documentation format with sections
and sub-sections rather than questions and answers.
The content edits are limited to what's necessary to change
the format, and very minor clarifications.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent changes will reformat the doc away from FAQ.
To make that more readable perform the pure section moves now.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of rxrpc_recvmsg(), if a call is found, the call is put and then
a trace line is emitted referencing that call in a couple of places - but
the call may have been deallocated by the time those traces happen.
Fix this by stashing the call debug_id in a variable and passing that to
the tracepoint rather than the call pointer.
Fixes: 849979051c ("rxrpc: Add a tracepoint to follow what recvmsg does")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix for uninitialized variable warning.
Addresses-Coverity: ("Uninitialized scalar variable")
Signed-off-by: Anuradha Weeraman <anuradha@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfc_get_device() take reference for the device, add missing
nfc_put_device() to release it when not need anymore.
Also fix the style warnning by use error EOPNOTSUPP instead of
ENOTSUPP.
Fixes: 5ce3f32b52 ("NFC: netlink: SE API implementation")
Fixes: 29e76924cf ("nfc: netlink: Add capability to reply to vendor_cmd with data")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTP hardware timestamping related objects are not linked when PTP
support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore
NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL
regardless of NET_DSA_MV88E6XXX_PTP.
Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's
dependencies are met, making sure that it cannot be enabled when
NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m.
In other words, this commit allows NET_DSA_MV88E6XXX to be built-in
while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is
prevented from being enabled.
Fixes: e5f3155267 ("ethernet: fix PTP_1588_CLOCK dependencies")
Signed-off-by: Johnny S. Lee <foss@jsl.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
case qlcnic_dcb_attach() would return an error, which always happens
under OOM conditions. This would lead to use-after-free because both
of the existing callers invoke qlcnic_dcb_get_info() on the obtained
pointer, which is potentially freed at that point.
Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
pointer at callsite using qlcnic_dcb_free(). This also removes the now
unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around
kfree() also causing memory leaks for partially initialized dcb.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Fixes: 3c44bba1d2 ("qlcnic: Disable DCB operations from SR-IOV VFs")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
[<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
[<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
[<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
[<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
[<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
[<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
[<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
[<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
[<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
[<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
[<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
[<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
[<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
[<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
[<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
[<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
[<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
[<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
[<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
[<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================
Kernel uses tcindex_change() to change an existing
filter properties.
Yet the problem is that, during the process of changing,
if `old_r` is retrieved from `p->perfect`, then
kernel uses tcindex_alloc_perfect_hash() to newly
allocate filter results, uses tcindex_filter_result_init()
to clear the old filter result, without destroying
its tcf_exts structure, which triggers the above memory leak.
To be more specific, there are only two source for the `old_r`,
according to the tcindex_lookup(). `old_r` is retrieved from
`p->perfect`, or `old_r` is retrieved from `p->h`.
* If `old_r` is retrieved from `p->perfect`, kernel uses
tcindex_alloc_perfect_hash() to newly allocate the
filter results. Then `r` is assigned with `cp->perfect + handle`,
which is newly allocated. So condition `old_r && old_r != r` is
true in this situation, and kernel uses tcindex_filter_result_init()
to clear the old filter result, without destroying
its tcf_exts structure
* If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
according to the tcindex_lookup(). Considering that `cp->h`
is directly copied from `p->h` and `p->perfect` is NULL,
`r` is assigned with `tcindex_lookup(cp, handle)`, whose value
should be the same as `old_r`, so condition `old_r && old_r != r`
is false in this situation, kernel ignores using
tcindex_filter_result_init() to clear the old filter result.
So only when `old_r` is retrieved from `p->perfect` does kernel use
tcindex_filter_result_init() to clear the old filter result, which
triggers the above memory leak.
Considering that there already exists a tc_filter_wq workqueue
to destroy the old tcindex_data by tcindex_partial_destroy_work()
at the end of tcindex_set_parms(), this patch solves
this memory leak bug by removing this old filter result
clearing part and delegating it to the tc_filter_wq workqueue.
Note that this patch doesn't introduce any other issues. If
`old_r` is retrieved from `p->perfect`, this patch just
delegates old filter result clearing part to the
tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
kernel doesn't reach the old filter result clearing part, so
removing this part has no effect.
[Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
and Dmitry Vyukov]
Fixes: b9a24bb76b ("net_sched: properly handle failure case of tcf_exts_init()")
Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY6YkXgAKCRDbK58LschI
g25kAP4jYi+YomSlmGUzN/fUbEIHkXXyh85Yh2/yHGYdVuIuvwEA0uXeC7JHQTca
dkcyYvgY6zJwFBV0lAVnhTRzFirFkQk=
=THs1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 231 insertions(+), 3 deletions(-).
The main changes are:
1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to
misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case
from Martin Lau.
2) Fix BPF verifier's nullness propagation when registers are of
type PTR_TO_BTF_ID, from Hao Sun.
3) Fix bpftool build for JIT disassembler under statically built
libllvm, from Anton Protopopov.
4) Fix warnings reported by resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled, from Hou Tao.
5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shows up when cross-compiling:
HOST_SCRATCH_DIR := $(OUTPUT)/host-tools
vs
SCRATCH_DIR := $(OUTPUT)/tools
HOST_SCRATCH_DIR := $(SCRATCH_DIR)
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221222213958.2302320-1-sdf@google.com
Hao Lan says:
====================
net: hns3: fix some bug for hns3
There are some bugfixes for the HNS3 ethernet driver. patch#1 fix miss
checking for rx packet. patch#2 fixes VF promisc mode not update
when mac table full bug, and patch#3 fixes a nterrupts not
initialization in VF FLR bug.
====================
Link: https://lore.kernel.org/r/20221222064343.61537-1-lanhao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>