Commit Graph

1136096 Commits

Author SHA1 Message Date
Shenwei Wang 95698ff617 net: fec: using page pool to manage RX buffers
This patch optimizes the RX buffer management by using the page
pool. The purpose for this change is to prepare for the following
XDP support. The current driver uses one frame per page for easy
management.

Added __maybe_unused attribute to the following functions to avoid
the compiling warning. Those functions will be removed by a separate
patch once this page pool solution is accepted.
 - fec_enet_new_rxbdp
 - fec_enet_copybreak

The following are the comparing result between page pool implementation
and the original implementation (non page pool).

 --- small packet (64 bytes) testing are almost the same
 --- no matter what the implementation is
 --- on both i.MX8 and i.MX6SX platforms.

shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[  1] local 10.81.17.20 port 39728 connected with 10.81.16.245 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-1.0000 sec  37.0 MBytes   311 Mbits/sec
[  1] 1.0000-2.0000 sec  36.6 MBytes   307 Mbits/sec
[  1] 2.0000-3.0000 sec  37.2 MBytes   312 Mbits/sec
[  1] 3.0000-4.0000 sec  37.1 MBytes   312 Mbits/sec
[  1] 4.0000-5.0000 sec  37.2 MBytes   312 Mbits/sec
[  1] 5.0000-6.0000 sec  37.2 MBytes   312 Mbits/sec
[  1] 6.0000-7.0000 sec  37.2 MBytes   312 Mbits/sec
[  1] 7.0000-8.0000 sec  37.2 MBytes   312 Mbits/sec
[  1] 0.0000-8.0943 sec   299 MBytes   310 Mbits/sec

 --- Page Pool implementation on i.MX8 ----

shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[  1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-1.0000 sec   111 MBytes   933 Mbits/sec
[  1] 1.0000-2.0000 sec   111 MBytes   934 Mbits/sec
[  1] 2.0000-3.0000 sec   112 MBytes   935 Mbits/sec
[  1] 3.0000-4.0000 sec   111 MBytes   933 Mbits/sec
[  1] 4.0000-5.0000 sec   111 MBytes   934 Mbits/sec
[  1] 5.0000-6.0000 sec   111 MBytes   933 Mbits/sec
[  1] 6.0000-7.0000 sec   111 MBytes   931 Mbits/sec
[  1] 7.0000-8.0000 sec   112 MBytes   935 Mbits/sec
[  1] 8.0000-9.0000 sec   111 MBytes   933 Mbits/sec
[  1] 9.0000-10.0000 sec   112 MBytes   935 Mbits/sec
[  1] 0.0000-10.0077 sec  1.09 GBytes   933 Mbits/sec

 --- Non Page Pool implementation on i.MX8 ----

shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[  1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-1.0000 sec   104 MBytes   868 Mbits/sec
[  1] 1.0000-2.0000 sec   105 MBytes   878 Mbits/sec
[  1] 2.0000-3.0000 sec   105 MBytes   881 Mbits/sec
[  1] 3.0000-4.0000 sec   105 MBytes   879 Mbits/sec
[  1] 4.0000-5.0000 sec   105 MBytes   878 Mbits/sec
[  1] 5.0000-6.0000 sec   105 MBytes   878 Mbits/sec
[  1] 6.0000-7.0000 sec   104 MBytes   875 Mbits/sec
[  1] 7.0000-8.0000 sec   104 MBytes   875 Mbits/sec
[  1] 8.0000-9.0000 sec   104 MBytes   873 Mbits/sec
[  1] 9.0000-10.0000 sec   104 MBytes   875 Mbits/sec
[  1] 0.0000-10.0073 sec  1.02 GBytes   875 Mbits/sec

 --- Page Pool implementation on i.MX6SX ----

shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[  1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-1.0000 sec  78.8 MBytes   661 Mbits/sec
[  1] 1.0000-2.0000 sec  82.5 MBytes   692 Mbits/sec
[  1] 2.0000-3.0000 sec  82.4 MBytes   691 Mbits/sec
[  1] 3.0000-4.0000 sec  82.4 MBytes   691 Mbits/sec
[  1] 4.0000-5.0000 sec  82.5 MBytes   692 Mbits/sec
[  1] 5.0000-6.0000 sec  82.4 MBytes   691 Mbits/sec
[  1] 6.0000-7.0000 sec  82.5 MBytes   692 Mbits/sec
[  1] 7.0000-8.0000 sec  82.4 MBytes   691 Mbits/sec
[  1] 8.0000-9.0000 sec  82.4 MBytes   691 Mbits/sec
[  1] 9.0000-9.5506 sec  45.0 MBytes   686 Mbits/sec
[  1] 0.0000-9.5506 sec   783 MBytes   688 Mbits/sec

 --- Non Page Pool implementation on i.MX6SX ----

shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[  1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-1.0000 sec  70.5 MBytes   591 Mbits/sec
[  1] 1.0000-2.0000 sec  64.5 MBytes   541 Mbits/sec
[  1] 2.0000-3.0000 sec  73.6 MBytes   618 Mbits/sec
[  1] 3.0000-4.0000 sec  73.6 MBytes   618 Mbits/sec
[  1] 4.0000-5.0000 sec  72.9 MBytes   611 Mbits/sec
[  1] 5.0000-6.0000 sec  73.4 MBytes   616 Mbits/sec
[  1] 6.0000-7.0000 sec  73.5 MBytes   617 Mbits/sec
[  1] 7.0000-8.0000 sec  73.4 MBytes   616 Mbits/sec
[  1] 8.0000-9.0000 sec  73.4 MBytes   616 Mbits/sec
[  1] 9.0000-10.0000 sec  73.9 MBytes   620 Mbits/sec
[  1] 0.0000-10.0174 sec   723 MBytes   605 Mbits/sec

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 12:43:59 +01:00
Guillaume Nault 9bc61c04ff net: Remove DECnet leftovers from flow.h.
DECnet was removed by commit 1202cdd665 ("Remove DECnet support from
kernel"). Let's also revome its flow structure.

Compile-tested only (allmodconfig).

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 12:41:59 +01:00
Jianglei Nie b43f9acbb8 bnx2x: fix potential memory leak in bnx2x_tpa_stop()
bnx2x_tpa_stop() allocates a memory chunk from new_data with
bnx2x_frag_alloc(). The new_data should be freed when gets some error.
But when "pad + len > fp->rx_buf_size" is true, bnx2x_tpa_stop() returns
without releasing the new_data, which will lead to a memory leak.

We should free the new_data with bnx2x_frag_free() when "pad + len >
fp->rx_buf_size" is true.

Fixes: 07b0f00964 ("bnx2x: fix possible panic under memory stress")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 12:41:07 +01:00
Raju Lakkaraju cb4b12071a eth: lan743x: reject extts for non-pci11x1x devices
Remove PTP_PF_EXTTS support for non-PCI11x1x devices since they do not support
the PTP-IO Input event triggered timestamping mechanisms added

Fixes: 60942c397a ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 12:39:39 +01:00
Coco Li 5eddb24901 gro: add support of (hw)gro packets to gro stack
Current GRO stack only supports incoming packets containing
one frame/MSS.

This patch changes GRO to accept packets that are already GRO.

HW-GRO (aka RSC for some vendors) is very often limited in presence
of interleaved packets. Linux SW GRO stack can complete the job
and provide larger GRO packets, thus reducing rate of ACK packets
and cpu overhead.

This also means BIG TCP can still be used, even if HW-GRO/RSC was
able to cook ~64 KB GRO packets.

v2: fix logic in tcp_gro_receive()

    Only support TCP for the moment (Paolo)

Co-Developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 12:38:34 +01:00
Jiasheng Jiang 9e6fd874c7 net: prestera: acl: Add check for kmemdup
As the kemdup could return NULL, it should be better to check the return
value and return error if fails.
Moreover, the return value of prestera_acl_ruleset_keymask_set() should
be checked by cascade.

Fixes: 604ba23090 ("net: prestera: flower template support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Taras Chornyi<tchornyi@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 12:35:21 +01:00
David S. Miller 197060c155 Merge branch 'mptcp-fastclose'
Mat Martineau says:

====================
mptcp: Fastclose edge cases and error handling

MPTCP has existing code to use the MP_FASTCLOSE option header, which
works like a RST for the MPTCP-level connection (regular RSTs only
affect specific subflows in MPTCP). This series has some improvements
for fastclose.

Patch 1 aligns fastclose socket error handling with TCP RST behavior on
TCP sockets.

Patch 2 adds use of MP_FASTCLOSE in some more edge cases, like file
descriptor close, FIN_WAIT timeout, and when the socket has unread data.

Patch 3 updates the fastclose self tests.

Patch 4 does not change any code, just fixes some outdated comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:18:53 +01:00
Paolo Abeni d89e3ed76b mptcp: update misleading comments.
The MPTCP data path is quite complex and hard to understend even
without some foggy comments referring to modified code and/or
completely misleading from the beginning.

Update a few of them to more accurately describing the current
status.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:18:53 +01:00
Paolo Abeni 6bf41020b7 selftests: mptcp: update and extend fastclose test-cases
After the previous patches, the MPTCP protocol can generate
fast-closes on both ends of the connection. Rework the relevant
test-case to carefully trigger the fast-close code-path on a
single end at the time, while ensuring than a predictable amount
of data is spooled on both ends.

Additionally add another test-cases for the passive socket
fast-close.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:18:53 +01:00
Paolo Abeni d21f834855 mptcp: use fastclose on more edge scenarios
Daire reported a user-space application hang-up when the
peer is forcibly closed before the data transfer completion.

The relevant application expects the peer to either
do an application-level clean shutdown or a transport-level
connection reset.

We can accommodate a such user by extending the fastclose
usage: at fd close time, if the msk socket has some unread
data, and at FIN_WAIT timeout.

Note that at MPTCP close time we must ensure that the TCP
subflows will reset: set the linger socket option to a suitable
value.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:18:53 +01:00
Paolo Abeni 69800e516e mptcp: propagate fastclose error
When an mptcp socket is closed due to an incoming FASTCLOSE
option, so specific sk_err is set and later syscall will
fail usually with EPIPE.

Align the current fastclose error handling with TCP reset,
properly setting the socket error according to the current
msk state and propagating such error.

Additionally sendmsg() is currently not handling properly
the sk_err, always returning EPIPE.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:18:53 +01:00
David S. Miller 7171e8a1a4 Merge branch 'RollBall-Hilink-Turris-10G-copper-SFP-support'
Marek Behún says:

====================
RollBall / Hilink / Turris 10G copper SFP support

I am resurrecting my attempt to add support for RollBall / Hilink /
Turris 10G copper SFPs modules.

The modules contain Marvell 88X3310 PHY, which can communicate with
the system via sgmii, 2500base-x, 5gbase-r, 10gbase-r or usxgmii mode.

Some of the patches I've taken from Russell King's net-queue [1]
(with some rebasing).

The important change from my previous attempts are:
- I am including the changes needed to phylink and marvell10g driver,
  so that the 88X3310 PHY is configured to use PHY modes supported by
  the host (the PHY defaults to use 10gbase-r only on host's side)
- I have changed the patch that informs phylib about the interfaces
  supported by the host (patch 5 of this series): it now fills in the
  phydev->host_interfaces member only when connecting a PHY that is
  inside a SFP module. This may change in the future.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:33 +01:00
Marek Behún 324e88cbe3 net: sfp: add support for multigig RollBall transceivers
This adds support for multigig copper SFP modules from RollBall/Hilink.
These modules have a specific way to access clause 45 registers of the
internal PHY.

We also need to wait at least 22 seconds after deasserting TX disable
before accessing the PHY. The code waits for 25 seconds just to be sure.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:33 +01:00
Marek Behún 09bbedac72 net: phy: mdio-i2c: support I2C MDIO protocol for RollBall SFP modules
Some multigig SFPs from RollBall and Hilink do not expose functional
MDIO access to the internal PHY of the SFP via I2C address 0x56
(although there seems to be read-only clause 22 access on this address).

Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced
Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be
selected to 3 and the password must be filled with 0xff bytes for this
PHY communication to work.

This extends the mdio-i2c driver to support this protocol by adding a
special parameter to mdio_i2c_alloc function via which this RollBall
protocol can be selected.

Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:33 +01:00
Marek Behún e85b1347ac net: sfp: create/destroy I2C mdiobus before PHY probe/after PHY release
Instead of configuring the I2C mdiobus when SFP driver is probed,
create/destroy the mdiobus before the PHY is probed for/after it is
released.

This way we can tell the mdio-i2c code which protocol to use for each
SFP transceiver.

Move the code that determines MDIO I2C protocol from
sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID
parsing is done. Don't allocate I2C bus if no PHY is expected.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:33 +01:00
Marek Behún 13c8adcf22 net: sfp: Add and use macros for SFP quirks definitions
Add macros SFP_QUIRK(), SFP_QUIRK_M() and SFP_QUIRK_F() for defining SFP
quirk table entries. Use them to deduplicate the code a little bit.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:33 +01:00
Marek Behún 31eb8907aa net: phylink: allow attaching phy for SFP modules on 802.3z mode
Some SFPs may contain an internal PHY which may in some cases want to
connect with the host interface in 1000base-x/2500base-x mode.
Do not fail if such PHY is being attached in one of these PHY interface
modes.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:33 +01:00
Russell King d6d2929264 net: phy: marvell10g: select host interface configuration
Select the host interface configuration according to the capabilities of
the host if the host provided them. This is currently provided only when
connecting PHY that is inside a SFP.

The PHY supports several configurations of host communication:
- always communicate with host in 10gbase-r, even if copper speed is
  lower (rate matching mode),
- the same as above but use xaui/rxaui instead of 10gbase-r,
- switch host SerDes mode between 10gbase-r, 5gbase-r, 2500base-x and
  sgmii according to copper speed,
- the same as above but use xaui/rxaui instead of 10gbase-r.

This mode of host communication, called MACTYPE, is by default selected
by strapping pins, but it can be changed in software.

This adds support for selecting this mode according to which modes are
supported by the host.

This allows the kernel to:
- support SFP modules with 88X33X0 or 88E21X0 inside them

Note: we use mv3310_select_mactype() for both 88X3310 and 88X3340,
although 88X3340 does not support XAUI. This is not a problem because
88X3340 does not declare XAUI in it's supported_interfaces, and so this
function will never choose that MACTYPE.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[ rebase, updated, also added support for 88E21X0 ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Marek Behún 3891569b2f net: phy: marvell10g: Use tabs instead of spaces for indentation
Some register definitions were defined with spaces used for indentation.
Change them to tabs.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Marek Behún eca68a3c7d net: phylink: pass supported host PHY interface modes to phylib for SFP's PHYs
Pass the supported PHY interface types to phylib if the PHY we are
connecting is inside a SFP, so that the PHY driver can select an
appropriate host configuration mode for their interface according to
the host capabilities.

For example the Marvell 88X3310 PHY inside RollBall SFP modules
defaults to 10gbase-r mode on host's side, and the marvell10g
driver currently does not change this setting. But a host may not
support 10gbase-r. For example Turris Omnia only supports sgmii,
1000base-x and 2500base-x modes. The PHY can be configured to use
those modes, but in order for the PHY driver to do that, it needs
to know which modes are supported.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Russell King (Oracle) e60846370c net: phylink: rename phylink_sfp_config()
phylink_sfp_config() now only deals with configuring the MAC for a
SFP containing a PHY. Rename it to be specific.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Russell King f81fa96d8a net: phylink: use phy_interface_t bitmaps for optical modules
Where a MAC provides a phy_interface_t bitmap, use these bitmaps to
select the operating interface mode for optical SFP modules, rather
than using the linkmode bitmaps.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Russell King fd580c9830 net: sfp: augment SFP parsing with phy_interface_t bitmap
We currently parse the SFP EEPROM to a bitmap of ethtool link modes,
and then attempt to convert the link modes to a PHY interface mode.
While this works at present, there are cases where this is sub-optimal.
For example, where a module can operate with several different PHY
interface modes.

To start addressing this, arrange for the SFP EEPROM parsing to also
provide a bitmap of the possible PHY interface modes.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Russell King (Oracle) 1645f44dd5 net: phylink: add ability to validate a set of interface modes
Rather than having the ability to validate all supported interface
modes or a single interface mode, introduce the ability to validate
a subset of supported modes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[ rebased on current net-next ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 11:08:32 +01:00
Zhao Liu 154fb14df7 x86/hyperv: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page()[1].

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap's pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

In the fuction hyperv_init() of hyperv/hv_init.c, the mapping is used in a
single thread and is short live. So, in this case, it's safe to simply use
kmap_local_page() to create mapping, and this avoids the wasted cost of
kmap() for global synchronization.

In addtion, the fuction hyperv_init() checks if kmap() fails by BUG_ON().
From the original discussion[2], the BUG_ON() here is just used to
explicitly panic NULL pointer. So still keep the BUG_ON() in place to check
if kmap_local_page() fails. Based on this consideration, memcpy_to_page()
is not selected here but only kmap_local_page() is used.

Therefore, replace kmap() with kmap_local_page() in hyperv/hv_init.c.

[1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com
[2]: https://lore.kernel.org/lkml/20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2/

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20220928095640.626350-1-zhao1.liu@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-10-03 08:49:48 +00:00
Ville Syrjälä 54b978e03a drm/i915: Round to closest in g4x+ HDMI clock readout
On pre-ddi platforms we have slightly different code being
used for HDMI TMDS clock to dotclock conversion between the
state computation and state readout. Both of these need to
round the same way in order to not get a mismatch between
the computed and read out states. Fix up the rounding
direction in the readout path to match what is used during
state computation.

Another option would to just use intel_crtc_dotclock()
in the readout path as well, but I don't really want to
do that as the current code more accurately represents
how the hardware really works; The HDMI port register
defines whether we're actually outputting 8bpc or 12bpc
over HDMI, and the PIPECONF bpc setting just defines what
goes over FDI between the CPU and PCH. The fact that we
try to cram all that into a single pipe_bpp during state
computation is perhaps not entirely great...

Fixes: f2c9df1010 ("drm/i915: Round TMDS clock to nearest")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220926193021.23287-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 86b972ef10)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2022-10-03 09:46:03 +01:00
Manivannan Sadhasivam 9cf4843e1a PCI: qcom-ep: Make use of the cached dev pointer
In the qcom_pcie_ep_get_resources() function, dev pointer is already
cached in a local variable. So let's make use of it instead of getting
the dev pointer again from pdev struct.

Link: https://lore.kernel.org/r/20220914075350.7992-4-manivannan.sadhasivam@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
2022-10-03 10:38:16 +02:00
Manivannan Sadhasivam e2efd31465 PCI: qcom-ep: Rely on the clocks supplied by devicetree
Generally, device drivers should just rely on the platform data like
devicetree to supply the clocks required for the functioning of the
peripheral. There is no need to hardcode the clk info in the driver.
So get rid of the static clk info and obtain the platform supplied
clks.

The total number of clocks supplied is obtained using the
devm_clk_bulk_get_all() API and used for the rest of the clk_bulk_ APIs.

Link: https://lore.kernel.org/r/20220914075350.7992-3-manivannan.sadhasivam@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
2022-10-03 10:38:16 +02:00
Manivannan Sadhasivam f1bfbd000f PCI: qcom-ep: Add kernel-doc for qcom_pcie_ep structure
Add kernel-doc for qcom_pcie_ep structure.

Link: https://lore.kernel.org/r/20220914075350.7992-2-manivannan.sadhasivam@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
2022-10-03 10:38:16 +02:00
Richard Zhu cbcf8722b5 phy: freescale: imx8m-pcie: Fix the wrong order of phy_init() and phy_power_on()
Refer to phy_core driver, phy_init() must be called before phy_power_on().
Fix the wrong order of phy_init() and phy_power_on() here.

Link: https://lore.kernel.org/r/1662344583-18874-1-git-send-email-hongxing.zhu@nxp.com
Fixes: 1aa97b0022 ("phy: freescale: pcie: Initialize the imx8 pcie standalone phy driver")
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
2022-10-03 10:34:46 +02:00
Richard Zhu 3db1e531e4 PCI: imx6: Add i.MX8MP PCIe support
Add i.MX8MP PCIe support.
To avoid codes duplication when find the syscon regmap, add the iomux
gpr syscon compatible into drvdata.

Link: https://lore.kernel.org/r/1662109086-15881-8-git-send-email-hongxing.zhu@nxp.com
Tested-by: Marek Vasut <marex@denx.de>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2022-10-03 10:34:46 +02:00
Andy Shevchenko 0dbc45241d PCI: dwc: Replace of_gpio_named_count() by gpiod_count()
As a preparation to unexport of_gpio_named_count(), convert the
driver to use gpiod_count() instead.

Link: https://lore.kernel.org/r/20220830183310.48541-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
2022-10-03 10:34:46 +02:00
Barnabás Pőcze 8d05fc0394 platform/x86: use PLATFORM_DEVID_NONE instead of -1
Use the `PLATFORM_DEVID_NONE` constant instead of
hard-coding -1 when creating a platform device.

No functional changes are intended.

Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
Link: https://lore.kernel.org/r/20220930104857.2796923-1-pobrn@protonmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-10-03 09:40:04 +02:00
Mario Limonciello c928df03bd platform/x86/amd: pmc: Dump idle mask during "check" stage instead
The idle mask is dumped during the "prepare" and "restore" stage
right now, which helps to demonstrate issues only related to the
first s2idle entry.

If the system has entered s2idle once, but was woken up never
breaking the s2idle loop but also never went back to sleep we
might still have another issue to deal with however.

Move the dynamic debugging message here so that we'll catch it on
each iteration.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216516
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220929215042.745-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-10-03 09:39:55 +02:00
Nathan Huckleberry 73ea735073 net: sparx5: Fix return type of sparx5_port_xmit_impl
The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of sparx5_port_xmit_impl should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 08:07:40 +01:00
Kuniyuki Iwashima 7a62ed6136 af_unix: Fix memory leaks of the whole sk due to OOB skb.
syzbot reported a sequence of memory leaks, and one of them indicated we
failed to free a whole sk:

  unreferenced object 0xffff8880126e0000 (size 1088):
    comm "syz-executor419", pid 326, jiffies 4294773607 (age 12.609s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00  ........}.......
      01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
    backtrace:
      [<000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970
      [<0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029
      [<00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928
      [<00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997
      [<0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516
      [<00000000da1521e1>] sock_create net/socket.c:1566 [inline]
      [<00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698
      [<000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline]
      [<000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline]
      [<000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748
      [<000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
      [<000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets,
send()ing an OOB skb to each other, and close()ing them without consuming
the OOB skbs.

  int skpair[2];

  socketpair(AF_UNIX, SOCK_STREAM, 0, skpair);

  send(skpair[0], "x", 1, MSG_OOB);
  send(skpair[1], "x", 1, MSG_OOB);

  close(skpair[0]);
  close(skpair[1]);

Currently, we free an OOB skb in unix_sock_destructor() which is called via
__sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb
is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is
called only when sk->sk_wmem_alloc is 0.

In the repro sequences, we do not consume the OOB skb, so both two sk's
sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc.
Then, no one can consume the OOB skb nor call __sk_free(), and we finally
leak the two whole sk.

Thus, we must free the unconsumed OOB skb earlier when close()ing the
socket.

Fixes: 314001f0bf ("af_unix: Add OOB support")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 08:00:50 +01:00
David S. Miller 3735264d72 Merge branch 'ip_tunnel-netlink-parms'
Liu Jian says:

====================
Add helper functions to parse netlink msg of ip_tunnel

v1->v2: Move the implementation of the helper function to ip_tunnel_core.c
v2->v3: Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:59:06 +01:00
Liu Jian b86fca800a net: Add helper function to parse netlink msg of ip_tunnel_parm
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
Reduces duplicate code, no actual functional changes.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:59:06 +01:00
Liu Jian 537dd2d9fb net: Add helper function to parse netlink msg of ip_tunnel_encap
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
Reduces duplicate code, no actual functional changes.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:59:06 +01:00
Tetsuo Handa a91b750fd6 net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for
commit ac3615e7f3 ("RDS: TCP: Reduce code duplication in
rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section
protected by lock_sock() without realizing that rds_send_xmit() might call
lock_sock().

We don't need to protect cancel_delayed_work_sync() using lock_sock(), for
even if rds_{send,recv}_worker() re-queued this work while __flush_work()
 from cancel_delayed_work_sync() was waiting for this work to complete,
retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP
bit.

Link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2 [1]
Reported-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Fixes: ac3615e7f3 ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:56:02 +01:00
David S. Miller 42e8e6d906 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Refactor selftests to use an array of structs in xfrm_fill_key().
   From Gautam Menghani.

2) Drop an unused argument from xfrm_policy_match.
   From Hongbin Wang.

3) Support collect metadata mode for xfrm interfaces.
   From Eyal Birger.

4) Add netlink extack support to xfrm.
   From Sabrina Dubroca.

Please note, there is a merge conflict in:

include/net/dst_metadata.h

between commit:

0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")

from the net-next tree and commit:

5182a5d48c ("net: allow storing xfrm interface metadata in metadata_dst")

from the ipsec-next tree.

Can be solved as done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:52:13 +01:00
Takashi Iwai 02f2e785c4 Merge branch 'for-next' into for-linus 2022-10-03 08:48:26 +02:00
Wilken Gottwalt 0cf46a653b hwmon: (corsair-psu) add USB id of new revision of the HX1000i psu
Also updates the documentation accordingly.

Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
Link: https://lore.kernel.org/r/YznOUQ7Pijedu0NW@monster.localdomain
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-10-02 14:38:55 -07:00
Linus Torvalds 4fe89d07dc Linux 6.0 2022-10-02 14:09:07 -07:00
Masahiro Yamada cff6fdf0b2 ia64: simplify esi object addition in Makefile
CONFIG_IA64_ESI is a bool option. I do not know why the Makefile was
written like this, but this should not have any functional change.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2022-10-03 03:53:12 +09:00
Masahiro Yamada 0f1fe9d616 Revert "kbuild: Check if linker supports the -X option"
This reverts commit d79a27195a.

According to the commit description, this ld-option test was added for
the gold linker at that time.

Commit 75959d44f9 ("kbuild: Fail if gold linker is detected") gave
up the gold linker support after all.

I tested the BFD linker from binutils 2.23 and LLD from LLVM 11.0.0.
Both of them support the -X option.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-03 03:52:58 +09:00
Masahiro Yamada 5d4aeffbf7 kbuild: rebuild .vmlinux.export.o when its prerequisite is updated
When include/linux/export-internal.h is updated, .vmlinux.export.o
must be rebuilt, but it does not happen because its rule is hidden
behind scripts/link-vmlinux.sh.

Move it out of the shell script, so that Make can see the dependency
between vmlinux and .vmlinux.export.o.

Move the vmlinux rule to scripts/Makefile.vmlinux.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-03 03:52:58 +09:00
Masahiro Yamada 7a342e6c77 kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o
Do not build modules.builtin(.modinfo) as a side-effect of vmlinux.

There are no good reason to rebuild them just because any of vmlinux's
prerequistes (vmlinux.lds, .vmlinux.export.c, etc.) has been updated.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-03 03:52:58 +09:00
Alexey Kardashevskiy 637a642f5c zstd: Fixing mixed module-builtin objects
With CONFIG_ZSTD_COMPRESS=m and CONFIG_ZSTD_DECOMPRESS=y we end up in
a situation when files from lib/zstd/common/ are compiled once to be
linked later for ZSTD_DECOMPRESS (build-in) and ZSTD_COMPRESS (module)
even though CFLAGS are different for builtins and modules.
So far somehow this was not a problem but enabling LLVM LTO exposes
the problem as:

ld.lld: error: linking module flags 'Code Model': IDs have conflicting values in 'lib/built-in.a(zstd_common.o at 5868)' and 'ld-temp.o'

This particular conflict is caused by KBUILD_CFLAGS=-mcmodel=medium vs.
KBUILD_CFLAGS_MODULE=-mcmodel=large , modules use the large model on
POWERPC as explained at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Makefile?h=v5.18-rc4#n127
but the current use of common files is wrong anyway.

This works around the issue by introducing a zstd_common module with
shared code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-03 03:52:58 +09:00
Masahiro Yamada d32b55f4bb kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols
Every EXPORT_SYMBOL creates __kstrtab_* and __kstrtabns_*, which
consumes 15-20% of the kallsyms entries.

For example, on the system built from the x86_64 defconfig,

  $ cat /proc/kallsyms | wc
     129527    388581   5685465
  $ cat /proc/kallsyms | grep __kstrtab | wc
      23489     70467   1187932

We already ignore __crc_* symbols populated by EXPORT_SYMBOL, so it
should be fine to ignore __kstrtab_* and __kstrtabns_* as well.

This makes vmlinux a bit smaller.

  $ size vmlinux.before vmlinux.after
     text    data     bss     dec     hex filename
  22785374        8559694 1413328 32758396        1f3da7c vmlinux.before
  22785374        8137806 1413328 32336508        1ed6a7c vmlinux.after

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-03 03:51:58 +09:00