Compare commits

...

966 Commits

Author SHA1 Message Date
Linus Torvalds
d31e86ef63 arm64: access_ok() optimization
The TBI setup on arm64 is very strange: HW is set up to always do TBI,
but the kernel enforcement for system calls is purely a software
contract, and user space is supposed to mask off the top bits before the
system call.

Except all the actual brk/mmap/etc() system calls then mask it in kernel
space anyway, and accept any TBI address.

This basically unifies things and makes access_ok() also ignore it.

This is an ABI change, but the current situation is very odd, and this
change avoids the current mess and makes the kernel more permissive, and
as such is unlikely to break anything.

The way forward - for some possible future situation when people want to
use more bits - is probably to introduce a new "I actually want the full
64-bit address space" prctl.  But we should make sure that the software
and hardware rules actually match at that point.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-19 12:33:38 -07:00
Linus Torvalds
7fd298d4b3 arm64: start using 'asm goto' for put_user()
This generates noticeably better code since we don't need to test the
error register etc, the exception just jumps to the error handling
directly.

Unlike get_user(), there's no need to worry about old compilers. All
supported compilers support the regular non-output 'asm goto', as
pointed out by Nathan Chancellor.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-19 12:33:38 -07:00
Linus Torvalds
86a6a68feb arm64: start using 'asm goto' for get_user() when available
This generates noticeably better code with compilers that support it,
since we don't need to test the error register etc, the exception just
jumps to the error handling directly.

Note that this also marks SW_TTBR0_PAN incompatible with KCSAN support,
since KCSAN wants to save and restore the user access state.

KCSAN and SW_TTBR0_PAN were probably always incompatible, but it became
obvious only when implementing the unsafe user access functions.  At
that point the default empty user_access_save/restore() functions
weren't provided by the default fallback functions.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-19 12:33:38 -07:00
Linus Torvalds
6ba59ff422 Linux 6.10-rc4 2024-06-16 13:40:16 -07:00
Linus Torvalds
6456c4256d Merge tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
 "On parisc we have suffered since years from random segfaults which
  seem to have been triggered due to cache inconsistencies. Those
  segfaults happened more often on machines with PA8800 and PA8900 CPUs,
  which have much bigger caches than the earlier machines.

  Dave Anglin has worked over the last few weeks to fix this bug. His
  patch has been successfully tested by various people on various
  machines and with various kernels (6.6, 6.8 and 6.9), and the debian
  buildd servers haven't shown a single random segfault with this patch.

  Since the cache handling has been reworked, the patch is slightly
  bigger than I would like in this stage, but the greatly improved
  stability IMHO justifies the inclusion now"

* tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Try to fix random segmentation faults in package builds
2024-06-16 11:50:16 -07:00
Linus Torvalds
4301487e6b Merge tag 'i2c-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Two fixes to correctly report i2c functionality, ensuring that
  I2C_FUNC_SLAVE is reported when a device operates solely as a slave
  interface"

* tag 'i2c-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: designware: Fix the functionality flags of the slave-only interface
  i2c: at91: Fix the functionality flags of the slave-only interface
2024-06-16 11:37:38 -07:00
Linus Torvalds
b5beaa4474 Merge tag 'usb-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
 "Here are some small USB and Thunderbolt driver fixes for 6.10-rc4.
  Included in here are:

   - thunderbolt debugfs bugfix

   - USB typec bugfixes

   - kcov usb bugfix

   - xhci bugfixes

   - usb-storage bugfix

   - dt-bindings bugfix

   - cdc-wdm log message spam bugfix

  All of these, except for the last cdc-wdm log level change, have been
  in linux-next for a while with no reported problems. The cdc-wdm
  bugfix has been tested by syzbot and proved to fix the reported cpu
  lockup issues when the log is constantly spammed by a broken device"

* tag 'usb-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
  xhci: Handle TD clearing for multiple streams case
  xhci: Apply broken streams quirk to Etron EJ188 xHCI host
  xhci: Apply reset resume quirk to Etron EJ188 xHCI host
  xhci: Set correct transferred length for cancelled bulk transfers
  usb-storage: alauda: Check whether the media is initialized
  usb: typec: ucsi: Ack also failed Get Error commands
  kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
  dt-bindings: usb: realtek,rts5411: Add missing "additionalProperties" on child nodes
  usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
  usb: typec: tcpm: fix use-after-free case in tcpm_register_source_caps
  USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected
  usb: typec: ucsi: glink: increase max ports for x1e80100
  Revert "usb: chipidea: move ci_ulpi_init after the phy initialization"
  thunderbolt: debugfs: Fix margin debugfs node creation condition
2024-06-16 11:20:26 -07:00
Linus Torvalds
6efc63a843 Merge tag 'tty-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes that resolve som
  reported problems. Included in here are:

   - n_tty lookahead buffer bugfix

   - WARN_ON() removal where it was not needed

   - 8250_dw driver bugfixes

   - 8250_pxa bugfix

   - sc16is7xx Kconfig fixes for reported build issues

  All of these have been in linux-next for over a week with no reported
  problems"

* tag 'tty-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: drop debugging WARN_ON_ONCE() from uart_write()
  serial: sc16is7xx: re-add Kconfig SPI or I2C dependency
  serial: sc16is7xx: rename Kconfig CONFIG_SERIAL_SC16IS7XX_CORE
  serial: port: Don't block system suspend even if bytes are left to xmit
  serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level
  serial: 8250_dw: Revert "Move definitions to the shared header"
  serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw
  tty: n_tty: Fix buffer offsets when lookahead is used
2024-06-16 11:05:47 -07:00
Linus Torvalds
d3e6dc4ff0 Merge tag 'staging-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
 "Here is a single staging driver fix, for the vc04 driver. It resolves
  a reported problem that showed up in the merge window set of changes.

  It's been in linux-next for over a week with no reported problems"

* tag 'staging-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vchiq_debugfs: Fix NPD in vchiq_dump_state
2024-06-16 10:57:05 -07:00
Linus Torvalds
e12fa4dd64 Merge tag 'driver-core-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and sysfs fixes from Greg KH:
 "Here are three small changes for 6.10-rc4 that resolve reported
  problems, and finally drop an unused api call. These are:

   - removal of devm_device_add_groups(), all the callers of this are
     finally gone after the 6.10-rc1 merge (changes came in through
     different trees), so it's safe to remove.

   - much reported sysfs build error fixed up for systems that did not
     have sysfs enabled

   - driver core sync issue fix for a many reported issue over the years
     that no one really paid much attention to, until Dirk finally
     tracked down the real issue and made the "obviously correct and
     simple" fix for it.

  All of these have been in linux-next for over a week with no reported
  problems"

* tag 'driver-core-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers: core: synchronize really_probe() and dev_uevent()
  sysfs: Unbreak the build around sysfs_bin_attr_simple_read()
  driver core: remove devm_device_add_groups()
2024-06-16 10:43:04 -07:00
Linus Torvalds
33f855cbb7 Merge tag 'char-misc-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are a number of small char/misc and iio driver fixes for
  6.10-rc4. Included in here are the following:

   - iio driver fixes for a bunch of reported problems.

   - mei driver fixes for a number of reported issues.

   - amiga parport driver build fix.

   - .editorconfig fix that was causing lots of unintended whitespace
     changes to happen to files when they were being edited. Unless we
     want to sweep the whole tree and remove all trailing whitespace at
     once, this is needed for the .editorconfig file to be able to be
     used at all. This change is required because the original
     submitters never touched older files in the tree.

   - jfs bugfix for a buffer overflow

  The jfs bugfix is in here as I didn't know where else to put it, and
  it's been ignored for a while as the filesystem seems to be abandoned
  and I'm tired of seeing the same issue reported in multiple places.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (25 commits)
  .editorconfig: remove trim_trailing_whitespace option
  jfs: xattr: fix buffer overflow for invalid xattr
  misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe()
  misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe()
  parport: amiga: Mark driver struct with __refdata to prevent section mismatch
  mei: vsc: Fix wrong invocation of ACPI SID method
  mei: vsc: Don't stop/restart mei device during system suspend/resume
  mei: me: release irq in mei_me_pci_resume error path
  mei: demote client disconnect warning on suspend to debug
  iio: inkern: fix channel read regression
  iio: imu: inv_mpu6050: stabilized timestamping in interrupt
  iio: adc: ad7173: Fix sampling frequency setting
  iio: adc: ad7173: Clear append status bit
  iio: imu: inv_icm42600: delete unneeded update watermark call
  iio: imu: inv_icm42600: stabilized timestamp in interrupt
  iio: invensense: fix odr switching to same value
  iio: adc: ad7173: Remove index from temp channel
  iio: adc: ad7173: Add ad7173_device_info names
  iio: adc: ad7173: fix buffers enablement for ad7176-2
  iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
  ...
2024-06-16 10:29:37 -07:00
Linus Torvalds
e8b0264d6f Merge tag 'ata-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
 "Fix a bug where the SCSI Removable Media Bit (RMB) was incorrectly set
  for hot-plug capable (and eSATA) ports.

  The RMB bit means that the media is removable (e.g. floppy or CD-ROM),
  not that the device server is removable. If the RMB bit is set, SCSI
  will set the removable media sysfs attribute.

  If the removable media sysfs attribute is set on a device,
  GNOME/udisks will automatically mount the device on boot.

  We only want to set the SCSI RMB bit (and thus the removable media
  sysfs attribute) for devices where the ATA removable media device bit
  is set"

* tag 'ata-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-scsi: Set the RMB bit only for removable media devices
2024-06-16 10:20:18 -07:00
Linus Torvalds
e39388e430 Merge tag 'edac_urgent_for_v6.10_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:

 - Fix two issues with MI300 address translation logic

* tag 'edac_urgent_for_v6.10_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
  RAS/AMD/ATL: Fix MI300 bank hash
2024-06-16 10:11:11 -07:00
Linus Torvalds
be2fa8865c Merge tag 'firewire-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:

 - Update tracepoints events introduced in v6.10-rc1 so that it includes
   the numeric identifier of host card in which the event happens

 - replace wiki URL with the current website URL in Kconfig

* tag 'firewire-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: record card index in bus_reset_handle tracepoints event
  firewire: core: record card index in tracepoinrts events derived from bus_reset_arrange_template
  firewire: core: record card index in async_phy_inbound tracepoints event
  firewire: core: record card index in async_phy_outbound_complete tracepoints event
  firewire: core: record card index in async_phy_outbound_initiate tracepoints event
  firewire: core: record card index in tracepoinrts events derived from async_inbound_template
  firewire: core: record card index in tracepoinrts events derived from async_outbound_initiate_template
  firewire: core: record card index in tracepoinrts events derived from async_outbound_complete_template
  firewire: fix website URL in Kconfig
2024-06-16 09:58:02 -07:00
Hans de Goede
fcf2a9970e leds: class: Revert: "If no default trigger is given, make hw_control trigger the default trigger"
Commit 66601a29bb ("leds: class: If no default trigger is given, make
hw_control trigger the default trigger") causes ledtrig-netdev to get
set as default trigger on various network LEDs.

This causes users to hit a pre-existing AB-BA deadlock issue in
ledtrig-netdev between the LED-trigger locks and the rtnl mutex,
resulting in hung tasks in kernels >= 6.9.

Solving the deadlock is non trivial, so for now revert the change to
set the hw_control trigger as default trigger, so that ledtrig-netdev
no longer gets activated automatically for various network LEDs.

The netdev trigger is not needed because the network LEDs are usually under
hw-control and the netdev trigger tries to leave things that way so setting
it as the active trigger for the LED class device is a no-op.

Fixes: 66601a29bb ("leds: class: If no default trigger is given, make hw_control trigger the default trigger")
Reported-by: Genes Lists <lists@sapience.com>
Closes: https://lore.kernel.org/all/9d189ec329cfe68ed68699f314e191a10d4b5eda.camel@sapience.com/
Reported-by: Johannes Wüller <johanneswueller@gmail.com>
Closes: https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-16 09:33:28 -07:00
Wolfram Sang
7e9bb0cb50 Merge tag 'i2c-host-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Two fixes from Jean aim to correctly report i2c functionality,
specifically ensuring that I2C_FUNC_SLAVE is reported when a
device operates solely as a slave interface.
2024-06-16 12:48:30 +02:00
Yazen Ghannam
ba437905b4 RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
The currently used normalized address format is not applicable to all
MI300 systems. This leads to incorrect results during address
translation.

Drop the fixed layout and construct the normalized address from system
settings.

Fixes: 87a6123753 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-2-2f11547a178c@amd.com
2024-06-16 11:22:57 +02:00
Linus Torvalds
a3e18a5405 Merge tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:
 "Ensure xfs incore superblock's allocated inode counter, free inode
  counter, and free data block counter are all zero or positive when
  they are copied over from xfs_mount->m_[icount,ifree,fdblocks]
  respectively"

* tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: make sure sb_fdblocks is non-negative
2024-06-15 12:03:32 -07:00
Linus Torvalds
62e1f3b3fd Merge tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
 "Two small smb3 server fixes:

   - set xatttr fix

   - pathname parsing check fix"

* tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix missing use of get_write in in smb2_set_ea()
  ksmbd: move leading slash check to smb2_get_name()
2024-06-15 12:00:25 -07:00
Linus Torvalds
08a6b55aa0 Merge tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Fix the 8 bytes get_user() logic on x86-32

 - Fix build bug that creates weird & mistaken target directory under
   arch/x86/

* tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Don't add the EFI stub to targets, again
  x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
2024-06-15 11:03:05 -07:00
Linus Torvalds
41d707222e Merge tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Fix boot-time warning in tick_setup_device()"

* tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
2024-06-15 10:54:24 -07:00
Takashi Sakamoto
893098b2af firewire: core: record card index in bus_reset_handle tracepoints event
The bus reset event occurs in the bus managed by one of 1394 OHCI
controller in Linux system, however the existing tracepoints events has
the lack of data about it to distinguish the issued hardware from the
others.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-9-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:26 +09:00
Takashi Sakamoto
7507dbc46b firewire: core: record card index in tracepoinrts events derived from bus_reset_arrange_template
The asynchronous transmission of phy packet is initiated on one of 1394
OHCI controller, however the existing tracepoints events has the lack of
data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-8-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
abbb4bd96d firewire: core: record card index in async_phy_inbound tracepoints event
The asynchronous transmission of phy packet is initiated on one of 1394
OHCI controller, however the existing tracepoints events has the lack of
data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-7-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
810f2aa835 firewire: core: record card index in async_phy_outbound_complete tracepoints event
The asynchronous transmission of phy packet is initiated on one of 1394
OHCI controller, however the existing tracepoints events has the lack of
data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
3cb44a72a3 firewire: core: record card index in async_phy_outbound_initiate tracepoints event
The asynchronous transaction is initiated on one of 1394 OHCI
controller, however the existing tracepoints events has the lack of data
about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-5-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
65ec7ebefe firewire: core: record card index in tracepoinrts events derived from async_inbound_template
The asynchronous transaction is initiated on one of 1394 OHCI controller,
however the existing tracepoints events has the lack of data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
64e02b64fb firewire: core: record card index in tracepoinrts events derived from async_outbound_initiate_template
The asynchronous transaction is initiated on one of 1394 OHCI controller,
however the existing tracepoints events has the lack of data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
e7da16abf0 firewire: core: record card index in tracepoinrts events derived from async_outbound_complete_template
The asynchronous transaction is initiated on one of 1394 OHCI controller,
however the existing tracepoints events has the lack of data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
e789523fe2 firewire: fix website URL in Kconfig
The wiki in kernel.org is no longer updated. This commit replaces the
website URL with the latest one.

Link: https://lore.kernel.org/r/20240613090343.416198-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:04 +09:00
Linus Torvalds
44ef20baed Merge tag 's390-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - A couple of fixes for regressions resulting from the uncoupling of
   physical vs virtual kernel address spaces: fix the mapping of the
   kernel image using large pages; enforce alignment checks on physical
   addresses before creating large pages

 - Update defconfigs

* tag 's390-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Restore mapping of kernel image using large pages
  s390/mm: Allow large pages only for aligned physical addresses
  s390: Update defconfigs
2024-06-14 19:27:02 -07:00
Linus Torvalds
d4332da0f2 Merge tag 'drm-fixes-2024-06-15' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Weekly fixes. Seems a little quieter than usual, but still a bunch of
  stuff across the board. Mostly xe, some exynos and nouveau fixes.

  core:
   - Werror Kconfig fix

  panel:
   - add orientation quirk for Aya Neo KUN
   - fix runtime warning on panel/bridge release

  nouveau:
   - remove unused struct
   - fix wq crash on cards with no display

  amdgpu:
   - fix bo release clear page warning

  xe:
   - update MAINTAINERS
   - Use correct forcewake assertions
   - Assert that VRAM provisioning is only done on DGFX
   - Flush render caches before user-fence signalling on all engines
   - Move the disable_c6 call since it was sometimes never called

  exynos:
   - fix regression with fallback mode
   - fix EDID related memory leak
   - remove redundant code

  komeda:
   - fix debugfs conditional compilations
   - check pointer error value

  renesas:
   - atomic shutdown fix

  mediatek:
   - atomic shutdown fix"

* tag 'drm-fixes-2024-06-15' of https://gitlab.freedesktop.org/drm/kernel:
  arm/komeda: Remove all CONFIG_DEBUG_FS conditional compilations
  drm/xe: move disable_c6 call
  drm/xe: flush engine buffers before signalling user fence on all engines
  drm/xe/pf: Assert LMEM provisioning is done only on DGFX
  drm/xe/xe_gt_idle: use GT forcewake domain assertion
  drm/mediatek: Call drm_atomic_helper_shutdown() at shutdown time
  drm: renesas: shmobile: Call drm_atomic_helper_shutdown() at shutdown time
  drm/nouveau: remove unused struct 'init_exec'
  drm/nouveau: don't attempt to schedule hpd_work on headless cards
  drm/amdgpu: Fix the BO release clear memory warning
  drm/bridge/panel: Fix runtime warning on panel bridge release
  drm/komeda: check for error-valued pointer
  drm: panel-orientation-quirks: Add quirk for Aya Neo KUN
  drm/exynos/vidi: fix memory leak in .get_modes()
  drm/exynos: dp: drop driver owner initialization
  drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found
  drm: have config DRM_WERROR depend on !WERROR
  MAINTAINERS: Update Xe driver maintainers
  MAINTAINERS: update Xe driver maintainers
2024-06-14 18:57:28 -07:00
Linus Torvalds
68132b3536 Merge tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
 "Fix long standing lockdep issue of using remap_pfn_range() from the
  vfio-pci fault handler for mapping device MMIO. Commit ba168b52bf
  ("mm: use rwsem assertion macros for mmap_lock") now exposes this as a
  warning forcing this to be addressed.

  remap_pfn_range() was used here to efficiently map the entire vma, but
  it really never should have been used in the fault handler and doesn't
  handle concurrency, which introduced complex locking. We also needed
  to track vmas mapping the device memory in order to zap those vmas
  when the memory is disabled resulting in a vma list.

  Instead of all that mess, setup an address space on the device fd
  such that we can use unmap_mapping_range() for zapping to avoid the
  tracking overhead and use the standard vmf_insert_pfn() to insert
  mappings on fault.

  For now we'll iterate the vma and opportunistically try to insert
  mappings for the entire vma. This aligns with typical use cases, but
  hopefully in the future we can drop the iterative approach and make
  use of huge_fault instead, once vmf_insert_pfn{pud,pmd}() learn to
  handle pfnmaps"

* tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Insert full vma on mmap'd MMIO fault
  vfio/pci: Use unmap_mapping_range()
  vfio: Create vfio_fs_type with inode per device
2024-06-14 18:46:53 -07:00
Dave Airlie
9f0a86492a Merge tag 'drm-misc-fixes-2024-06-14' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.10-rc4:
- Kconfig fix for WERROR.
- Add panel quirk for Aya Neo KUN
- Small bugfixes in komeda, bridge/panel, amdgpu, nouveau.
- Remove unused nouveau struct.
- Call drm_atomic_helper_shutdown for shmobile and mediatek on shutdown.
- Remove DEBUGFS ifdefs from komeda.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/941c0552-3614-4af1-b04a-0a62c99fd7fb@linux.intel.com
2024-06-15 06:52:56 +10:00
Linus Torvalds
c286c21ff9 Merge tag 'block-6.10-20240614' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Discard double free on error conditions (Chunguang)
     - Target Fixes (Daniel)
     - Namespace detachment regression fix (Keith)

 - Fix for an issue with flush requests and queuelist reuse (Chengming)

 - nbd sparse annotation fixes (Christoph)

 - unmap and free bio mapped data via submitter (Anuj)

 - loop discard/fallocate unsupported fix (Cyril)

 - Fix for the zoned write plugging added in this release (Damien)

 - sed-opal wrong address fix (Su)

* tag 'block-6.10-20240614' of git://git.kernel.dk/linux:
  loop: Disable fallocate() zero and discard if not supported
  nvme: fix namespace removal list
  nbd: Remove __force casts
  nvmet: always initialize cqe.result
  nvmet-passthru: propagate status from id override functions
  nvme: avoid double free special payload
  block: unmap and free user mapped integrity via submitter
  block: fix request.queuelist usage in flush
  block: Optimize disk zone resource cleanup
  block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()
2024-06-14 11:41:50 -07:00
Linus Torvalds
ac3cb72aea Merge tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
 "Two fixes from Pavel headed to stable:

   - Ensure that the task state is correct before attempting to grab a
     mutex

   - Split cancel sequence flag into a separate variable, as it can get
     set by someone not owning the request (but holding the ctx lock)"

* tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux:
  io_uring: fix cancellation overwriting req->flags
  io_uring/rsrc: don't lock while !TASK_RUNNING
2024-06-14 11:17:24 -07:00
Linus Torvalds
0b320c8601 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Three obvious driver fixes and two core fixes.

  The two core fixes are to disable Command Duration Limits by default
  to fix an inconsistency in SATA and some USB devices. The other is to
  change the default read size for block zero to follow the device
  preference (some USB bridges preferring 16 byte commands don't have a
  translation for READ(10) and thus don't scan properly)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpi3mr: Fix ATA NCQ priority support
  scsi: ufs: core: Quiesce request queues before checking pending cmds
  scsi: core: Disable CDL by default
  scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory
  scsi: sd: Use READ(16) when reading block zero on large capacity disks
2024-06-14 10:25:29 -07:00
Linus Torvalds
11100273f2 Merge tag 'iommu-fix-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
 "A single patch that fixes a regression which several people reported:

   - AMD-Vi: Fix regression causing panics"

* tag 'iommu-fix-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix panic accessing amd_iommu_enable_faulting
2024-06-14 10:06:29 -07:00
Linus Torvalds
0cac73eb38 Merge tag 'pm-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
 "Restore the behavior of the no_turbo sysfs attribute in the
  intel_pstate driver which allowed users to make the driver start using
  turbo P-states if they have been enabled on the fly by the firmware
  after OS initialization (Rafael Wysocki)"

* tag 'pm-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()
2024-06-14 09:52:51 -07:00
Linus Torvalds
94df82fe5b Merge tag 'acpi-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix a recent regression in the ACPI EC driver and make system
  suspend work on multiple platforms where StorageD3Enable _DSD is
  missing in the ACPI tables.

  Specifics:

   - Make the ACPI EC driver directly evaluate an "orphan" _REG method
     under the EC device, if present, which stopped being evaluated
     after the driver had started to install its EC address space
     handler at the root of the ACPI namespace (Rafael Wysocki)

   - Make more devices put NVMe storage devices into D3 at suspend to
     work around missing StorageD3Enable _DSD in the BIOS (Mario
     Limonciello)"

* tag 'acpi-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: EC: Evaluate orphan _REG under EC device
  ACPI: x86: Force StorageD3Enable on more products
2024-06-14 09:39:14 -07:00
Linus Torvalds
cee84c0b00 Merge tag 'thermal-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
 "These fix three issues introduced recently, two related to defects in
  ACPI tables supplied by the platform firmware and one cause by a
  thermal core change that went too far:

   - Prevent the thermal core from failing the registration of a cooling
     device if its .get_cur_state() reports an incorrect state to start
     with which may happen for fans handled through firmware-supplied
     AML in ACPI tables (Rafael Wysocki)

   - Make the ACPI thermal zone driver initialize all trip points with
     temperature of 0 centigrade and below as invalid because such trip
     point temperatures do not make sense on systems with ACPI thermal
     control and they cause performance regressions due to permanent
     thermal mitigations to occur (Rafael Wysocki)

   - Restore passive polling management in the Step-Wise thermal
     governor that uses it to ensure that all cooling devices used for
     thermal mitigation will go back to their initial states eventually
     (Rafael Wysocki)"

* tag 'thermal-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: gov_step_wise: Restore passive polling management
  thermal: ACPI: Invalidate trip points with temperature of 0 or below
  thermal: core: Do not fail cdev registration because of invalid initial state
2024-06-14 09:28:56 -07:00
Rafael J. Wysocki
04f82fbb86 Merge branch acpi-x86
Merge a fix for a suspend issue related to storage handling on multiple
systems based on AMD hardware:

 - Make more devices put NVMe storage devices into D3 at suspend to work
   around missing StorageD3Enable _DSD in the BIOS (Mario Limonciello).

* branch acpi-x86:
  ACPI: x86: Force StorageD3Enable on more products
2024-06-14 14:27:16 +02:00
Cyril Hrubis
5f75e081ab loop: Disable fallocate() zero and discard if not supported
If fallcate is implemented but zero and discard operations are not
supported by the filesystem the backing file is on we continue to fill
dmesg with errors from the blk_mq_end_request() since each time we call
fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
ends up propagated into the block layer. In the end syscall succeeds
since the blkdev_issue_zeroout() falls back to writing zeroes which
makes the errors even more misleading and confusing.

How to reproduce:

1. make sure /tmp is mounted as tmpfs
2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
3. losetup /dev/loop0 /tmp/disk.img
4. mkfs.ext2 /dev/loop0
5. dmesg |tail

[710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0

This patch changes the lo_fallocate() to clear the flags for zero and
discard operations if we get EOPNOTSUPP from the backing file fallocate
callback, that way we at least stop spewing errors after the first
unsuccessful try.

CC: Jan Kara <jack@suse.cz>
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240613163817.22640-1-chrubis@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14 06:21:25 -06:00
Damien Le Moal
a6a75edc86 ata: libata-scsi: Set the RMB bit only for removable media devices
The SCSI Removable Media Bit (RMB) should only be set for removable media,
where the device stays and the media changes, e.g. CD-ROM or floppy.

The ATA removable media device bit is obsoleted since ATA-8 ACS (2006),
but before that it was used to indicate that the device can have its media
removed (while the device stays).

Commit 8a3e33cf92 ("ata: ahci: find eSATA ports and flag them as
removable") introduced a change to set the RMB bit if the port has either
the eSATA bit or the hot-plug capable bit set. The reasoning was that the
author wanted his eSATA ports to get treated like a USB stick.

This is however wrong. See "20-082r23SPC-6: Removable Medium Bit
Expectations" which has since been integrated to SPC, which states that:

"""
Reports have been received that some USB Memory Stick device servers set
the removable medium (RMB) bit to one. The rub comes when the medium is
actually removed, because... The device server is removed concurrently
with the medium removal. If there is no device server, then there is no
device server that is waiting to have removable medium inserted.

Sufficient numbers of SCSI analysts see such a device:
- not as a device that supports removable medium;
but
- as a removable, hot pluggable device.
"""

The definition of the RMB bit in the SPC specification has since been
clarified to match this.

Thus, a USB stick should not have the RMB bit set (and neither shall an
eSATA nor a hot-plug capable port).

Commit dc8b4afc4a ("ata: ahci: don't mark HotPlugCapable Ports as
external/removable") then changed so that the RMB bit is only set for the
eSATA bit (and not for the hot-plug capable bit), because of a lot of bug
reports of SATA devices were being automounted by udisks. However,
treating eSATA and hot-plug capable ports differently is not correct.

From the AHCI 1.3.1 spec:
Hot Plug Capable Port (HPCP): When set to '1', indicates that this port's
signal and power connectors are externally accessible via a joint signal
and power connector for blindmate device hot plug.

So a hot-plug capable port is an external port, just like commit
45b96d65ec ("ata: ahci: a hotplug capable port is an external port")
claims.

In order to not violate the SPC specification, modify the SCSI INQUIRY
data to only set the RMB bit if the ATA device can have its media removed.

This fixes a reported problem where GNOME/udisks was automounting devices
connected to hot-plug capable ports.

Fixes: 45b96d65ec ("ata: ahci: a hotplug capable port is an external port")
Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Tested-by: Thomas Weißschuh <linux@weissschuh.net>
Reported-by: Thomas Weißschuh <linux@weissschuh.net>
Closes: https://lore.kernel.org/linux-ide/c0de8262-dc4b-4c22-9fac-33432e5bddd3@t-8ch.de/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
[cassel: wrote commit message]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-06-14 14:18:46 +02:00
Maxime Ripard
14731a640e Merge drm/drm-fixes into drm-misc-fixes
Roll -rc3 and current drm/fixes in.

This will also unstuck our for-next branch.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-06-14 09:55:46 +02:00
pengfuyuan
41f590e31c arm/komeda: Remove all CONFIG_DEBUG_FS conditional compilations
Since the debugfs functions have no-op stubs for CONFIG_DEBUG_FS=n,
the compiler will optimize the rest away since they are no longer referenced.

The benefit of removing the conditional compilation is that the build
is actually tested for both CONFIG_DEBUG_FS configuration values.
Assuming most developers have it enabled, CONFIG_DEBUG_FS=n is not tested
much and may fail the build due to the conditional compilation.

Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: pengfuyuan <pengfuyuan@kylinos.cn>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240606120842.1377267-1-pengfuyuan@kylinos.cn
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-06-14 08:57:28 +02:00
Alan Stern
22f0081286 USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
The syzbot fuzzer found that the interrupt-URB completion callback in
the cdc-wdm driver was taking too long, and the driver's immediate
resubmission of interrupt URBs with -EPROTO status combined with the
dummy-hcd emulation to cause a CPU lockup:

cdc_wdm 1-1:1.0: nonzero urb status received: -71
cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes
watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625]
CPU#0 Utilization every 4s during lockup:
	#1:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#2:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#3:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#4:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#5:  98% system,	  1% softirq,	  3% hardirq,	  0% idle
Modules linked in:
irq event stamp: 73096
hardirqs last  enabled at (73095): [<ffff80008037bc00>] console_emit_next_record kernel/printk/printk.c:2935 [inline]
hardirqs last  enabled at (73095): [<ffff80008037bc00>] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994
hardirqs last disabled at (73096): [<ffff80008af10b00>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline]
hardirqs last disabled at (73096): [<ffff80008af10b00>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551
softirqs last  enabled at (73048): [<ffff8000801ea530>] softirq_handle_end kernel/softirq.c:400 [inline]
softirqs last  enabled at (73048): [<ffff8000801ea530>] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582
softirqs last disabled at (73043): [<ffff800080020de8>] __do_softirq+0x14/0x20 kernel/softirq.c:588
CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G        W          6.10.0-rc2-syzkaller-g8867bbd4a056 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024

Testing showed that the problem did not occur if the two error
messages -- the first two lines above -- were removed; apparently adding
material to the kernel log takes a surprisingly large amount of time.

In any case, the best approach for preventing these lockups and to
avoid spamming the log with thousands of error messages per second is
to ratelimit the two dev_err() calls.  Therefore we replace them with
dev_err_ratelimited().

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Reported-and-tested-by: syzbot+5f996b83575ef4058638@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/00000000000073d54b061a6a1c65@google.com/
Reported-and-tested-by: syzbot+1b2abad17596ad03dcff@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/000000000000f45085061aa9b37e@google.com/
Fixes: 9908a32e94 ("USB: remove err() macro from usb class drivers")
Link: https://lore.kernel.org/linux-usb/40dfa45b-5f21-4eef-a8c1-51a2f320e267@rowland.harvard.edu/
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/29855215-52f5-4385-b058-91f42c2bee18@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-14 08:47:59 +02:00
Pavel Begunkov
f4a1254f2a io_uring: fix cancellation overwriting req->flags
Only the current owner of a request is allowed to write into req->flags.
Hence, the cancellation path should never touch it. Add a new field
instead of the flag, move it into the 3rd cache line because it should
always be initialised. poll_refs can move further as polling is an
involved process anyway.

It's a minimal patch, in the future we can and should find a better
place for it and remove now unused REQ_F_CANCEL_SEQ.

Fixes: 521223d7c2 ("io_uring/cancel: don't default to setting req->work.cancel_seq")
Cc: stable@vger.kernel.org
Reported-by: Li Shi <sl1589472800@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6827b129f8f0ad76fa9d1f0a773de938b240ffab.1718323430.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-13 19:25:28 -06:00
Dave Airlie
f1909e8597 Merge tag 'drm-xe-fixes-2024-06-13' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Core Changes:
- Xe Maintainers update to MAINTAINERS file.

Driver Changes:
- Use correct forcewake assertions.
- Assert that VRAM provisioning is only done on DGFX.
- Flush render caches before user-fence signalling on all engines.
- Move the disable_c6 call since it was sometimes never called.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZmrXV0FoBb8M0c6J@fedora
2024-06-14 11:08:06 +10:00
Dave Airlie
ae1e782362 Merge tag 'exynos-drm-fixes-for-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Regression fix
- Fix an regression issue by adding 640x480 fallback mode
  for Exynos HDMI driver.

Bug fix
- Fix a memory leak by ensuring the duplicated EDID is properly freed in the get_modes function.

Code cleanup
- Remove redundant driver owner initialization since platform_driver_register() sets it automatically.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240610073839.37430-1-inki.dae@samsung.com
2024-06-14 10:12:27 +10:00
Jens Axboe
e3e53683cc Merge tag 'nvme-6.10-2024-06-13' of git://git.infradead.org/nvme into block-6.10
Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.10

 - Discard double free on error conditions (Chunguang)
 - Target Fixes (Daniel)
 - Namespace detachment regression fix (Keith)"

* tag 'nvme-6.10-2024-06-13' of git://git.infradead.org/nvme:
  nvme: fix namespace removal list
  nvmet: always initialize cqe.result
  nvmet-passthru: propagate status from id override functions
  nvme: avoid double free special payload
2024-06-13 14:19:57 -06:00
Keith Busch
ff0ffe5b7c nvme: fix namespace removal list
This function wants to move a subset of a list from one element to the
tail into another list. It also needs to use the srcu synchronize
instead of the regular rcu version. Do this one element at a time
because that's the only to do it.

Fixes: be647e2c76 ("nvme: use srcu for iterating namespace list")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-13 11:47:40 -07:00
Linus Torvalds
d20f6b3d74 Merge tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Slim pickings this time, probably a combination of summer, DevConf.cz,
  and the end of first half of the year at corporations.

  Current release - regressions:

   - Revert "igc: fix a log entry using uninitialized netdev", it traded
     lack of netdev name in a printk() for a crash

  Previous releases - regressions:

   - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ

   - geneve: fix incorrectly setting lengths of inner headers in the
     skb, confusing the drivers and causing mangled packets

   - sched: initialize noop_qdisc owner to avoid false-positive
     recursion detection (recursing on CPU 0), which bubbles up to user
     space as a sendmsg() error, while noop_qdisc should silently drop

   - netdevsim: fix backwards compatibility in nsim_get_iflink()

  Previous releases - always broken:

   - netfilter: ipset: fix race between namespace cleanup and gc in the
     list:set type"

* tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
  af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
  bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
  gve: Clear napi->skb before dev_kfree_skb_any()
  ionic: fix use after netif_napi_del()
  Revert "igc: fix a log entry using uninitialized netdev"
  net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
  net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
  net/ipv6: Fix the RT cache flush via sysctl using a previous delay
  net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
  gve: ignore nonrelevant GSO type bits when processing TSO headers
  net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
  tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
  mailmap: map Geliang's new email address
  mptcp: pm: update add_addr counters after connect
  mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
  mptcp: ensure snd_una is properly initialized on connect
  ...
2024-06-13 11:11:53 -07:00
Linus Torvalds
fd88e181d8 Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
 "Bugfixes:
   - NFSv4.2: Fix a memory leak in nfs4_set_security_label
   - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
   - NFS: Add appropriate memory barriers to the sillyrename code
   - Propagate readlink errors in nfs_symlink_filler
   - NFS: don't invalidate dentries on transient errors
   - NFS: fix unnecessary synchronous writes in random write workloads
   - NFSv4.1: enforce rootpath check when deciding whether or not to trunk

  Other:
   - Change email address for Trond Myklebust due to email server concerns"

* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add barriers when testing for NFS_FSDATA_BLOCKED
  SUNRPC: return proper error from gss_wrap_req_priv
  NFSv4.1 enforce rootpath check in fs_location query
  NFS: abort nfs_atomic_open_v23 if name is too long.
  nfs: don't invalidate dentries on transient errors
  nfs: Avoid flushing many pages with NFS_FILE_SYNC
  nfs: propagate readlink errors in nfs_symlink_filler
  MAINTAINERS: Change email address for Trond Myklebust
  NFSv4: Fix memory leak in nfs4_set_security_label
2024-06-13 11:07:32 -07:00
Linus Torvalds
3572597ca8 Merge tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
 "Fix validation of NUMA coverage.

  memblock_validate_numa_coverage() was checking for a unset node ID
  using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was
  specified by buggy firmware.

  Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in
  memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()"

* tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
  memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
2024-06-13 10:09:29 -07:00
Aleksandr Mishin
a9b9741854 bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
In case of token is released due to token->state == BNXT_HWRM_DEFERRED,
released token (set to NULL) is used in log messages. This issue is
expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But
this error code is returned by recent firmware. So some firmware may not
return it. This may lead to NULL pointer dereference.
Adjust this issue by adding token pointer check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 8fa4219dba ("bnxt_en: add dynamic debug support for HWRM messages")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 08:05:46 -07:00
Rao Shoaib
a6736a0add af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
Read with MSG_PEEK flag loops if the first byte to read is an OOB byte.
commit 22dd70eb2c ("af_unix: Don't peek OOB data without MSG_OOB.")
addresses the loop issue but does not address the issue that no data
beyond OOB byte can be read.

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
>>>

Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 08:03:55 -07:00
Michael Chan
7d9df38c9c bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
Firmware interface 1.10.2.118 has increased the size of
HWRM_PORT_PHY_QCFG response beyond the maximum size that can be
forwarded.  When the VF's link state is not the default auto state,
the PF will need to forward the response back to the VF to indicate
the forced state.  This regression may cause the VF to fail to
initialize.

Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum
96 bytes.  The SPEEDS2_SUPPORTED flag needs to be cleared because the
new speeds2 fields are beyond the legacy structure.  Also modify
bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96
bytes to make this failure more obvious.

Fixes: 84a911db83 ("bnxt_en: Update firmware interface to 1.10.2.118")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 07:50:16 -07:00
Greg Kroah-Hartman
7da9dfdd5a .editorconfig: remove trim_trailing_whitespace option
Some editors (like the vim variants), when seeing "trim_whitespace"
decide to do just that for all of the whitespace in the file you are
saving, even if it is not on a line that you have modified.  This plays
havoc with diffs and is NOT something that should be intended.

As the "only trim whitespace on modified lines" is not part of the
editorconfig standard yet, just delete these lines from the
.editorconfig file so that we don't end up with diffs that are
automatically rejected by maintainers for containing things they
shouldn't.

Cc: Danny Lin <danny@kdrag0n.dev>
Cc: Íñigo Huguet <ihuguet@redhat.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Fixes: 5a602de997 ("Add .editorconfig file for basic formatting")
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/2024061137-jawless-dipped-e789@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-13 16:47:52 +02:00
Ziwei Xiao
6f4d93b78a gve: Clear napi->skb before dev_kfree_skb_any()
gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it
is freed with dev_kfree_skb_any(). This can result in a subsequent call
to napi_get_frags returning a dangling pointer.

Fix this by clearing napi->skb before the skb is freed.

Fixes: 9b8dd5e5ea ("gve: DQO: Add RX path")
Cc: stable@vger.kernel.org
Reported-by: Shailend Chand <shailend@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 07:37:35 -07:00
Taehee Yoo
79f18a41dd ionic: fix use after netif_napi_del()
When queues are started, netif_napi_add() and napi_enable() are called.
If there are 4 queues and only 3 queues are used for the current
configuration, only 3 queues' napi should be registered and enabled.
The ionic_qcq_enable() checks whether the .poll pointer is not NULL for
enabling only the using queue' napi. Unused queues' napi will not be
registered by netif_napi_add(), so the .poll pointer indicates NULL.
But it couldn't distinguish whether the napi was unregistered or not
because netif_napi_del() doesn't reset the .poll pointer to NULL.
So, ionic_qcq_enable() calls napi_enable() for the queue, which was
unregistered by netif_napi_del().

Reproducer:
   ethtool -L <interface name> rx 1 tx 1 combined 0
   ethtool -L <interface name> rx 0 tx 0 combined 1
   ethtool -L <interface name> rx 0 tx 0 combined 4

Splat looks like:
kernel BUG at net/core/dev.c:6666!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16
Workqueue: events ionic_lif_deferred_work [ionic]
RIP: 0010:napi_enable+0x3b/0x40
Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f
RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28
RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20
FS:  0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <TASK>
 ? die+0x33/0x90
 ? do_trap+0xd9/0x100
 ? napi_enable+0x3b/0x40
 ? do_error_trap+0x83/0xb0
 ? napi_enable+0x3b/0x40
 ? napi_enable+0x3b/0x40
 ? exc_invalid_op+0x4e/0x70
 ? napi_enable+0x3b/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? napi_enable+0x3b/0x40
 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 process_one_work+0x145/0x360
 worker_thread+0x2bb/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xcc/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2d/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 07:30:06 -07:00
Sasha Neftin
8eef5c3cea Revert "igc: fix a log entry using uninitialized netdev"
This reverts commit 86167183a1.

igc_ptp_init() needs to be called before igc_reset(), otherwise kernel
crash could be observed. Following the corresponding discussion [1] and
[2] revert this commit.

Link: https://lore.kernel.org/all/8fb634f8-7330-4cf4-a8ce-485af9c0a61a@intel.com/ [1]
Link: https://lore.kernel.org/all/87o78rmkhu.fsf@intel.com/ [2]
Fixes: 86167183a1 ("igc: fix a log entry using uninitialized netdev")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240611162456.961631-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13 07:24:52 -07:00
Riana Tauro
2470b141bf drm/xe: move disable_c6 call
disable c6 called in guc_pc_fini_hw is unreachable.

GuC PC init returns earlier if skip_guc_pc is true and never
registers the finish call thus making disable_c6 unreachable.

move this call to gt idle.

v2: rebase
v3: add fixes tag (Himal)

Fixes: 975e4a3795 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240606100842.956072-3-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 6800e63cf9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-06-13 12:35:13 +02:00
Andrzej Hajda
b5e3a9b83f drm/xe: flush engine buffers before signalling user fence on all engines
Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted-v3-2-06e7932f784a@intel.com
(cherry picked from commit 3ad7d18c5d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-06-13 11:36:23 +02:00
Michal Wajdeczko
cd554e1e11 drm/xe/pf: Assert LMEM provisioning is done only on DGFX
The Local Memory (aka VRAM) is only available on DGFX platforms.
We shouldn't attempt to provision VFs with LMEM or attempt to
update the LMTT on non-DGFX platforms. Add missing asserts that
would enforce that and fix release code that could crash on iGFX
due to uninitialized LMTT.

Fixes: 0698ff57bf ("drm/xe/pf: Update the LMTT when freeing VF GT config")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607153155.1592-1-michal.wajdeczko@intel.com
(cherry picked from commit b321cb83a3)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-06-13 11:33:07 +02:00
Riana Tauro
7c877115da drm/xe/xe_gt_idle: use GT forcewake domain assertion
The rc6 registers used in disable_c6 function belong
to the GT forcewake domain. Hence change the forcewake
assertion to check GT forcewake domain.

v2: add fixes tag (Himal)

Fixes: 975e4a3795 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240606100842.956072-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 21b7085546)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-06-13 11:33:07 +02:00
Rafael J. Wysocki
0e6b6dedf1 ACPI: EC: Evaluate orphan _REG under EC device
After starting to install the EC address space handler at the ACPI
namespace root, if there is an "orphan" _REG method in the EC device's
scope, it will not be evaluated any more.  This breaks EC operation
regions on some systems, like Asus gu605.

To address this, use a wrapper around an existing ACPICA function to
look for an "orphan" _REG method in the EC device scope and evaluate
it if present.

Fixes: 60fa6ae6e6 ("ACPI: EC: Install address space handler at the namespace root")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218945
Reported-by: VitaliiT <vitaly.torshyn@gmail.com>
Tested-by: VitaliiT <vitaly.torshyn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-13 11:28:54 +02:00
Dimitri Sivanich
12243a8115 iommu/amd: Fix panic accessing amd_iommu_enable_faulting
This fixes a bug introduced by commit d74169ceb0 ("iommu/vt-d: Allocate
DMAR fault interrupts locally").  The panic happens when
amd_iommu_enable_faulting is called from CPUHP_AP_ONLINE_DYN context.

Fixes: d74169ceb0 ("iommu/vt-d: Allocate DMAR fault interrupts locally")
Signed-off-by: Dimitri Sivanich <sivanich@hpe.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/ZljHE/R4KLzGU6vx@hpe.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-13 11:16:05 +02:00
Benjamin Segall
b2747f108b x86/boot: Don't add the EFI stub to targets, again
This is a re-commit of

  da05b143a3 ("x86/boot: Don't add the EFI stub to targets")

after the tagged patch incorrectly reverted it.

vmlinux-objs-y is added to targets, with an assumption that they are all
relative to $(obj); adding a $(objtree)/drivers/...  path causes the
build to incorrectly create a useless
arch/x86/boot/compressed/drivers/...  directory tree.

Fix this just by using a different make variable for the EFI stub.

Fixes: cb8bda8ad4 ("x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S")
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org # v6.1+
Link: https://lore.kernel.org/r/xm267ceukksz.fsf@bsegall.svl.corp.google.com
2024-06-13 10:32:36 +02:00
Jakub Kicinski
b60b1bdc18 Merge branch 'net-bridge-mst-fix-suspicious-rcu-usage-warning'
Nikolay Aleksandrov says:

====================
net: bridge: mst: fix suspicious rcu usage warning

This set fixes a suspicious RCU usage warning triggered by syzbot[1] in
the bridge's MST code. After I converted br_mst_set_state to RCU, I
forgot to update the vlan group dereference helper. Fix it by using
the proper helper, in order to do that we need to pass the vlan group
which is already obtained correctly by the callers for their respective
context. Patch 01 is a requirement for the fix in patch 02.

Note I did consider rcu_dereference_rtnl() but the churn is much bigger
and in every part of the bridge. We can do that as a cleanup in
net-next.

[1] https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
 =============================
 WARNING: suspicious RCU usage
 6.10.0-rc2-syzkaller-00235-g8a92980606e3 #0 Not tainted
 -----------------------------
 net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 4 locks held by syz-executor.1/5374:
  #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline]
  #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: __mm_populate+0x1b0/0x460 mm/gup.c:2111
  #1: ffffc90000a18c00 ((&p->forward_delay_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x650 kernel/time/timer.c:1789
  #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
  #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
  #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
  #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: br_mst_set_state+0x171/0x7a0 net/bridge/br_mst.c:105

 stack backtrace:
 CPU: 1 PID: 5374 Comm: syz-executor.1 Not tainted 6.10.0-rc2-syzkaller-00235-g8a92980606e3 #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
 Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
  lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
  nbp_vlan_group net/bridge/br_private.h:1599 [inline]
  br_mst_set_state+0x29e/0x7a0 net/bridge/br_mst.c:106
  br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47
  br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88
  call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
  expire_timers kernel/time/timer.c:1843 [inline]
  __run_timers kernel/time/timer.c:2417 [inline]
  __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
  run_timer_base kernel/time/timer.c:2437 [inline]
  run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
  handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
  __do_softirq kernel/softirq.c:588 [inline]
  invoke_softirq kernel/softirq.c:428 [inline]
  __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
  </IRQ>
  <TASK>
====================

Link: https://lore.kernel.org/r/20240609103654.914987-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 18:24:26 -07:00
Nikolay Aleksandrov
546ceb1dfd net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
I converted br_mst_set_state to RCU to avoid a vlan use-after-free
but forgot to change the vlan group dereference helper. Switch to vlan
group RCU deref helper to fix the suspicious rcu usage warning.

Fixes: 3a7c1661ae ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 18:24:24 -07:00
Nikolay Aleksandrov
36c92936e8 net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
instead of dereferencing it again. Each caller has already correctly
dereferenced it for their context. This change is required for the
following suspicious RCU dereference fix. No functional changes
intended.

Fixes: 3a7c1661ae ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 18:24:24 -07:00
Petr Pavlu
14a20e5b4a net/ipv6: Fix the RT cache flush via sysctl using a previous delay
The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.

A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.

Fix the problem by switching the order of the two operations.

Fixes: 4990509f19 ("[NETNS][IPV6]: Make sysctls route per namespace.")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 17:51:35 -07:00
Linus Torvalds
2ccbdf43d5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM and clkdev fixes from Russell King:

 - Fix clkdev - erroring out on long strings causes boot failures, so
   don't do this. Still warn about the over-sized strings (which will
   never match and thus their registration with clkdev is useless)

 - Fix for ftrace with frame pointer unwinder with recent GCC changing
   the way frames are stacked.

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9405/1: ftrace: Don't assume stack frames are contiguous in memory
  clkdev: don't fail clkdev_alloc() if over-sized
2024-06-12 16:58:05 -07:00
Jakub Kicinski
d92589f8fd Merge tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 fixes insufficient sanitization of netlink attributes for the
	 inner expression which can trigger nul-pointer dereference,
	 from Davide Ornaghi.

Patch #2 address a report that there is a race condition between
         namespace cleanup and the garbage collection of the list:set
         type. This patch resolves this issue with other minor issues
	 as well, from Jozsef Kadlecsik.

Patch #3 ip6_route_me_harder() ignores flowlabel/dsfield when ip dscp
	 has been mangled, this unbreaks ip6 dscp set $v,
	 from Florian Westphal.

All of these patches address issues that are present in several releases.

* tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
====================

Link: https://lore.kernel.org/r/20240611220323.413713-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12 16:29:00 -07:00
Linus Torvalds
0b4989ebe8 Merge tag 'bcachefs-2024-06-12' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:

 - fix kworker explosion, due to calling submit_bio() (which can block)
   from a multithreaded workqueue

 - fix error handling in btree node scan

 - forward compat fix: kill an old debug assert

 - key cache shrinker fixes

   This is a partial fix for stalls doing multithreaded creates - there
   were various O(n^2) issues the key cache shrinker was hitting [1].

   There's more work coming here; I'm working on a patch to delete the
   key cache lock, which initial testing shows to be a pretty drastic
   performance improvement

 - assorted syzbot fixes

Link: https://lore.kernel.org/linux-bcachefs/CAGudoHGenxzk0ZqPXXi1_QDbfqQhGHu+wUwzyS6WmfkUZ1HiXA@mail.gmail.com/ [1]

* tag 'bcachefs-2024-06-12' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas
  bcachefs: Add missing bch_inode_info.ei_flags init
  bcachefs: Add missing synchronize_srcu_expedited() call when shutting down
  bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()
  bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks
  bcachefs: Fix snapshot_create_lock lock ordering
  bcachefs: Fix refcount leak in check_fix_ptrs()
  bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing
  bcachefs: Fix reporting of freed objects from key cache shrinker
  bcachefs: set sb->s_shrinker->seeks = 0
  bcachefs: increase key cache shrinker batch size
  bcachefs: Enable automatic shrinking for rhashtables
  bcachefs: fix the display format for show-super
  bcachefs: fix stack frame size in fsck.c
  bcachefs: Delete incorrect BTREE_ID_NR assertion
  bcachefs: Fix incorrect error handling found_btree_node_is_readable()
  bcachefs: Split out btree_write_submit_wq
2024-06-12 15:08:23 -07:00
Alex Williamson
d71a989cf5 vfio/pci: Insert full vma on mmap'd MMIO fault
In order to improve performance of typical scenarios we can try to insert
the entire vma on fault.  This accelerates typical cases, such as when
the MMIO region is DMA mapped by QEMU.  The vfio_iommu_type1 driver will
fault in the entire DMA mapped range through fixup_user_fault().

In synthetic testing, this improves the time required to walk a PCI BAR
mapping from userspace by roughly 1/3rd.

This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain
support for pfnmaps.

Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20240607035213.2054226-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-06-12 15:40:39 -06:00
Christoph Hellwig
957df9af72 nbd: Remove __force casts
Make it again possible for sparse to verify that blk_status_t and Unix
error codes are used in the proper context by making nbd_send_cmd()
return a blk_status_t instead of an integer.

No functionality has been changed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[ bvanassche: added description and made two small formatting changes ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240604221531.327131-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-12 14:47:07 -06:00
Pavel Begunkov
54559642b9 io_uring/rsrc: don't lock while !TASK_RUNNING
There is a report of io_rsrc_ref_quiesce() locking a mutex while not
TASK_RUNNING, which is due to forgetting restoring the state back after
io_run_task_work_sig() and attempts to break out of the waiting loop.

do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff815d2494>] prepare_to_wait+0xa4/0x380
kernel/sched/wait.c:237
WARNING: CPU: 2 PID: 397056 at kernel/sched/core.c:10099
__might_sleep+0x114/0x160 kernel/sched/core.c:10099
RIP: 0010:__might_sleep+0x114/0x160 kernel/sched/core.c:10099
Call Trace:
 <TASK>
 __mutex_lock_common kernel/locking/mutex.c:585 [inline]
 __mutex_lock+0xb4/0x940 kernel/locking/mutex.c:752
 io_rsrc_ref_quiesce+0x590/0x940 io_uring/rsrc.c:253
 io_sqe_buffers_unregister+0xa2/0x340 io_uring/rsrc.c:799
 __io_uring_register io_uring/register.c:424 [inline]
 __do_sys_io_uring_register+0x5b9/0x2400 io_uring/register.c:613
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd8/0x270 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77

Reported-by: Li Shi <sl1589472800@gmail.com>
Fixes: 4ea15b56f0 ("io_uring/rsrc: use wq for quiescing")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/77966bc104e25b0534995d5dbb152332bc8f31c0.1718196953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-12 13:02:12 -06:00
Daniel Wagner
cd0c1b8e04 nvmet: always initialize cqe.result
The spec doesn't mandate that the first two double words (aka results)
for the command queue entry need to be set to 0 when they are not
used (not specified). Though, the target implemention returns 0 for TCP
and FC but not for RDMA.

Let's make RDMA behave the same and thus explicitly initializing the
result field. This prevents leaking any data from the stack.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-12 11:00:08 -07:00
Daniel Wagner
d76584e53f nvmet-passthru: propagate status from id override functions
The id override functions return a status which is not propagated to the
caller.

Fixes: c1fef73f79 ("nvmet: add passthru code to process commands")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-12 11:00:08 -07:00
Chunguang Xu
e5d574ab37 nvme: avoid double free special payload
If a discard request needs to be retried, and that retry may fail before
a new special payload is added, a double free will result. Clear the
RQF_SPECIAL_LOAD when the request is cleaned.

Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-12 10:56:50 -07:00
Anuj Gupta
e038ee6189 block: unmap and free user mapped integrity via submitter
The user mapped intergity is copied back and unpinned by
bio_integrity_free which is a low-level routine. Do it via the submitter
rather than doing it in the low-level block layer code, to split the
submitter side from the consumer side of the bio.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240610111144.14647-1-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-12 11:00:50 -06:00
Chengming Zhou
d0321c812d block: fix request.queuelist usage in flush
Friedrich Weber reported a kernel crash problem and bisected to commit
81ada09cc2 ("blk-flush: reuse rq queuelist in flush state machine").

The root cause is that we use "list_move_tail(&rq->queuelist, pending)"
in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since
it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch().
We don't initialize its queuelist just for this first request, although
the queuelist of all later popped requests will be initialized.

Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so
rq->queuelist doesn't need to be initialized. It should be ok since rq
can't be on any list when PREFLUSH or POSTFLUSH, has no move actually.

Please note the commit 81ada09cc2 ("blk-flush: reuse rq queuelist in
flush state machine") also has another requirement that no drivers would
touch rq->queuelist after blk_mq_end_request() since we will reuse it to
add rq to the post-flush pending list in POSTFLUSH. If this is not true,
we will have to revert that commit IMHO.

This updated version adds "list_del_init(&rq->queuelist)" in flush rq
callback since the dm layer may submit request of a weird invalid format
(REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH), which causes double list_add
if without this "list_del_init(&rq->queuelist)". The weird invalid format
problem should be fixed in dm layer.

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Closes: https://lore.kernel.org/lkml/14b89dfb-505c-49f7-aebb-01c54451db40@proxmox.com/
Closes: https://lore.kernel.org/lkml/c9d03ff7-27c5-4ebd-b3f6-5a90d96f35ba@proxmox.com/
Fixes: 81ada09cc2 ("blk-flush: reuse rq queuelist in flush state machine")
Cc: Christoph Hellwig <hch@lst.de>
Cc: ming.lei@redhat.com
Cc: bvanassche@acm.org
Tested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240608143115.972486-1-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-12 10:58:11 -06:00
Damien Le Moal
1933192a91 block: Optimize disk zone resource cleanup
For zoned block devices using zone write plugging, an rcu_barrier() call
is needed in disk_free_zone_resources() to synchronize freeing of zone
write plugs and the destrution of the mempool used to allocate the
plugs. The barrier call does slow down a little teardown of zoned block
devices but should not affect teardown of regular block devices or zoned
block devices that do not use zone write plugging (e.g. zoned DM devices
that do not require zone append emulation).

Modify disk_free_zone_resources() to return early if we do not have a
mempool to start with, that is, if the device does not use zone write
plugging. This avoids the costly rcu_barrier() and speeds up disk
teardown.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: dd291d77cc ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240607002126.104227-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-12 10:56:45 -06:00
Su Hui
9b1ebce6a1 block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()
Clang static checker (scan-build) warning:
block/sed-opal.c:line 317, column 3
Value stored to 'ret' is never read.

Fix this problem by returning the error code when keyring_search() failed.
Otherwise, 'key' will have a wrong value when 'kerf' stores the error code.

Fixes: 3bfeb61256 ("block: sed-opal: keyring support for SED keys")
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20240611073659.429582-1-suhui@nfschina.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-12 10:53:20 -06:00
Andy Shevchenko
cea2a26553 mailmap: Add my outdated addresses to the map file
There is a couple of outdated addresses that are still visible
in the Git history, add them to .mailmap.

While at it, replace one in the comment.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-12 09:28:04 -07:00
Jean Delvare
cbf3fb5b29 i2c: designware: Fix the functionality flags of the slave-only interface
When an I2C adapter acts only as a slave, it should not claim to
support I2C master capabilities.

Fixes: 5b6d721b26 ("i2c: designware: enable SLAVE in platform module")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Luis Oliveira <lolivei@synopsys.com>
Cc: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Jan Dabros <jsd@semihalf.com>
Cc: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-06-12 17:07:34 +01:00
Jean Delvare
d6d5645e5f i2c: at91: Fix the functionality flags of the slave-only interface
When an I2C adapter acts only as a slave, it should not claim to
support I2C master capabilities.

Fixes: 9d3ca54b55 ("i2c: at91: added slave mode support")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Juergen Fitschen <me@jue.yt>
Cc: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Cc: Andi Shyti <andi.shyti@kernel.org>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-06-12 17:07:33 +01:00
Rafael J. Wysocki
350cbb5d2f cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()
After recent changes in intel_pstate, global.turbo_disabled is only set
at the initialization time and never changed.  However, it turns out
that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE,
the initial state of which is reflected by global.turbo_disabled, can be
flipped later and there should be a way to take that into account (other
than checking that MSR every time the driver runs which is costly and
useless overhead on the vast majority of systems).

For this purpose, notice that before the changes in question,
store_no_turbo() contained a turbo_is_disabled() check that was used
for updating global.turbo_disabled if the "turbo disabled" bit in
MSR_IA32_MISC_ENABLE had been flipped and that functionality can be
restored.  Then, users will be able to reset global.turbo_disabled
by writing 0 to no_turbo which used to work before on systems with
flipping "turbo disabled" bit.

This guarantees the driver state to remain in sync, but READ_ONCE()
annotations need to be added in two places where global.turbo_disabled
is accessed locklessly, so modify the driver to make that happen.

Fixes: 0940f1a801 ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization")
Closes: https://lore.kernel.org/linux-pm/bf3ebf1571a4788e97daf861eb493c12d42639a3.camel@xry111.site
Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reported-by: Xi Ruoyao <xry111@xry111.site>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-12 14:11:50 +02:00
Douglas Anderson
c38896ca63 drm/mediatek: Call drm_atomic_helper_shutdown() at shutdown time
Based on grepping through the source code this driver appears to be
missing a call to drm_atomic_helper_shutdown() at system shutdown
time. Among other things, this means that if a panel is in use that it
won't be cleanly powered off at system shutdown time.

The fact that we should call drm_atomic_helper_shutdown() in the case
of OS shutdown/restart comes straight out of the kernel doc "driver
instance overview" in drm_drv.c.

This driver users the component model and shutdown happens in the base
driver. The "drvdata" for this driver will always be valid if
shutdown() is called and as of commit 2a07396828
("drm/atomic-helper: drm_atomic_helper_shutdown(NULL) should be a
noop") we don't need to confirm that "drm" is non-NULL.

Suggested-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Fei Shao <fshao@chromium.org>
Tested-by: Fei Shao <fshao@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611102744.v2.1.I2b014f90afc4729b6ecc7b5ddd1f6dedcea4625b@changeid
2024-06-12 09:54:23 +02:00
Douglas Anderson
0320ca14c6 drm: renesas: shmobile: Call drm_atomic_helper_shutdown() at shutdown time
Based on grepping through the source code, this driver appears to be
missing a call to drm_atomic_helper_shutdown() at system shutdown time.
This is important because drm_atomic_helper_shutdown() will cause
panels to get disabled cleanly which may be important for their power
sequencing.  Future changes will remove any custom powering off in
individual panel drivers so the DRM drivers need to start getting this
right.

The fact that we should call drm_atomic_helper_shutdown() in the case of
OS shutdown comes straight out of the kernel doc "driver instance
overview" in drm_drv.c.

[geert: shmob_drm_remove() already calls drm_atomic_helper_shutdown]

Suggested-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230901164111.RFT.15.Iaf638a1d4c8b3c307a6192efabb4cbb06b195f15@changeid
[geert: s/drm_helper_force_disable_all/drm_atomic_helper_shutdown/]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/17c6a5a668e5975f871b77fb1fca6711a0799d9e.1718176895.git.geert+renesas@glider.be
2024-06-12 09:54:07 +02:00
Hector Martin
5ceac4402f xhci: Handle TD clearing for multiple streams case
When multiple streams are in use, multiple TDs might be in flight when
an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for
each, to ensure everything is reset properly and the caches cleared.
Change the logic so that any N>1 TDs found active for different streams
are deferred until after the first one is processed, calling
xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to
queue another command until we are done with all of them. Also change
the error/"should never happen" paths to ensure we at least clear any
affected TDs, even if we can't issue a command to clear the hardware
cache, and complain loudly with an xhci_warn() if this ever happens.

This problem case dates back to commit e9df17eb14 ("USB: xhci: Correct
assumptions about number of rings per endpoint.") early on in the XHCI
driver's life, when stream support was first added.
It was then identified but not fixed nor made into a warning in commit
674f8438c1 ("xhci: split handling halted endpoints into two steps"),
which added a FIXME comment for the problem case (without materially
changing the behavior as far as I can tell, though the new logic made
the problem more obvious).

Then later, in commit 94f339147f ("xhci: Fix failure to give back some
cached cancelled URBs."), it was acknowledged again.

[Mathias: commit 94f339147f ("xhci: Fix failure to give back some cached
cancelled URBs.") was a targeted regression fix to the previously mentioned
patch. Users reported issues with usb stuck after unmounting/disconnecting
UAS devices. This rolled back the TD clearing of multiple streams to its
original state.]

Apparently the commit author was aware of the problem (yet still chose
to submit it): It was still mentioned as a FIXME, an xhci_dbg() was
added to log the problem condition, and the remaining issue was mentioned
in the commit description. The choice of making the log type xhci_dbg()
for what is, at this point, a completely unhandled and known broken
condition is puzzling and unfortunate, as it guarantees that no actual
users would see the log in production, thereby making it nigh
undebuggable (indeed, even if you turn on DEBUG, the message doesn't
really hint at there being a problem at all).

It took me *months* of random xHC crashes to finally find a reliable
repro and be able to do a deep dive debug session, which could all have
been avoided had this unhandled, broken condition been actually reported
with a warning, as it should have been as a bug intentionally left in
unfixed (never mind that it shouldn't have been left in at all).

> Another fix to solve clearing the caches of all stream rings with
> cancelled TDs is needed, but not as urgent.

3 years after that statement and 14 years after the original bug was
introduced, I think it's finally time to fix it. And maybe next time
let's not leave bugs unfixed (that are actually worse than the original
bug), and let's actually get people to review kernel commits please.

Fixes xHC crashes and IOMMU faults with UAS devices when handling
errors/faults. Easiest repro is to use `hdparm` to mark an early sector
(e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop.
At least in the case of JMicron controllers, the read errors end up
having to cancel two TDs (for two queued requests to different streams)
and the one that didn't get cleared properly ends up faulting the xHC
entirely when it tries to access DMA pages that have since been unmapped,
referred to by the stale TDs. This normally happens quickly (after two
or three loops). After this fix, I left the `cat` in a loop running
overnight and experienced no xHC failures, with all read errors
recovered properly. Repro'd and tested on an Apple M1 Mac Mini
(dwc3 host).

On systems without an IOMMU, this bug would instead silently corrupt
freed memory, making this a security bug (even on systems with IOMMUs
this could silently corrupt memory belonging to other USB devices on the
same controller, so it's still a security bug). Given that the kernel
autoprobes partition tables, I'm pretty sure a malicious USB device
pretending to be a UAS device and reporting an error with the right
timing could deliberately trigger a UAF and write to freed memory, with
no user action.

[Mathias: Commit message and code comment edit, original at:]
https://lore.kernel.org/linux-usb/20240524-xhci-streams-v1-1-6b1f13819bea@marcan.st/

Fixes: e9df17eb14 ("USB: xhci: Correct assumptions about number of rings per endpoint.")
Fixes: 94f339147f ("xhci: Fix failure to give back some cached cancelled URBs.")
Fixes: 674f8438c1 ("xhci: split handling halted endpoints into two steps")
Cc: stable@vger.kernel.org
Cc: security@kernel.org
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240611120610.3264502-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12 09:43:36 +02:00
Kuangyi Chiang
91f7a1524a xhci: Apply broken streams quirk to Etron EJ188 xHCI host
As described in commit 8f873c1ff4 ("xhci: Blacklist using streams on the
Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams
do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188
as well.

Cc: stable@vger.kernel.org
Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240611120610.3264502-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12 09:43:36 +02:00
Kuangyi Chiang
17bd54555c xhci: Apply reset resume quirk to Etron EJ188 xHCI host
As described in commit c877b3b2ad ("xhci: Add reset on resume quirk for
asrock p67 host"), EJ188 have the same issue as EJ168, where completely
dies on resume. So apply XHCI_RESET_ON_RESUME quirk to EJ188 as well.

Cc: stable@vger.kernel.org
Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240611120610.3264502-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12 09:43:36 +02:00
Mathias Nyman
f0260589b4 xhci: Set correct transferred length for cancelled bulk transfers
The transferred length is set incorrectly for cancelled bulk
transfer TDs in case the bulk transfer ring stops on the last transfer
block with a 'Stop - Length Invalid' completion code.

length essentially ends up being set to the requested length:
urb->actual_length = urb->transfer_buffer_length

Length for 'Stop - Length Invalid' cases should be the sum of all
TRB transfer block lengths up to the one the ring stopped on,
_excluding_ the one stopped on.

Fix this by always summing up TRB lengths for 'Stop - Length Invalid'
bulk cases.

This issue was discovered by Alan Stern while debugging
https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not
solve that bug. Issue is older than 4.10 kernel but fix won't apply
to those due to major reworks in that area.

Tested-by: Pierre Tomon <pierretom+12@ik.me>
Cc: stable@vger.kernel.org # v4.10+
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240611120610.3264502-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12 09:43:36 +02:00
Namjae Jeon
2bfc4214c6 ksmbd: fix missing use of get_write in in smb2_set_ea()
Fix an issue where get_write is not used in smb2_set_ea().

Fixes: 6fc0a265e1 ("ksmbd: fix potential circular locking issue in smb2_set_ea()")
Cc: stable@vger.kernel.org
Reported-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-11 23:43:09 -05:00
Namjae Jeon
1cdeca6a72 ksmbd: move leading slash check to smb2_get_name()
If the directory name in the root of the share starts with
character like 镜(0x955c) or Ṝ(0x1e5c), it (and anything inside)
cannot be accessed. The leading slash check must be checked after
converting unicode to nls string.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-11 23:43:09 -05:00
Xiaolei Wang
be27b89652 net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
The current cbs parameter depends on speed after uplinking,
which is not needed and will report a configuration error
if the port is not initially connected. The UAPI exposed by
tc-cbs requires userspace to recalculate the send slope anyway,
because the formula depends on port_transmit_rate (see man tc-cbs),
which is not an invariant from tc's perspective. Therefore, we
use offload->sendslope and offload->idleslope to derive the
original port_transmit_rate from the CBS formula.

Fixes: 1f705bc61a ("net: stmmac: Add support for CBS QDISC")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240608143524.2065736-1-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-11 19:58:18 -07:00
Joshua Washington
1b9f756344 gve: ignore nonrelevant GSO type bits when processing TSO headers
TSO currently fails when the skb's gso_type field has more than one bit
set.

TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a
few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes
virtualization, such as QEMU, a real use-case.

The gso_type and gso_size fields as passed from userspace in
virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type
|= SKB_GSO_DODGY to force the packet to enter the software GSO stack
for verification.

This issue might similarly come up when the CWR bit is set in the TCP
header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit
to be set.

Fixes: a57e5de476 ("gve: DQO: Add TX path")
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@gmail.com>

v2 - Remove unnecessary comments, remove line break between fixes tag
and signoffs.

v3 - Add back unrelated empty line removal.

Link: https://lore.kernel.org/r/20240610225729.2985343-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-11 19:42:35 -07:00
Jakub Kicinski
f6b2f578df Merge tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_sync: fix not using correct handle
 - L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
 - L2CAP: fix connection setup in l2cap_connect

* tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: fix connection setup in l2cap_connect
  Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
  Bluetooth: hci_sync: Fix not using correct handle
====================

Link: https://lore.kernel.org/r/20240610135803.920662-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-11 19:40:27 -07:00
Kory Maincent
144ba8580b net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by
checkpatch script.

Fixes: 18ff0bcda6 ("ethtool: add interface to interact with Ethernet Power Equipment")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240610083426.740660-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-11 19:33:37 -07:00
Damien Le Moal
90e6f08915 scsi: mpi3mr: Fix ATA NCQ priority support
The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to
the HBA if a read or write command directed at an ATA device should be
translated to an NCQ read/write command with the high prioiryt bit set
when the request uses the RT priority class and the user has enabled NCQ
priority through sysfs.

However, unlike the mpt3sas driver, the mpi3mr driver does not define
the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so
the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never
actually set and NCQ Priority cannot ever be used.

Fix this by defining these missing atributes to allow a user to check if
an ATA device supports NCQ priority and to enable/disable the use of NCQ
priority. To do this, lift the function scsih_ncq_prio_supp() out of the
mpt3sas driver and make it the generic SCSI SAS transport function
sas_ata_ncq_prio_supported(). Nothing in that function is hardware
specific, so this function can be used in both the mpt3sas driver and
the mpi3mr driver.

Reported-by: Scott McCoy <scott.mccoy@wdc.com>
Fixes: 023ab2a9b4 ("scsi: mpi3mr: Add support for queue command processing")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240611083435.92961-1-dlemoal@kernel.org
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11 21:40:23 -04:00
Ziqi Chen
77691af484 scsi: ufs: core: Quiesce request queues before checking pending cmds
In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked,
ufshcd_pending_cmds() is called to check whether there are pending
transactions or not. And only if there are no pending transactions can we
proceed to kickstart the clock scaling sequence.

ufshcd_pending_cmds() traverses over all SCSI devices and calls
sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down
to three steps:

 1. Calculate the nr outstanding bits set in the 'word' bitmap.

 2. Calculate the nr outstanding bits set in the 'cleared' bitmap.

 3. Subtract the result from step 1 by the result from step 2.

This can lead to a race condition as outlined below:

Assume there is one pending transaction in the request queue of one SCSI
device, say sda, and the budget token of this request is 0, the 'word' is
0x1 and the 'cleared' is 0x0.

 1. When step 1 executes, it gets the result as 1.

 2. Before step 2 executes, block layer tries to dispatch a new request to
    sda. Since the SCSI layer is blocked, the request cannot pass through
    SCSI but the block layer would do budget_get() and budget_put() to
    sda's budget map regardless, so the 'word' has become 0x3 and 'cleared'
    has become 0x2 (assume the new request got budget token 1).

 3. When step 2 executes, it gets the result as 1.

 4. When step 3 executes, it gets the result as 0, meaning there is no
    pending transactions, which is wrong.

    Thread A                        Thread B
    ufshcd_pending_cmds()           __blk_mq_sched_dispatch_requests()
    |                               |
    sbitmap_weight(word)            |
    |                               scsi_mq_get_budget()
    |                               |
    |                               scsi_mq_put_budget()
    |                               |
    sbitmap_weight(cleared)
    ...

When this race condition happens, the clock scaling sequence is started
with transactions still in flight, leading to subsequent hibernate enter
failure, broken link, task abort and back to back error recovery.

Fix this race condition by quiescing the request queues before calling
ufshcd_pending_cmds() so that block layer won't touch the budget map when
ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer
blocking/unblocking to reduce redundancies and latencies.

Fixes: 8d077ede48 ("scsi: ufs: Optimize the command queueing code")
Co-developed-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com>
Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11 21:22:33 -04:00
Damien Le Moal
52912ca87e scsi: core: Disable CDL by default
For SCSI devices supporting the Command Duration Limits feature set, the
user can enable/disable this feature use through the sysfs device attribute
"cdl_enable". This attribute modification triggers a call to
scsi_cdl_enable() to enable and disable the feature for ATA devices and set
the scsi device cdl_enable field to the user provided bool value.  For SCSI
devices supporting CDL, the feature set is always enabled and
scsi_cdl_enable() is reduced to setting the cdl_enable field.

However, for ATA devices, a drive may spin-up with the CDL feature enabled
by default. But the SCSI device cdl_enable field is always initialized to
false (CDL disabled), regardless of the actual device CDL feature
state. For ATA devices managed by libata (or libsas), libata-core always
disables the CDL feature set when the device is attached, thus syncing the
state of the CDL feature on the device and of the SCSI device cdl_enable
field. However, for ATA devices connected to a SAS HBA, the CDL feature is
not disabled on scan for ATA devices that have this feature enabled by
default, leading to an inconsistent state of the feature on the device with
the SCSI device cdl_enable field.

Avoid this inconsistency by adding a call to scsi_cdl_enable() in
scsi_cdl_check() to make sure that the device-side state of the CDL feature
set always matches the scsi device cdl_enable field state.  This implies
that CDL will always be disabled for ATA devices connected to SAS HBAs,
which is consistent with libata/libsas initialization of the device.

Reported-by: Scott McCoy <scott.mccoy@wdc.com>
Fixes: 1b22cfb141 ("scsi: core: Allow enabling and disabling command duration limits")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240607012507.111488-1-dlemoal@kernel.org
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11 21:14:54 -04:00
John David Anglin
72d95924ee parisc: Try to fix random segmentation faults in package builds
PA-RISC systems with PA8800 and PA8900 processors have had problems
with random segmentation faults for many years.  Systems with earlier
processors are much more stable.

Systems with PA8800 and PA8900 processors have a large L2 cache which
needs per page flushing for decent performance when a large range is
flushed. The combined cache in these systems is also more sensitive to
non-equivalent aliases than the caches in earlier systems.

The majority of random segmentation faults that I have looked at
appear to be memory corruption in memory allocated using mmap and
malloc.

My first attempt at fixing the random faults didn't work. On
reviewing the cache code, I realized that there were two issues
which the existing code didn't handle correctly. Both relate
to cache move-in. Another issue is that the present bit in PTEs
is racy.

1) PA-RISC caches have a mind of their own and they can speculatively
load data and instructions for a page as long as there is a entry in
the TLB for the page which allows move-in. TLBs are local to each
CPU. Thus, the TLB entry for a page must be purged before flushing
the page. This is particularly important on SMP systems.

In some of the flush routines, the flush routine would be called
and then the TLB entry would be purged. This was because the flush
routine needed the TLB entry to do the flush.

2) My initial approach to trying the fix the random faults was to
try and use flush_cache_page_if_present for all flush operations.
This actually made things worse and led to a couple of hardware
lockups. It finally dawned on me that some lines weren't being
flushed because the pte check code was racy. This resulted in
random inequivalent mappings to physical pages.

The __flush_cache_page tmpalias flush sets up its own TLB entry
and it doesn't need the existing TLB entry. As long as we can find
the pte pointer for the vm page, we can get the pfn and physical
address of the page. We can also purge the TLB entry for the page
before doing the flush. Further, __flush_cache_page uses a special
TLB entry that inhibits cache move-in.

When switching page mappings, we need to ensure that lines are
removed from the cache.  It is not sufficient to just flush the
lines to memory as they may come back.

This made it clear that we needed to implement all the required
flush operations using tmpalias routines. This includes flushes
for user and kernel pages.

After modifying the code to use tmpalias flushes, it became clear
that the random segmentation faults were not fully resolved. The
frequency of faults was worse on systems with a 64 MB L2 (PA8900)
and systems with more CPUs (rp4440).

The warning that I added to flush_cache_page_if_present to detect
pages that couldn't be flushed triggered frequently on some systems.

Helge and I looked at the pages that couldn't be flushed and found
that the PTE was either cleared or for a swap page. Ignoring pages
that were swapped out seemed okay but pages with cleared PTEs seemed
problematic.

I looked at routines related to pte_clear and noticed ptep_clear_flush.
The default implementation just flushes the TLB entry. However, it was
obvious that on parisc we need to flush the cache page as well. If
we don't flush the cache page, stale lines will be left in the cache
and cause random corruption. Once a PTE is cleared, there is no way
to find the physical address associated with the PTE and flush the
associated page at a later time.

I implemented an updated change with a parisc specific version of
ptep_clear_flush. It fixed the random data corruption on Helge's rp4440
and rp3440, as well as on my c8000.

At this point, I realized that I could restore the code where we only
flush in flush_cache_page_if_present if the page has been accessed.
However, for this, we also need to flush the cache when the accessed
bit is cleared in ptep_clear_flush_young to keep things synchronized.
The default implementation only flushes the TLB entry.

Other changes in this version are:

1) Implement parisc specific version of ptep_get. It's identical to
default but needed in arch/parisc/include/asm/pgtable.h.
2) Revise parisc implementation of ptep_test_and_clear_young to use
ptep_get (READ_ONCE).
3) Drop parisc implementation of ptep_get_and_clear. We can use default.
4) Revise flush_kernel_vmap_range and invalidate_kernel_vmap_range to
use full data cache flush.
5) Move flush_cache_vmap and flush_cache_vunmap to cache.c. Handle
VM_IOREMAP case in flush_cache_vmap.

At this time, I don't know whether it is better to always flush when
the PTE present bit is set or when both the accessed and present bits
are set. The later saves flushing pages that haven't been accessed,
but we need to flush in ptep_clear_flush_young. It also needs a page
table lookup to find the PTE pointer. The lpa instruction only needs
a page table lookup when the PTE entry isn't in the TLB.

We don't atomically handle setting and clearing the _PAGE_ACCESSED bit.
If we miss an update, we may miss a flush and the cache may get corrupted.
Whether the current code is effectively atomic depends on process control.

When CONFIG_FLUSH_PAGE_ACCESSED is set to zero, the page will eventually
be flushed when the PTE is cleared or in flush_cache_page_if_present. The
_PAGE_ACCESSED bit is not used, so the problem is avoided.

The flush method can be selected using the CONFIG_FLUSH_PAGE_ACCESSED
define in cache.c. The default is 0. I didn't see a large difference
in performance.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.6+
Signed-off-by: Helge Deller <deller@gmx.de>
2024-06-12 01:57:05 +02:00
Kees Cook
8c860ed825 x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
When reworking the range checking for get_user(), the get_user_8() case
on 32-bit wasn't zeroing the high register. (The jump to bad_get_user_8
was accidentally dropped.) Restore the correct error handling
destination (and rename the jump to using the expected ".L" prefix).

While here, switch to using a named argument ("size") for the call
template ("%c4" to "%c[size]") as already used in the other call
templates in this file.

Found after moving the usercopy selftests to KUnit:

      # usercopy_test_invalid: EXPECTATION FAILED at
      lib/usercopy_kunit.c:278
      Expected val_u64 == 0, but
          val_u64 == -60129542144 (0xfffffff200000000)

Closes: https://lore.kernel.org/all/CABVgOSn=tb=Lj9SxHuT4_9MTjjKVxsq-ikdXC4kGHO4CfKVmGQ@mail.gmail.com
Fixes: b19b74bc99 ("x86/mm: Rework address range check in get_user() and put_user()")
Reported-by: David Gow <davidgow@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/all/20240610210213.work.143-kees%40kernel.org
2024-06-11 16:08:43 -07:00
Kent Overstreet
f2736b9c79 bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-11 18:59:08 -04:00
Dr. David Alan Gilbert
017ed5e70c drm/nouveau: remove unused struct 'init_exec'
'init_exec' is unused since
commit cb75d97e9c ("drm/nouveau: implement devinit subdev, and new
init table parser")
Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517232617.230767-1-linux@treblig.org
2024-06-11 23:20:05 +02:00
Linus Torvalds
2ef5971ff3 Merge tag 'vfs-6.10-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "Misc:
   - Restore debugfs behavior of ignoring unknown mount options
   - Fix kernel doc for netfs_wait_for_oustanding_io()
   - Fix struct statx comment after new addition for this cycle
   - Fix a check in find_next_fd()

  iomap:
   - Fix data zeroing behavior when an extent spans the block that
     contains i_size
   - Restore i_size increasing in iomap_write_end() for now to avoid
     stale data exposure on xfs with a realtime device

  Cachefiles:
   - Remove unneeded fdtable.h include
   - Improve trace output for cachefiles_obj_{get,put}_ondemand_fd()
   - Remove requests from the request list to prevent accessing already
     freed requests
   - Fix UAF when issuing restore command while the daemon is still
     alive by adding an additional reference count to requests
   - Fix UAF by grabbing a reference during xarray lookup with xa_lock()
     held
   - Simplify error handling in cachefiles_ondemand_daemon_read()
   - Add consistency checks read and open requests to avoid crashes
   - Add a spinlock to protect ondemand_id variable which is used to
     determine whether an anonymous cachefiles fd has already been
     closed
   - Make on-demand reads killable allowing to handle broken cachefiles
     daemon better
   - Flush all requests after the kernel has been marked dead via
     CACHEFILES_DEAD to avoid hung-tasks
   - Ensure that closed requests are marked as such to avoid reusing
     them with a reopen request
   - Defer fd_install() until after copy_to_user() succeeded and thereby
     get rid of having to use close_fd()
   - Ensure that anonymous cachefiles on-demand fds are reused while
     they are valid to avoid pinning already freed cookies"

* tag 'vfs-6.10-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: Fix iomap_adjust_read_range for plen calculation
  iomap: keep on increasing i_size in iomap_write_end()
  cachefiles: remove unneeded include of <linux/fdtable.h>
  fs/file: fix the check in find_next_fd()
  cachefiles: make on-demand read killable
  cachefiles: flush all requests after setting CACHEFILES_DEAD
  cachefiles: Set object to close if ondemand_id < 0 in copen
  cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
  cachefiles: never get a new anonymous fd if ondemand_id is valid
  cachefiles: add spin_lock for cachefiles_ondemand_info
  cachefiles: add consistency check for copen/cread
  cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
  cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
  cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  cachefiles: remove requests from xarray during flushing requests
  cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
  statx: Update offset commentary for struct statx
  netfs: fix kernel doc for nets_wait_for_outstanding_io()
  debugfs: continue to ignore unknown mount options
2024-06-11 12:04:21 -07:00
Rafael J. Wysocki
b684682698 thermal: gov_step_wise: Restore passive polling management
Consider a thermal zone with one passive trip point, a cooling device
with 3 states (0, 1, 2) bound to it, passive polling enabled (nonzero
passive_delay_jiffies) and no regular polling (polling_delay_jiffies
equal to 0) that is managed by the Step-Wise governor.  Suppose that
the initial state of the cooling device is 0 and the zone temperature
is below the trip point to start with.

When the trip point is crossed, tz->passive is incremented by the
thermal core and the governor's .manage() callback is invoked.  It
sets 'throttle' to 'true' for the trip in question and
get_target_state() returns 1 for the instance corresponding to the
cooling device (say that 'upper' and 'lower' are set to 2 and 0 for
it, respectively), so its state changes to 1.

Passive polling is still active for the zone, so next time the
temperature is updated, the governor's .manage() callback will be
invoked again.  If the temperature is still rising, it will change
the state of the cooling device to 2.

Now suppose that next time the zone temperature is updated, it falls
below the trip point, so tz->passive is decremented for the zone (say
it becomes 0 then) and the governor's .manage() callbacks runs.

It finds that the temperature trend for the zone is 'falling' and
'throttle' will be set to 'false' for the trip in question, so the
cooling device's state will be changed to 1.  However, because
tz->polling is 0 for the zone, the governor's .manage() callback
may not be invoked again for a long time and the cooling device's
state will not be reset back to 0.

This can happen because commit 042a3d80f1 ("thermal: core: Move
passive polling management to the core") removed passive polling
management from the Step-Wise governor.

Before that change, thermal_zone_trip_update() would bump up
tz->passive when changing the target state for a thermal instance
from "no target" to a specific value and it would drop tz->passive
when changing it back to "no target" which would cause passive
polling to be active for the zone until the governor has reset the
states of all cooling devices.  In particular, in the example above
tz->passive would be incremented when changing the state of the
cooling device from 0 to 1 and then it would be still nonzero when
the state of the cooling device was changed from 2 to 1.

To prevent this problem from occurring, restore the passive polling
management in the Step-Wise governor by partially reverting the
commit in question and update the comment in the restored code
to explain its role more clearly.

Fixes: 042a3d80f1 ("thermal: core: Move passive polling management to the core")
Closes: https://lore.kernel.org/linux-pm/ZmVfcEOxmjUHZTSX@hovoldconsulting.com
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11 21:00:44 +02:00
Florian Westphal
6f8f132cc7 netfilter: Use flowlabel flow key when re-routing mangled packets
'ip6 dscp set $v' in an nftables outpute route chain has no effect.
While nftables does detect the dscp change and calls the reroute hook.
But ip6_route_me_harder never sets the dscp/flowlabel:
flowlabel/dsfield routing rules are ignored and no reroute takes place.

Thanks to Yi Chen for an excellent reproducer script that I used
to validate this change.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 18:46:04 +02:00
Jozsef Kadlecsik
4e7aaa6b82 netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
Lion Ackermann reported that there is a race condition between namespace cleanup
in ipset and the garbage collection of the list:set type. The namespace
cleanup can destroy the list:set type of sets while the gc of the set type is
waiting to run in rcu cleanup. The latter uses data from the destroyed set which
thus leads use after free. The patch contains the following parts:

- When destroying all sets, first remove the garbage collectors, then wait
  if needed and then destroy the sets.
- Fix the badly ordered "wait then remove gc" for the destroy a single set
  case.
- Fix the missing rcu locking in the list:set type in the userspace test
  case.
- Use proper RCU list handlings in the list:set type.

The patch depends on c1193d9bbb (netfilter: ipset: Add list flush to cancel_gc).

Fixes: 97f7cf1cd8 (netfilter: ipset: fix performance regression in swap operation)
Reported-by: Lion Ackermann <nnamrec@gmail.com>
Tested-by: Lion Ackermann <nnamrec@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 18:46:04 +02:00
Davide Ornaghi
c4ab9da85b netfilter: nft_inner: validate mandatory meta and payload
Check for mandatory netlink attributes in payload and meta expression
when used embedded from the inner expression, otherwise NULL pointer
dereference is possible from userspace.

Fixes: a150d122b6 ("netfilter: nft_meta: add inner match support")
Fixes: 3a07327d10 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 18:46:04 +02:00
Alexander Gordeev
693d41f7c9 s390/mm: Restore mapping of kernel image using large pages
Since physical and virtual kernel address spaces are uncoupled
the kernel image is not mapped using large segment pages anymore,
which is a regression.

Put the kernel image at the same large segment page offset in
physical memory as in virtual memory. Such approach preserves
the existing number of bits of entropy used for randomization
of the kernel location in virtual memory when KASLR is on.
As result, the kernel is mapped using large segment pages.

Fixes: c98d2ecae0 ("s390/mm: Uncouple physical vs virtual address spaces")
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-11 16:20:40 +02:00
Alexander Gordeev
d8073dc6bc s390/mm: Allow large pages only for aligned physical addresses
Do not allow creation of large pages against physical addresses,
which itself are not aligned on the correct boundary. Failure to
do so might lead to referencing wrong memory as result of the way
DAT works.

Fixes: c98d2ecae0 ("s390/mm: Uncouple physical vs virtual address spaces")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-11 16:20:40 +02:00
Heiko Carstens
b01b8151ef s390: Update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-11 16:20:40 +02:00
Vasily Khoruzhick
b96a225377 drm/nouveau: don't attempt to schedule hpd_work on headless cards
If the card doesn't have display hardware, hpd_work and hpd_lock are
left uninitialized which causes BUG when attempting to schedule hpd_work
on runtime PM resume.

Fix it by adding headless flag to DRM and skip any hpd if it's set.

Fixes: ae1aadb1eb ("nouveau: don't fail driver load if no display hw present.")
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/337
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607221032.25918-1-anarsoul@gmail.com
2024-06-11 12:04:59 +02:00
Eric Dumazet
36534d3c54 tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Due to timer wheel implementation, a timer will usually fire
after its schedule.

For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.

For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.

We need to make sure tcp_rtx_probe0_timed_out() handles this case.

Fixes: e89688e3e9 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:50:10 -07:00
Jakub Kicinski
70b3c88cec Merge branch 'mptcp-various-fixes'
Matthieu Baerts says:

====================
mptcp: various fixes

The different patches here are some unrelated fixes for MPTCP:

- Patch 1 ensures 'snd_una' is initialised on connect in case of MPTCP
  fallback to TCP followed by retransmissions before the processing of
  any other incoming packets. A fix for v5.9+.

- Patch 2 makes sure the RmAddr MIB counter is incremented, and only
  once per ID, upon the reception of a RM_ADDR. A fix for v5.10+.

- Patch 3 doesn't update 'add addr' related counters if the connect()
  was not possible. A fix for v5.7+.

- Patch 4 updates the mailmap file to add Geliang's new email address.
====================

Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-0-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:14 -07:00
Geliang Tang
74acb250e1 mailmap: map Geliang's new email address
Just like my other email addresses, map my new one to kernel.org
account too.

My new email address uses "last name, first name" format, which is
different from my other email addresses. This mailmap is also used
to indicate that it is actually the same person.

Suggested-by: Mat Martineau <martineau@kernel.org>
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-4-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:11 -07:00
YonglongLi
40eec1795c mptcp: pm: update add_addr counters after connect
The creation of new subflows can fail for different reasons. If no
subflow have been created using the received ADD_ADDR, the related
counters should not be updated, otherwise they will never be decremented
for events related to this ID later on.

For the moment, the number of accepted ADD_ADDR is only decremented upon
the reception of a related RM_ADDR, and only if the remote address ID is
currently being used by at least one subflow. In other words, if no
subflow can be created with the received address, the counter will not
be decremented. In this case, it is then important not to increment
pm.add_addr_accepted counter, and not to modify pm.accept_addr bit.

Note that this patch does not modify the behaviour in case of failures
later on, e.g. if the MP Join is dropped or rejected.

The "remove invalid addresses" MP Join subtest has been modified to
validate this case. The broadcast IP address is added before the "valid"
address that will be used to successfully create a subflow, and the
limit is decreased by one: without this patch, it was not possible to
create the last subflow, because:

- the broadcast address would have been accepted even if it was not
  usable: the creation of a subflow to this address results in an error,

- the limit of 2 accepted ADD_ADDR would have then been reached.

Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:10 -07:00
YonglongLi
6a09788c1a mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
The RmAddr MIB counter is supposed to be incremented once when a valid
RM_ADDR has been received. Before this patch, it could have been
incremented as many times as the number of subflows connected to the
linked address ID, so it could have been 0, 1 or more than 1.

The "RmSubflow" is incremented after a local operation. In this case,
it is normal to tied it with the number of subflows that have been
actually removed.

The "remove invalid addresses" MP Join subtest has been modified to
validate this case. A broadcast IP address is now used instead: the
client will not be able to create a subflow to this address. The
consequence is that when receiving the RM_ADDR with the ID attached to
this broadcast IP address, no subflow linked to this ID will be found.

Fixes: 7a7e52e38a ("mptcp: add RM_ADDR related mibs")
Cc: stable@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:10 -07:00
Paolo Abeni
8031b58c3a mptcp: ensure snd_una is properly initialized on connect
This is strictly related to commit fb7a0d3348 ("mptcp: ensure snd_nxt
is properly initialized on connect"). It turns out that syzkaller can
trigger the retransmit after fallback and before processing any other
incoming packet - so that snd_una is still left uninitialized.

Address the issue explicitly initializing snd_una together with snd_nxt
and write_seq.

Suggested-by: Mat Martineau <martineau@kernel.org>
Fixes: 8fd738049a ("mptcp: fallback in case of simultaneous connect")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:49:10 -07:00
Johannes Berg
44180feacc net/sched: initialize noop_qdisc owner
When the noop_qdisc owner isn't initialized, then it will be 0,
so packets will erroneously be regarded as having been subject
to recursion as long as only CPU 0 queues them. For non-SMP,
that's all packets, of course. This causes a change in what's
reported to userspace, normally noop_qdisc would drop packets
silently, but with this change the syscall returns -ENOBUFS if
RECVERR is also set on the socket.

Fix this by initializing the owner field to -1, just like it
would be for dynamically allocated qdiscs by qdisc_alloc().

Fixes: 0f022d32c3 ("net/sched: Fix mirred deadlock on device recursion")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240607175340.786bfb938803.I493bf8422e36be4454c08880a8d3703cea8e421a@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 19:36:49 -07:00
Kent Overstreet
7124a8982b bcachefs: Add missing bch_inode_info.ei_flags init
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 20:50:14 -04:00
Oleg Nesterov
07c54cc598 tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
After the recent commit 5097cbcb38 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.

In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().

Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).

Fixes: 08ae95f4fd ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com
Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com
2024-06-10 20:18:13 +02:00
Arunpravin Paneer Selvam
31849bf07e drm/amdgpu: Fix the BO release clear memory warning
This happens when the amdgpu_bo_release_notify running
before amdgpu_ttm_set_buffer_funcs_status set the buffer
funcs to enabled.

check the buffer funcs enablement before calling the fill
buffer memory.

v2:(Christian)
  - Apply it only for GEM buffers and since GEM buffers are only
    allocated/freed while the driver is loaded we never run into
    the issue to clear with buffer funcs disabled.

v3:(Mario)
  - drop the stable tag as this will presumably go into a
    -fixes PR for 6.10

Log snip:
*ERROR* Trying to clear memory with ring turned off.
RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu]

Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Richard Gong <richard.gong@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240610180401.9540-1-Arunpravin.PaneerSelvam@amd.com
2024-06-10 23:46:33 +05:30
Kent Overstreet
b799220092 bcachefs: Add missing synchronize_srcu_expedited() call when shutting down
We use the polling interface to srcu for tracking pending frees; when
shutting down we don't need to wait for an srcu barrier to free them,
but SRCU still gets confused if we shutdown with an outstanding grace
period.

Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com
Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
9432e90df1 bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()
Turn more asserts into proper recoverable error paths.

Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
9c4acd19bb bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks
The bucket_gens array and gc_buckets array known their own size; we
should be using those members, and returning an error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
e0cb5722e1 bcachefs: Fix snapshot_create_lock lock ordering
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc2-ktest-00018-gebd1d148b278 #144 Not tainted
------------------------------------------------------
fio/1345 is trying to acquire lock:
ffff88813e200ab8 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_truncate+0x76/0xf0

but task is already holding lock:
ffff888105a1fa38 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: do_truncate+0x7b/0xc0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}:
       down_write+0x3d/0xd0
       bch2_write_iter+0x1c0/0x10f0
       vfs_write+0x24a/0x560
       __x64_sys_pwrite64+0x77/0xb0
       x64_sys_call+0x17e5/0x1ab0
       do_syscall_64+0x68/0x130
       entry_SYSCALL_64_after_hwframe+0x4b/0x53

-> #1 (sb_writers#10){.+.+}-{0:0}:
       mnt_want_write+0x4a/0x1d0
       filename_create+0x69/0x1a0
       user_path_create+0x38/0x50
       bch2_fs_file_ioctl+0x315/0xbf0
       __x64_sys_ioctl+0x297/0xaf0
       x64_sys_call+0x10cb/0x1ab0
       do_syscall_64+0x68/0x130
       entry_SYSCALL_64_after_hwframe+0x4b/0x53

-> #0 (&c->snapshot_create_lock){++++}-{3:3}:
       __lock_acquire+0x1445/0x25b0
       lock_acquire+0xbd/0x2b0
       down_read+0x40/0x180
       bch2_truncate+0x76/0xf0
       bchfs_truncate+0x240/0x3f0
       bch2_setattr+0x7b/0xb0
       notify_change+0x322/0x4b0
       do_truncate+0x8b/0xc0
       do_ftruncate+0x110/0x270
       __x64_sys_ftruncate+0x43/0x80
       x64_sys_call+0x1373/0x1ab0
       do_syscall_64+0x68/0x130
       entry_SYSCALL_64_after_hwframe+0x4b/0x53

other info that might help us debug this:

Chain exists of:
  &c->snapshot_create_lock --> sb_writers#10 --> &sb->s_type->i_mutex_key#13

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sb->s_type->i_mutex_key#13);
                               lock(sb_writers#10);
                               lock(&sb->s_type->i_mutex_key#13);
  rlock(&c->snapshot_create_lock);

 *** DEADLOCK ***

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
f9035b0ce6 bcachefs: Fix refcount leak in check_fix_ptrs()
fsck_err() does a goto fsck_err on error; factor out check_fix_ptr() so
that our error label can drop our device ref.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
bf2b356afd bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
2760bfe388 bcachefs: Fix reporting of freed objects from key cache shrinker
We count objects as freed when we move them to the srcu-pending lists
because we're doing the equivalent of a kfree_srcu(); the only
difference is managing the pending list ourself means we can allocate
from the pending list.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
9ac3e660ca bcachefs: set sb->s_shrinker->seeks = 0
inodes and dentries are still present in the btree node cache, in much
more compact form

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
bc65e98e68 bcachefs: increase key cache shrinker batch size
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
5ae67abcdf bcachefs: Enable automatic shrinking for rhashtables
Since the key cache shrinker walks the rhashtable, a mostly empty
rhashtable leads to really nasty reclaim performance issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Hongbo Li
26447d224a bcachefs: fix the display format for show-super
There are three keys displayed in non-uniform format.
Let's fix them.

[Before]
```
Label:	testbcachefs
Version:	1.9: (unknown version)
Version upgrade complete:	0.0: (unknown version)
```

[After]
```
Label:					testbcachefs
Version:				1.9: (unknown version)
Version upgrade complete:		0.0: (unknown version)
```

Fixes: 7423330e30 ("bcachefs: prt_printf() now respects \r\n\t")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
dab1870439 bcachefs: fix stack frame size in fsck.c
fsck.c always runs top of the stack so we're not too concerned here;
noinline_for_stack is sufficient

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
04f635ede8 bcachefs: Delete incorrect BTREE_ID_NR assertion
for forwards compat we now explicitly allow mounting and using
filesystems with unknown btrees, and we have to walk them for fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
1c8cc24eef bcachefs: Fix incorrect error handling found_btree_node_is_readable()
error handling here is slightly odd, which is why we were accidently
calling evict() on an error pointer

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:15 -04:00
Kent Overstreet
161f73c2c7 bcachefs: Split out btree_write_submit_wq
Split the workqueues for btree read completions and btree write
submissions; we don't want concurrency control on btree read
completions, but we do want concurrency control on write submissions,
else blocking in submit_bio() will cause a ton of kworkers to be
allocated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:15 -04:00
Pauli Virtanen
c695439d19 Bluetooth: fix connection setup in l2cap_connect
The amp_id argument of l2cap_connect() was removed in
commit 84a4bb6548 ("Bluetooth: HCI: Remove HCI_AMP support")

It was always called with amp_id == 0, i.e. AMP_ID_BREDR == 0x00 (ie.
non-AMP controller).  In the above commit, the code path for amp_id != 0
was preserved, although it should have used the amp_id == 0 one.

Restore the previous behavior of the non-AMP code path, to fix problems
with L2CAP connections.

Fixes: 84a4bb6548 ("Bluetooth: HCI: Remove HCI_AMP support")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-10 09:48:30 -04:00
Luiz Augusto von Dentz
806a5198c0 Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
This removes the bogus check for max > hcon->le_conn_max_interval since
the later is just the initial maximum conn interval not the maximum the
stack could support which is really 3200=4000ms.

In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values
of the following fields in IXIT that would cause hci_check_conn_params
to fail:

TSPX_conn_update_int_min
TSPX_conn_update_int_max
TSPX_conn_update_peripheral_latency
TSPX_conn_update_supervision_timeout

Link: https://github.com/bluez/bluez/issues/847
Fixes: e4b019515f ("Bluetooth: Enforce validation on max value of connection interval")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-10 09:48:27 -04:00
Luiz Augusto von Dentz
86fbd9f63a Bluetooth: hci_sync: Fix not using correct handle
When setting up an advertisement the code shall always attempt to use
the handle set by the instance since it may not be equal to the instance
ID.

Fixes: e77f43d531 ("Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-10 09:48:25 -04:00
David S. Miller
93792130a9 Merge branch 'geneve-fixes'
Tariq Toukan says:

====================
geneve fixes

This small patchset by Gal provides bug fixes to the geneve tunnels flows.

Patch 1 fixes an incorrect value returned by the inner network header
offset helper.
Patch 2 fixes an issue inside the mlx5e tunneling flow. It 'happened' to
be harmless so far, before applying patch 1.

Series generated against:
commit d30d0e49da ("Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:18:09 +01:00
Gal Pressman
791b4089e3 net/mlx5e: Fix features validation check for tunneled UDP (non-VXLAN) packets
Move the vxlan_features_check() call to after we verified the packet is
a tunneled VXLAN packet.

Without this, tunneled UDP non-VXLAN packets (for ex. GENENVE) might
wrongly not get offloaded.
In some cases, it worked by chance as GENEVE header is the same size as
VXLAN, but it is obviously incorrect.

Fixes: e3cfc7e6b7 ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:18:09 +01:00
Gal Pressman
c6ae073f59 geneve: Fix incorrect inner network header offset when innerprotoinherit is set
When innerprotoinherit is set, the tunneled packets do not have an inner
Ethernet header.
Change 'maclen' to not always assume the header length is ETH_HLEN, as
there might not be a MAC header.

This resolves issues with drivers (e.g. mlx5, in
mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset
to be correct, and use it for TX offloads.

Fixes: d8a6213d70 ("geneve: fix header validation in geneve[6]_xmit_skb")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:18:08 +01:00
Andy Shevchenko
d029edefed net dsa: qca8k: fix usages of device_get_named_child_node()
The documentation for device_get_named_child_node() mentions this
important point:

"
The caller is responsible for calling fwnode_handle_put() on the
returned fwnode pointer.
"

Add fwnode_handle_put() to avoid leaked references.

Fixes: 1e264f9d29 ("net: dsa: qca8k: add LEDs basic support")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:12:14 +01:00
Eric Dumazet
d37fe4255a tcp: fix race in tcp_v6_syn_recv_sock()
tcp_v6_syn_recv_sock() calls ip6_dst_store() before
inet_sk(newsk)->pinet6 has been set up.

This means ip6_dst_store() writes over the parent (listener)
np->dst_cookie.

This is racy because multiple threads could share the same
parent and their final np->dst_cookie could be wrong.

Move ip6_dst_store() call after inet_sk(newsk)->pinet6
has been changed and after the copy of parent ipv6_pinfo.

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:11:22 +01:00
Adam Miotk
ce62600c4d drm/bridge/panel: Fix runtime warning on panel bridge release
Device managed panel bridge wrappers are created by calling to
drm_panel_bridge_add_typed() and registering a release handler for
clean-up when the device gets unbound.

Since the memory for this bridge is also managed and linked to the panel
device, the release function should not try to free that memory.
Moreover, the call to devm_kfree() inside drm_panel_bridge_remove() will
fail in this case and emit a warning because the panel bridge resource
is no longer on the device resources list (it has been removed from
there before the call to release handlers).

Fixes: 67022227ff ("drm/bridge: Add a devm_ allocator for panel bridge.")
Signed-off-by: Adam Miotk <adam.miotk@arm.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240610102739.139852-1-adam.miotk@arm.com
2024-06-10 13:22:05 +02:00
Amjad Ouled-Ameur
b880018edd drm/komeda: check for error-valued pointer
komeda_pipeline_get_state() may return an error-valued pointer, thus
check the pointer for negative or null value before dereferencing.

Fixes: 502932a03f ("drm/komeda: Add the initial scaler support for CORE")
Signed-off-by: Amjad Ouled-Ameur <amjad.ouled-ameur@arm.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240610102056.40406-1-amjad.ouled-ameur@arm.com
2024-06-10 13:20:37 +02:00
Russell King (Oracle)
594ce0b8a9 Merge topic branches 'clkdev' and 'fixes' into for-linus 2024-06-10 12:03:21 +01:00
Ard Biesheuvel
e3cf20e5c6 ARM: 9405/1: ftrace: Don't assume stack frames are contiguous in memory
The frame pointer unwinder relies on a standard layout of the stack
frame, consisting of (in downward order)

   Calling frame:
     PC   <---------+
     LR             |
     SP             |
     FP             |
     .. locals ..   |
   Callee frame:    |
     PC             |
     LR             |
     SP             |
     FP   ----------+

where after storing its previous value on the stack, FP is made to point
at the location of PC in the callee stack frame, using the canonical
prologue:

   mov     ip, sp
   stmdb   sp!, {fp, ip, lr, pc}
   sub     fp, ip, #4

The ftrace code assumes that this activation record is pushed first, and
that any stack space for locals is allocated below this. Strict
adherence to this would imply that the caller's value of SP at the time
of the function call can always be obtained by adding 4 to FP (which
points to PC in the callee frame).

However, recent versions of GCC appear to deviate from this rule, and so
the only reliable way to obtain the caller's value of SP is to read it
from the activation record. Since this involves a read from memory
rather than simple arithmetic, we need to use the uaccess API here which
protects against inadvertent data aborts resulting from attempts to
dereference bogus FP values.

The plain uaccess API is ftrace instrumented itself, so to avoid
unbounded recursion, use the __get_kernel_nofault() primitive directly.

Closes: https://lore.kernel.org/all/alp44tukzo6mvcwl4ke4ehhmojrqnv6xfcdeuliybxfjfvgd3e@gpjvwj33cc76

Closes: https://lore.kernel.org/all/d870c149-4363-43de-b0ea-7125dec5608e@broadcom.com/

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: Justin Chen <justin.chen@broadcom.com>
Tested-by: Thorsten Scherer <t.scherer@eckelmann.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-06-10 12:00:27 +01:00
David Wei
5add2f7288 netdevsim: fix backwards compatibility in nsim_get_iflink()
The default ndo_get_iflink() implementation returns the current ifindex
of the netdev. But the overridden nsim_get_iflink() returns 0 if the
current nsim is not linked, breaking backwards compatibility for
userspace that depend on this behaviour.

Fix the problem by returning the current ifindex if not linked to a
peer.

Fixes: 8debcf5832 ("netdevsim: add ndo_get_iflink() implementation")
Reported-by: Yu Watanabe <watanabe.yu@gmail.com>
Suggested-by: Yu Watanabe <watanabe.yu@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 11:51:04 +01:00
Tobias Jakobi
f74fb5df42 drm: panel-orientation-quirks: Add quirk for Aya Neo KUN
Similar to the other Aya Neo devices this one features
again a portrait screen, here with a native resolution
of 1600x2560.

Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240310220401.895591-1-tjakobi@math.uni-bielefeld.de
2024-06-10 12:19:25 +02:00
Wengang Wang
58f880711f xfs: make sure sb_fdblocks is non-negative
A user with a completely full filesystem experienced an unexpected
shutdown when the filesystem tried to write the superblock during
runtime.
kernel shows the following dmesg:

[    8.176281] XFS (dm-4): Metadata corruption detected at xfs_sb_write_verify+0x60/0x120 [xfs], xfs_sb block 0x0
[    8.177417] XFS (dm-4): Unmount and run xfs_repair
[    8.178016] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
[    8.178703] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00  XFSB............
[    8.179487] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[    8.180312] 00000020: cf 12 dc 89 ca 26 45 29 92 e6 e3 8d 3b b8 a2 c3  .....&E)....;...
[    8.181150] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80  ................
[    8.182003] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
[    8.182004] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00  .....d..........
[    8.182004] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ..d.............
[    8.182005] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19  ................
[    8.182008] XFS (dm-4): Corruption of in-memory data detected.  Shutting down filesystem
[    8.182010] XFS (dm-4): Please unmount the filesystem and rectify the problem(s)

When xfs_log_sb writes super block to disk, b_fdblocks is fetched from
m_fdblocks without any lock. As m_fdblocks can experience a positive ->
negative -> positive changing when the FS reaches fullness (see
xfs_mod_fdblocks). So there is a chance that sb_fdblocks is negative, and
because sb_fdblocks is type of unsigned long long, it reads super big.
And sb_fdblocks being bigger than sb_dblocks is a problem during log
recovery, xfs_validate_sb_write() complains.

Fix:
As sb_fdblocks will be re-calculated during mount when lazysbcount is
enabled, We just need to make xfs_validate_sb_write() happy -- make sure
sb_fdblocks is not nenative. This patch also takes care of other percpu
counters in xfs_log_sb.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-06-10 11:38:12 +05:30
Jani Nikula
38e3825631 drm/exynos/vidi: fix memory leak in .get_modes()
The duplicated EDID is never freed. Fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2024-06-10 15:05:43 +09:00
Yazen Ghannam
fe8a08973a RAS/AMD/ATL: Fix MI300 bank hash
Apply the SID bits to the correct offset in the Bank value. Do this in
the temporary value so they don't need to be masked off later.

Fixes: 87a6123753 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-1-2f11547a178c@amd.com
2024-06-10 07:56:33 +02:00
Krzysztof Kozlowski
1f3512cdf8 drm/exynos: dp: drop driver owner initialization
Core in platform_driver_register() already sets the .owner, so driver
does not need to.  Whatever is set here will be anyway overwritten by
main driver calling platform_driver_register().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2024-06-10 13:50:53 +09:00
Marek Szyprowski
799d4b3924 drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found
When reading EDID fails and driver reports no modes available, the DRM
core adds an artificial 1024x786 mode to the connector. Unfortunately
some variants of the Exynos HDMI (like the one in Exynos4 SoCs) are not
able to drive such mode, so report a safe 640x480 mode instead of nothing
in case of the EDID reading failure.

This fixes the following issue observed on Trats2 board since commit
13d5b04036 ("drm/exynos: do not return negative values from .get_modes()"):

[drm] Exynos DRM: using 11c00000.fimd device for DMA mapping operations
exynos-drm exynos-drm: bound 11c00000.fimd (ops fimd_component_ops)
exynos-drm exynos-drm: bound 12c10000.mixer (ops mixer_component_ops)
exynos-dsi 11c80000.dsi: [drm:samsung_dsim_host_attach] Attached s6e8aa0 device (lanes:4 bpp:24 mode-flags:0x10b)
exynos-drm exynos-drm: bound 11c80000.dsi (ops exynos_dsi_component_ops)
exynos-drm exynos-drm: bound 12d00000.hdmi (ops hdmi_component_ops)
[drm] Initialized exynos 1.1.0 20180330 for exynos-drm on minor 1
exynos-hdmi 12d00000.hdmi: [drm:hdmiphy_enable.part.0] *ERROR* PLL could not reach steady state
panel-samsung-s6e8aa0 11c80000.dsi.0: ID: 0xa2, 0x20, 0x8c
exynos-mixer 12c10000.mixer: timeout waiting for VSYNC
------------[ cut here ]------------
WARNING: CPU: 1 PID: 11 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8
[CRTC:70:crtc-1] vblank wait timed out
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.9.0-rc5-next-20240424 #14913
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x68/0x88
 dump_stack_lvl from __warn+0x7c/0x1c4
 __warn from warn_slowpath_fmt+0x11c/0x1a8
 warn_slowpath_fmt from drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8
 drm_atomic_helper_wait_for_vblanks.part.0 from drm_atomic_helper_commit_tail_rpm+0x7c/0x8c
 drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x184
 commit_tail from drm_atomic_helper_commit+0x168/0x190
 drm_atomic_helper_commit from drm_atomic_commit+0xb4/0xe0
 drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x27c
 drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1cc
 drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40
 drm_client_modeset_commit from __drm_fb_helper_restore_fbdev_mode_unlocked+0x9c/0xc4
 __drm_fb_helper_restore_fbdev_mode_unlocked from drm_fb_helper_set_par+0x2c/0x3c
 drm_fb_helper_set_par from fbcon_init+0x3d8/0x550
 fbcon_init from visual_init+0xc0/0x108
 visual_init from do_bind_con_driver+0x1b8/0x3a4
 do_bind_con_driver from do_take_over_console+0x140/0x1ec
 do_take_over_console from do_fbcon_takeover+0x70/0xd0
 do_fbcon_takeover from fbcon_fb_registered+0x19c/0x1ac
 fbcon_fb_registered from register_framebuffer+0x190/0x21c
 register_framebuffer from __drm_fb_helper_initial_config_and_unlock+0x350/0x574
 __drm_fb_helper_initial_config_and_unlock from exynos_drm_fbdev_client_hotplug+0x6c/0xb0
 exynos_drm_fbdev_client_hotplug from drm_client_register+0x58/0x94
 drm_client_register from exynos_drm_bind+0x160/0x190
 exynos_drm_bind from try_to_bring_up_aggregate_device+0x200/0x2d8
 try_to_bring_up_aggregate_device from __component_add+0xb0/0x170
 __component_add from mixer_probe+0x74/0xcc
 mixer_probe from platform_probe+0x5c/0xb8
 platform_probe from really_probe+0xe0/0x3d8
 really_probe from __driver_probe_device+0x9c/0x1e4
 __driver_probe_device from driver_probe_device+0x30/0xc0
 driver_probe_device from __device_attach_driver+0xa8/0x120
 __device_attach_driver from bus_for_each_drv+0x80/0xcc
 bus_for_each_drv from __device_attach+0xac/0x1fc
 __device_attach from bus_probe_device+0x8c/0x90
 bus_probe_device from deferred_probe_work_func+0x98/0xe0
 deferred_probe_work_func from process_one_work+0x240/0x6d0
 process_one_work from worker_thread+0x1a0/0x3f4
 worker_thread from kthread+0x104/0x138
 kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0895fb0 to 0xf0895ff8)
...
irq event stamp: 82357
hardirqs last  enabled at (82363): [<c01a96e8>] vprintk_emit+0x308/0x33c
hardirqs last disabled at (82368): [<c01a969c>] vprintk_emit+0x2bc/0x33c
softirqs last  enabled at (81614): [<c0101644>] __do_softirq+0x320/0x500
softirqs last disabled at (81609): [<c012dfe0>] __irq_exit_rcu+0x130/0x184
---[ end trace 0000000000000000 ]---
exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out
exynos-drm exynos-drm: [drm] *ERROR* [CRTC:70:crtc-1] commit wait timed out
exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out
exynos-drm exynos-drm: [drm] *ERROR* [CONNECTOR:74:HDMI-A-1] commit wait timed out
exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out
exynos-drm exynos-drm: [drm] *ERROR* [PLANE:56:plane-5] commit wait timed out
exynos-mixer 12c10000.mixer: timeout waiting for VSYNC

Cc: stable@vger.kernel.org
Fixes: 13d5b04036 ("drm/exynos: do not return negative values from .get_modes()")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2024-06-10 13:50:50 +09:00
Linus Torvalds
83a7eefedc Linux 6.10-rc3 2024-06-09 14:19:43 -07:00
Linus Torvalds
b8481381d4 Merge tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Update copies of kernel headers, which resulted in support for the
   new 'mseal' syscall, SUBVOL statx return mask bit, RISC-V and PPC
   prctls, fcntl's DUPFD_QUERY, POSTED_MSI_NOTIFICATION IRQ vector,
   'map_shadow_stack' syscall for x86-32.

 - Revert perf.data record memory allocation optimization that ended up
   causing a regression, work is being done to re-introduce it in the
   next merge window.

 - Fix handling of minimal vmlinux.h file used with BPF's CO-RE when
   interrupting the build.

* tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build
  Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event"
  tools headers arm64: Sync arm64's cputype.h with the kernel sources
  tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL
  tools headers UAPI: Update i915_drm.h with the kernel sources
  tools headers UAPI: Sync kvm headers with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall
  perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources to pick POSTED_MSI_NOTIFICATION
  perf beauty: Update copy of linux/socket.h with the kernel sources
  tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools include UAPI: Sync linux/stat.h with the kernel sources
2024-06-09 09:04:51 -07:00
Linus Torvalds
637c2dfcd9 Merge tag 'edac_urgent_for_v6.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:

 - Convert PCI core error codes to proper error numbers since latter get
   propagated all the way up to the module loading functions

* tag 'edac_urgent_for_v6.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/igen6: Convert PCIBIOS_* return codes to errnos
  EDAC/amd64: Convert PCIBIOS_* return codes to errnos
2024-06-09 08:49:13 -07:00
Sagar Cheluvegowda
0579f27249 net: stmmac: dwmac-qcom-ethqos: Configure host DMA width
Commit 070246e467 ("net: stmmac: Fix for mismatched host/device DMA
address width") added support in the stmmac driver for platform drivers
to indicate the host DMA width, but left it up to authors of the
specific platforms to indicate if their width differed from the addr64
register read from the MAC itself.

Qualcomm's EMAC4 integration supports only up to 36 bit width (as
opposed to the addr64 register indicating 40 bit width). Let's indicate
that in the platform driver to avoid a scenario where the driver will
allocate descriptors of size that is supported by the CPU which in our
case is 36 bit, but as the addr64 register is still capable of 40 bits
the device will use two descriptors as one address.

Fixes: 8c4d92e82d ("net: stmmac: dwmac-qcom-ethqos: add support for emac4 on sa8775p platforms")
Signed-off-by: Sagar Cheluvegowda <quic_scheluve@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-09 15:52:52 +01:00
Linus Torvalds
771ed66105 Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
 "One fix for the SiFive PRCI clocks so that the device boots again.

  This driver was registering clkdev lookups that were always going to
  be useless. This wasn't a problem until clkdev started returning an
  error in these cases, causing this driver to fail probe, and thus boot
  to fail because clks are essential for most drivers. The fix is
  simple, don't use clkdev because this is a DT based system where
  clkdev isn't used"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sifive: Do not register clkdevs for PRCI clocks
2024-06-08 19:14:02 -07:00
Linus Torvalds
c5dbc2ed00 Merge tag '6.10-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Two small smb3 client fixes:

   - fix deadlock in umount

   - minor cleanup due to netfs change"

* tag '6.10-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Don't advance the I/O iterator before terminating subrequest
  smb: client: fix deadlock in smb2_find_smb_tcon()
2024-06-08 19:07:18 -07:00
Linus Torvalds
061d1af7b0 Merge tag 'for-linus-2024060801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:

 - fix potential read out of bounds in hid-asus (Andrew Ballance)

 - fix endian-conversion on little endian systems in intel-ish-hid (Arnd
   Bergmann)

 - A couple of new input event codes (Aseda Aboagye)

 - errors handling fixes in hid-nvidia-shield (Chen Ni), hid-nintendo
   (Christophe JAILLET), hid-logitech-dj (José Expósito)

 - current leakage fix while the device is in suspend on a i2c-hid
   laptop (Johan Hovold)

 - other assorted smaller fixes and device ID / quirk entry additions

* tag 'for-linus-2024060801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: Ignore battery for ELAN touchscreens 2F2C and 4116
  HID: i2c-hid: elan: fix reset suspend current leakage
  dt-bindings: HID: i2c-hid: elan: add 'no-reset-on-power-off' property
  dt-bindings: HID: i2c-hid: elan: add Elan eKTH5015M
  dt-bindings: HID: i2c-hid: add dedicated Ilitek ILI2901 schema
  input: Add support for "Do Not Disturb"
  input: Add event code for accessibility key
  hid: asus: asus_report_fixup: fix potential read out of bounds
  HID: logitech-hidpp: add missing MODULE_DESCRIPTION() macro
  HID: intel-ish-hid: fix endian-conversion
  HID: nintendo: Fix an error handling path in nintendo_hid_probe()
  HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode()
  HID: core: remove unnecessary WARN_ON() in implement()
  HID: nvidia-shield: Add missing check for input_ff_create_memless
  HID: intel-ish-hid: Fix build error for COMPILE_TEST
2024-06-08 10:48:11 -07:00
Linus Torvalds
329f70c5be Merge tag 'kbuild-fixes-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Fix the initial state of the save button in 'make gconfig'

 - Improve the Kconfig documentation

 - Fix a Kconfig bug regarding property visibility

 - Fix build breakage for systems where 'sed' is not installed in /bin

 - Fix a false warning about missing MODULE_DESCRIPTION()

* tag 'kbuild-fixes-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  modpost: do not warn about missing MODULE_DESCRIPTION() for vmlinux.o
  kbuild: explicitly run mksysmap as sed script from link-vmlinux.sh
  kconfig: remove wrong expr_trans_bool()
  kconfig: doc: document behavior of 'select' and 'imply' followed by 'if'
  kconfig: doc: fix a typo in the note about 'imply'
  kconfig: gconf: give a proper initial state to the Save button
  kconfig: remove unneeded code for user-supplied values being out of range
2024-06-08 10:12:33 -07:00
Linus Torvalds
1e7ccdd325 Merge tag 'media/v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:

 - fixes for the new ipu6 driver (and related fixes to mei csi driver)

 - fix a double debugfs remove logic at mgb4 driver

 - a documentation fix

* tag 'media/v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: intel/ipu6: add csi2 port sanity check in notifier bound
  media: intel/ipu6: update the maximum supported csi2 port number to 6
  media: mei: csi: Warn less verbosely of a missing device fwnode
  media: mei: csi: Put the IPU device reference
  media: intel/ipu6: fix the buffer flags caused by wrong parentheses
  media: intel/ipu6: Fix an error handling path in isys_probe()
  media: intel/ipu6: Move isys_remove() close to isys_probe()
  media: intel/ipu6: Fix some redundant resources freeing in ipu6_pci_remove()
  media: Documentation: v4l: Fix ACTIVE route flag
  media: mgb4: Fix double debugfs remove
2024-06-08 09:57:09 -07:00
Linus Torvalds
36714d69b1 Merge tag 'irq-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:

 - Fix possible memory leak the riscv-intc irqchip driver load failures

 - Fix boot crash in the sifive-plic irqchip driver caused by recently
   changed boot initialization order

 - Fix race condition in the gic-v3-its irqchip driver

* tag 'irq-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()
  irqchip/sifive-plic: Chain to parent IRQ after handlers are ready
  irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails
2024-06-08 09:44:50 -07:00
Linus Torvalds
7cedb020d5 Merge tag 'x86-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Miscellaneous fixes:

   - Fix kexec() crash if call depth tracking is enabled

   - Fix SMN reads on inaccessible registers on certain AMD systems"

* tag 'x86-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_nb: Check for invalid SMN reads
  x86/kexec: Fix bug with call depth tracking
2024-06-08 09:36:08 -07:00
Linus Torvalds
7cec2e16cb Merge tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
 "Fix race between perf_event_free_task() and perf_event_release_kernel()
  that can result in missed wakeups and hung tasks"

* tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix missing wakeup when waiting for context reference
2024-06-08 09:26:59 -07:00
Linus Torvalds
bbc5332b8c Merge tag 'locking-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking doc fix from Ingo Molnar:
 "Fix typos in the kerneldoc of some of the atomic APIs"

* tag 'locking-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: scripts: fix ${atomic}_sub_and_test() kerneldoc
2024-06-08 09:03:46 -07:00
Linus Torvalds
dc772f8237 Merge tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "14 hotfixes, 6 of which are cc:stable.

  All except the nilfs2 fix affect MM and all are singletons - see the
  chagelogs for details"

* tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
  mm: fix xyz_noprof functions calling profiled functions
  codetag: avoid race at alloc_slab_obj_exts
  mm/hugetlb: do not call vma_add_reservation upon ENOMEM
  mm/ksm: fix ksm_zero_pages accounting
  mm/ksm: fix ksm_pages_scanned accounting
  kmsan: do not wipe out origin when doing partial unpoisoning
  vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
  mm: page_alloc: fix highatomic typing in multi-block buddies
  nilfs2: fix potential kernel bug due to lack of writeback flag waiting
  memcg: remove the lockdep assert from __mod_objcg_mlstate()
  mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
  mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
  mm: drop the 'anon_' prefix for swap-out mTHP counters
2024-06-07 17:01:10 -07:00
Linus Torvalds
e60721bf3c Merge tag 'gpio-fixes-for-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:

 - interrupt handling and Kconfig fixes for gpio-tqmx86

 - add a buffer for storing output values in gpio-tqmx86 as reading back
   the registers always returns the input values

 - add missing MODULE_DESCRIPTION()s to several GPIO drivers

* tag 'gpio-fixes-for-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: add missing MODULE_DESCRIPTION() macros
  gpio: tqmx86: fix broken IRQ_TYPE_EDGE_BOTH interrupt type
  gpio: tqmx86: store IRQ trigger type and unmask status separately
  gpio: tqmx86: introduce shadow register for GPIO output value
  gpio: tqmx86: fix typo in Kconfig label
2024-06-07 16:54:57 -07:00
Linus Torvalds
602079a0a1 Merge tag 'block-6.10-20240607' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Fix for null_blk block size validation (Andreas)

 - NVMe pull request via Keith:
      - Use reserved tags for special fabrics operations (Chunguang)
      - Persistent Reservation status masking fix (Weiwen)

* tag 'block-6.10-20240607' of git://git.kernel.dk/linux:
  null_blk: fix validation of block size
  nvme: fix nvme_pr_* status code parsing
  nvme-fabrics: use reserved tag for reg read/write command
2024-06-07 16:45:48 -07:00
Linus Torvalds
e33915892d Merge tag 'io_uring-6.10-20240607' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:

 - Fix a locking order issue with setting max async thread workers
   (Hagar)

 - Fix for a NULL pointer dereference for failed async flagged requests
   using ring provided buffers. This doesn't affect the current kernel,
   but it does affect older kernels, and is being queued up for 6.10
   just to make the stable process easier (me)

 - Fix for NAPI timeout calculations for how long to busy poll, and
   subsequently how much to sleep post that if a wait timeout is passed
   in (me)

 - Fix for a regression in this release cycle, where we could end up
   using a partially unitialized match value for io-wq (Su)

* tag 'io_uring-6.10-20240607' of git://git.kernel.dk/linux:
  io_uring: fix possible deadlock in io_register_iowq_max_workers()
  io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
  io_uring/napi: fix timeout calculation
  io_uring: check for non-NULL file pointer in io_file_can_poll()
2024-06-07 16:43:07 -07:00
Linus Torvalds
07978330e6 Merge tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix handling of folio private changes.

   The private value holds pointer to our extent buffer structure
   representing a metadata range. Release and create of the range was
   not properly synchronized when updating the private bit which ended
   up in double folio_put, leading to all sorts of breakage

 - fix a crash, reported as duplicate key in metadata, but caused by a
   race of fsync and size extending write. Requires prealloc target
   range + fsync and other conditions (log tree state, timing)

 - fix leak of qgroup extent records after transaction abort

* tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: protect folio::private when attaching extent buffer folios
  btrfs: fix leak of qgroup extent records after transaction abort
  btrfs: fix crash on racing fsync and size-extending write into prealloc
2024-06-07 15:13:12 -07:00
Linus Torvalds
eecba7c070 Merge tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:

 - Fix an occasional memory overwrite caused by a fix added in 6.10

* tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix loop termination condition in gss_free_in_token_pages()
2024-06-07 15:07:57 -07:00
Linus Torvalds
0a02756d91 Merge tag 'riscv-for-linus-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - Another fix to avoid allocating pages that overlap with ERR_PTR,
   which manifests on rv32

 - A revert for the badaccess patch I incorrectly picked up an early
   version of

* tag 'riscv-for-linus-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  Revert "riscv: mm: accelerate pagefault when badaccess"
  riscv: fix overlap of allocated page and PTR_ERR
2024-06-07 14:47:38 -07:00
Linus Torvalds
8d6b029e15 Merge tag 's390-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:

 - Do not create PT_LOAD program header for the kenel image when the
   virtual memory informaton in OS_INFO data is not available. That
   fixes stand-alone dump failures against kernels that do not provide
   the virtual memory informaton

 - Add KVM s390 shared zeropage selftest

* tag 's390-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  KVM: s390x: selftests: Add shared zeropage test
  s390/crash: Do not use VM info if os_info does not have it
2024-06-07 14:44:53 -07:00
Linus Torvalds
8d437867ba Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:

 - Fix spurious CPU hotplug warning message from SETEND emulation code

 - Fix the build when GCC wasn't inlining our I/O accessor internals

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/io: add constant-argument check
  arm64: armv8_deprecated: Fix warning in isndep cpuhp starting process
2024-06-07 14:36:57 -07:00
Linus Torvalds
96e09b8f81 Merge tag 'platform-drivers-x86-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:

 -  Default silead touchscreen driver to 10 fingers and drop 10 finger
    setting from all DMI quirks. More of a cleanup then a pure fix, but
    since the DMI quirks always get updated through the fixes branch
    this avoids conflicts.

 -  Kconfig fix for randconfig builds

 -  dell-smbios: Fix wrong token data in sysfs

 -  amd-hsmp: Fix driver poking unsupported hw when loaded manually

* tag 'platform-drivers-x86-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/amd/hsmp: Check HSMP support on AMD family of processors
  platform/x86: dell-smbios: Simplify error handling
  platform/x86: dell-smbios: Fix wrong token data in sysfs
  platform/x86: yt2-1380: add CONFIG_EXTCON dependency
  platform/x86: touchscreen_dmi: Use 2-argument strscpy()
  platform/x86: touchscreen_dmi: Drop "silead,max-fingers" property
  Input: silead - Always support 10 fingers
2024-06-07 14:13:46 -07:00
Linus Torvalds
f24b46ea10 Merge tag 'iommu-fixes-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
 "Core:

   - Make iommu-dma code recognize 'force_aperture' again

   - Fix for potential NULL-ptr dereference from iommu_sva_bind_device()
     return value

  AMD IOMMU fixes:

   - Fix lockdep splat for invalid wait context

   - Add feature bit check before enabling PPR

   - Make workqueue name fit into buffer

   - Fix memory leak in sysfs code"

* tag 'iommu-fixes-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix Invalid wait context issue
  iommu/amd: Check EFR[EPHSup] bit before enabling PPR
  iommu/amd: Fix workqueue name
  iommu: Return right value in iommu_sva_bind_device()
  iommu/dma: Fix domain init
  iommu/amd: Fix sysfs leak in iommu init
2024-06-07 13:34:53 -07:00
Linus Torvalds
e693c5026c Merge tag 'ata-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:

 - Fix a regression for the PATA MacIO driver were it would fail to
   probe because of the recent changes of initializing the limits in
   SCSI core

* tag 'ata-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K
2024-06-07 12:47:20 -07:00
Linus Torvalds
2e32d58075 Merge tag 'drm-fixes-2024-06-07' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Weekly fixes: vmwgfx leads the way this week, with minor changes in xe
  and amdgpu and a couple of other small fixes.

  Seems quiet enough.

  xe:
   - Update the LMTT when freeing VF GT config

  amdgpu:
   - Fix shutdown issues on some SMU 13.x platforms
   - Silence some UBSAN flexible array warnings

  panel:
   - sitronix-st7789v: handle of_drm_get_panel_orientation failing
     error

  vmwgfx:
   - filter modes greater than available graphics memory
   - fix 3D vs STDU enable
   - remove STDU logic from mode valid
   - logging fix
   - memcmp pointers fix
   - remove unused struct
   - screen target lifetime fix

  komeda:
   - unused struct removal"

* tag 'drm-fixes-2024-06-07' of https://gitlab.freedesktop.org/drm/kernel:
  drm/vmwgfx: Don't memcmp equivalent pointers
  drm/vmwgfx: remove unused struct 'vmw_stdu_dma'
  drm/vmwgfx: Don't destroy Screen Target when CRTC is enabled but inactive
  drm/vmwgfx: Standardize use of kibibytes when logging
  drm/vmwgfx: Remove STDU logic from generic mode_valid function
  drm/vmwgfx: 3D disabled should not effect STDU memory limits
  drm/vmwgfx: Filter modes which exceed graphics memory
  drm/amdgpu/pptable: Fix UBSAN array-index-out-of-bounds
  drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platforms
  drm/xe/pf: Update the LMTT when freeing VF GT config
  drm/panel: sitronix-st7789v: Add check for of_drm_get_panel_orientation
  drm/komeda: remove unused struct 'gamma_curve_segment'
2024-06-07 12:35:56 -07:00
Greg Kroah-Hartman
1092c4126b Merge tag 'thunderbolt-for-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next
Mika writes:

thunderbolt: Fix for v6.10-rc3

This includes one USB4/Thunderbolt fix for v6.10-rc3:

  - Fix lane margining debugfs node creation condition.

This has been in linux-next with no reported issues.
2024-06-07 21:08:20 +02:00
Greg Kroah-Hartman
8f40af3197 Merge tag 'iio-fixes-for-6.10a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus
Jonathan writes:

1st set of IIO fixes for the 6.10 cycle.

The usual mixed bag of old and new driver bugs plus one core issue that
highlighted we have some documentation issues that we need to fix
as a follow up action.

core in kernel interface
- Wrong return value documentation didn't help with an error in
  a cleanup. Result is that thermal is failing to read the temperature.

adi,ad3552r
- Fix DT binding output range sign error.
adi,ad5592r
- Wrong scaling on temperature channel.
adi,ad7173
- Driver assumed all supported devices had input buffers. Make sure not to
  enable them on the ad7176-2 which doesn't.
- Add some missing device names for recently added models.
- Drop an unneeded zero index on the single temperature channel.
- Clear buffered capture specific control bit when returning to on
  demand sampling which otherwise no longer works.
- Make sampling frequency per channel rather than just setting it for the
  first channel.
adi,ad9467
- Capital S for sign of channel whereas ABI is lowercase.
bosch,bmi323
- Make sure to release the trigger even on error paths in the trigger
  handler as otherwise there is no path to recover.
bosch,bmp280
- Avoid an overflow in calculating the temperature.
invensense,timestamp helper
- Fix case where ODR is being switched to the existing ODR an update that
  never finishes.
- Fix an issue with the timestamp being updated whilst still handling
  previous interrupt (icm42600 and mpu6050)
invensense,icm42600
- Don't update the watermark parameters twice.
melexis,mlx90635
- Fix variable returned as error code.

* tag 'iio-fixes-for-6.10a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
  iio: inkern: fix channel read regression
  iio: imu: inv_mpu6050: stabilized timestamping in interrupt
  iio: adc: ad7173: Fix sampling frequency setting
  iio: adc: ad7173: Clear append status bit
  iio: imu: inv_icm42600: delete unneeded update watermark call
  iio: imu: inv_icm42600: stabilized timestamp in interrupt
  iio: invensense: fix odr switching to same value
  iio: adc: ad7173: Remove index from temp channel
  iio: adc: ad7173: Add ad7173_device_info names
  iio: adc: ad7173: fix buffers enablement for ad7176-2
  iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
  iio: imu: bmi323: Fix trigger notification in case of error
  iio: dac: ad5592r: fix temperature channel scaling value
  iio: pressure: bmp280: Fix BMP580 temperature reading
  dt-bindings: iio: dac: fix ad354xr output range
  iio: adc: ad9467: fix scan type sign
2024-06-07 21:05:39 +02:00
Mario Limonciello
e79a10652b ACPI: x86: Force StorageD3Enable on more products
A Rembrandt-based HP thin client is reported to have problems where
the NVME disk isn't present after resume from s2idle.

This is because the NVME disk wasn't put into D3 at suspend, and
that happened because the StorageD3Enable _DSD was missing in the BIOS.

As AMD's architecture requires that the NVME is in D3 for s2idle, adjust
the criteria for force_storage_d3 to match *all* Zen SoCs when the FADT
advertises low power idle support.

This will ensure that any future products with this BIOS deficiency don't
need to be added to the allow list of overrides.

Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 20:24:07 +02:00
Louis Dalibard
a3a5a37efb HID: Ignore battery for ELAN touchscreens 2F2C and 4116
At least ASUS Zenbook 14 (2023) and ASUS Zenbook 14 Pro (2023) are affected.

The touchscreen reports a battery status of 0% and jumps to 1% when a
stylus is used.

The device ID was added and the battery ignore quirk was enabled for it.

[jkosina@suse.com: reformatted changelog a bit]
Signed-off-by: Louis Dalibard <ontake@ontake.dev>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-06-07 17:02:56 +02:00
Jani Nikula
0c76053e3f drm: have config DRM_WERROR depend on !WERROR
If WERROR is already enabled, there's no point in enabling DRM_WERROR or
asking users about it.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/r/CAHk-=whxT8D_0j=bjtrvj-O=VEOjn6GW8GK4j2V+BiDUntZKAQ@mail.gmail.com
Fixes: f89632a9e5 ("drm: Add CONFIG_DRM_WERROR")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516083343.1375687-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-06-07 16:28:34 +03:00
Aleksandr Mishin
c44711b786 liquidio: Adjust a NULL pointer handling path in lio_vf_rep_copy_packet
In lio_vf_rep_copy_packet() pg_info->page is compared to a NULL value,
but then it is unconditionally passed to skb_add_rx_frag() which looks
strange and could lead to null pointer dereference.

lio_vf_rep_copy_packet() call trace looks like:
	octeon_droq_process_packets
	 octeon_droq_fast_process_packets
	  octeon_droq_dispatch_pkt
	   octeon_create_recv_info
	    ...search in the dispatch_list...
	     ->disp_fn(rdisp->rinfo, ...)
	      lio_vf_rep_pkt_recv(struct octeon_recv_info *recv_info, ...)
In this path there is no code which sets pg_info->page to NULL.
So this check looks unneeded and doesn't solve potential problem.
But I guess the author had reason to add a check and I have no such card
and can't do real test.
In addition, the code in the function liquidio_push_packet() in
liquidio/lio_core.c does exactly the same.

Based on this, I consider the most acceptable compromise solution to
adjust this issue by moving skb_add_rx_frag() into conditional scope.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 1f233f3279 ("liquidio: switchdev support for LiquidIO NIC")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-07 14:22:19 +01:00
Rafael J. Wysocki
7f18bd49cb thermal: ACPI: Invalidate trip points with temperature of 0 or below
It is reported that commit 9502108876 ("thermal: core: Drop
trips_disabled bitmask") causes the maximum frequency of CPUs to drop
further down with every system sleep-wake cycle on Intel Core i7-4710HQ.

This turns out to be due to a trip point whose temperature is equal to 0
degrees Celsius which is acted on every time the system wakes from sleep.

Before commit 9502108876 this trip point would be disabled wia the
trips_disabled bitmask, but now it is treated as a valid one.

Since ACPI thermal control is generally about protection against
overheating, trip points with temperature of 0 centigrade or below are
not particularly useful there, so initialize them all as invalid which
fixes the problem at hand.

Fixes: 9502108876 ("thermal: core: Drop trips_disabled bitmask")
Closes: https://lore.kernel.org/linux-pm/3f71747b-f852-4ee0-b384-cf46b2aefa3f@gmx.com
Reported-by: Tibor Billes <tbilles@gmx.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Cc: 6.7+ <stable@vger.kernel.org> # 6.7+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 13:52:17 +02:00
Rafael J. Wysocki
1af89dedc8 thermal: core: Do not fail cdev registration because of invalid initial state
It is reported that commit 31a0fa0019 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan.  If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019.

Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.

Accordingly, modify __thermal_cooling_device_register() to avoid
calling thermal_debug_cdev_add() instead of returning an error if the
initial .get_cur_state() callback invocation fails.

Fixes: 31a0fa0019 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Laura Nao <laura.nao@collabora.com>
2024-06-07 13:51:51 +02:00
David S. Miller
dbfb886465 Merge branch 'hns3-fixes'
Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver

There are some bugfix for the HNS3 ethernet driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-07 12:20:28 +01:00
Jie Wang
968fde8384 net: hns3: add cond_resched() to hns3 ring buffer init process
Currently hns3 ring buffer init process would hold cpu too long with big
Tx/Rx ring depth. This could cause soft lockup.

So this patch adds cond_resched() to the process. Then cpu can break to
run other tasks instead of busy looping.

Fixes: a723fb8efe ("net: hns3: refine for set ring parameters")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-07 12:20:28 +01:00
Yonglong Liu
12cda92021 net: hns3: fix kernel crash problem in concurrent scenario
When link status change, the nic driver need to notify the roce
driver to handle this event, but at this time, the roce driver
may uninit, then cause kernel crash.

To fix the problem, when link status change, need to check
whether the roce registered, and when uninit, need to wait link
update finish.

Fixes: 45e92b7e4e ("net: hns3: add calling roce callback function when link status change")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-07 12:20:28 +01:00
Udit Kumar
b472b996a4 dt-bindings: net: dp8386x: Add MIT license along with GPL-2.0
Modify license to include dual licensing as GPL-2.0-only OR MIT
license for TI specific phy header files. This allows for Linux
kernel files to be used in other Operating System ecosystems
such as Zephyr or FreeBSD.

While at this, update the GPL-2.0 to be GPL-2.0-only to be in sync
with latest SPDX conventions (GPL-2.0 is deprecated).

While at this, update the TI copyright year to sync with current year
to indicate license change.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Trent Piepho <tpiepho@impinj.com>
Cc: Wadim Egorov <w.egorov@phytec.de>
Cc: Kip Broadhurst <kbroadhurst@ti.com>
Signed-off-by: Udit Kumar <u-kumar1@ti.com>
Acked-by: Wadim Egorov <w.egorov@phytec.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-07 12:16:22 +01:00
Johan Hovold
0eafc58f21 HID: i2c-hid: elan: fix reset suspend current leakage
The Elan eKTH5015M touch controller found on the Lenovo ThinkPad X13s
shares the VCC33 supply with other peripherals that may remain powered
during suspend (e.g. when enabled as wakeup sources).

The reset line is also wired so that it can be left deasserted when the
supply is off.

This is important as it avoids holding the controller in reset for
extended periods of time when it remains powered, which can lead to
increased power consumption, and also avoids leaking current through the
X13s reset circuitry during suspend (and after driver unbind).

Use the new 'no-reset-on-power-off' devicetree property to determine
when reset needs to be asserted on power down.

Notably this also avoids wasting power on machine variants without a
touchscreen for which the driver would otherwise exit probe with reset
asserted.

Fixes: bd3cba00dc ("HID: i2c-hid: elan: Add support for Elan eKTH6915 i2c-hid touchscreens")
Cc: <stable@vger.kernel.org>	# 6.0
Cc: Douglas Anderson <dianders@chromium.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240507144821.12275-5-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:18:11 +02:00
Johan Hovold
e538d4b85b dt-bindings: HID: i2c-hid: elan: add 'no-reset-on-power-off' property
When the power supply is shared with other peripherals the reset line
can be wired in such a way that it can remain deasserted regardless of
whether the supply is on or not.

This is important as it can be used to avoid holding the controller in
reset for extended periods of time when it remains powered, something
which can lead to increased power consumption. Leaving reset deasserted
also avoids leaking current through the reset circuitry pull-up
resistors.

Add a new 'no-reset-on-power-off' devicetree property which can be used
by the OS to determine when reset needs to be asserted on power down.

Note that this property can also be used when the supply cannot be
turned off by the OS at all.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240507144821.12275-4-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:16:55 +02:00
Johan Hovold
07fc16fa55 dt-bindings: HID: i2c-hid: elan: add Elan eKTH5015M
Add a compatible string for the Elan eKTH5015M touch controller.

Judging from the current binding and commit bd3cba00dc ("HID: i2c-hid:
elan: Add support for Elan eKTH6915 i2c-hid touchscreens"), eKTH5015M
appears to be compatible with eKTH6915. Notably the power-on sequence is
the same.

While at it, drop a redundant label from the example.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240507144821.12275-3-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:16:55 +02:00
Johan Hovold
8d3ae46c64 dt-bindings: HID: i2c-hid: add dedicated Ilitek ILI2901 schema
The Ilitek ILI2901 touch screen controller was apparently incorrectly
added to the Elan eKTH6915 schema simply because it also has a reset
gpio and is currently managed by the Elan driver in Linux.

The two controllers are not related even if an unfortunate wording in
the commit message adding the Ilitek compatible made it sound like they
were.

Add a dedicated schema for the ILI2901 which does not specify the I2C
address (which is likely 0x41 rather than 0x10 as for other Ilitek touch
controllers) to avoid cluttering the Elan schema with unrelated devices
and to make it easier to find the correct schema when adding further
Ilitek controllers.

Fixes: d74ac6f60a ("dt-bindings: HID: i2c-hid: elan: Introduce Ilitek ili2901")
Cc: Zhengqiao Xia <xiazhengqiao@huaqin.corp-partner.google.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240507144821.12275-2-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:16:55 +02:00
Aseda Aboagye
22d6d060ac input: Add support for "Do Not Disturb"
HUTRR94 added support for a new usage titled "System Do Not Disturb"
which toggles a system-wide Do Not Disturb setting. This commit simply
adds a new event code for the usage.

Signed-off-by: Aseda Aboagye <aaboagye@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/Zl-gUHE70s7wCAoB@google.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:08:07 +02:00
Aseda Aboagye
0c7dd00de0 input: Add event code for accessibility key
HUTRR116 added support for a new usage titled "System Accessibility
Binding" which toggles a system-wide bound accessibility UI or command.
This commit simply adds a new event code for the usage.

Signed-off-by: Aseda Aboagye <aaboagye@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/Zl-e97O9nvudco5z@google.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:08:07 +02:00
Andrew Ballance
89e1ee118d hid: asus: asus_report_fixup: fix potential read out of bounds
syzbot reported a potential read out of bounds in asus_report_fixup.

this patch adds checks so that a read out of bounds will not occur

Signed-off-by: Andrew Ballance <andrewjballance@gmail.com>
Reported-by:  <syzbot+07762f019fd03d01f04c@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=07762f019fd03d01f04c
Fixes: 59d2f5b739 ("HID: asus: fix more n-key report descriptors if n-key quirked")
Link: https://lore.kernel.org/r/20240602085023.1720492-1-andrewjballance@gmail.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-07 11:07:40 +02:00
Jeff Johnson
64054eb716 gpio: add missing MODULE_DESCRIPTION() macros
On x86, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpio/gpio-gw-pld.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpio/gpio-mc33880.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpio/gpio-pcf857x.o

Add the missing invocations of the MODULE_DESCRIPTION() macro,
including the one missing in gpio-pl061.c, which is not built for x86.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240606-md-drivers-gpio-v1-1-cb42d240ca5c@quicinc.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-06-07 10:05:21 +02:00
David Howells
a88d609036 cifs: Don't advance the I/O iterator before terminating subrequest
There's now no need to make sure subreq->io_iter is advanced to match
subreq->transferred before calling one of the netfs subrequest termination
functions as the check has been removed netfslib and the iterator is reset
prior to retrying a subreq.

Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-07 01:05:26 -05:00
Enzo Matsumiya
02c418774f smb: client: fix deadlock in smb2_find_smb_tcon()
Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such
deadlock.

Cc: stable@vger.kernel.org
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-07 01:05:07 -05:00
Csókás, Bence
e96b293315 net: sfp: Always call sfp_sm_mod_remove() on remove
If the module is in SFP_MOD_ERROR, `sfp_sm_mod_remove()` will
not be run. As a consequence, `sfp_hwmon_remove()` is not getting
run either, leaving a stale `hwmon` device behind. `sfp_sm_mod_remove()`
itself checks `sfp->sm_mod_state` anyways, so this check was not
really needed in the first place.

Fixes: d2e816c029 ("net: sfp: handle module remove outside state machine")
Signed-off-by: "Csókás, Bence" <csokas.bence@prolan.hu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240605084251.63502-1-csokas.bence@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 17:34:03 -07:00
Masahiro Yamada
9185afeac2 modpost: do not warn about missing MODULE_DESCRIPTION() for vmlinux.o
Building with W=1 incorrectly emits the following warning:

  WARNING: modpost: missing MODULE_DESCRIPTION() in vmlinux.o

This check should apply only to modules.

Fixes: 1fffe7a34c ("script: modpost: emit a warning when the description is missing")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2024-06-07 08:42:31 +09:00
Richard Acayan
96c965667b kbuild: explicitly run mksysmap as sed script from link-vmlinux.sh
In commit b18b047002 ("kbuild: change scripts/mksysmap into sed
script"), the mksysmap script was transformed into a sed script,
made directly executable with "#!/bin/sed -f". Apparently, the path to
sed is different on NixOS.

The shebang can't use the env command, otherwise the "sed -f" command
would be treated as a single argument. This can be solved with the -S
flag, but that is a GNU extension. Explicitly use sed instead of relying
on the executable shebang to fix NixOS builds without breaking build
environments using Busybox.

Fixes: b18b047002 ("kbuild: change scripts/mksysmap into sed script")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-06-07 08:42:14 +09:00
Dave Airlie
eb55943aab Merge tag 'drm-misc-next-fixes-2024-06-07' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-next-fixes for v6.10-rc3:
- Single unused struct removal that should have been in -fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0251b6ae-bffa-44b2-b698-955712c25a27@linux.intel.com
2024-06-07 08:40:58 +10:00
Dave Airlie
26033424ed Merge tag 'drm-misc-fixes-2024-06-07' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.10-rc3:
- Robustness fixes for vmwgfx.
- Error check for of_drm_get_panel_orientation failing in
  sitronix-st7789v.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d5645d00-a8cf-47d9-a2a0-4ff55842fc7d@linux.intel.com
2024-06-07 08:37:25 +10:00
Dave Airlie
2d42183110 Merge tag 'amd-drm-fixes-6.10-2024-06-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.10-2024-06-06:

amdgpu:
- Fix shutdown issues on some SMU 13.x platforms
- Silence some UBSAN flexible array warnings

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240606192348.3620805-1-alexander.deucher@amd.com
2024-06-07 08:22:59 +10:00
Linus Torvalds
8a92980606 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "The core change is to detect unusually large number of VPD pages
  (caused by device manufacturers having an endiannes issue) and reject
  them rather than trying to parse a huge non-existent array.

  The remaining fixes are in drivers the most user visible of which is
  the ALUA state transition recognition (leads to intermittent I/O
  errors in some situations otherwise)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort()
  scsi: core: Handle devices which return an unusually large VPD page count
  scsi: mpt3sas: Add missing kerneldoc parameter descriptions
  scsi: qedf: Set qed_slowpath_params to zero before use
  scsi: qedf: Wait for stag work during unload
  scsi: qedf: Don't process stag work during unload and recovery
  scsi: sr: Fix unintentional arithmetic wraparound
  scsi: core: alua: I/O errors for ALUA state transitions
  scsi: mpi3mr: Use proper format specifier in mpi3mr_sas_port_add()
2024-06-06 14:40:51 -07:00
Linus Torvalds
d91e656262 Merge tag 'pci-v6.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:

 - Revert lockdep checking on locking that protects device resets from
   user-space config accesses; it exposed issues for which fixes are in
   the works but are too risky for this cycle (Dan Williams)

* tag 'pci-v6.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Revert the cfg_access_lock lockdep mechanism
2024-06-06 14:28:11 -07:00
Thomas Hellström
a9f9b30e17 MAINTAINERS: Update Xe driver maintainers
Add Rodrigo Vivi as an Xe driver maintainer.

v2:
- Cc also Lucas De Marchi (Rodrigo vivi)
- Remove a blank line in commit the commit message (Lucas De Marchi)

Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240602190959.2981-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-06-06 12:47:47 -07:00
Oded Gabbay
bb93148ca8 MAINTAINERS: update Xe driver maintainers
Because I left Intel, I'm removing myself from the list
of Xe driver maintainers.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240515162222.12958-3-ogabbay@kernel.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 8de6625daf)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-06-06 12:47:47 -07:00
Qu Wenruo
f3a5367c67 btrfs: protect folio::private when attaching extent buffer folios
[BUG]
Since v6.8 there are rare kernel crashes reported by various people,
the common factor is bad page status error messages like this:

  BUG: Bad page state in process kswapd0  pfn:d6e840
  page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c
  pfn:0xd6e840
  aops:btree_aops ino:1
  flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)
  page_type: 0xffffffff()
  raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0
  raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: non-NULL mapping

[CAUSE]
Commit 09e6cef19c ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") changes the sequence when allocating a new
extent buffer.

Previously we always called grab_extent_buffer() under
mapping->i_private_lock, to ensure the safety on modification on
folio::private (which is a pointer to extent buffer for regular
sectorsize).

This can lead to the following race:

Thread A is trying to allocate an extent buffer at bytenr X, with 4
4K pages, meanwhile thread B is trying to release the page at X + 4K
(the second page of the extent buffer at X).

           Thread A                |                 Thread B
-----------------------------------+-------------------------------------
                                   | btree_release_folio()
				   | | This is for the page at X + 4K,
				   | | Not page X.
				   | |
alloc_extent_buffer()              | |- release_extent_buffer()
|- filemap_add_folio() for the     | |  |- atomic_dec_and_test(eb->refs)
|  page at bytenr X (the first     | |  |
|  page).                          | |  |
|  Which returned -EEXIST.         | |  |
|                                  | |  |
|- filemap_lock_folio()            | |  |
|  Returned the first page locked. | |  |
|                                  | |  |
|- grab_extent_buffer()            | |  |
|  |- atomic_inc_not_zero()        | |  |
|  |  Returned false               | |  |
|  |- folio_detach_private()       | |  |- folio_detach_private() for X
|     |- folio_test_private()      | |     |- folio_test_private()
      |  Returned true             | |     |  Returned true
      |- folio_put()               |       |- folio_put()

Now there are two puts on the same folio at folio X, leading to refcount
underflow of the folio X, and eventually causing the BUG_ON() on the
page->mapping.

The condition is not that easy to hit:

- The release must be triggered for the middle page of an eb
  If the release is on the same first page of an eb, page lock would kick
  in and prevent the race.

- folio_detach_private() has a very small race window
  It's only between folio_test_private() and folio_clear_private().

That's exactly when mapping->i_private_lock is used to prevent such race,
and commit 09e6cef19c ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") screwed that up.

At that time, I thought the page lock would kick in as
filemap_release_folio() also requires the page to be locked, but forgot
the filemap_release_folio() only locks one page, not all pages of an
extent buffer.

[FIX]
Move all the code requiring i_private_lock into
attach_eb_folio_to_filemap(), so that everything is done with proper
lock protection.

Furthermore to prevent future problems, add an extra
lockdep_assert_locked() to ensure we're holding the proper lock.

To reproducer that is able to hit the race (takes a few minutes with
instrumented code inserting delays to alloc_extent_buffer()):

  #!/bin/sh
  drop_caches () {
	  while(true); do
		  echo 3 > /proc/sys/vm/drop_caches
		  echo 1 > /proc/sys/vm/compact_memory
	  done
  }

  run_tar () {
	  while(true); do
		  for x in `seq 1 80` ; do
			  tar cf /dev/zero /mnt > /dev/null &
		  done
		  wait
	  done
  }

  mkfs.btrfs -f -d single -m single /dev/vda
  mount -o noatime /dev/vda /mnt
  # create 200,000 files, 1K each
  ./simoop -n 200000 -E -f 1k /mnt
  drop_caches &
  (run_tar)

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-btrfs/CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com/
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Link: https://lore.kernel.org/lkml/CABXGCsPktcHQOvKTbPaTwegMExije=Gpgci5NW=hqORo-s7diA@mail.gmail.com/
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/linux-btrfs/e8b3311c-9a75-4903-907f-fc0f7a3fe423@gmx.de/
Reported-by: syzbot+f80b066392366b4af85e@syzkaller.appspotmail.com
Fixes: 09e6cef19c ("btrfs: refactor alloc_extent_buffer() to allocate-then-attach method")
CC: stable@vger.kernel.org # 6.8+
CC: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-06 21:42:22 +02:00
Jan Beulich
3ac36aa730 x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
memblock_set_node() warns about using MAX_NUMNODES, see

  e0eec24e2e ("memblock: make memblock_set_node() also warn about use of MAX_NUMNODES")

for details.

Reported-by: Narasimhan V <Narasimhan.V@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
[bp: commit message]
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240603141005.23261-1-bp@kernel.org
Link: https://lore.kernel.org/r/abadb736-a239-49e4-ab42-ace7acdd4278@suse.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-06-06 22:20:39 +03:00
Linus Torvalds
d30d0e49da Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from BPF and big collection of fixes for WiFi core and
  drivers.

  Current release - regressions:

   - vxlan: fix regression when dropping packets due to invalid src
     addresses

   - bpf: fix a potential use-after-free in bpf_link_free()

   - xdp: revert support for redirect to any xsk socket bound to the
     same UMEM as it can result in a corruption

   - virtio_net:
      - add missing lock protection when reading return code from
        control_buf
      - fix false-positive lockdep splat in DIM
      - Revert "wifi: wilc1000: convert list management to RCU"

   - wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config

  Previous releases - regressions:

   - rtnetlink: make the "split" NLM_DONE handling generic, restore the
     old behavior for two cases where we started coalescing those
     messages with normal messages, breaking sloppily-coded userspace

   - wifi:
      - cfg80211: validate HE operation element parsing
      - cfg80211: fix 6 GHz scan request building
      - mt76: mt7615: add missing chanctx ops
      - ath11k: move power type check to ASSOC stage, fix connecting to
        6 GHz AP
      - ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
      - rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
      - iwlwifi: mvm: fix a crash on 7265

  Previous releases - always broken:

   - ncsi: prevent multi-threaded channel probing, a spec violation

   - vmxnet3: disable rx data ring on dma allocation failure

   - ethtool: init tsinfo stats if requested, prevent unintentionally
     reporting all-zero stats on devices which don't implement any

   - dst_cache: fix possible races in less common IPv6 features

   - tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED

   - ax25: fix two refcounting bugs

   - eth: ionic: fix kernel panic in XDP_TX action

  Misc:

   - tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB"

* tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
  selftests: net: lib: set 'i' as local
  selftests: net: lib: avoid error removing empty netns name
  selftests: net: lib: support errexit with busywait
  net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
  ipv6: fix possible race in __fib6_drop_pcpu_from()
  af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
  af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
  af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
  af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
  af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
  af_unix: Annotate data-races around sk->sk_sndbuf.
  af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
  af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
  af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
  af_unix: Annotate data-race of sk->sk_state in unix_accept().
  af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
  af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
  af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
  af_unix: Annodate data-races around sk->sk_state for writers.
  af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
  ...
2024-06-06 09:55:27 -07:00
Linus Torvalds
2faf6332c5 Merge tag 'tomoyo-pr-20240606' of git://git.code.sf.net/p/tomoyo/tomoyo
Pull tomoyo fixlet from Tetsuo Handa:
 "Single patch to update project links, no behavior changes"

* tag 'tomoyo-pr-20240606' of git://git.code.sf.net/p/tomoyo/tomoyo:
  tomoyo: update project links
2024-06-06 09:48:57 -07:00
Linus Torvalds
a34adf6010 Merge tag 'efi-fixes-for-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Ensure that .discard sections are really discarded in the EFI zboot
   image build

 - Return proper error numbers from efi-pstore

 - Add __nocfi annotations to EFI runtime wrappers

* tag 'efi-fixes-for-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Add missing __nocfi annotations to runtime wrappers
  efi: pstore: Return proper errors on UEFI failures
  efi/libstub: zboot.lds: Discard .discard sections
2024-06-06 09:39:36 -07:00
Jakub Kicinski
27bc865408 Merge branch 'selftests-net-lib-small-fixes'
Matthieu Baerts says:

====================
selftests: net: lib: small fixes

While looking at using 'lib.sh' for the MPTCP selftests [1], we found
some small issues with 'lib.sh'. Here they are:

- Patch 1: fix 'errexit' (set -e) support with busywait. 'errexit' is
  supported in some functions, not all. A fix for v6.8+.

- Patch 2: avoid confusing error messages linked to the cleaning part
  when the netns setup fails. A fix for v6.8+.

- Patch 3: set a variable as local to avoid accidentally changing the
  value of a another one with the same name on the caller side. A fix
  for v6.10-rc1+.

Link: https://lore.kernel.org/mptcp/5f4615c3-0621-43c5-ad25-55747a4350ce@kernel.org/T/ [1]
====================

Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-0-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 08:29:07 -07:00
Matthieu Baerts (NGI0)
84a8bc3ec2 selftests: net: lib: set 'i' as local
Without this, the 'i' variable declared before could be overridden by
accident, e.g.

  for i in "${@}"; do
      __ksft_status_merge "${i}"  ## 'i' has been modified
      foo "${i}"                  ## using 'i' with an unexpected value
  done

After a quick look, it looks like 'i' is currently not used after having
been modified in __ksft_status_merge(), but still, better be safe than
sorry. I saw this while modifying the same file, not because I suspected
an issue somewhere.

Fixes: 596c8819cb ("selftests: forwarding: Have RET track kselftest framework constants")
Acked-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-3-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 08:29:07 -07:00
Matthieu Baerts (NGI0)
79322174bc selftests: net: lib: avoid error removing empty netns name
If there is an error to create the first netns with 'setup_ns()',
'cleanup_ns()' will be called with an empty string as first parameter.

The consequences is that 'cleanup_ns()' will try to delete an invalid
netns, and wait 20 seconds if the netns list is empty.

Instead of just checking if the name is not empty, convert the string
separated by spaces to an array. Manipulating the array is cleaner, and
calling 'cleanup_ns()' with an empty array will be a no-op.

Fixes: 25ae948b44 ("selftests/net: add lib.sh")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-2-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 08:29:07 -07:00
Matthieu Baerts (NGI0)
41b02ea4c0 selftests: net: lib: support errexit with busywait
If errexit is enabled ('set -e'), loopy_wait -- or busywait and others
using it -- will stop after the first failure.

Note that if the returned status of loopy_wait is checked, and even if
errexit is enabled, Bash will not stop at the first error.

Fixes: 25ae948b44 ("selftests/net: add lib.sh")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-1-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 08:29:07 -07:00
Michael Ellerman
09fe2bfa6b ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K
The pata_macio driver advertises a max_segment_size of 0xff00, because
the hardware doesn't cope with requests >= 64K.

However the SCSI core requires max_segment_size to be at least
PAGE_SIZE, which is a problem for pata_macio when the kernel is built
with 64K pages.

In older kernels the SCSI core would just increase the segment size to
be equal to PAGE_SIZE, however since the commit tagged below it causes a
warning and the device fails to probe:

  WARNING: CPU: 0 PID: 26 at block/blk-settings.c:202 .blk_validate_limits+0x2f8/0x35c
  CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 6.10.0-rc1 #1
  Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac
  ...
  NIP .blk_validate_limits+0x2f8/0x35c
  LR  .blk_alloc_queue+0xc0/0x2f8
  Call Trace:
    .blk_alloc_queue+0xc0/0x2f8
    .blk_mq_alloc_queue+0x60/0xf8
    .scsi_alloc_sdev+0x208/0x3c0
    .scsi_probe_and_add_lun+0x314/0x52c
    .__scsi_add_device+0x170/0x1a4
    .ata_scsi_scan_host+0x2bc/0x3e4
    .async_port_probe+0x6c/0xa0
    .async_run_entry_fn+0x60/0x1bc
    .process_one_work+0x228/0x510
    .worker_thread+0x360/0x530
    .kthread+0x134/0x13c
    .start_kernel_thread+0x10/0x14
  ...
  scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

Although the hardware can't cope with a 64K segment, the driver
already deals with that internally by splitting large requests in
pata_macio_qc_prep(). That is how the driver has managed to function
until now on 64K kernels.

So fix the driver to advertise a max_segment_size of 64K, which avoids
the warning and keeps the SCSI core happy.

Fixes: afd53a3d85 ("scsi: core: Initialize scsi midlayer limits before allocating the queue")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/ce2bf6af-4382-4fe1-b392-cc6829f5ceb2@roeck-us.net/
Reported-by: Doru Iorgulescu <doru.iorgulescu1@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218858
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-06-06 14:53:34 +02:00
Su Hui
0dcc53abf5 net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
Clang static checker (scan-build) warning:
net/ethtool/ioctl.c:line 2233, column 2
Called function pointer is null (null dereference).

Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix
this typo error.

Fixes: 201ed315f9 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:34:33 +02:00
Masahiro Yamada
77a92660d8 kconfig: remove wrong expr_trans_bool()
expr_trans_bool() performs an incorrect transformation.

[Test Code]

    config MODULES
            def_bool y
            modules

    config A
            def_bool y
            select C if B != n

    config B
            def_tristate m

    config C
            tristate

[Result]

    CONFIG_MODULES=y
    CONFIG_A=y
    CONFIG_B=m
    CONFIG_C=m

This output is incorrect because CONFIG_C=y is expected.

Documentation/kbuild/kconfig-language.rst clearly explains the function
of the '!=' operator:

    If the values of both symbols are equal, it returns 'n',
    otherwise 'y'.

Therefore, the statement:

    select C if B != n

should be equivalent to:

    select C if y

Or, more simply:

    select C

Hence, the symbol C should be selected by the value of A, which is 'y'.

However, expr_trans_bool() wrongly transforms it to:

    select C if B

Therefore, the symbol C is selected by (A && B), which is 'm'.

The comment block of expr_trans_bool() correctly explains its intention:

  * bool FOO!=n => FOO
    ^^^^

If FOO is bool, FOO!=n can be simplified into FOO. This is correct.

However, the actual code performs this transformation when FOO is
tristate:

    if (e->left.sym->type == S_TRISTATE) {
                             ^^^^^^^^^^

While it can be fixed to S_BOOLEAN, there is no point in doing so
because expr_tranform() already transforms FOO!=n to FOO when FOO is
bool. (see the "case E_UNEQUAL" part)

expr_trans_bool() is wrong and unnecessary.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
2024-06-06 20:09:10 +09:00
Eric Dumazet
b01e1c0307 ipv6: fix possible race in __fib6_drop_pcpu_from()
syzbot found a race in __fib6_drop_pcpu_from() [1]

If compiler reads more than once (*ppcpu_rt),
second read could read NULL, if another cpu clears
the value in rt6_get_pcpu_route().

Add a READ_ONCE() to prevent this race.

Also add rcu_read_lock()/rcu_read_unlock() because
we rely on RCU protection while dereferencing pcpu_rt.

[1]

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: netns cleanup_net
 RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984
Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48
RSP: 0018:ffffc900040df070 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16
RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091
RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8
R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline]
  fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline]
  fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038
  fib6_del_route net/ipv6/ip6_fib.c:1998 [inline]
  fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043
  fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205
  fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127
  fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175
  fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255
  __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271
  rt6_sync_down_dev net/ipv6/route.c:4906 [inline]
  rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911
  addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855
  addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778
  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992
  call_netdevice_notifiers_extack net/core/dev.c:2030 [inline]
  call_netdevice_notifiers net/core/dev.c:2044 [inline]
  dev_close_many+0x333/0x6a0 net/core/dev.c:1585
  unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193
  unregister_netdevice_many net/core/dev.c:11276 [inline]
  default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759
  ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
  cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
  process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
  process_scheduled_works kernel/workqueue.c:3312 [inline]
  worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
  kthread+0x2c1/0x3a0 kernel/kthread.c:389
  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Fixes: d52d3997f8 ("ipv6: Create percpu rt6_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:05:54 +02:00
Masahiro Yamada
45c7f555bf kconfig: doc: document behavior of 'select' and 'imply' followed by 'if'
Documentation/kbuild/kconfig-language.rst explains the behavior of
'select' as follows:

  reverse dependencies can be used to force a lower limit of
  another symbol. The value of the current menu symbol is used as the
  minimal value <symbol> can be set to.

This is not true when the 'select' property is followed by 'if'.

[Test Code]

    config MODULES
            def_bool y
            modules

    config A
            def_tristate y
            select C if B

    config B
            def_tristate m

    config C
            tristate

[Result]

    CONFIG_MODULES=y
    CONFIG_A=y
    CONFIG_B=m
    CONFIG_C=m

If "the value of A is used as the minimal value C can be set to",
C must be 'y'.

The actual behavior is "C is selected by (A && B)". The lower limit of
C is downgraded due to B being 'm'.

This behavior is kind of weird, and this has arisen several times in
the mailing list.

I do not know whether it is a bug or intended behavior. Anyway, it is
not feasible to change it now because many Kconfig files are written
based on this behavior. The same applies to 'imply'.

Document this (but reserve the possibility for a future change).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
2024-06-06 20:05:15 +09:00
Masahiro Yamada
bf83266a1e kconfig: doc: fix a typo in the note about 'imply'
This sentence does not make sense due to a typo. Fix it.

Fixes: def2fbffe6 ("kconfig: allow symbols implied by y to become m")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
2024-06-06 20:03:16 +09:00
Masahiro Yamada
46edf4372e kconfig: gconf: give a proper initial state to the Save button
Currently, the initial state of the "Save" button is always active.

If none of the CONFIG options are changed while loading the .config
file, the "Save" button should be greyed out.

This can be fixed by calling conf_read() after widget initialization.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-06-06 20:03:16 +09:00
Masahiro Yamada
c181689bc4 kconfig: remove unneeded code for user-supplied values being out of range
This is a leftover from commit ce1fc9345a ("kconfig: do not clear
SYMBOL_DEF_USER when the value is out of range").

This code is now redundant because if a user-supplied value is out
of range, the value adjusted by sym_validate_range() differs, and
conf_unsaved has already been incremented a few lines above.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-06-06 20:03:14 +09:00
Paolo Abeni
411c0ea696 Merge branch 'af_unix-fix-lockless-access-of-sk-sk_state-and-others-fields'
Kuniyuki Iwashima says:

====================
af_unix: Fix lockless access of sk->sk_state and others fields.

The patch 1 fixes a bug where SOCK_DGRAM's sk->sk_state is changed
to TCP_CLOSE even if the socket is connect()ed to another socket.

The rest of this series annotates lockless accesses to the following
fields.

  * sk->sk_state
  * sk->sk_sndbuf
  * net->unx.sysctl_max_dgram_qlen
  * sk->sk_receive_queue.qlen
  * sk->sk_shutdown

Note that with this series there is skb_queue_empty() left in
unix_dgram_disconnected() that needs to be changed to lockless
version, and unix_peer(other) access there should be protected
by unix_state_lock().

This will require some refactoring, so another series will follow.

Changes:
  v2:
    * Patch 1: Fix wrong double lock

  v1: https://lore.kernel.org/netdev/20240603143231.62085-1-kuniyu@amazon.com/
====================

Link: https://lore.kernel.org/r/20240604165241.44758-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:18 +02:00
Kuniyuki Iwashima
efaf24e30e af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().

Let's use READ_ONCE() to read sk->sk_shutdown.

Fixes: e4e541a848 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
5d915e584d af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.

If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.

Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().

Fixes: c9da99e647 ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
83690b82d2 af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().

However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock.  Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().

Thues, unix_state_lock() does not protect qlen.

Let's use skb_queue_empty_lockless() in unix_release_sock().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
45d872f0e6 af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.

It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().

Thus, we need to use unix_recvq_full_lockless() to avoid data-race.

Now we remove unix_recvq_full() as no one uses it.

Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:

  (1) For SOCK_DGRAM, it is a written-once field in unix_create1()

  (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
      listener's unix_state_lock() in unix_listen(), and we hold
      the lock in unix_stream_connect()

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
bd9f2d0573 af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.

Let's use READ_ONCE() in unix_create1().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
b0632e53e0 af_unix: Annotate data-races around sk->sk_sndbuf.
sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
not used in af_unix.c.

Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
unix_dgram_sendmsg(), and unix_stream_sendmsg().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
0aa3be7b3e af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.

Let's use READ_ONCE() there.

Note that the result could be inconsistent if the socket is dumped
during the state change.  This is common for other SOCK_DIAG and
similar interfaces.

Fixes: c9da99e647 ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6 ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
af4c733b6b af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
unix_stream_read_skb() is called from sk->sk_data_ready() context
where unix_state_lock() is not held.

Let's use READ_ONCE() there.

Fixes: 77462de14a ("af_unix: Add read_sock for stream socket types")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
8a34d4e8d9 af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
The following functions read sk->sk_state locklessly and proceed only if
the state is TCP_ESTABLISHED.

  * unix_stream_sendmsg
  * unix_stream_read_generic
  * unix_seqpacket_sendmsg
  * unix_seqpacket_recvmsg

Let's use READ_ONCE() there.

Fixes: a05d2ad1c1 ("af_unix: Only allow recv on connected seqpacket sockets.")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
1b536948e8 af_unix: Annotate data-race of sk->sk_state in unix_accept().
Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes the advantage and reads sk->sk_state without
holding unix_state_lock().

Let's use READ_ONCE() there.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
a9bf9c7dc6 af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
As small optimisation, unix_stream_connect() prefetches the client's
sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE.

Later, sk->sk_state is checked again under unix_state_lock().

Let's use READ_ONCE() for the first check and TCP_CLOSE directly for
the second check.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
eb0718fb3e af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and
calls unix_writable() which also reads sk->sk_state without holding
unix_state_lock().

Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass
it to unix_writable().

While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as
that state does not exist for AF_UNIX socket since the code was added.

Fixes: 1586a5877d ("af_unix: do not report POLLOUT on listeners")
Fixes: 3c73419c09 ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
3a0f38eb28 af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first
and returns -EINVAL if it's TCP_LISTEN.

Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of
bytes in recvq.

However, unix_inq_len() does not hold unix_state_lock(), and the
concurrent listen() might change the state after checking sk->sk_state.

If the race occurs, 0 is returned for the listener, instead of -EINVAL,
because the length of skb with embryo is 0.

We could hold unix_state_lock() in unix_inq_len(), but it's overkill
given the result is true for pre-listen() TCP_CLOSE state.

So, let's use READ_ONCE() for sk->sk_state in unix_inq_len().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
942238f973 af_unix: Annodate data-races around sk->sk_state for writers.
sk->sk_state is changed under unix_state_lock(), but it's read locklessly
in many places.

This patch adds WRITE_ONCE() on the writer side.

We will add READ_ONCE() to the lockless readers in the following patches.

Fixes: 83301b5367 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
26bfb8b570 af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
When a SOCK_DGRAM socket connect()s to another socket, the both sockets'
sk->sk_state are changed to TCP_ESTABLISHED so that we can register them
to BPF SOCKMAP.

When the socket disconnects from the peer by connect(AF_UNSPEC), the state
is set back to TCP_CLOSE.

Then, the peer's state is also set to TCP_CLOSE, but the update is done
locklessly and unconditionally.

Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects
from B.

After the first two connect()s, all three sockets' sk->sk_state are
TCP_ESTABLISHED:

  $ ss -xa
  Netid State  Recv-Q Send-Q  Local Address:Port  Peer Address:PortProcess
  u_dgr ESTAB  0      0       @A 641              * 642
  u_dgr ESTAB  0      0       @B 642              * 643
  u_dgr ESTAB  0      0       @C 643              * 0

And after the disconnect, B's state is TCP_CLOSE even though it's still
connected to C and C's state is TCP_ESTABLISHED.

  $ ss -xa
  Netid State  Recv-Q Send-Q  Local Address:Port  Peer Address:PortProcess
  u_dgr UNCONN 0      0       @A 641              * 0
  u_dgr UNCONN 0      0       @B 642              * 643
  u_dgr ESTAB  0      0       @C 643              * 0

In this case, we cannot register B to SOCKMAP.

So, when a socket disconnects from the peer, we should not set TCP_CLOSE to
the peer if the peer is connected to yet another socket, and this must be
done under unix_state_lock().

Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless
readers.  These data-races will be fixed in the following patches.

Fixes: 83301b5367 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Aleksandr Mishin
b0c9a26435 net: wwan: iosm: Fix tainted pointer delete is case of region creation fail
In case of region creation fail in ipc_devlink_create_region(), previously
created regions delete process starts from tainted pointer which actually
holds error code value.
Fix this bug by decreasing region index before delete.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 4dcd183fbd ("net: wwan: iosm: devlink registration")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240604082500.20769-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 10:15:14 +02:00
Ian Forbes
5703fc058e drm/vmwgfx: Don't memcmp equivalent pointers
These pointers are frequently the same and memcmp does not compare the
pointers before comparing their contents so this was wasting cycles
comparing 16 KiB of memory which will always be equal.

Fixes: bb6780aa5a ("drm/vmwgfx: Diff cursors when using cmds")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328190716.27367-1-ian.forbes@broadcom.com
2024-06-05 22:38:40 -04:00
Jakub Kicinski
5899c88513 Merge branch 'intel-wired-lan-driver-updates-2024-05-29-ice-igc'
Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2024-05-29 (ice, igc)

This series includes fixes for the ice driver as well as a fix for the igc
driver.

Jacob fixes two issues in the ice driver with reading the NVM for providing
firmware data via devlink info. First, fix an off-by-one error when reading
the Preserved Fields Area, resolving an infinite loop triggered on some
NVMs which lack certain data in the NVM. Second, fix the reading of the NVM
Shadow RAM on newer E830 and E825-C devices which have a variable sized CSS
header rather than assuming this header is always the same fixed size as in
the E810 devices.

Larysa fixes three issues with the ice driver XDP logic that could occur if
the number of queues is changed after enabling an XDP program. First, the
af_xdp_zc_qps bitmap is removed and replaced by simpler logic to track
whether queues are in zero-copy mode. Second, the reset and .ndo_bpf flows
are distinguished to avoid potential races with a PF reset occuring
simultaneously to .ndo_bpf callback from userspace. Third, the logic for
mapping XDP queues to vectors is fixed so that XDP state is restored for
XDP queues after a reconfiguration.

Sasha fixes reporting of Energy Efficient Ethernet support via ethtool in
the igc driver.

v1: https://lore.kernel.org/r/20240530-net-2024-05-30-intel-net-fixes-v1-0-8b11c8c9bff8@intel.com
====================

Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-0-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:28:06 -07:00
Sasha Neftin
7d67d11fbe igc: Fix Energy Efficient Ethernet support declaration
The commit 01cf893bf0 ("net: intel: i40e/igc: Remove setting Autoneg in
EEE capabilities") removed SUPPORTED_Autoneg field but left inappropriate
ethtool_keee structure initialization. When "ethtool --show <device>"
(get_eee) invoke, the 'ethtool_keee' structure was accidentally overridden.
Remove the 'ethtool_keee' overriding and add EEE declaration as per IEEE
specification that allows reporting Energy Efficient Ethernet capabilities.

Examples:
Before fix:
ethtool --show-eee enp174s0
EEE settings for enp174s0:
	EEE status: not supported

After fix:
EEE settings for enp174s0:
	EEE status: disabled
	Tx LPI: disabled
	Supported EEE link modes:  100baseT/Full
	                           1000baseT/Full
	                           2500baseT/Full

Fixes: 01cf893bf0 ("net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities")
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-6-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Larysa Zaremba
f3df404425 ice: map XDP queues to vectors in ice_vsi_map_rings_to_vectors()
ice_pf_dcb_recfg() re-maps queues to vectors with
ice_vsi_map_rings_to_vectors(), which does not restore the previous
state for XDP queues. This leads to no AF_XDP traffic after rebuild.

Map XDP queues to vectors in ice_vsi_map_rings_to_vectors().
Also, move the code around, so XDP queues are mapped independently only
through .ndo_bpf().

Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Larysa Zaremba
744d197162 ice: add flag to distinguish reset from .ndo_bpf in XDP rings config
Commit 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in
the rebuild process. The behaviour of the XDP rings config functions is
context-dependent, so the change of order has led to
ice_destroy_xdp_rings() doing additional work and removing XDP prog, when
it was supposed to be preserved.

Also, dependency on the PF state reset flags creates an additional,
fortunately less common problem:

* PFR is requested e.g. by tx_timeout handler
* .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(),
  but reset flag is set, so rings are destroyed without deleting the
  program
* ice_vsi_rebuild tries to delete non-existent XDP rings, because the
  program is still on the VSI
* system crashes

With a similar race, when requested to attach a program,
ice_prepare_xdp_rings() can actually skip setting the program in the VSI
and nevertheless report success.

Instead of reverting to the old order of function calls, add an enum
argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in
order to distinguish between calls from rebuild and .ndo_bpf().

Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Larysa Zaremba
adbf5a4234 ice: remove af_xdp_zc_qps bitmap
Referenced commit has introduced a bitmap to distinguish between ZC and
copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this
for us.

The bitmap would be especially useful when restoring previous state after
rebuild, if only it was not reallocated in the process. This leads to e.g.
xdpsock dying after changing number of queues.

Instead of preserving the bitmap during the rebuild, remove it completely
and distinguish between ZC and copy-mode queues based on the presence of
a device associated with the pool.

Fixes: e102db780e ("ice: track AF_XDP ZC enabled queues in bitmap")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Jacob Keller
cfa747a66e ice: fix reads from NVM Shadow RAM on E830 and E825-C devices
The ice driver reads data from the Shadow RAM portion of the NVM during
initialization, including data used to identify the NVM image and device,
such as the ETRACK ID used to populate devlink dev info fw.bundle.

Currently it is using a fixed offset defined by ICE_CSS_HEADER_LENGTH to
compute the appropriate offset. This worked fine for E810 and E822 devices
which both have CSS header length of 330 words.

Other devices, including both E825-C and E830 devices have different sizes
for their CSS header. The use of a hard coded value results in the driver
reading from the wrong block in the NVM when attempting to access the
Shadow RAM copy. This results in the driver reporting the fw.bundle as 0x0
in both the devlink dev info and ethtool -i output.

The first E830 support was introduced by commit ba20ecb1d1 ("ice: Hook up
4 E830 devices by adding their IDs") and the first E825-C support was
introducted by commit f64e189442 ("ice: introduce new E825C devices
family")

The NVM actually contains the CSS header length embedded in it. Remove the
hard coded value and replace it with logic to read the length from the NVM
directly. This is more resilient against all existing and future hardware,
vs looking up the expected values from a table. It ensures the driver will
read from the appropriate place when determining the ETRACK ID value used
for populating the fw.bundle_id and for reporting in ethtool -i.

The CSS header length for both the active and inactive flash bank is stored
in the ice_bank_info structure to avoid unnecessary duplicate work when
accessing multiple words of the Shadow RAM. Both banks are read in the
unlikely event that the header length is different for the NVM in the
inactive bank, rather than being different only by the overall device
family.

Fixes: ba20ecb1d1 ("ice: Hook up 4 E830 devices by adding their IDs")
Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-2-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:55 -07:00
Jacob Keller
03e4a092be ice: fix iteration of TLVs in Preserved Fields Area
The ice_get_pfa_module_tlv() function iterates over the Type-Length-Value
structures in the Preserved Fields Area (PFA) of the NVM. This is used by
the driver to access data such as the Part Board Assembly identifier.

The function uses simple logic to iterate over the PFA. First, the pointer
to the PFA in the NVM is read. Then the total length of the PFA is read
from the first word.

A pointer to the first TLV is initialized, and a simple loop iterates over
each TLV. The pointer is moved forward through the NVM until it exceeds the
PFA area.

The logic seems sound, but it is missing a key detail. The Preserved
Fields Area length includes one additional final word. This is documented
in the device data sheet as a dummy word which contains 0xFFFF. All NVMs
have this extra word.

If the driver tries to scan for a TLV that is not in the PFA, it will read
past the size of the PFA. It reads and interprets the last dummy word of
the PFA as a TLV with type 0xFFFF. It then reads the word following the PFA
as a length.

The PFA resides within the Shadow RAM portion of the NVM, which is
relatively small. All of its offsets are within a 16-bit size. The PFA
pointer and TLV pointer are stored by the driver as 16-bit values.

In almost all cases, the word following the PFA will be such that
interpreting it as a length will result in 16-bit arithmetic overflow. Once
overflowed, the new next_tlv value is now below the maximum offset of the
PFA. Thus, the driver will continue to iterate the data as TLVs. In the
worst case, the driver hits on a sequence of reads which loop back to
reading the same offsets in an endless loop.

To fix this, we need to correct the loop iteration check to account for
this extra word at the end of the PFA. This alone is sufficient to resolve
the known cases of this issue in the field. However, it is plausible that
an NVM could be misconfigured or have corrupt data which results in the
same kind of overflow. Protect against this by using check_add_overflow
when calculating both the maximum offset of the TLVs, and when calculating
the next_tlv offset at the end of each loop iteration. This ensures that
the driver will not get stuck in an infinite loop when scanning the PFA.

Fixes: e961b679fb ("ice: add board identifier info to devlink .info_get")
Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-1-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:55 -07:00
Dr. David Alan Gilbert
b91e05f1fc drm/vmwgfx: remove unused struct 'vmw_stdu_dma'
'vmw_stdu_dma' is unused since
commit 39985eea5a ("drm/vmwgfx: Abstract placement selection")
Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517232858.230860-1-linux@treblig.org
2024-06-05 22:20:25 -04:00
Ryusuke Konishi
7373a51e79 nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
The error handling in nilfs_empty_dir() when a directory folio/page read
fails is incorrect, as in the old ext2 implementation, and if the
folio/page cannot be read or nilfs_check_folio() fails, it will falsely
determine the directory as empty and corrupt the file system.

In addition, since nilfs_empty_dir() does not immediately return on a
failed folio/page read, but continues to loop, this can cause a long loop
with I/O if i_size of the directory's inode is also corrupted, causing the
log writer thread to wait and hang, as reported by syzbot.

Fix these issues by making nilfs_empty_dir() immediately return a false
value (0) if it fails to get a directory folio/page.

Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:27 -07:00
Suren Baghdasaryan
9415983599 mm: fix xyz_noprof functions calling profiled functions
Grepping /proc/allocinfo for "noprof" reveals several xyz_noprof
functions, which means internally they are calling profiled functions. 
This should never happen as such calls move allocation charge from a
higher level location where it should be accounted for into these lower
level helpers.  Fix this by replacing profiled function calls with noprof
ones.

Link: https://lkml.kernel.org/r/20240531205350.3973009-1-surenb@google.com
Fixes: b951aaff50 ("mm: enable page allocation tagging")
Fixes: e26d8769da ("mempool: hook up to memory allocation profiling")
Fixes: 88ae5fb755 ("mm: vmalloc: enable memory allocation profiling")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Thadeu Lima de Souza Cascardo
3f0c44c8c2 codetag: avoid race at alloc_slab_obj_exts
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 #176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/20240527183007.1595037-1-cascardo@igalia.com
Fixes: 09c46563ff ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Oscar Salvador
8daf9c702e mm/hugetlb: do not call vma_add_reservation upon ENOMEM
sysbot reported a splat [1] on __unmap_hugepage_range().  This is because
vma_needs_reservation() can return -ENOMEM if
allocate_file_region_entries() fails to allocate the file_region struct
for the reservation.

Check for that and do not call vma_add_reservation() if that is the case,
otherwise region_abort() and region_del() will see that we do not have any
file_regions.

If we detect that vma_needs_reservation() returned -ENOMEM, we clear the
hugetlb_restore_reserve flag as if this reservation was still consumed, so
free_huge_folio() will not increment the resv count.

[1] https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/T/#ma5983bc1ab18a54910da83416b3f89f3c7ee43aa

Link: https://lkml.kernel.org/r/20240528205323.20439-1-osalvador@suse.de
Fixes: df7a6d1f64 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-and-tested-by: syzbot+d3fe2dc5ffe9380b714b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/
Cc: Breno Leitao <leitao@debian.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Chengming Zhou
c2dc78b86e mm/ksm: fix ksm_zero_pages accounting
We normally ksm_zero_pages++ in ksmd when page is merged with zero page,
but ksm_zero_pages-- is done from page tables side, where there is no any
accessing protection of ksm_zero_pages.

So we can read very exceptional value of ksm_zero_pages in rare cases,
such as -1, which is very confusing to users.

Fix it by changing to use atomic_long_t, and the same case with the
mm->ksm_zero_pages.

Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.dev
Fixes: e2942062e0 ("ksm: count all zero pages placed by KSM")
Fixes: 6080d19f07 ("ksm: add ksm zero pages for each process")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Chengming Zhou
730cdc2c72 mm/ksm: fix ksm_pages_scanned accounting
Patch series "mm/ksm: fix some accounting problems", v3.

We encountered some abnormal ksm_pages_scanned and ksm_zero_pages during
some random tests.

1. ksm_pages_scanned unchanged even ksmd scanning has progress.
2. ksm_zero_pages maybe -1 in some rare cases.


This patch (of 2):

During testing, I found ksm_pages_scanned is unchanged although the
scan_get_next_rmap_item() did return valid rmap_item that is not NULL.

The reason is the scan_get_next_rmap_item() will return NULL after a full
scan, so ksm_do_scan() just return without accounting of the
ksm_pages_scanned.

Fix it by just putting ksm_pages_scanned accounting in that loop, and it
will be accounted more timely if that loop would last for a long time.

Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-0-34bb358fdc13@linux.dev
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-1-34bb358fdc13@linux.dev
Fixes: b348b5fe2b ("mm/ksm: add pages scanned metric")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Alexander Potapenko
2ef3cec44c kmsan: do not wipe out origin when doing partial unpoisoning
As noticed by Brian, KMSAN should not be zeroing the origin when
unpoisoning parts of a four-byte uninitialized value, e.g.:

    char a[4];
    kmsan_unpoison_memory(a, 1);

This led to false negatives, as certain poisoned values could receive zero
origins, preventing those values from being reported.

To fix the problem, check that kmsan_internal_set_shadow_origin() writes
zero origins only to slots which have zero shadow.

Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com
Fixes: f80be4571b ("kmsan: add KMSAN runtime core")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
  Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.com/T/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Cong Wang
0105eaabb2 vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
After commit 2c9e5d4a00 ("bpf: remove CONFIG_BPF_JIT dependency on
CONFIG_MODULES of") CONFIG_BPF_JIT does not depend on CONFIG_MODULES any
more and bpf jit also uses the [MODULES_VADDR, MODULES_END] memory region.
But is_vmalloc_or_module_addr() still checks CONFIG_MODULES, which then
returns false for a bpf jit memory region when CONFIG_MODULES is not
defined.  It leads to the following kernel BUG:

[    1.567023] ------------[ cut here ]------------
[    1.567883] kernel BUG at mm/vmalloc.c:745!
[    1.568477] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[    1.569367] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0+ #448
[    1.570247] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[    1.570786] RIP: 0010:vmalloc_to_page+0x48/0x1ec
[    1.570786] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84
[    1.570786] RSP: 0018:ffff888007787960 EFLAGS: 00010212
[    1.570786] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c
[    1.570786] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8
[    1.570786] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000
[    1.570786] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8
[    1.570786] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640
[    1.570786] FS:  0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
[    1.570786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.570786] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0
[    1.570786] Call Trace:
[    1.570786]  <TASK>
[    1.570786]  ? __die_body+0x1b/0x58
[    1.570786]  ? die+0x31/0x4b
[    1.570786]  ? do_trap+0x9d/0x138
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? do_error_trap+0xcd/0x102
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? handle_invalid_op+0x2f/0x38
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? exc_invalid_op+0x2b/0x41
[    1.570786]  ? asm_exc_invalid_op+0x16/0x20
[    1.570786]  ? vmalloc_to_page+0x26/0x1ec
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  __text_poke+0xb6/0x458
[    1.570786]  ? __pfx_text_poke_memcpy+0x10/0x10
[    1.570786]  ? __pfx___mutex_lock+0x10/0x10
[    1.570786]  ? __pfx___text_poke+0x10/0x10
[    1.570786]  ? __pfx_get_random_u32+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  text_poke_copy_locked+0x70/0x84
[    1.570786]  text_poke_copy+0x32/0x4f
[    1.570786]  bpf_arch_text_copy+0xf/0x27
[    1.570786]  bpf_jit_binary_pack_finalize+0x26/0x5a
[    1.570786]  bpf_int_jit_compile+0x576/0x8ad
[    1.570786]  ? __pfx_bpf_int_jit_compile+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? __kmalloc_node_track_caller+0x2b5/0x2e0
[    1.570786]  bpf_prog_select_runtime+0x7c/0x199
[    1.570786]  bpf_prepare_filter+0x1e9/0x25b
[    1.570786]  ? __pfx_bpf_prepare_filter+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? _find_next_bit+0x29/0x7e
[    1.570786]  bpf_prog_create+0xb8/0xe0
[    1.570786]  ptp_classifier_init+0x75/0xa1
[    1.570786]  ? __pfx_ptp_classifier_init+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? register_pernet_subsys+0x36/0x42
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  sock_init+0x99/0xa3
[    1.570786]  ? __pfx_sock_init+0x10/0x10
[    1.570786]  do_one_initcall+0x104/0x2c4
[    1.570786]  ? __pfx_do_one_initcall+0x10/0x10
[    1.570786]  ? parameq+0x25/0x2d
[    1.570786]  ? rcu_is_watching+0x1c/0x3c
[    1.570786]  ? trace_kmalloc+0x81/0xb2
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? __kmalloc+0x29c/0x2c7
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  do_initcalls+0xf9/0x123
[    1.570786]  kernel_init_freeable+0x24f/0x289
[    1.570786]  ? __pfx_kernel_init+0x10/0x10
[    1.570786]  kernel_init+0x19/0x13a
[    1.570786]  ret_from_fork+0x24/0x41
[    1.570786]  ? __pfx_kernel_init+0x10/0x10
[    1.570786]  ret_from_fork_asm+0x1a/0x30
[    1.570786]  </TASK>
[    1.570819] ---[ end trace 0000000000000000 ]---
[    1.571463] RIP: 0010:vmalloc_to_page+0x48/0x1ec
[    1.572111] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84
[    1.574632] RSP: 0018:ffff888007787960 EFLAGS: 00010212
[    1.575129] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c
[    1.576097] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8
[    1.577084] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000
[    1.578077] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8
[    1.578810] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640
[    1.579823] FS:  0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
[    1.580992] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.581869] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0
[    1.582800] Kernel panic - not syncing: Fatal exception
[    1.583765] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fix this by checking CONFIG_EXECMEM instead.

Link: https://lkml.kernel.org/r/20240528160838.102223-1-xiyou.wangcong@gmail.com
Fixes: 2c9e5d4a00 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Johannes Weiner
7cc5a5d650 mm: page_alloc: fix highatomic typing in multi-block buddies
Christoph reports a page allocator splat triggered by xfstests:

generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30
XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273
XFS (nvme0n1): Ending clean mount
XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
XFS (nvme1n1): Ending clean mount
XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9
XFS (nvme1n1): Ending clean mount
------------[ cut here ]------------
page type is 3, passed migratetype is 1 (nr=512)
WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0
Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug]
CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:expand+0x1c5/0x1f0
Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3
RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000
RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009
R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000
FS:  00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x7b/0x120
 ? expand+0x1c5/0x1f0
 ? report_bug+0x191/0x1c0
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? expand+0x1c5/0x1f0
 ? expand+0x1c5/0x1f0
 __rmqueue_pcplist+0x3a9/0x730
 get_page_from_freelist+0x7a0/0xf00
 __alloc_pages_noprof+0x153/0x2e0
 __folio_alloc_noprof+0x10/0xa0
 __filemap_get_folio+0x16b/0x370
 iomap_write_begin+0x496/0x680

While trying to service a movable allocation (page type 1), the page
allocator runs into a two-pageblock buddy on the movable freelist whose
second block is typed as highatomic (page type 3).

This inconsistency is caused by the highatomic reservation system
operating on single pageblocks, while MAX_ORDER can be bigger than that -
in this configuration, pageblock_order is 9 while MAX_PAGE_ORDER is 10. 
The test case is observed to make several adjacent order-3 requests with
__GFP_DIRECT_RECLAIM cleared, which marks the surrounding block as
highatomic.  Upon freeing, the blocks merge into an order-10 buddy.  When
the highatomic pool is drained later on, this order-10 buddy gets moved
back to the movable list, but only the first pageblock is marked movable
again.  A subsequent expand() of this buddy warns about the tail being of
a different type.

This is a long-standing bug that's surfaced by the recent block type
warnings added to the allocator.  The consequences seem mostly benign, it
just results in odd behavior: the highatomic tail blocks are not properly
drained, instead they end up on the movable list first, then go back to
the highatomic list after an alloc-free cycle.

To fix this, make the highatomic reservation code aware that
allocations/buddies can be larger than a pageblock.

While it's an old quirk, the recently added type consistency warnings seem
to be the most prominent consequence of it.  Set the Fixes: tag
accordingly to highlight this backporting dependency.

Link: https://lkml.kernel.org/r/20240530114203.GA1222079@cmpxchg.org
Fixes: e0932b6c1f ("mm: page_alloc: consolidate free page accounting")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Ryusuke Konishi
a4ca369ca2 nilfs2: fix potential kernel bug due to lack of writeback flag waiting
Destructive writes to a block device on which nilfs2 is mounted can cause
a kernel bug in the folio/page writeback start routine or writeback end
routine (__folio_start_writeback in the log below):

 kernel BUG at mm/page-writeback.c:3070!
 Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 ...
 RIP: 0010:__folio_start_writeback+0xbaa/0x10e0
 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff
  e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f>
  0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00
 ...
 Call Trace:
  <TASK>
  nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2]
  nilfs_segctor_construct+0x181/0x6b0 [nilfs2]
  nilfs_segctor_thread+0x548/0x11c0 [nilfs2]
  kthread+0x2f0/0x390
  ret_from_fork+0x4b/0x80
  ret_from_fork_asm+0x1a/0x30
  </TASK>

This is because when the log writer starts a writeback for segment summary
blocks or a super root block that use the backing device's page cache, it
does not wait for the ongoing folio/page writeback, resulting in an
inconsistent writeback state.

Fix this issue by waiting for ongoing writebacks when putting
folios/pages on the backing device into writeback state.

Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3 ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Sebastian Andrzej Siewior
36eef400c2 memcg: remove the lockdep assert from __mod_objcg_mlstate()
The assert was introduced in the commit cited below as an insurance that
the semantic is the same after the local_irq_save() has been removed and
the function has been made static.

The original requirement to disable interrupt was due the modification
of per-CPU counters which require interrupts to be disabled because the
counter update operation is not atomic and some of the counters are
updated from interrupt context.

All callers of __mod_objcg_mlstate() acquire a lock
(memcg_stock.stock_lock) which disables interrupts on !PREEMPT_RT and
the lockdep assert is satisfied. On PREEMPT_RT the interrupts are not
disabled and the assert triggers.

The safety of the counter update is already ensured by
VM_WARN_ON_IRQS_ENABLED() which is part of __mod_memcg_lruvec_state() and
does not require yet another check.

Remove the lockdep assert from __mod_objcg_mlstate().

Link: https://lkml.kernel.org/r/20240528141341.rz_rytN_@linutronix.de
Fixes: 91882c1617 ("memcg: simple cleanup of stats update functions")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Barry Song
6434e69814 mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
We are passing a huge nr to __clear_young_dirty_ptes() right now.  While
we should pass the number of pages, we are actually passing CONT_PTE_SIZE.
This is causing lots of crashes of MADV_FREE, panic oops could vary
everytime.

Link: https://lkml.kernel.org/r/20240524005444.135417-1-21cnbao@gmail.com
Fixes: 89e86854fb ("mm/arm64: override clear_young_dirty_ptes() batch helper")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Lance Yang <ioworker0@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Jeff Xie <xiehuan09@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Barry Song
94d46bf179 mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
if CONFIG_SYSFS is not enabled in config, we get the below error,

All errors (new ones prefixed by >>):

   s390-linux-ld: mm/memory.o: in function `count_mthp_stat':
>> include/linux/huge_mm.h:285:(.text+0x191c): undefined reference to `mthp_stats'
   s390-linux-ld: mm/huge_memory.o:(.rodata+0x10): undefined reference to `mthp_stats'

vim +285 include/linux/huge_mm.h

   279
   280  static inline void count_mthp_stat(int order, enum mthp_stat_item item)
   281  {
   282          if (order <= 0 || order > PMD_ORDER)
   283                  return;
   284
 > 285          this_cpu_inc(mthp_stats.stats[order][item]);
   286  }
   287

Link: https://lkml.kernel.org/r/20240523210045.40444-1-21cnbao@gmail.com
Fixes: ec33687c67 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405231728.tCAogiSI-lkp@intel.com/
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Baolin Wang
0d648dd5c8 mm: drop the 'anon_' prefix for swap-out mTHP counters
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback'
are confusing with an 'anon_' prefix, since the shmem can swap out
non-anonymous pages.  So drop the 'anon_' prefix to keep consistent with
the old swap counter names.

This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the
field.

Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com
Fixes: d0f048ac39 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
Fixes: 42248b9d34 ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:23 -07:00
Ian Forbes
7ef91dcba1 drm/vmwgfx: Don't destroy Screen Target when CRTC is enabled but inactive
drm_crtc_helper_funcs::atomic_disable can be called even when the CRTC is
still enabled. This can occur when the mode changes or the CRTC is set as
inactive.

In the case where the CRTC is being set as inactive we only want to
blank the screen. The Screen Target should remain intact as long as the
mode has not changed and CRTC is enabled.

This fixes a bug with GDM where locking the screen results in a permanent
black screen because the Screen Target is no longer defined.

Fixes: 7b0062036c ("drm/vmwgfx: Implement virtual crc generation")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240531203358.26677-1-ian.forbes@broadcom.com
2024-06-05 22:11:42 -04:00
Ian Forbes
a54a200f3d drm/vmwgfx: Standardize use of kibibytes when logging
Use the same standard abbreviation KiB instead of incorrect variants.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-5-ian.forbes@broadcom.com
2024-06-05 22:10:49 -04:00
Ian Forbes
dde1de06bd drm/vmwgfx: Remove STDU logic from generic mode_valid function
STDU has its own mode_valid function now so this logic can be removed from
the generic version.

Fixes: 935f795045 ("drm/vmwgfx: Refactor drm connector probing for display modes")

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-4-ian.forbes@broadcom.com
2024-06-05 22:10:48 -04:00
Ian Forbes
fb5e19d2dd drm/vmwgfx: 3D disabled should not effect STDU memory limits
This limit became a hard cap starting with the change referenced below.
Surface creation on the device will fail if the requested size is larger
than this limit so altering the value arbitrarily will expose modes that
are too large for the device's hard limits.

Fixes: 7ebb47c9f9 ("drm/vmwgfx: Read new register for GB memory when available")

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbes@broadcom.com
2024-06-05 22:10:47 -04:00
Ian Forbes
4268269331 drm/vmwgfx: Filter modes which exceed graphics memory
SVGA requires individual surfaces to fit within graphics memory
(max_mob_pages) which means that modes with a final buffer size that would
exceed graphics memory must be pruned otherwise creation will fail.

Additionally llvmpipe requires its buffer height and width to be a multiple
of its tile size which is 64. As a result we have to anticipate that
llvmpipe will round up the mode size passed to it by the compositor when
it creates buffers and filter modes where this rounding exceeds graphics
memory.

This fixes an issue where VMs with low graphics memory (< 64MiB) configured
with high resolution mode boot to a black screen because surface creation
fails.

Fixes: d947d1b71d ("drm/vmwgfx: Add and connect connector helper function")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-2-ian.forbes@broadcom.com
2024-06-05 22:10:46 -04:00
Jakub Kicinski
886bf9172d Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2024-06-05

We've added 8 non-merge commits during the last 6 day(s) which contain
a total of 9 files changed, 34 insertions(+), 35 deletions(-).

The main changes are:

1) Fix a potential use-after-free in bpf_link_free when the link uses
   dealloc_deferred to free the link object but later still tests for
   presence of link->ops->dealloc, from Cong Wang.

2) Fix BPF test infra to set the run context for rawtp test_run callback
   where syzbot reported a crash, from Jiri Olsa.

3) Fix bpf_session_cookie BTF_ID in the special_kfunc_set list to exclude
   it for the case of !CONFIG_FPROBE, also from Jiri Olsa.

4) Fix a Coverity static analysis report to not close() a link_fd of -1
   in the multi-uprobe feature detector, from Andrii Nakryiko.

5) Revert support for redirect to any xsk socket bound to the same umem
   as it can result in corrupted ring state which can lead to a crash when
   flushing rings. A different approach will be pursued for bpf-next to
   address it safely, from Magnus Karlsson.

6) Fix inet_csk_accept prototype in test_sk_storage_tracing.c which caused
   BPF CI failure after the last tree fast forwarding, from Andrii Nakryiko.

7) Fix a coccicheck warning in BPF devmap that iterator variable cannot
   be NULL, from Thorsten Blum.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  Revert "xsk: Document ability to redirect to any socket bound to the same umem"
  Revert "xsk: Support redirect to any socket bound to the same umem"
  bpf: Set run context for rawtp test_run callback
  bpf: Fix a potential use-after-free in bpf_link_free()
  bpf, devmap: Remove unnecessary if check in for loop
  libbpf: don't close(-1) in multi-uprobe feature detector
  bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list
  selftests/bpf: fix inet_csk_accept prototype in test_sk_storage_tracing.c
====================

Link: https://lore.kernel.org/r/20240605091525.22628-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:03:08 -07:00
Dave Airlie
1cfa043fc0 Merge tag 'drm-xe-fixes-2024-06-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- drm/xe/pf: Update the LMTT when freeing VF GT config

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zl8uFrQp0YjTtX4p@fedora
2024-06-06 11:38:51 +10:00
Breno Leitao
4254dfeda8 scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory
There is a potential out-of-bounds access when using test_bit() on a single
word. The test_bit() and set_bit() functions operate on long values, and
when testing or setting a single word, they can exceed the word
boundary. KASAN detects this issue and produces a dump:

	 BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas

	 Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965

For full log, please look at [1].

Make the allocation at least the size of sizeof(unsigned long) so that
set_bit() and test_bit() have sufficient room for read/write operations
without overwriting unallocated memory.

[1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/

Fixes: c696f7b83e ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path")
Cc: stable@vger.kernel.org
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240605085530.499432-1-leitao@debian.org
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-05 20:15:41 -04:00
Martin K. Petersen
7926d51f73 scsi: sd: Use READ(16) when reading block zero on large capacity disks
Commit 321da3dc1f ("scsi: sd: usb_storage: uas: Access media prior
to querying device properties") triggered a read to LBA 0 before
attempting to inquire about device characteristics. This was done
because some protocol bridge devices will return generic values until
an attached storage device's media has been accessed.

Pierre Tomon reported that this change caused problems on a large
capacity external drive connected via a bridge device. The bridge in
question does not appear to implement the READ(10) command.

Issue a READ(16) instead of READ(10) when a device has been identified
as preferring 16-byte commands (use_16_for_rw heuristic).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218890
Link: https://lore.kernel.org/r/70dd7ae0-b6b1-48e1-bb59-53b7c7f18274@rowland.harvard.edu
Link: https://lore.kernel.org/r/20240605022521.3960956-1-martin.petersen@oracle.com
Fixes: 321da3dc1f ("scsi: sd: usb_storage: uas: Access media prior to querying device properties")
Cc: stable@vger.kernel.org
Reported-by: Pierre Tomon <pierretom+12@ik.me>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Pierre Tomon <pierretom+12@ik.me>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-05 20:08:06 -04:00
Karol Kolacinski
323a359f9b ptp: Fix error message on failed pin verification
On failed verification of PTP clock pin, error message prints channel
number instead of pin index after "pin", which is incorrect.

Fix error message by adding channel number to the message and printing
pin number instead of channel number.

Fixes: 6092315dfd ("ptp: introduce programmable pins.")
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 16:01:16 -07:00
Eric Dumazet
f921a58ae2 net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP
If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided,
taprio_parse_mqprio_opt() must validate it, or userspace
can inject arbitrary data to the kernel, the second time
taprio_change() is called.

First call (with valid attributes) sets dev->num_tc
to a non zero value.

Second call (with arbitrary mqprio attributes)
returns early from taprio_parse_mqprio_opt()
and bad things can happen.

Fixes: a3d43c0d56 ("taprio: Add support adding an admin schedule")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 15:54:51 -07:00
Linus Torvalds
2df0193e62 Merge tag 'thermal-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
 "Fix issues related to the handling of invalid trip points in the
  thermal core and in the thermal debug code that have been overlooked
  by some recent thermal control core changes"

* tag 'thermal-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid
  thermal: core: Introduce thermal_trip_crossed()
  thermal/debugfs: Allow tze_seq_show() to print statistics for invalid trips
  thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()
2024-06-05 15:28:20 -07:00
Linus Torvalds
553352597d Merge tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix the ACPI EC and AC drivers, the ACPI APEI error injection
  driver and build issues related to the dev_is_pnp() macro referring to
  pnp_bus_type that is not exported to modules.

  Specifics:

   - Fix error handling during EC operation region accesses in the ACPI
     EC driver (Armin Wolf)

   - Fix a memory leak in the APEI error injection driver introduced
     during its converion to a platform driver (Dan Williams)

   - Fix build failures related to the dev_is_pnp() macro by redefining
     it as a proper function and exporting it to modules as appropriate
     and unexport pnp_bus_type which need not be exported any more (Andy
     Shevchenko)

   - Update the ACPI AC driver to use power_supply_changed() to let the
     power supply core handle configuration changes properly (Thomas
     Weißschuh)"

* tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: AC: Properly notify powermanagement core about changes
  PNP: Hide pnp_bus_type from the non-PNP code
  PNP: Make dev_is_pnp() to be a function and export it for modules
  ACPI: EC: Avoid returning AE_OK on errors in address space handler
  ACPI: EC: Abort address space access upon error
  ACPI: APEI: EINJ: Fix einj_dev release leak
2024-06-05 15:19:15 -07:00
Linus Torvalds
64c6a36d79 Merge tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix the intel_pstate and amd-pstate cpufreq drivers and the
  cpupower utility.

  Specifics:

   - Fix a recently introduced unchecked HWP MSR access in the
     intel_pstate driver (Srinivas Pandruvada)

   - Add missing conversion from MHz to KHz to amd_pstate_set_boost() to
     address sysfs inteface inconsistency and fix P-state frequency
     reporting on AMD Family 1Ah CPUs in the cpupower utility (Dhananjay
     Ugwekar)

   - Get rid of an excess global header file used by the amd-pstate
     cpufreq driver (Arnd Bergmann)"

* tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Fix unchecked HWP MSR access
  cpufreq: amd-pstate: Fix the inconsistency in max frequency units
  cpufreq: amd-pstate: remove global header file
  tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
2024-06-05 15:12:35 -07:00
Aleksandr Mishin
229bedbf62 net/mlx5: Fix tainted pointer delete is case of flow rules creation fail
In case of flow rule creation fail in mlx5_lag_create_port_sel_table(),
instead of previously created rules, the tainted pointer is deleted
deveral times.
Fix this bug by using correct flow rules pointers.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 352899f384 ("net/mlx5: Lag, use buckets in hash mode")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 12:51:25 -07:00
Yazen Ghannam
c625dabbf1 x86/amd_nb: Check for invalid SMN reads
AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.

SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated.

However, the PCI config access may succeed, but the return value may be
invalid. This is in similar fashion to PCI bad reads, i.e. return all
bits set.

Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.

However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.

Check for this condition, clear the return value, and set a proper error
code.

Fixes: ddfe43cdc0 ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com
2024-06-05 21:23:34 +02:00
Linus Torvalds
19ca0d8a43 Merge tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
 "A fix for fast fsync that needs to handle errors during writes after
  some COW failure so it does not lead to an inconsistent state"

* tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ensure fast fsync waits for ordered extents after a write failure
2024-06-05 11:28:25 -07:00
Linus Torvalds
e20b269d73 Merge tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
 "Just a few small fixes"

* tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix trans->locked assert
  bcachefs: Rereplicate now moves data off of durability=0 devices
  bcachefs: Fix GFP_KERNEL allocation in break_cycle()
2024-06-05 11:25:41 -07:00
Jens Axboe
27d024235b Merge tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme into block-6.10
Pull NVMe fixes from Keith:

"nvme fixes Linux 6.10

 - Use reserved tags for special fabrics operations (Chunguang)
 - Persistent Reservation status masking fix (Weiwen)"

* tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme:
  nvme: fix nvme_pr_* status code parsing
  nvme-fabrics: use reserved tag for reg read/write command
2024-06-05 12:13:00 -06:00
Andreas Hindborg
c462ecd659 null_blk: fix validation of block size
Block size should be between 512 and PAGE_SIZE and be a power of 2. The current
check does not validate this, so update the check.

Without this patch, null_blk would Oops due to a null pointer deref when
loaded with bs=1536 [1].

Link: https://lore.kernel.org/all/87wmn8mocd.fsf@metaspace.dk/

Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240603192645.977968-1-nmi@metaspace.dk
[axboe: remove unnecessary braces and != 0 check]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-05 12:12:54 -06:00
Tasos Sahanidis
c6c4dd5401 drm/amdgpu/pptable: Fix UBSAN array-index-out-of-bounds
Flexible arrays used [1] instead of []. Replace the former with the latter
to resolve multiple UBSAN warnings observed on boot with a BONAIRE card.

In addition, use the __counted_by attribute where possible to hint the
length of the arrays to the compiler and any sanitizers.

Signed-off-by: Tasos Sahanidis <tasos@tasossah.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 13:43:34 -04:00
Mario Limonciello
267cace556 drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platforms
commit cd94d1b182 ("dm/amd/pm: Fix problems with reboot/shutdown for
some SMU 13.0.4/13.0.11 users") attempted to fix shutdown issues
that were reported since commit 31729e8c21 ("drm/amd/pm: fixes a
random hang in S4 for SMU v13.0.4/11") but caused issues for some
people.

Adjust the workaround flow to properly only apply in the S4 case:
-> For shutdown go through SMU_MSG_PrepareMp1ForUnload
-> For S4 go through SMU_MSG_GfxDeviceDriverReset and
   SMU_MSG_PrepareMp1ForUnload

Reported-and-tested-by: lectrode <electrodexsnet@gmail.com>
Closes: https://github.com/void-linux/void-packages/issues/50417
Cc: stable@vger.kernel.org
Fixes: cd94d1b182 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users")
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 13:41:56 -04:00
Linus Torvalds
558dc49aac Merge tag 'i2c-for-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "This should have been my second pull request during the merge window
  but one dependency in the drm subsystem fell through the cracks and
  was only applied for rc2.

  Now we can finally remove I2C_CLASS_SPD"

* tag 'i2c-for-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: Remove I2C_CLASS_SPD
  i2c: synquacer: Remove a clk reference from struct synquacer_i2c
2024-06-05 10:32:20 -07:00
Linus Torvalds
208d9b65c0 Merge tag 'tpmdd-next-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
 "The bug fix for tpm_tis_core_init() is not that critical but still
  makes sense to get into release for the sake of better quality.

  I included the Intel CPU model define change mainly to help Tony just
  a bit, as for this subsystem it cannot realistically speaking cause
  any possible harm"

* tag 'tpmdd-next-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Switch to new Intel CPU model defines
  tpm_tis: Do *not* flush uninitialized work
2024-06-05 10:29:13 -07:00
Filipe Manana
fb33eb2ef0 btrfs: fix leak of qgroup extent records after transaction abort
Qgroup extent records are created when delayed ref heads are created and
then released after accounting extents at btrfs_qgroup_account_extents(),
called during the transaction commit path.

If a transaction is aborted we free the qgroup records by calling
btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
unless we don't have delayed references. We are incorrectly assuming
that no delayed references means we don't have qgroup extents records.

We can currently have no delayed references because we ran them all
during a transaction commit and the transaction was aborted after that
due to some error in the commit path.

So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
btrfs_destroy_delayed_refs() even if we don't have any delayed references.

Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
Fixes: 81f7eb00ff ("btrfs: destroy qgroup extent records on transaction abort")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05 18:06:54 +02:00
Omar Sandoval
9d274c19a7 btrfs: fix crash on racing fsync and size-extending write into prealloc
We have been seeing crashes on duplicate keys in
btrfs_set_item_key_safe():

  BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192)
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.c:2620!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
  RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs]

With the following stack trace:

  #0  btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4)
  #1  btrfs_drop_extents (fs/btrfs/file.c:411:4)
  #2  log_one_extent (fs/btrfs/tree-log.c:4732:9)
  #3  btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9)
  #4  btrfs_log_inode (fs/btrfs/tree-log.c:6626:9)
  #5  btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8)
  #6  btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8)
  #7  btrfs_sync_file (fs/btrfs/file.c:1933:8)
  #8  vfs_fsync_range (fs/sync.c:188:9)
  #9  vfs_fsync (fs/sync.c:202:9)
  #10 do_fsync (fs/sync.c:212:9)
  #11 __do_sys_fdatasync (fs/sync.c:225:9)
  #12 __se_sys_fdatasync (fs/sync.c:223:1)
  #13 __x64_sys_fdatasync (fs/sync.c:223:1)
  #14 do_syscall_x64 (arch/x86/entry/common.c:52:14)
  #15 do_syscall_64 (arch/x86/entry/common.c:83:7)
  #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)

So we're logging a changed extent from fsync, which is splitting an
extent in the log tree. But this split part already exists in the tree,
triggering the BUG().

This is the state of the log tree at the time of the crash, dumped with
drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py)
to get more details than btrfs_print_leaf() gives us:

  >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"])
  leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610
  leaf 33439744 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
          item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160
                  generation 7 transid 9 size 8192 nbytes 8473563889606862198
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 204 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417704.983333333 (2024-05-22 15:41:44)
                  mtime 1716417704.983333333 (2024-05-22 15:41:44)
                  otime 17592186044416.000000000 (559444-03-08 01:40:16)
          item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13
                  index 195 namelen 3 name: 193
          item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 4096 ram 12288
                  extent compression 0 (none)
          item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 4096 nr 8192
          item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096
  ...

So the real problem happened earlier: notice that items 4 (4k-12k) and 5
(8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and
item 5 starts at i_size.

Here is the state of the filesystem tree at the time of the crash:

  >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root
  >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0))
  >>> print_extent_buffer(nodes[0])
  leaf 30425088 level 0 items 184 generation 9 owner 5
  leaf 30425088 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
  	...
          item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160
                  generation 7 transid 7 size 4096 nbytes 12288
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 6 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417703.220000000 (2024-05-22 15:41:43)
                  mtime 1716417703.220000000 (2024-05-22 15:41:43)
                  otime 1716417703.220000000 (2024-05-22 15:41:43)
          item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13
                  index 195 namelen 3 name: 193
          item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 8192 ram 12288
                  extent compression 0 (none)
          item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096

Item 5 in the log tree corresponds to item 183 in the filesystem tree,
but nothing matches item 4. Furthermore, item 183 is the last item in
the leaf.

btrfs_log_prealloc_extents() is responsible for logging prealloc extents
beyond i_size. It first truncates any previously logged prealloc extents
that start beyond i_size. Then, it walks the filesystem tree and copies
the prealloc extent items to the log tree.

If it hits the end of a leaf, then it calls btrfs_next_leaf(), which
unlocks the tree and does another search. However, while the filesystem
tree is unlocked, an ordered extent completion may modify the tree. In
particular, it may insert an extent item that overlaps with an extent
item that was already copied to the log tree.

This may manifest in several ways depending on the exact scenario,
including an EEXIST error that is silently translated to a full sync,
overlapping items in the log tree, or this crash. This particular crash
is triggered by the following sequence of events:

- Initially, the file has i_size=4k, a regular extent from 0-4k, and a
  prealloc extent beyond i_size from 4k-12k. The prealloc extent item is
  the last item in its B-tree leaf.
- The file is fsync'd, which copies its inode item and both extent items
  to the log tree.
- An xattr is set on the file, which sets the
  BTRFS_INODE_COPY_EVERYTHING flag.
- The range 4k-8k in the file is written using direct I/O. i_size is
  extended to 8k, but the ordered extent is still in flight.
- The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this
  calls copy_inode_items_to_log(), which calls
  btrfs_log_prealloc_extents().
- btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the
  filesystem tree. Since it starts before i_size, it skips it. Since it
  is the last item in its B-tree leaf, it calls btrfs_next_leaf().
- btrfs_next_leaf() unlocks the path.
- The ordered extent completion runs, which converts the 4k-8k part of
  the prealloc extent to written and inserts the remaining prealloc part
  from 8k-12k.
- btrfs_next_leaf() does a search and finds the new prealloc extent
  8k-12k.
- btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into
  the log tree. Note that it overlaps with the 4k-12k prealloc extent
  that was copied to the log tree by the first fsync.
- fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k
  extent that was written.
- This tries to drop the range 4k-8k in the log tree, which requires
  adjusting the start of the 4k-12k prealloc extent in the log tree to
  8k.
- btrfs_set_item_key_safe() sees that there is already an extent
  starting at 8k in the log tree and calls BUG().

Fix this by detecting when we're about to insert an overlapping file
extent item in the log tree and truncating the part that would overlap.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05 18:06:30 +02:00
Linus Torvalds
71d7b52cc3 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "This is dominated by a couple large series for ARM and x86
  respectively, but apart from that things are calm.

  ARM:

   - Large set of FP/SVE fixes for pKVM, addressing the fallout from the
     per-CPU data rework and making sure that the host is not involved
     in the FP/SVE switching any more

   - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is
     completely supported

   - Fix for the respective priorities of Failed PAC, Illegal Execution
     state and Instruction Abort exceptions

   - Fix the handling of AArch32 instruction traps failing their
     condition code, which was broken by the introduction of
     ESR_EL2.ISS2

   - Allow vcpus running in AArch32 state to be restored in System mode

   - Fix AArch32 GPR restore that would lose the 64 bit state under some
     conditions

  RISC-V:

   - No need to use mask when hart-index-bits is 0

   - Fix incorrect reg_subtype labels in
     kvm_riscv_vcpu_set_reg_isa_ext()

  x86:

   - Fixes and debugging help for the #VE sanity check.

     Also disable it by default, even for CONFIG_DEBUG_KERNEL, because
     it was found to trigger spuriously (most likely a processor erratum
     as the exact symptoms vary by generation).

   - Avoid WARN() when two NMIs arrive simultaneously during an
     NMI-disabled situation (GIF=0 or interrupt shadow) when the
     processor supports virtual NMI.

     While generally KVM will not request an NMI window when virtual
     NMIs are supported, in this case it *does* have to single-step over
     the interrupt shadow or enable the STGI intercept, in order to
     deliver the latched second NMI.

   - Drop support for hand tuning APIC timer advancement from userspace.

     Since we have adaptive tuning, and it has proved to work well, drop
     the module parameter for manual configuration and with it a few
     stupid bugs that it had"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (32 commits)
  KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr
  KVM: arm64: Ensure that SME controls are disabled in protected mode
  KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format
  KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM
  KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM
  KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM
  KVM: arm64: Specialize handling of host fpsimd state on trap
  KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper
  KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state
  KVM: arm64: Reintroduce __sve_save_state
  KVM: x86: Drop support for hand tuning APIC timer advancement from userspace
  KVM: SEV-ES: Delegate LBR virtualization to the processor
  KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent
  KVM: SEV-ES: Prevent MSR access post VMSA encryption
  RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function
  RISC-V: KVM: No need to use mask when hart-index-bit is 0
  KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor
  KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx
  KVM: arm64: AArch32: Fix spurious trapping of conditional instructions
  KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
  ...
2024-06-05 08:43:41 -07:00
Ritesh Harjani (IBM)
f5ceb1bbc9 iomap: Fix iomap_adjust_read_range for plen calculation
If the extent spans the block that contains i_size, we need to handle
both halves separately so that we properly zero data in the page cache
for blocks that are entirely outside of i_size. But this is needed only
when i_size is within the current folio under processing.
"orig_pos + length > isize" can be true for all folios if the mapped
extent length is greater than the folio size. That is making plen to
break for every folio instead of only the last folio.

So use orig_plen for checking if "orig_pos + orig_plen > isize".

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/a32e5f9a4fcfdb99077300c4020ed7ae61d6e0f9.1715067055.git.ritesh.list@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-05 17:27:03 +02:00
Zhang Yi
0841ea4a3b iomap: keep on increasing i_size in iomap_write_end()
Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")' breaks xfs with realtime device on generic/561, the problem
is when unaligned truncate down a xfs realtime inode with rtextsize > 1
fs block, xfs only zero out the EOF block but doesn't zero out the tail
blocks that aligned to rtextsize, so if we don't increase i_size in
iomap_write_end(), it could expose stale data after we do an append
write beyond the aligned EOF block.

xfs should zero out the tail blocks when truncate down, but before we
finish that, let's fix the issue by just revert the changes in
iomap_write_end().

Fixes: 943bc0882c ("iomap: don't increase i_size if it's not a write operation")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240603112222.2109341-1-yi.zhang@huaweicloud.com
Tested-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-05 17:23:39 +02:00
Rafael J. Wysocki
9b7e7ff0fe Merge branch 'pm-cpufreq'
Merge cpufreq fixes for 6.10-rc3:

 - Fix a recently introduced unchecked HWP MSR access in the
   intel_pstate driver (Srinivas Pandruvada).

 - Add missing conversion from MHz to KHz to amd_pstate_set_boost()
   to address sysfs inteface inconsistency (Dhananjay Ugwekar).

 - Get rid of an excess global header file used by the amd-pstate
   cpufreq driver (Arnd Bergmann).

* pm-cpufreq:
  cpufreq: intel_pstate: Fix unchecked HWP MSR access
  cpufreq: amd-pstate: Fix the inconsistency in max frequency units
  cpufreq: amd-pstate: remove global header file
2024-06-05 17:11:47 +02:00
David Hildenbrand
01c51a32dc KVM: s390x: selftests: Add shared zeropage test
Let's test that we can have shared zeropages in our process as long as
storage keys are not getting used, that shared zeropages are properly
unshared (replaced by anonymous pages) once storage keys are enabled,
and that no new shared zeropages are populated after storage keys
were enabled.

We require the new pagemap interface to detect the shared zeropage.

On an old kernel (zeropages always disabled):
	# ./s390x/shared_zeropage_test
	TAP version 13
	1..3
	not ok 1 Shared zeropages should be enabled
	ok 2 Shared zeropage should be gone
	ok 3 Shared zeropages should be disabled
	# Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0

On a fixed kernel:
	# ./s390x/shared_zeropage_test
	TAP version 13
	1..3
	ok 1 Shared zeropages should be enabled
	ok 2 Shared zeropage should be gone
	ok 3 Shared zeropages should be disabled
	# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Testing of UFFDIO_ZEROPAGE can be added later.

[ agordeev: Fixed checkpatch complaint, added ucall_common.h include ]

Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20240412084329.30315-1-david@redhat.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05 17:03:24 +02:00
Alexander Gordeev
d38e48563c s390/crash: Do not use VM info if os_info does not have it
The virtual memory information stored in os_info area is
required for creation of the kernel image PT_LOAD program
header for kernels since commit a2ec5bec56dd ("s390/mm:
uncouple physical vs virtual address spaces").

By contrast, if such information in os_info is absent the
PT_LOAD program header should not be created.

Currently the proper PT_LOAD program header is created for
kernels that contain the virtual memory information, but
for kernels without one an invalid header of zero size is
created. That in turn leads to stand-alone dump failures.

Use OS_INFO_KASLR_OFFSET variable to check whether os_info
is present or not (same as crash and makedumpfile tools do)
and based on that create or do not create the kernel image
PT_LOAD program header.

Fixes: f4cac27dc0 ("s390/crash: Use old os_info to create PT_LOAD headers")
Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05 17:03:24 +02:00
Rafael J. Wysocki
1bfc0835d4 Merge branches 'acpi-ec', 'acpi-apei' and 'pnp'
Merge ACPI EC driver fixes, an ACPI APEI fix and PNP fixes for
6.10-rc3:

 - Fix error handling during EC operation region accesses in the ACPI EC
   driver (Armin Wolf).

 - Fix a memory leak in the APEI error injection driver introduced
   during its converion to a platform driver (Dan Williams).

 - Fix build failures related to the dev_is_pnp() macro by redefining it
   as a proper function and exporting it to modules as appropriate and
   unexport pnp_bus_type which need not be exported any more (Andy
   Shevchenko).

* acpi-ec:
  ACPI: EC: Avoid returning AE_OK on errors in address space handler
  ACPI: EC: Abort address space access upon error

* acpi-apei:
  ACPI: APEI: EINJ: Fix einj_dev release leak

* pnp:
  PNP: Hide pnp_bus_type from the non-PNP code
  PNP: Make dev_is_pnp() to be a function and export it for modules
2024-06-05 16:58:09 +02:00
Kent Overstreet
319fef29e9 bcachefs: Fix trans->locked assert
in bch2_move_data_btree, we might start with the trans unlocked from a
previous loop iteration - we need a trans_begin() before iter_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-05 10:44:08 -04:00
Kent Overstreet
fdccb24352 bcachefs: Rereplicate now moves data off of durability=0 devices
This fixes an issue where setting a device to durability=0 after it's
been used makes it impossible to remove.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-05 10:44:08 -04:00
Kent Overstreet
9a64e1bfd8 bcachefs: Fix GFP_KERNEL allocation in break_cycle()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-05 10:44:08 -04:00
Namhyung Kim
ca9680821d perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build
Ingo reported that he was seeing these when hitting Control+C during a
perf tools build:

  Makefile.perf:1149: *** Missing bpftool input for generating vmlinux.h. Stop.

The failure happens when you don't have vmlinux.h or vmlinux with BTF.

ifeq ($(VMLINUX_H),)
  ifeq ($(VMLINUX_BTF),)
    $(error Missing bpftool input for generating vmlinux.h)
  endif
endif

VMLINUX_BTF can be empty if you didn't build a kernel or it doesn't have
a BTF section and the current kernel also has no BTF.  This is totally
ok.

But VMLINUX_H should be set to the minimal version in the source tree
(unless you overwrite it manually) when you don't pass GEN_VMLINUX_H=1
(which requires VMLINUX_BTF should not be empty).  The problem is that
it's defined in Makefile.config which is not included for `make clean`.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lore.kernel.org/lkml/CAM9d7ch5HTr+k+_GpbMrX0HUo5BZ11byh1xq0Two7B7RQACuNw@mail.gmail.com
Link: http://lore.kernel.org/lkml/ZjssGrj+abyC6mYP@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-05 11:33:00 -03:00
Arnaldo Carvalho de Melo
5b3cde1988 Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event"
This reverts commit 7d1405c71d.

This causes segfaults in some cases, as reported by Milian:

  ```
  sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e
  raw_syscalls:sys_enter ls
  ...
  [ perf record: Woken up 3 times to write data ]
  malloc(): invalid next size (unsorted)
  Aborted
  ```

  Backtrace with GDB + debuginfod:

  ```
  malloc(): invalid next size (unsorted)

  Thread 1 "perf" received signal SIGABRT, Aborted.
  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
  no_tid=no_tid@entry=0) at pthread_kill.c:44
  Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
  44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO
  (ret) : 0;
  (gdb) bt
  #0  __pthread_kill_implementation (threadid=<optimized out>,
  signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  #1  0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>,
  signo=6) at pthread_kill.c:78
  #2  0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/
  raise.c:26
  #3  0x00007ffff6e384c3 in __GI_abort () at abort.c:79
  #4  0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea
  "%s\n") at ../sysdeps/posix/libc_fatal.c:132
  #5  0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850
  "malloc(): invalid next size (unsorted)") at malloc.c:5772
  #6  0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0
  <main_arena>, bytes=bytes@entry=368) at malloc.c:4081
  #7  0x00007ffff6eb877e in __libc_calloc (n=<optimized out>,
  elem_size=<optimized out>) at malloc.c:3754
  #8  0x000055555569bdb6 in perf_session.do_write_header ()
  #9  0x00005555555a373a in __cmd_record.constprop.0 ()
  #10 0x00005555555a6846 in cmd_record ()
  #11 0x000055555564db7f in run_builtin ()
  #12 0x000055555558ed77 in main ()
  ```

  Valgrind memcheck:
  ```
  ==45136== Invalid write of size 8
  ==45136==    at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf)
  ==45136==    by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf)
  ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
  ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
  ==45136==    by 0x142D76: main (in /usr/bin/perf)
  ==45136==  Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
  ==45136==    at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
  ==45136==    by 0x3574AB: zalloc (in /usr/bin/perf)
  ==45136==    by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
  ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
  ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
  ==45136==    by 0x142D76: main (in /usr/bin/perf)
  ==45136==
  ==45136== Syscall param write(buf) points to unaddressable byte(s)
  ==45136==    at 0x575953D: __libc_write (write.c:26)
  ==45136==    by 0x575953D: write (write.c:24)
  ==45136==    by 0x35761F: ion (in /usr/bin/perf)
  ==45136==    by 0x357778: writen (in /usr/bin/perf)
  ==45136==    by 0x1548F7: record__write (in /usr/bin/perf)
  ==45136==    by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf)
  ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
  ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
  ==45136==    by 0x142D76: main (in /usr/bin/perf)
  ==45136==  Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
  ==45136==    at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
  ==45136==    by 0x3574AB: zalloc (in /usr/bin/perf)
  ==45136==    by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
  ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
  ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
  ==45136==    by 0x142D76: main (in /usr/bin/perf)
  ==45136==
 -----

Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/
Reported-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@kernel.org # 6.8+
Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-05 11:12:36 -03:00
Carlos Llamas
f92a59f6d1 locking/atomic: scripts: fix ${atomic}_sub_and_test() kerneldoc
For ${atomic}_sub_and_test() the @i parameter is the value to subtract,
not add. Fix the typo in the kerneldoc template and generate the headers
with this update.

Fixes: ad8110706f ("locking/atomic: scripts: generate kerneldoc comments")
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240515133844.3502360-1-cmllamas@google.com
2024-06-05 15:52:34 +02:00
Haifeng Xu
74751ef5c1 perf/core: Fix missing wakeup when waiting for context reference
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:

[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae

The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.

Thread A					Thread B
...						...
perf_event_free_task				perf_event_release_kernel
						   ...
						   acquire event->child_mutex
						   ...
						   get_ctx
   ...						   release event->child_mutex
   acquire ctx->mutex
   ...
   perf_free_event (acquire/release event->child_mutex)
   ...
   release ctx->mutex
   wait_var_event
						   acquire ctx->mutex
						   acquire event->child_mutex
						   # move existing events to free_list
						   release event->child_mutex
						   release ctx->mutex
						   put_ctx
...						...

In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.

Fixes: 1cf8dfe8a6 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com
2024-06-05 15:52:33 +02:00
David S. Miller
f8f0de9d58 Merge branch 'mlx5-fixes'
Tariq Toukan says:

====================
mlx5 core fixes 20240603

This small patchset provides two bug fixes from the team to the mlx5 core driver.

Series generated against:
commit 33700a0c9b ("net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 14:07:17 +01:00
Shay Drory
c8b3f38d2d net/mlx5: Always stop health timer during driver removal
Currently, if teardown_hca fails to execute during driver removal, mlx5
does not stop the health timer. Afterwards, mlx5 continue with driver
teardown. This may lead to a UAF bug, which results in page fault
Oops[1], since the health timer invokes after resources were freed.

Hence, stop the health monitor even if teardown_hca fails.

[1]
mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: cleanup
mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource
mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup
BUG: unable to handle page fault for address: ffffa26487064230
PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE     -------  ---  6.7.0-68.fc38.x86_64 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
RIP: 0010:ioread32be+0x34/0x60
RSP: 0018:ffffa26480003e58 EFLAGS: 00010292
RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0
RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230
RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8
R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0
R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0
FS:  0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 ? __die+0x23/0x70
 ? page_fault_oops+0x171/0x4e0
 ? exc_page_fault+0x175/0x180
 ? asm_exc_page_fault+0x26/0x30
 ? __pfx_poll_health+0x10/0x10 [mlx5_core]
 ? __pfx_poll_health+0x10/0x10 [mlx5_core]
 ? ioread32be+0x34/0x60
 mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core]
 ? __pfx_poll_health+0x10/0x10 [mlx5_core]
 poll_health+0x42/0x230 [mlx5_core]
 ? __next_timer_interrupt+0xbc/0x110
 ? __pfx_poll_health+0x10/0x10 [mlx5_core]
 call_timer_fn+0x21/0x130
 ? __pfx_poll_health+0x10/0x10 [mlx5_core]
 __run_timers+0x222/0x2c0
 run_timer_softirq+0x1d/0x40
 __do_softirq+0xc9/0x2c8
 __irq_exit_rcu+0xa6/0xc0
 sysvec_apic_timer_interrupt+0x72/0x90
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:cpuidle_enter_state+0xcc/0x440
 ? cpuidle_enter_state+0xbd/0x440
 cpuidle_enter+0x2d/0x40
 do_idle+0x20d/0x270
 cpu_startup_entry+0x2a/0x30
 rest_init+0xd0/0xd0
 arch_call_rest_init+0xe/0x30
 start_kernel+0x709/0xa90
 x86_64_start_reservations+0x18/0x30
 x86_64_start_kernel+0x96/0xa0
 secondary_startup_64_no_verify+0x18f/0x19b
---[ end trace 0000000000000000 ]---

Fixes: 9b98d395b8 ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 14:07:16 +01:00
Moshe Shemesh
33afbfcc10 net/mlx5: Stop waiting for PCI if pci channel is offline
In case pci channel becomes offline the driver should not wait for PCI
reads during health dump and recovery flow. The driver has timeout for
each of these loops trying to read PCI, so it would fail anyway.
However, in case of recovery waiting till timeout may cause the pci
error_detected() callback fail to meet pci_dpc_recovered() wait timeout.

Fixes: b3bd076f75 ("net/mlx5: Report devlink health on FW fatal issues")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 14:07:16 +01:00
Frank Wunderlich
c57e558194 net: ethernet: mtk_eth_soc: handle dma buffer size soc specific
The mainline MTK ethernet driver suffers long time from rarly but
annoying tx queue timeouts. We think that this is caused by fixed
dma sizes hardcoded for all SoCs.

We suspect this problem arises from a low level of free TX DMADs,
the TX Ring alomost full.

The transmit timeout is caused by the Tx queue not waking up. The
Tx queue stops when the free counter is less than ring->thres, and
it will wake up once the free counter is greater than ring->thres.
If the CPU is too late to wake up the Tx queues, it may cause a
transmit timeout.
Therefore, we increased the TX and RX DMADs to improve this error
situation.

Use the dma-size implementation from SDK in a per SoC manner. In
difference to SDK we have no RSS feature yet, so all RX/TX sizes
should be raised from 512 to 2048 byte except fqdma on mt7988 to
avoid the tx timeout issue.

Fixes: 656e705243 ("net-next: mediatek: add support for MT7623 ethernet")
Suggested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 14:04:44 +01:00
Arnd Bergmann
5c40e428ae arm64/io: add constant-argument check
In some configurations __const_iowrite32_copy() does not get inlined
and gcc runs into the BUILD_BUG():

In file included from <command-line>:
In function '__const_memcpy_toio_aligned32',
    inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3,
    inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20:
include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed
  487 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |                                             ^
include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert'
  468 |                         prefix ## suffix();                             \
      |                         ^~~~~~
include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert'
  487 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |         ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
   59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
      |                     ^~~~~~~~~~~~~~~~
arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG'
  193 |                 BUILD_BUG();
      |                 ^~~~~~~~~

Move the check for constant arguments into the inline function to ensure
it is still constant if the compiler decides against inlining it, and
mark them as __always_inline to override the logic that sometimes leads
to the compiler not producing the simplified output.

Note that either the __always_inline annotation or the check for a
constant value are sufficient here, but combining the two looks cleaner
as it also avoids the macro. With clang-8 and older, the macro was still
needed, but all versions of gcc and clang can reliably perform constant
folding here.

Fixes: ead79118da ("arm64/io: Provide a WC friendly __iowriteXX_copy()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240604210006.668912-1-arnd@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-06-05 13:30:58 +01:00
Jakub Kicinski
5b4b62a169 rtnetlink: make the "split" NLM_DONE handling generic
Jaroslav reports Dell's OMSA Systems Management Data Engine
expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo()
and inet_dump_ifaddr(). We already added a similar fix previously in
commit 460b0d33cf ("inet: bring NLM_DONE out to a separate recv() again")

Instead of modifying all the dump handlers, and making them look
different than modern for_each_netdev_dump()-based dump handlers -
put the workaround in rtnetlink code. This will also help us move
the custom rtnl-locking from af_netlink in the future (in net-next).

Note that this change is not touching rtnl_dump_all(). rtnl_dump_all()
is different kettle of fish and a potential problem. We now mix families
in a single recvmsg(), but NLM_DONE is not coalesced.

Tested:

  ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \
           --dump getaddr --json '{"ifa-family": 2}'

  ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
           --dump getroute --json '{"rtm-family": 2}'

  ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \
           --dump getlink

Fixes: 3e41af9076 ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()")
Fixes: cdb2f80f1c ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 12:34:54 +01:00
David S. Miller
e137596ec1 Merge branch 'tcp-mptcp-close-wait'
Jason Xing says:

====================
tcp/mptcp: count CLOSE-WAIT for CurrEstab

Taking CLOSE-WAIT sockets into CurrEstab counters is in accordance with RFC
1213, as suggested by Eric and Neal.

v5
Link: https://lore.kernel.org/all/20240531091753.75930-1-kerneljasonxing@gmail.com/
1. add more detailed comment (Matthieu)

v4
Link: https://lore.kernel.org/all/20240530131308.59737-1-kerneljasonxing@gmail.com/
1. correct the Fixes: tag in patch [2/2]. (Eric)

Previous discussion
Link: https://lore.kernel.org/all/20240529033104.33882-1-kerneljasonxing@gmail.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 12:32:47 +01:00
Jason Xing
9633e9377e mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB
Like previous patch does in TCP, we need to adhere to RFC 1213:

  "tcpCurrEstab OBJECT-TYPE
   ...
   The number of TCP connections for which the current state
   is either ESTABLISHED or CLOSE- WAIT."

So let's consider CLOSE-WAIT sockets.

The logic of counting
When we increment the counter?
a) Only if we change the state to ESTABLISHED.

When we decrement the counter?
a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
from CLOSE-WAIT to LAST-ACK.

Fixes: d9cd27b8cd ("mptcp: add CurrEstab MIB counter support")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 12:32:47 +01:00
Jason Xing
a46d0ea5c9 tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB
According to RFC 1213, we should also take CLOSE-WAIT sockets into
consideration:

  "tcpCurrEstab OBJECT-TYPE
   ...
   The number of TCP connections for which the current state
   is either ESTABLISHED or CLOSE- WAIT."

After this, CurrEstab counter will display the total number of
ESTABLISHED and CLOSE-WAIT sockets.

The logic of counting
When we increment the counter?
a) if we change the state to ESTABLISHED.
b) if we change the state from SYN-RECEIVED to CLOSE-WAIT.

When we decrement the counter?
a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
from CLOSE-WAIT to LAST-ACK.

Please note: there are two chances that old state of socket can be changed
to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED.
So we have to take care of the former case.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 12:32:46 +01:00
Tao Su
db574f2f96 KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr
Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn().
Before checking the mismatch of private vs. shared, mmu_invalidate_seq is
saved to fault->mmu_seq, which can be used to detect an invalidation
related to the gfn occurred, i.e. KVM will not install a mapping in page
table if fault->mmu_seq != mmu_invalidate_seq.

Currently there is a second snapshot of mmu_invalidate_seq, which may not
be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute
may be changed between the two snapshots, but the gfn may be mapped in
page table without hindrance. Therefore, drop the second snapshot as it
has no obvious benefits.

Fixes: f6adeae81f ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()")
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05 06:45:06 -04:00
Paolo Bonzini
45ce0314bf Merge tag 'kvmarm-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.10, take #1

- Large set of FP/SVE fixes for pKVM, addressing the fallout
  from the per-CPU data rework and making sure that the host
  is not involved in the FP/SVE switching any more

- Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH
  is copletely supported

- Fix for the respective priorities of Failed PAC, Illegal
  Execution state and Instruction Abort exceptions

- Fix the handling of AArch32 instruction traps failing their
  condition code, which was broken by the introduction of
  ESR_EL2.ISS2

- Allow vpcus running in AArch32 state to be restored in
  System mode

- Fix AArch32 GPR restore that would lose the 64 bit state
  under some conditions
2024-06-05 06:32:18 -04:00
Wei Li
14951beaec arm64: armv8_deprecated: Fix warning in isndep cpuhp starting process
The function run_all_insn_set_hw_mode() is registered as startup callback
of 'CPUHP_AP_ARM64_ISNDEP_STARTING', it invokes set_hw_mode() methods of
all emulated instructions.

As the STARTING callbacks are not expected to fail, if one of the
set_hw_mode() fails, e.g. due to el0 mixed-endian is not supported for
'setend', it will report a warning:

```
CPU[2] cannot support the emulation of setend
CPU 2 UP state arm64/isndep:starting (136) failed (-22)
CPU2: Booted secondary processor 0x0000000002 [0x414fd0c1]
```

To fix it, add a check for INSN_UNAVAILABLE status and skip the process.

Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Huisong Li <lihuisong@huawei.com>
Link: https://lore.kernel.org/r/20240423093501.3460764-1-liwei391@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-06-05 11:16:55 +01:00
Hangbin Liu
712115a24b selftests: hsr: add missing config for CONFIG_BRIDGE
hsr_redbox.sh test need to create bridge for testing. Add the missing
config CONFIG_BRIDGE in config file.

Fixes: eafbf0574e ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 10:52:04 +01:00
Daniel Borkmann
1cd4bc987a vxlan: Fix regression when dropping packets due to invalid src addresses
Commit f58f45c1e5 ("vxlan: drop packets from invalid src-address")
has recently been added to vxlan mainly in the context of source
address snooping/learning so that when it is enabled, an entry in the
FDB is not being created for an invalid address for the corresponding
tunnel endpoint.

Before commit f58f45c1e5 vxlan was similarly behaving as geneve in
that it passed through whichever macs were set in the L2 header. It
turns out that this change in behavior breaks setups, for example,
Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
passing before the change in f58f45c1e5 for both vxlan and geneve.
After mentioned change it is only passing for geneve as in case of
vxlan packets are dropped due to vxlan_set_mac() returning false as
source and destination macs are zero which for E/W traffic via tunnel
is totally fine.

Fix it by only opting into the is_valid_ether_addr() check in
vxlan_set_mac() when in fact source address snooping/learning is
actually enabled in vxlan. This is done by moving the check into
vxlan_snoop(). With this change, the Cilium connectivity test suite
passes again for both tunnel flavors.

Fixes: f58f45c1e5 ("vxlan: drop packets from invalid src-address")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Bauer <mail@david-bauer.net>
Cc: Ido Schimmel <idosch@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Bauer <mail@david-bauer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 10:51:10 +01:00
Hangyu Hua
affc18fdc6 net: sched: sch_multiq: fix possible OOB write in multiq_tune()
q->bands will be assigned to qopt->bands to execute subsequent code logic
after kmalloc. So the old q->bands should not be used in kmalloc.
Otherwise, an out-of-bounds write will occur.

Fixes: c2999f7fb0 ("net: sched: multiq: don't call qdisc_put() while holding tree lock")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 10:50:19 +01:00
Taehee Yoo
491aee894a ionic: fix kernel panic in XDP_TX action
In the XDP_TX path, ionic driver sends a packet to the TX path with rx
page and corresponding dma address.
After tx is done, ionic_tx_clean() frees that page.
But RX ring buffer isn't reset to NULL.
So, it uses a freed page, which causes kernel panic.

BUG: unable to handle page fault for address: ffff8881576c110c
PGD 773801067 P4D 773801067 PUD 87f086067 PMD 87efca067 PTE 800ffffea893e060
Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
CPU: 1 PID: 25 Comm: ksoftirqd/1 Not tainted 6.9.0+ #11
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
RIP: 0010:bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
Code: 00 53 41 55 41 56 41 57 b8 01 00 00 00 48 8b 5f 08 4c 8b 77 00 4c 89 f7 48 83 c7 0e 48 39 d8
RSP: 0018:ffff888104e6fa28 EFLAGS: 00010283
RAX: 0000000000000002 RBX: ffff8881576c1140 RCX: 0000000000000002
RDX: ffffffffc0051f64 RSI: ffffc90002d33048 RDI: ffff8881576c110e
RBP: ffff888104e6fa88 R08: 0000000000000000 R09: ffffed1027a04a23
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881b03a21a8
R13: ffff8881589f800f R14: ffff8881576c1100 R15: 00000001576c1100
FS: 0000000000000000(0000) GS:ffff88881ae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8881576c110c CR3: 0000000767a90000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x254/0x790
? __pfx_page_fault_oops+0x10/0x10
? __pfx_is_prefetch.constprop.0+0x10/0x10
? search_bpf_extables+0x165/0x260
? fixup_exception+0x4a/0x970
? exc_page_fault+0xcb/0xe0
? asm_exc_page_fault+0x22/0x30
? 0xffffffffc0051f64
? bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
? do_raw_spin_unlock+0x54/0x220
ionic_rx_service+0x11ab/0x3010 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
? ionic_tx_clean+0x29b/0xc60 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
? __pfx_ionic_tx_clean+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
? ionic_tx_cq_service+0x25d/0xa00 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
ionic_cq_service+0x69/0x150 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
ionic_txrx_napi+0x11a/0x540 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
__napi_poll.constprop.0+0xa0/0x440
net_rx_action+0x7e7/0xc30
? __pfx_net_rx_action+0x10/0x10

Fixes: 8eeed8373e ("ionic: Add XDP_TX support")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 10:49:27 +01:00
Tristram Ha
0a8d3f2e3e net: phy: Micrel KSZ8061: fix errata solution not taking effect problem
KSZ8061 needs to write to a MMD register at driver initialization to fix
an errata.  This worked in 5.0 kernel but not in newer kernels.  The
issue is the main phylib code no longer resets PHY at the very beginning.
Calling phy resuming code later will reset the chip if it is already
powered down at the beginning.  This wipes out the MMD register write.
Solution is to implement a phy resume function for KSZ8061 to take care
of this problem.

Fixes: 232ba3a51c ("net: phy: Micrel KSZ8061: link failure after cable connect")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 10:03:15 +01:00
Wen Gu
fb0aa0781a net/smc: avoid overwriting when adjusting sock bufsizes
When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf
to sysctl_tcp_wmem[1], since this may overwrite the value set by
tcp_sndbuf_expand() in TCP connection establishment.

And the other setting sk_{snd|rcv}buf to sysctl value in
smc_adjust_sock_bufsizes() can also be omitted since the initialization
of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem
or ipv4_sysctl_tcp_{w|r}mem[1].

Fixes: 30c3c4a449 ("net/smc: Use correct buffer sizes when switching between TCP and SMC")
Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>, too.
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 09:42:57 +01:00
Subbaraya Sundeep
8b0f741094 octeontx2-af: Always allocate PF entries from low prioriy zone
PF mcam entries has to be at low priority always so that VF
can install longest prefix match rules at higher priority.
This was taken care currently but when priority allocation
wrt reference entry is requested then entries are allocated
from mid-zone instead of low priority zone. Fix this and
always allocate entries from low priority zone for PFs.

Fixes: 7df5b4b260 ("octeontx2-af: Allocate low priority entries for PF")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 09:40:02 +01:00
Ard Biesheuvel
99280413a5 efi: Add missing __nocfi annotations to runtime wrappers
The EFI runtime wrappers are a sandbox for calling into EFI runtime
services, which are invoked using indirect calls. When running with kCFI
enabled, the compiler will require the target of any indirect call to be
type annotated.

Given that the EFI runtime services prototypes and calling convention
are governed by the EFI spec, not the Linux kernel, adding such type
annotations for firmware routines is infeasible, and so the compiler
must be informed that prototype validation should be omitted.

Add the __nocfi annotation at the appropriate places in the EFI runtime
wrapper code to achieve this.

Note that this currently only affects 32-bit ARM, given that other
architectures that support both kCFI and EFI use an asm wrapper to call
EFI runtime services, and this hides the indirect call from the
compiler.

Fixes: 1a4fec49ef ("ARM: 9392/2: Support CLANG CFI")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-06-05 10:18:58 +02:00
Magnus Karlsson
03e38d315f Revert "xsk: Document ability to redirect to any socket bound to the same umem"
This reverts commit 968595a936.

Reported-by: Yuval El-Hanany <YuvalE@radware.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com
Link: https://lore.kernel.org/bpf/20240604122927.29080-3-magnus.karlsson@gmail.com
2024-06-05 09:43:05 +02:00
Magnus Karlsson
7fcf26b315 Revert "xsk: Support redirect to any socket bound to the same umem"
This reverts commit 2863d665ea.

This patch introduced a potential kernel crash when multiple napi instances
redirect to the same AF_XDP socket. By removing the queue_index check, it is
possible for multiple napi instances to access the Rx ring at the same time,
which will result in a corrupted ring state which can lead to a crash when
flushing the rings in __xsk_flush(). This can happen when the linked list of
sockets to flush gets corrupted by concurrent accesses. A quick and small fix
is not possible, so let us revert this for now.

Reported-by: Yuval El-Hanany <YuvalE@radware.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com
Link: https://lore.kernel.org/bpf/20240604122927.29080-2-magnus.karlsson@gmail.com
2024-06-05 09:42:30 +02:00
Jiri Olsa
d0d1df8ba1 bpf: Set run context for rawtp test_run callback
syzbot reported crash when rawtp program executed through the
test_run interface calls bpf_get_attach_cookie helper or any
other helper that touches task->bpf_ctx pointer.

Setting the run context (task->bpf_ctx pointer) for test_run
callback.

Fixes: 7adfc6c9b3 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
Reported-by: syzbot+3ab78ff125b7979e45f9@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9
Link: https://lore.kernel.org/bpf/20240604150024.359247-1-jolsa@kernel.org
2024-06-05 09:41:33 +02:00
Tony Luck
f071d02eca tpm: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Link: https://lore.kernel.org/all/20240520224620.9480-4-tony.luck@intel.com/
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-06-05 04:55:04 +03:00
Jan Beulich
0ea00e249c tpm_tis: Do *not* flush uninitialized work
tpm_tis_core_init() may fail before tpm_tis_probe_irq_single() is
called, in which case tpm_tis_remove() unconditionally calling
flush_work() is triggering a warning for .func still being NULL.

Cc: stable@vger.kernel.org # v6.5+
Fixes: 481c2d1462 ("tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-06-05 01:18:01 +03:00
Linus Torvalds
51214520ad Merge tag 'devicetree-fixes-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:

 - Fix regression in 'interrupt-map' handling affecting Apple M1 mini
   (at least)

 - Fix binding example warning in stm32 st,mlahb binding

 - Fix schema error in Allwinner platform binding causing lots of
   spurious warnings

 - Add missing MODULE_DESCRIPTION() to DT kunit tests

* tag 'devicetree-fixes-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: property: Fix fw_devlink handling of interrupt-map
  of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw()
  dt-bindings: arm: stm32: st,mlahb: Drop spurious "reg" property from example
  dt-bindings: arm: sunxi: Fix incorrect '-' usage
  of: of_test: add MODULE_DESCRIPTION()
2024-06-04 14:08:44 -07:00
Arnaldo Carvalho de Melo
dc6abbbde4 tools headers arm64: Sync arm64's cputype.h with the kernel sources
To get the changes in:

  0ce85db6c2 ("arm64: cputype: Add Neoverse-V3 definitions")
  02a0a04676 ("arm64: cputype: Add Cortex-X4 definitions")
  f4d9d9dcc7 ("arm64: Add Neoverse-V2 part")

That makes this perf source code to be rebuilt:

  CC      /tmp/build/perf-tools/util/arm-spe.o

The changes in the above patch add MIDR_NEOVERSE_V[23] and
MIDR_NEOVERSE_V1 is used in arm-spe.c, so probably we need to add those
and perhaps MIDR_CORTEX_X4 to that array? Or maybe we need to leave this
for later when this is all tested on those machines?

  static const struct midr_range neoverse_spe[] = {
          MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
          MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
          MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
          {},
  };

Mark Rutland recommended about arm-spe.c:

"I would not touch this for now -- someone would have to go audit the
TRMs to check that those other cores have the same encoding, and I think
it'd be better to do that as a follow-up."

That addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h

Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/Zl8cYk0Tai2fs7aM@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-04 16:46:40 -03:00
Linus Torvalds
32f88d65f0 Merge tag 'linux_kselftest-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
 "Fixes to build warnings in several tests and fixes to ftrace tests"

* tag 'linux_kselftest-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/futex: don't pass a const char* to asprintf(3)
  selftests/futex: don't redefine .PHONY targets (all, clean)
  selftests/tracing: Fix event filter test to retry up to 10 times
  selftests/futex: pass _GNU_SOURCE without a value to the compiler
  selftests/overlayfs: Fix build error on ppc64
  selftests/openat2: Fix build warnings on ppc64
  selftests: cachestat: Fix build warnings on ppc64
  tracing/selftests: Fix kprobe event name test for .isra. functions
  selftests/ftrace: Update required config
  selftests/ftrace: Fix to check required event file
  kselftest/alsa: Ensure _GNU_SOURCE is defined
2024-06-04 10:34:13 -07:00
Ard Biesheuvel
290be0a402 Merge branch 'efi/next' into efi/urgent 2024-06-04 19:31:03 +02:00
Dan Williams
c9d52fb313 PCI: Revert the cfg_access_lock lockdep mechanism
While the experiment did reveal that there are additional places that are
missing the lock during secondary bus reset, one of the places that needs
to take cfg_access_lock (pci_bus_lock()) is not prepared for lockdep
annotation.

Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is
currently dependent on the fact that the device_lock() is marked
lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that
annotation, pci_bus_lock() would need to use something like a new
pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in the
topology, and a hope that the depth of a PCI tree never exceeds the max
value for a lockdep subclass.

The alternative to ripping out the lockdep coverage would be to deploy a
dynamic lock key for every PCI device. Unfortunately, there is evidence
that increasing the number of keys that lockdep needs to track to be
per-PCI-device is prohibitively expensive for something like the
cfg_access_lock.

The main motivation for adding the annotation in the first place was to
catch unlocked secondary bus resets, not necessarily catch lock ordering
problems between cfg_access_lock and other locks. Solve that narrower
problem with follow-on patches, and just due to targeted revert for now.

Link: https://lore.kernel.org/r/171711746402.1628941.14575335981264103013.stgit@dwillia2-xfh.jf.intel.com
Fixes: 7e89efc6e9 ("PCI: Lock upstream bridge for pci_reset_function()")
Reported-by: Imre Deak <imre.deak@intel.com>
Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Kalle Valo <kvalo@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Cc: Jani Saarinen <jani.saarinen@intel.com>
2024-06-04 12:10:05 -05:00
Dirk Behme
c0a40097f0 drivers: core: synchronize really_probe() and dev_uevent()
Synchronize the dev->driver usage in really_probe() and dev_uevent().
These can run in different threads, what can result in the following
race condition for dev->driver uninitialization:

Thread #1:
==========

really_probe() {
...
probe_failed:
...
device_unbind_cleanup(dev) {
    ...
    dev->driver = NULL;   // <= Failed probe sets dev->driver to NULL
    ...
    }
...
}

Thread #2:
==========

dev_uevent() {
...
if (dev->driver)
      // If dev->driver is NULLed from really_probe() from here on,
      // after above check, the system crashes
      add_uevent_var(env, "DRIVER=%s", dev->driver->name);
...
}

really_probe() holds the lock, already. So nothing needs to be done
there. dev_uevent() is called with lock held, often, too. But not
always. What implies that we can't add any locking in dev_uevent()
itself. So fix this race by adding the lock to the non-protected
path. This is the path where above race is observed:

 dev_uevent+0x235/0x380
 uevent_show+0x10c/0x1f0  <= Add lock here
 dev_attr_show+0x3a/0xa0
 sysfs_kf_seq_show+0x17c/0x250
 kernfs_seq_show+0x7c/0x90
 seq_read_iter+0x2d7/0x940
 kernfs_fop_read_iter+0xc6/0x310
 vfs_read+0x5bc/0x6b0
 ksys_read+0xeb/0x1b0
 __x64_sys_read+0x42/0x50
 x64_sys_call+0x27ad/0x2d30
 do_syscall_64+0xcd/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Similar cases are reported by syzkaller in

https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a

But these are regarding the *initialization* of dev->driver

dev->driver = drv;

As this switches dev->driver to non-NULL these reports can be considered
to be false-positives (which should be "fixed" by this commit, as well,
though).

The same issue was reported and tried to be fixed back in 2015 in

https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@samsung.com/

already.

Fixes: 239378f16a ("Driver core: add uevent vars for devices of a class")
Cc: stable <stable@kernel.org>
Cc: syzbot+ffa8143439596313a85a@syzkaller.appspotmail.com
Cc: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Link: https://lore.kernel.org/r/20240513050634.3964461-1-dirk.behme@de.bosch.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 18:14:51 +02:00
Russell King (Oracle)
616501eccb clkdev: don't fail clkdev_alloc() if over-sized
Don't fail clkdev_alloc() if the strings are over-sized. In this case,
the entry will not match during lookup, so its useless. However, since
code fails if we return NULL leading to boot failure, return a dummy
entry with the connection and device IDs set to "bad".

Leave the warning so these problems can be found, and the useless
wasteful clkdev registrations removed.

Reported-by: Ron Economos <re@w6rz.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 8d532528ff ("clkdev: report over-sized strings when creating clkdev entries")
Closes: https://lore.kernel.org/linux-clk/7eda7621-0dde-4153-89e4-172e4c095d01@roeck-us.net.
Link: https://lore.kernel.org/r/28114882-f8d7-21bf-4536-a186e8d7a22a@w6rz.net
Tested-by: Ron Economos <re@w6rz.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-06-04 17:12:01 +01:00
Greg Kroah-Hartman
7c55b78818 jfs: xattr: fix buffer overflow for invalid xattr
When an xattr size is not what is expected, it is printed out to the
kernel log in hex format as a form of debugging.  But when that xattr
size is bigger than the expected size, printing it out can cause an
access off the end of the buffer.

Fix this all up by properly restricting the size of the debug hex dump
in the kernel log.

Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com
Cc: Dave Kleikamp <shaggy@kernel.org>
Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 18:09:03 +02:00
Yongzhi Liu
77427e3d5c misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe()
There is a memory leak (forget to free allocated buffers) in a
memory allocation failure path.

Fix it to jump to the correct error handling code.

Fixes: 393fc2f594 ("misc: microchip: pci1xxxx: load auxiliary bus driver for the PIO function in the multi-function endpoint of pci1xxxx device.")
Signed-off-by: Yongzhi Liu <hyperlyzcs@gmail.com>
Reviewed-by: Kumaravel Thiagarajan <kumaravel.thiagarajan@microchip.com>
Link: https://lore.kernel.org/r/20240523121434.21855-4-hyperlyzcs@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 18:08:39 +02:00
Yongzhi Liu
086c6cbcc5 misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe()
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), callback function
gp_auxiliary_device_release() calls ida_free() and
kfree(aux_device_wrapper) to free memory. We should't
call them again in the error handling path.

Fix this by skipping the redundant cleanup functions.

Fixes: 393fc2f594 ("misc: microchip: pci1xxxx: load auxiliary bus driver for the PIO function in the multi-function endpoint of pci1xxxx device.")
Signed-off-by: Yongzhi Liu <hyperlyzcs@gmail.com>
Link: https://lore.kernel.org/r/20240523121434.21855-3-hyperlyzcs@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 18:08:39 +02:00
Uwe Kleine-König
73fedc31fe parport: amiga: Mark driver struct with __refdata to prevent section mismatch
As described in the added code comment, a reference to .exit.text is ok
for drivers registered via module_platform_driver_probe(). Make this
explicit to prevent the following section mismatch warning

	WARNING: modpost: drivers/parport/parport_amiga: section mismatch in reference: amiga_parallel_driver+0x8 (section: .data) -> amiga_parallel_remove (section: .exit.text)

that triggers on an allmodconfig W=1 build.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20240513075206.2337310-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 18:08:31 +02:00
Hans de Goede
af076156ec mei: vsc: Fix wrong invocation of ACPI SID method
When using an initializer for a union only one of the union members
must be initialized. The initializer for the acpi_object union variable
passed as argument to the SID ACPI method was initializing both
the type and the integer members of the union.

Unfortunately rather then complaining about this gcc simply ignores
the first initializer and only used the second integer.value = 1
initializer. Leaving type set to 0 which leads to the argument being
skipped by acpi acpi_ns_evaluate() resulting in:

ACPI Warning: \_SB.PC00.SPI1.SPFD.CVFD.SID: Insufficient arguments -
Caller passed 0, method requires 1 (20240322/nsarguments-232)

Fix this by initializing only the integer struct part of the union
and initializing both members of the integer struct.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Fixes: 566f5ca976 ("mei: Add transport driver for IVSC device")
Reviewed-by: Wentong Wu <wentong.wu@intel.com>
Link: https://lore.kernel.org/r/20240603205050.505389-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 17:46:00 +02:00
Wentong Wu
9b5e045029 mei: vsc: Don't stop/restart mei device during system suspend/resume
The dynamically created mei client device (mei csi) is used as one V4L2
sub device of the whole video pipeline, and the V4L2 connection graph is
built by software node. The mei_stop() and mei_restart() will delete the
old mei csi client device and create a new mei client device, which will
cause the software node information saved in old mei csi device lost and
the whole video pipeline will be broken.

Removing mei_stop()/mei_restart() during system suspend/resume can fix
the issue above and won't impact hardware actual power saving logic.

Fixes: f6085a96c9 ("mei: vsc: Unregister interrupt handler for system suspend")
Cc: stable@vger.kernel.org # for 6.8+
Reported-by: Hao Yao <hao.yao@intel.com>
Signed-off-by: Wentong Wu <wentong.wu@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Tested-by: Jason Chen <jason.z.chen@intel.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20240527123835.522384-1-wentong.wu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 17:45:35 +02:00
Tomas Winkler
283cb234ef mei: me: release irq in mei_me_pci_resume error path
The mei_me_pci_resume doesn't release irq on the error path,
in case mei_start() fails.

Cc: <stable@kernel.org>
Fixes: 33ec082631 ("mei: revamp mei reset state machine")
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20240604090728.1027307-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 17:11:48 +02:00
Alexander Usyskin
1db5322b7e mei: demote client disconnect warning on suspend to debug
Change level for the "not connected" client message in the write
callback from error to debug.

The MEI driver currently disconnects all clients upon system suspend.
This behavior is by design and user-space applications with
open connections before the suspend are expected to handle errors upon
resume, by reopening their handles, reconnecting,
and retrying their operations.

However, the current driver implementation logs an error message every
time a write operation is attempted on a disconnected client.
Since this is a normal and expected flow after system resume
logging this as an error can be misleading.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20240530091415.725247-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 17:11:44 +02:00
Michal Wajdeczko
0698ff57bf drm/xe/pf: Update the LMTT when freeing VF GT config
The LMTT must be updated whenever we change the VF LMEM configuration.
We missed that step when freeing the whole VF GT config, which could
result in stale PTE in LMTT or LMTT PT object leaks. Fix that.

Fixes: ac6598aed1 ("drm/xe/pf: Add support to configure SR-IOV VFs")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527115408.1064-1-michal.wajdeczko@intel.com
(cherry picked from commit c063cce7df)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-06-04 16:31:24 +02:00
Jeff Johnson
9c8f05cf1d HID: logitech-hidpp: add missing MODULE_DESCRIPTION() macro
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hid/hid-logitech-hidpp.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-06-04 16:16:37 +02:00
Arnd Bergmann
9e438fe31e HID: intel-ish-hid: fix endian-conversion
The newly added file causes a ton of sparse warnings about the
incorrect use of __le32 and similar types:

drivers/hid/intel-ish-hid/ishtp/loader.h:41:23: error: invalid bitfield specifier for type restricted __le32.
drivers/hid/intel-ish-hid/ishtp/loader.h:42:27: error: invalid bitfield specifier for type restricted __le32.
drivers/hid/intel-ish-hid/ishtp/loader.h:43:24: error: invalid bitfield specifier for type restricted __le32.
drivers/hid/intel-ish-hid/ishtp/loader.h:44:24: error: invalid bitfield specifier for type restricted __le32.
drivers/hid/intel-ish-hid/ishtp/loader.h:45:22: error: invalid bitfield specifier for type restricted __le32.
drivers/hid/intel-ish-hid/ishtp/loader.c:172:33: warning: restricted __le32 degrades to integer
drivers/hid/intel-ish-hid/ishtp/loader.c:178:50: warning: incorrect type in assignment (different base types)
drivers/hid/intel-ish-hid/ishtp/loader.c:178:50:    expected restricted __le32 [usertype] length
drivers/hid/intel-ish-hid/ishtp/loader.c:178:50:    got unsigned long
drivers/hid/intel-ish-hid/ishtp/loader.c:179:50: warning: incorrect type in assignment (different base types)
drivers/hid/intel-ish-hid/ishtp/loader.c:179:50:    expected restricted __le32 [usertype] fw_off
drivers/hid/intel-ish-hid/ishtp/loader.c:179:50:    got unsigned int [usertype] offset
drivers/hid/intel-ish-hid/ishtp/loader.c:180:17: warning: cast from restricted __le32
drivers/hid/intel-ish-hid/ishtp/loader.c:183:24: warning: invalid assignment: +=
drivers/hid/intel-ish-hid/ishtp/loader.c:183:24:    left side has type unsigned int
drivers/hid/intel-ish-hid/ishtp/loader.c:183:24:    right side has type restricted __le32

Add the necessary conversions and use temporary variables where appropriate
to avoid converting back.

Fixes: 579a267e46 ("HID: intel-ish-hid: Implement loading firmware from host feature")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Zhang Lixu <lixu.zhang@intel.com>
Tested-by: Zhang Lixu <lixu.zhang@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-06-04 16:16:37 +02:00
Christophe JAILLET
655a8a7684 HID: nintendo: Fix an error handling path in nintendo_hid_probe()
joycon_leds_create() has a ida_alloc() call. So if an error occurs after
it, a corresponding ida_free() call is needed, as already done in the
.remove function.

This is not 100% perfect, because if ida_alloc() fails, then
'ctlr->player_id' will forced to be U32_MAX, and an error will be logged
when ida_free() is called.

Considering that this can't happen in real life, no special handling is
done to handle it.

Fixes: 5307de63d7 ("HID: nintendo: use ida for LED player id")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Silvan Jegen <s.jegen@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-06-04 16:16:37 +02:00
José Expósito
ce3af2ee95 HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode()
Fix a memory leak on logi_dj_recv_send_report() error path.

Fixes: 6f20d32612 ("HID: logitech-dj: Fix error handling in logi_dj_recv_switch_to_dj_mode()")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-06-04 16:16:37 +02:00
Fuad Tabba
afb91f5f8a KVM: arm64: Ensure that SME controls are disabled in protected mode
KVM (and pKVM) do not support SME guests. Therefore KVM ensures
that the host's SME state is flushed and that SME controls for
enabling access to ZA storage and for streaming are disabled.

pKVM needs to protect against a buggy/malicious host. Ensure that
it wouldn't run a guest when protected mode is enabled should any
of the SME controls be enabled.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
a69283ae1d KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format
When setting/clearing CPACR bits for EL0 and EL1, use the ELx
format of the bits, which covers both. This makes the code
clearer, and reduces the chances of accidentally missing a bit.

No functional change intended.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
1696fc2174 KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM
Now that we have introduced finalize_init_hyp_mode(), lets
consolidate the initializing of the host_data fpsimd_state and
sve state.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
b5b9955617 KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM
When running in protected mode we don't want to leak protected
guest state to the host, including whether a guest has used
fpsimd/sve. Therefore, eagerly restore the host state on guest
exit when running in protected mode, which happens only if the
guest has used fpsimd/sve.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
66d5b53e20 KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM
Protected mode needs to maintain (save/restore) the host's sve
state, rather than relying on the host kernel to do that. This is
to avoid leaking information to the host about guests and the
type of operations they are performing.

As a first step towards that, allocate memory mapped at hyp, per
cpu, for the host sve state. The following patch will use this
memory to save/restore the host state.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
e511e08a9f KVM: arm64: Specialize handling of host fpsimd state on trap
In subsequent patches, n/vhe will diverge on saving the host
fpsimd/sve state when taking a guest fpsimd/sve trap. Add a
specialized helper to handle it.

No functional change intended.

Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
6d8fb3cbf7 KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be
toggled in different parts of the code, but the exact bits and
their polarity differ between these two formats and the mode
(vhe/nvhe/hvhe).

To reduce the amount of duplicated code and the chance of getting
the wrong bit/polarity or missing a field, abstract the set/clear
of CPTR_EL2 bits behind a helper.

Since (h)VHE is the way of the future, use the CPACR_EL1 format,
which is a subset of the VHE CPTR_EL2, as a reference.

No functional change intended.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:33 +01:00
Fuad Tabba
45f4ea9bcf KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state
Since the prototypes for __sve_save_state/__sve_restore_state at
hyp were added, the underlying macro has acquired a third
parameter for saving/restoring ffr.

Fix the prototypes to account for the third parameter, and
restore the ffr for the guest since it is saved.

Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:32 +01:00
Fuad Tabba
87bb39ed40 KVM: arm64: Reintroduce __sve_save_state
Now that the hypervisor is handling the host sve state in
protected mode, it needs to be able to save it.

This reverts commit e66425fc9b ("KVM: arm64: Remove unused
__sve_save_state").

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04 15:06:32 +01:00
Lukas Wunner
44a45be57f sysfs: Unbreak the build around sysfs_bin_attr_simple_read()
Günter reports build breakage for m68k "m5208evb_defconfig" plus
CONFIG_BLK_DEV_INITRD=y caused by commit 66bc1a1733 ("treewide:
Use sysfs_bin_attr_simple_read() helper").

The defconfig disables CONFIG_SYSFS, so sysfs_bin_attr_simple_read()
is not compiled into the kernel.  But init/initramfs.c references
that function in the initializer of a struct bin_attribute.

Add an empty static inline to avoid the build breakage.

Fixes: 66bc1a1733 ("treewide: Use sysfs_bin_attr_simple_read() helper")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/r/e12b0027-b199-4de7-b83d-668171447ccc@roeck-us.net
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/05f4290439a58730738a15b0c99cd8576c4aa0d9.1716461752.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:56:45 +02:00
Greg Kroah-Hartman
9711873506 driver core: remove devm_device_add_groups()
There is no more in-kernel users of this function, and no driver should
ever be using it, so remove it from the kernel.

Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230704131715.44454-8-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:53:36 +02:00
Hagar Hemdan
73254a297c io_uring: fix possible deadlock in io_register_iowq_max_workers()
The io_register_iowq_max_workers() function calls io_put_sq_data(),
which acquires the sqd->lock without releasing the uring_lock.
Similar to the commit 009ad9f0c6 ("io_uring: drop ctx->uring_lock
before acquiring sqd->lock"), this can lead to a potential deadlock
situation.

To resolve this issue, the uring_lock is released before calling
io_put_sq_data(), and then it is re-acquired after the function call.

This change ensures that the locks are acquired in the correct
order, preventing the possibility of a deadlock.

Suggested-by: Maximilian Heyne <mheyne@amazon.de>
Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-04 07:39:17 -06:00
Su Hui
91215f70ea io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
Clang static checker (scan-build) warning:
o_uring/io-wq.c:line 1051, column 3
The expression is an uninitialized value. The computed value will
also be garbage.

'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is
not fully initialized. Change the order of assignment for 'match' to fix
this problem.

Fixes: 42abc95f05 ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-04 07:39:00 -06:00
Shichao Lai
16637fea00 usb-storage: alauda: Check whether the media is initialized
The member "uzonesize" of struct alauda_info will remain 0
if alauda_init_media() fails, potentially causing divide errors
in alauda_read_data() and alauda_write_lba().
- Add a member "media_initialized" to struct alauda_info.
- Change a condition in alauda_check_media() to ensure the
  first initialization.
- Add an error check for the return value of alauda_init_media().

Fixes: e80b0fade0 ("[PATCH] USB Storage: add alauda support")
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Shichao Lai <shichaorai@gmail.com>
Link: https://lore.kernel.org/r/20240526012745.2852061-1-shichaorai@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:36:28 +02:00
Heikki Krogerus
8bdf8a42bc usb: typec: ucsi: Ack also failed Get Error commands
It is possible that also the GET_ERROR command fails. If
that happens, the command completion still needs to be
acknowledged. Otherwise the interface will be stuck until
it's reset.

Reported-by: Ammy Yi <ammy.yi@intel.com>
Fixes: bdc62f2bae ("usb: typec: ucsi: Simplified registration and I/O API")
Cc: stable@vger.kernel.org
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240531104653.1303519-1-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:36:12 +02:00
Andrey Konovalov
f85d39dd7e kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
After commit 8fea0c8fda ("usb: core: hcd: Convert from tasklet to BH
workqueue"), usb_giveback_urb_bh() runs in the BH workqueue with
interrupts enabled.

Thus, the remote coverage collection section in usb_giveback_urb_bh()->
__usb_hcd_giveback_urb() might be interrupted, and the interrupt handler
might invoke __usb_hcd_giveback_urb() again.

This breaks KCOV, as it does not support nested remote coverage collection
sections within the same context (neither in task nor in softirq).

Update kcov_remote_start/stop_usb_softirq() to disable interrupts for the
duration of the coverage collection section to avoid nested sections in
the softirq context (in addition to such in the task context, which are
already handled).

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Closes: https://lore.kernel.org/linux-usb/0f4d1964-7397-485b-bc48-11c01e2fcbca@I-love.SAKURA.ne.jp/
Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 8fea0c8fda ("usb: core: hcd: Convert from tasklet to BH workqueue")
Cc: stable@vger.kernel.org
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20240527173538.4989-1-andrey.konovalov@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:34:44 +02:00
Rob Herring (Arm)
e4228cfd09 dt-bindings: usb: realtek,rts5411: Add missing "additionalProperties" on child nodes
All nodes need an explicit additionalProperties or unevaluatedProperties
unless a $ref has one that's false. As that is not the case with
usb-device.yaml, "additionalProperties" is needed here.

Fixes: c44d9dab31 ("dt-bindings: usb: Add downstream facing ports to realtek binding")
Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20240523194500.2958192-1-robh@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:34:28 +02:00
Kyle Tso
fc8fb9eea9 usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
Similar to what fixed in Commit a6fe37f428 ("usb: typec: tcpm: Skip
hard reset when in error recovery"), the handling of the received Hard
Reset has to be skipped during TOGGLING state.

[ 4086.021288] VBUS off
[ 4086.021295] pending state change SNK_READY -> SNK_UNATTACHED @ 650 ms [rev2 NONE_AMS]
[ 4086.022113] VBUS VSAFE0V
[ 4086.022117] state change SNK_READY -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.022447] VBUS off
[ 4086.022450] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.023060] VBUS VSAFE0V
[ 4086.023064] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS]
[ 4086.023070] disable BIST MODE TESTDATA
[ 4086.023766] disable vbus discharge ret:0
[ 4086.023911] Setting usb_comm capable false
[ 4086.028874] Setting voltage/current limit 0 mV 0 mA
[ 4086.028888] polarity 0
[ 4086.030305] Requesting mux state 0, usb-role 0, orientation 0
[ 4086.033539] Start toggling
[ 4086.038496] state change SNK_UNATTACHED -> TOGGLING [rev2 NONE_AMS]

// This Hard Reset is unexpected
[ 4086.038499] Received hard reset
[ 4086.038501] state change TOGGLING -> HARD_RESET_START [rev2 HARD_RESET]

Fixes: f0690a25a1 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable@vger.kernel.org
Signed-off-by: Kyle Tso <kyletso@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240520154858.1072347-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:34:12 +02:00
Amit Sunil Dhamne
e7e921918d usb: typec: tcpm: fix use-after-free case in tcpm_register_source_caps
There could be a potential use-after-free case in
tcpm_register_source_caps(). This could happen when:
 * new (say invalid) source caps are advertised
 * the existing source caps are unregistered
 * tcpm_register_source_caps() returns with an error as
   usb_power_delivery_register_capabilities() fails

This causes port->partner_source_caps to hold on to the now freed source
caps.

Reset port->partner_source_caps value to NULL after unregistering
existing source caps.

Fixes: 230ecdf71a ("usb: typec: tcpm: unregister existing source caps before re-registration")
Cc: stable@vger.kernel.org
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
Reviewed-by: Ondrej Jirman <megi@xff.cz>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240514220134.2143181-1-amitsd@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:33:55 +02:00
John Ernberg
8475ffcfb3 USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected
If no other USB HCDs are selected when compiling a small pure virutal
machine, the Xen HCD driver cannot be built.

Fix it by traversing down host/ if CONFIG_USB_XEN_HCD is selected.

Fixes: 494ed3997d ("usb: Introduce Xen pvUSB frontend (xen hcd)")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: John Ernberg <john.ernberg@actia.se>
Link: https://lore.kernel.org/r/20240517114345.1190755-1-john.ernberg@actia.se
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:33:38 +02:00
Johan Hovold
fc3568f142 usb: typec: ucsi: glink: increase max ports for x1e80100
The new X Elite (x1e80100) platform has three ports so increase the
maximum so that all ports can be registered.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20240603100007.10236-1-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 15:33:19 +02:00
Jens Axboe
415ce0ea55 io_uring/napi: fix timeout calculation
Not quite sure what __io_napi_adjust_timeout() was attemping to do, it's
adjusting both the NAPI timeout and the general overall timeout, and
calculating a value that is never used. The overall timeout is a super
set of the NAPI timeout, and doesn't need adjusting. The only thing we
really need to care about is that the NAPI timeout doesn't exceed the
overall timeout. If a user asked for a timeout of eg 5 usec and NAPI
timeout is 10 usec, then we should not spin for 10 usec.

While in there, sanitize the time checking a bit. If we have a negative
value in the passed in timeout, discard it. Round up the value as well,
so we don't end up with a NAPI timeout for the majority of the wait,
with only a tiny sleep value at the end.

Hence the only case we need to care about is if the NAPI timeout is
larger than the overall timeout. If it is, cap the NAPI timeout at what
the overall timeout is.

Cc: stable@vger.kernel.org
Fixes: 8d0c12a80c ("io-uring: add napi busy poll support")
Reported-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-04 07:32:45 -06:00
Peter Chen
718d4a63c0 Revert "usb: chipidea: move ci_ulpi_init after the phy initialization"
This reverts commit 22ffd399e6.
People report this commit causes the driver defer probed, and never
back to work[1][2].

[1] https://lore.kernel.org/lkml/20240407011913.GA168730@nchen-desktop/T/#mc2b93bc11a8b01ec7cd0d0bf6b0b03951d9ef751
[2] https://lore.kernel.org/lkml/20240407011913.GA168730@nchen-desktop/T/#me87d9a2a76c07619d83b3879ea14780da89fbbbf

Cc: Michael Grzeschik <m.grzeschik@pengutronix.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Wouter Franken <wouter@franken-peeters.be>
Signed-off-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20240517023648.3459188-1-peter.chen@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:51:09 +02:00
Tetsuo Handa
ae01e52da2 serial: drop debugging WARN_ON_ONCE() from uart_write()
syzbot is reporting lockdep warning upon

  int disc = 7;
  ioctl(open("/dev/ttyS3", O_RDONLY), TIOCSETD, &disc);

sequence. Do like what commit 5f1149d2f4 ("serial: drop debugging
WARN_ON_ONCE() from uart_put_char()") does.

Reported-by: syzbot+f78380e4eae53c64125c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f78380e4eae53c64125c
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/d775ae2d-a2ac-439e-8e2b-134749f60f30@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:10:43 +02:00
Hugo Villeneuve
7a2e8e30ad serial: sc16is7xx: re-add Kconfig SPI or I2C dependency
Commit d492164381
("serial: sc16is7xx: split into core and I2C/SPI parts (core)")
removed Kconfig SPI_MASTER or I2C dependency for SERIAL_SC16IS7XX (core).
This removal was done because I inadvertently misinterpreted some review
comments.

Because of that, the driver question now pops up if both I2C and
SPI_MASTER are disabled.

Re-add Kconfig SPI_MASTER or I2C dependency to fix the problem.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: d492164381 ("serial: sc16is7xx: split into core and I2C/SPI parts (core)")
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240603152601.3689319-3-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:10:28 +02:00
Hugo Villeneuve
4e534ff4b6 serial: sc16is7xx: rename Kconfig CONFIG_SERIAL_SC16IS7XX_CORE
Commit d492164381
("serial: sc16is7xx: split into core and I2C/SPI parts (core)")
renamed SERIAL_SC16IS7XX_CORE by SERIAL_SC16IS7XX. This means that some
configs should have been updated when I submitted the original patch, but
unfortunately they were not. Geert mentioned for example:
    arch/mips/configs/cu1??0-neo_defconfig

Rename SERIAL_SC16IS7XX to SERIAL_SC16IS7XX_CORE so that existing configs
will still work correctly.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: d492164381 ("serial: sc16is7xx: split into core and I2C/SPI parts (core)")
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240603152601.3689319-2-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:10:28 +02:00
Douglas Anderson
ca84cd379b serial: port: Don't block system suspend even if bytes are left to xmit
Recently, suspend testing on sc7180-trogdor based devices has started
to sometimes fail with messages like this:

  port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0
  port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16
  port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs
  port a88000.serial:0.0: PM: failed to suspend: error -16

I could reproduce these problems by logging in via an agetty on the
debug serial port (which was _not_ used for kernel console) and
running:
  cat /var/log/messages
...and then (via an SSH session) forcing a few suspend/resume cycles.

Tracing through the code and doing some printf()-based debugging shows
that the -16 (-EBUSY) comes from the recently added
serial_port_runtime_suspend().

The idea of the serial_port_runtime_suspend() function is to prevent
the port from being _runtime_ suspended if it still has bytes left to
transmit. Having bytes left to transmit isn't a reason to block
_system_ suspend, though. If a serdev device in the kernel needs to
block system suspend it should block its own suspend and it can use
serdev_device_wait_until_sent() to ensure bytes are sent.

The DEFINE_RUNTIME_DEV_PM_OPS() used by the serial_port code means
that the system suspend function will be pm_runtime_force_suspend().
In pm_runtime_force_suspend() we can see that before calling the
runtime suspend function we'll call pm_runtime_disable(). This should
be a reliable way to detect that we're called from system suspend and
that we shouldn't look for busyness.

Fixes: 43066e3222 ("serial: port: Don't suspend if the port is still busy")
Cc: stable@vger.kernel.org
Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240531080914.v3.1.I2395e66cf70c6e67d774c56943825c289b9c13e4@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:09:47 +02:00
Doug Brown
5208e7ced5 serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level
The FIFO is 64 bytes, but the FCR is configured to fire the TX interrupt
when the FIFO is half empty (bit 3 = 0). Thus, we should only write 32
bytes when a TX interrupt occurs.

This fixes a problem observed on the PXA168 that dropped a bunch of TX
bytes during large transmissions.

Fixes: ab28f51c77 ("serial: rewrite pxa2xx-uart to use 8250_core")
Signed-off-by: Doug Brown <doug@schmorgal.com>
Link: https://lore.kernel.org/r/20240519191929.122202-1-doug@schmorgal.com
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:08:09 +02:00
Andy Shevchenko
2c94512055 serial: 8250_dw: Revert "Move definitions to the shared header"
This reverts commit d9666dfb31.

The container of the struct dw8250_port_data is private to the actual
driver. In particular, 8250_lpss and 8250_dw use different data types
that are assigned to the UART port private_data. Hence, it must not
be used outside the specific driver.

Fix the mistake made in the past by moving the respective definitions
to the specific driver.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240514190730.2787071-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:07:58 +02:00
Andy Shevchenko
87d80bfbd5 serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw
The container of the struct dw8250_port_data is private to the actual
driver. In particular, 8250_lpss and 8250_dw use different data types
that are assigned to the UART port private_data. Hence, it must not
be used outside the specific driver.

Currently the only cpr_val is required by the common code, make it
be available via struct dw8250_port_data.

This fixes the UART breakage on Intel Galileo boards.

Fixes: 593dea000b ("serial: 8250: dw: Allow to use a fallback CPR value if not synthesized")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240514190730.2787071-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:07:58 +02:00
Ilpo Järvinen
b19ab7ee2c tty: n_tty: Fix buffer offsets when lookahead is used
When lookahead has "consumed" some characters (la_count > 0),
n_tty_receive_buf_standard() and n_tty_receive_buf_closing() for
characters beyond the la_count are given wrong cp/fp offsets which
leads to duplicating and losing some characters.

If la_count > 0, correct buffer pointers and make count consistent too
(the latter is not strictly necessary to fix the issue but seems more
logical to adjust all variables immediately to keep state consistent).

Reported-by: Vadym Krevs <vkrevs@yahoo.com>
Fixes: 6bb6fa6908 ("tty: Implement lookahead to process XON/XOFF timely")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218834
Tested-by: Vadym Krevs <vkrevs@yahoo.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240514140429.12087-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 14:07:27 +02:00
Vasant Hegde
526606b0a1 iommu/amd: Fix Invalid wait context issue
With commit c4cb231111 ("iommu/amd: Add support for enable/disable IOPF")
we are hitting below issue. This happens because in IOPF enablement path
it holds spin lock with irq disable and then tries to take mutex lock.

dmesg:
-----
[    0.938739] =============================
[    0.938740] [ BUG: Invalid wait context ]
[    0.938742] 6.10.0-rc1+ #1 Not tainted
[    0.938745] -----------------------------
[    0.938746] swapper/0/1 is trying to lock:
[    0.938748] ffffffff8c9f01d8 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x78/0x4a0
[    0.938767] other info that might help us debug this:
[    0.938768] context-{5:5}
[    0.938769] 7 locks held by swapper/0/1:
[    0.938772]  #0: ffff888101a91310 (&group->mutex){+.+.}-{4:4}, at: bus_iommu_probe+0x70/0x160
[    0.938790]  #1: ffff888101d1f1b8 (&domain->lock){....}-{3:3}, at: amd_iommu_attach_device+0xa5/0x700
[    0.938799]  #2: ffff888101cc3d18 (&dev_data->lock){....}-{3:3}, at: amd_iommu_attach_device+0xc5/0x700
[    0.938806]  #3: ffff888100052830 (&iommu->lock){....}-{2:2}, at: amd_iommu_iopf_add_device+0x3f/0xa0
[    0.938813]  #4: ffffffff8945a340 (console_lock){+.+.}-{0:0}, at: _printk+0x48/0x50
[    0.938822]  #5: ffffffff8945a390 (console_srcu){....}-{0:0}, at: console_flush_all+0x58/0x4e0
[    0.938867]  #6: ffffffff82459f80 (console_owner){....}-{0:0}, at: console_flush_all+0x1f0/0x4e0
[    0.938872] stack backtrace:
[    0.938874] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc1+ #1
[    0.938877] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.39 04/16/2019

Fix above issue by re-arranging code in attach device path:
  - move device PASID/IOPF enablement outside lock in AMD IOMMU driver.
    This is safe as core layer holds group->mutex lock before calling
    iommu_ops->attach_dev.

Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Fixes: c4cb231111 ("iommu/amd: Add support for enable/disable IOPF")
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240530084801.10758-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-04 14:00:59 +02:00
Vasant Hegde
48dc345a23 iommu/amd: Check EFR[EPHSup] bit before enabling PPR
Check for EFR[EPHSup] bit before enabling PPR. This bit must be set
to enable PPR.

Reported-by: Borislav Petkov <bp@alien8.de>
Fixes: c4cb231111 ("iommu/amd: Add support for enable/disable IOPF")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218900
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Jean-Christophe Guillain <jean-christophe@guillain.net>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20240530071118.10297-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-04 13:59:52 +02:00
Vasant Hegde
998a0a362b iommu/amd: Fix workqueue name
Workqueue name length is crossing WQ_NAME_LEN limit. Fix it by changing
name format. New format : "iopf_queue/amdvi-<iommu-devid>"

kernel warning:
  [   11.146912] workqueue: name exceeds WQ_NAME_LEN. Truncating to: iopf_queue/amdiommu-0xc002-iopf

Reported-by: Borislav Petkov <bp@alien8.de>
Fixes: 61928bab9d ("iommu/amd: Define per-IOMMU iopf_queue")
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240529113900.5798-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-04 13:58:38 +02:00
Lu Baolu
89e8a2366e iommu: Return right value in iommu_sva_bind_device()
iommu_sva_bind_device() should return either a sva bond handle or an
ERR_PTR value in error cases. Existing drivers (idxd and uacce) only
check the return value with IS_ERR(). This could potentially lead to
a kernel NULL pointer dereference issue if the function returns NULL
instead of an error pointer.

In reality, this doesn't cause any problems because iommu_sva_bind_device()
only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA.
In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will
return an error, and the device drivers won't call iommu_sva_bind_device()
at all.

Fixes: 26b25a2b98 ("iommu: Bind process address spaces to devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240528042528.71396-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-04 13:54:31 +02:00
Robin Murphy
cc8d89d063 iommu/dma: Fix domain init
Despite carefully rewording the kerneldoc to describe the new direct
interaction with dma_range_map, it seems I managed to confuse myself in
removing the redundant force_aperture check and ended up making the code
not do that at all. This led to dma_range_maps inadvertently being able
to set iovad->start_pfn = 0, and all the nonsensical chaos which ensues
from there. Restore the correct behaviour of constraining base_pfn to
the domain aperture regardless of dma_range_map, and not trying to apply
dma_range_map constraints to the basic IOVA domain since they will be
properly handled with reserved regions later.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Fixes: ad4750b07d ("iommu/dma: Make limit checks self-contained")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/721fa6baebb0924aa40db0b8fb86bcb4538434af.1716232484.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-04 13:54:09 +02:00
Kun(llfl)
a295ec52c8 iommu/amd: Fix sysfs leak in iommu init
During the iommu initialization, iommu_init_pci() adds sysfs nodes.
However, these nodes aren't remove in free_iommu_resources() subsequently.

Fixes: 39ab9555c2 ("iommu: Add sysfs bindings for struct iommu_device")
Signed-off-by: Kun(llfl) <llfl@linux.alibaba.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/c8e0d11c6ab1ee48299c288009cf9c5dae07b42d.1715215003.git.llfl@linux.alibaba.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-04 13:50:15 +02:00
Stefan Wahren
c3552ab19a staging: vchiq_debugfs: Fix NPD in vchiq_dump_state
The commit 42a2f6664e ("staging: vc04_services: Move global g_state
to vchiq_state") falsely assumed that the debugfs entry vchiq/state
was created with vchiq_instance as data. This causes now a NULL
pointer derefence while trying to dump the vchiq state. So fix
this by passing vchiq_state as data, because this is the relevant
part here.

Fixes: 42a2f6664e ("staging: vc04_services: Move global g_state to vchiq_state")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>
Link: https://lore.kernel.org/r/20240524151542.19415-1-wahrenst@gmx.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-04 13:36:44 +02:00
Jakub Kicinski
a535d59432 net: tls: fix marking packets as decrypted
For TLS offload we mark packets with skb->decrypted to make sure
they don't escape the host without getting encrypted first.
The crypto state lives in the socket, so it may get detached
by a call to skb_orphan(). As a safety check - the egress path
drops all packets with skb->decrypted and no "crypto-safe" socket.

The skb marking was added to sendpage only (and not sendmsg),
because tls_device injected data into the TCP stack using sendpage.
This special case was missed when sendpage got folded into sendmsg.

Fixes: c5c37af6ec ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04 12:58:50 +02:00
Ilpo Järvinen
f8367a74ae EDAC/igen6: Convert PCIBIOS_* return codes to errnos
errcmd_enable_error_reporting() uses pci_{read,write}_config_word()
that return PCIBIOS_* codes. The return code is then returned all the
way into the probe function igen6_probe() that returns it as is. The
probe functions, however, should return normal errnos.

Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from errcmd_enable_error_reporting().

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
2024-06-04 11:29:52 +02:00
Ilpo Järvinen
3ec8ebd8a5 EDAC/amd64: Convert PCIBIOS_* return codes to errnos
gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is then returned all the way into the module
init function amd64_edac_init() that returns it as is. The module init
functions, however, should return normal errnos.

Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from gpu_get_node_map().

For consistency, convert also the other similar cases which return
PCIBIOS_* codes even if they do not have any bugs at the moment.

Fixes: 4251566ebc ("EDAC/amd64: Cache and use GPU node map")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-1-ilpo.jarvinen@linux.intel.com
2024-06-04 11:24:16 +02:00
Nikita Zhandarovich
4aa2dcfbad HID: core: remove unnecessary WARN_ON() in implement()
Syzkaller hit a warning [1] in a call to implement() when trying
to write a value into a field of smaller size in an output report.

Since implement() already has a warn message printed out with the
help of hid_warn() and value in question gets trimmed with:
	...
	value &= m;
	...
WARN_ON may be considered superfluous. Remove it to suppress future
syzkaller triggers.

[1]
WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 implement drivers/hid/hid-core.c:1451 [inline]
WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863
Modules linked in:
CPU: 0 PID: 5084 Comm: syz-executor424 Not tainted 6.9.0-rc7-syzkaller-00183-gcf87f46fd34d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:implement drivers/hid/hid-core.c:1451 [inline]
RIP: 0010:hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863
...
Call Trace:
 <TASK>
 __usbhid_submit_report drivers/hid/usbhid/hid-core.c:591 [inline]
 usbhid_submit_report+0x43d/0x9e0 drivers/hid/usbhid/hid-core.c:636
 hiddev_ioctl+0x138b/0x1f00 drivers/hid/usbhid/hiddev.c:726
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:904 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...

Fixes: 95d1c8951e ("HID: simplify implement() a bit")
Reported-by: <syzbot+5186630949e3c55f0799@syzkaller.appspotmail.com>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-06-04 09:49:35 +02:00
Jakub Kicinski
d630180260 Merge tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:

====================
wireless fixes for v6.10-rc3

The first fixes for v6.10. And we have a big one, I suspect the
biggest wireless pull request we ever had. There are fixes all over,
both in stack and drivers. Likely the most important here are mt76 not
working on mt7615 devices, ath11k not being able to connect to 6 GHz
networks and rtlwifi suffering from packet loss. But of course there's
much more.

* tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (37 commits)
  wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
  wifi: mt76: mt7615: add missing chanctx ops
  wifi: wilc1000: document SRCU usage instead of SRCU
  Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section"
  Revert "wifi: wilc1000: convert list management to RCU"
  wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
  wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
  wifi: mac80211: fix Spatial Reuse element size check
  wifi: iwlwifi: mvm: don't read past the mfuart notifcation
  wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill
  wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
  wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option
  wifi: iwlwifi: mvm: handle BA session teardown in RF-kill
  wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd
  wifi: iwlwifi: mvm: remove stale STA link data during restart
  wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
  wifi: iwlwifi: mvm: set properly mac header
  wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
  wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup
  wifi: iwlwifi: mvm: fix a crash on 7265
  ...
====================

Link: https://lore.kernel.org/r/20240603115129.9494CC2BD10@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:52:24 -07:00
Jeff Johnson
c6cab01d7e lib/test_rhashtable: add missing MODULE_DESCRIPTION() macro
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_rhashtable.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240531-md-lib-test_rhashtable-v1-1-cd6d4138f1b6@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:51:18 -07:00
Jakub Kicinski
d730a42ca6 Merge branch 'dst_cache-fix-possible-races'
Eric Dumazet says:

====================
dst_cache: fix possible races

This series is inspired by various undisclosed syzbot
reports hinting at corruptions in dst_cache structures.

It seems at least four users of dst_cache are racy against
BH reentrancy.

Last patch is adding a DEBUG_NET check to catch future misuses.
====================

Link: https://lore.kernel.org/r/20240531132636.2637995-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:50:14 -07:00
Eric Dumazet
2fe6fb36c7 net: dst_cache: add two DEBUG_NET warnings
After fixing four different bugs involving dst_cache
users, it might be worth adding a check about BH being
blocked by dst_cache callers.

DEBUG_NET_WARN_ON_ONCE(!in_softirq());

It is not fatal, if we missed valid case where no
BH deadlock is to be feared, we might change this.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:50:09 -07:00
Eric Dumazet
cf28ff8e4c ila: block BH in ila_output()
As explained in commit 1378817486 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.

ila_output() is called from lwtunnel_output()
possibly from process context, and under rcu_read_lock().

We might be interrupted by a softirq, re-enter ila_output()
and corrupt dst_cache data structures.

Fix the race by using local_bh_disable().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:50:09 -07:00
Eric Dumazet
c0b98ac1cc ipv6: sr: block BH in seg6_output_core() and seg6_input_core()
As explained in commit 1378817486 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.

Disabling preemption in seg6_output_core() is not good enough,
because seg6_output_core() is called from process context,
lwtunnel_output() only uses rcu_read_lock().

We might be interrupted by a softirq, re-enter seg6_output_core()
and corrupt dst_cache data structures.

Fix the race by using local_bh_disable() instead of
preempt_disable().

Apply a similar change in seg6_input_core().

Fixes: fa79581ea6 ("ipv6: sr: fix several BUGs when preemption is enabled")
Fixes: 6c8702c60b ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Lebrun <dlebrun@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:50:08 -07:00
Eric Dumazet
db0090c6eb net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input()
As explained in commit 1378817486 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.

Disabling preemption in rpl_output() is not good enough,
because rpl_output() is called from process context,
lwtunnel_output() only uses rcu_read_lock().

We might be interrupted by a softirq, re-enter rpl_output()
and corrupt dst_cache data structures.

Fix the race by using local_bh_disable() instead of
preempt_disable().

Apply a similar change in rpl_input().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <aahringo@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:50:08 -07:00
Eric Dumazet
2fe40483ec ipv6: ioam: block BH from ioam6_output()
As explained in commit 1378817486 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.

Disabling preemption in ioam6_output() is not good enough,
because ioam6_output() is called from process context,
lwtunnel_output() only uses rcu_read_lock().

We might be interrupted by a softirq, re-enter ioam6_output()
and corrupt dst_cache data structures.

Fix the race by using local_bh_disable() instead of
preempt_disable().

Fixes: 8cb3bf8bff ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Justin Iurman <justin.iurman@uliege.be>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:50:08 -07:00
Matthias Stocker
ffbe335b8d vmxnet3: disable rx data ring on dma allocation failure
When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base,
the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset
rq->data_ring.desc_size for the data ring that failed, which presumably
causes the hypervisor to reference it on packet reception.

To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell
the hypervisor to disable this feature.

[   95.436876] kernel BUG at net/core/skbuff.c:207!
[   95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1
[   95.441558] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[   95.443481] RIP: 0010:skb_panic+0x4d/0x4f
[   95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50
ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9
ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24
[   95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246
[   95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f
[   95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[   95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60
[   95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000
[   95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0
[   95.455682] FS:  0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000
[   95.457178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0
[   95.459791] Call Trace:
[   95.460515]  <IRQ>
[   95.461180]  ? __die_body.cold+0x19/0x27
[   95.462150]  ? die+0x2e/0x50
[   95.462976]  ? do_trap+0xca/0x110
[   95.463973]  ? do_error_trap+0x6a/0x90
[   95.464966]  ? skb_panic+0x4d/0x4f
[   95.465901]  ? exc_invalid_op+0x50/0x70
[   95.466849]  ? skb_panic+0x4d/0x4f
[   95.467718]  ? asm_exc_invalid_op+0x1a/0x20
[   95.468758]  ? skb_panic+0x4d/0x4f
[   95.469655]  skb_put.cold+0x10/0x10
[   95.470573]  vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3]
[   95.471853]  vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3]
[   95.473185]  __napi_poll+0x2b/0x160
[   95.474145]  net_rx_action+0x2c6/0x3b0
[   95.475115]  handle_softirqs+0xe7/0x2a0
[   95.476122]  __irq_exit_rcu+0x97/0xb0
[   95.477109]  common_interrupt+0x85/0xa0
[   95.478102]  </IRQ>
[   95.478846]  <TASK>
[   95.479603]  asm_common_interrupt+0x26/0x40
[   95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20
[   95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
[   95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246
[   95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000
[   95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001
[   95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3
[   95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260
[   95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000
[   95.495035]  acpi_safe_halt+0x14/0x20
[   95.496127]  acpi_idle_do_entry+0x2f/0x50
[   95.497221]  acpi_idle_enter+0x7f/0xd0
[   95.498272]  cpuidle_enter_state+0x81/0x420
[   95.499375]  cpuidle_enter+0x2d/0x40
[   95.500400]  do_idle+0x1e5/0x240
[   95.501385]  cpu_startup_entry+0x29/0x30
[   95.502422]  start_secondary+0x11c/0x140
[   95.503454]  common_startup_64+0x13e/0x141
[   95.504466]  </TASK>
[   95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback
vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock
sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev
loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi
ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
scsi_dh_alua ip6_tables ip_tables fuse
[   95.516536] ---[ end trace 0000000000000000 ]---

Fixes: 6f4833383e ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()")
Signed-off-by: Matthias Stocker <mstocker@barracuda.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Ronak Doshi <ronak.doshi@broadcom.com>
Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03 18:49:33 -07:00
Linus Torvalds
2ab7951410 Merge tag 'cxl-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:

 - Compile fix for cxl-test from missing linux/vmalloc.h

 - Fix for memregion leaks in devm_cxl_add_region()

* tag 'cxl-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/region: Fix memregion leaks in devm_cxl_add_region()
  cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c
2024-06-03 14:42:41 -07:00
Johan Hovold
78f0dfa64c iio: inkern: fix channel read regression
A recent "cleanup" broke IIO channel read outs and thereby thermal
mitigation on the Lenovo ThinkPad X13s by returning zero instead of the
expected IIO value type in iio_read_channel_processed_scale():

	thermal thermal_zone12: failed to read out thermal zone (-22)

Fixes: 3092bde731 ("iio: inkern: move to the cleanup.h magic")
Cc: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240530074416.13697-1-johan+linaro@kernel.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-03 20:29:31 +01:00
Jean-Baptiste Maneyrol
8844ed0a6e iio: imu: inv_mpu6050: stabilized timestamping in interrupt
Use IRQ ONESHOT flag to ensure the timestamp is not updated in the
hard handler during the thread handler. And use a fixed value of 1
sample that correspond to this first timestamp.

This way we can ensure the timestamp is always corresponding to the
value used by the timestamping mechanism. Otherwise, it is possible
that between FIFO count read and FIFO processing the timestamp is
overwritten in the hard handler.

Fixes: 111e1abd00 ("iio: imu: inv_mpu6050: use the common inv_sensors timestamp module")
Cc: stable@vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Link: https://lore.kernel.org/r/20240527150117.608792-1-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-03 19:05:56 +01:00
Dumitru Ceclan
182bc496dc iio: adc: ad7173: Fix sampling frequency setting
This patch fixes two issues regarding the sampling frequency setting:
-The attribute was set as per device, not per channel. As such, when
 setting the sampling frequency, the configuration was always done for
 the slot 0, and the correct configuration was applied on the next
 channel configuration call by the LRU mechanism.
-The LRU implementation does not take into account external settings of
 the slot registers. When setting the sampling frequency directly to a
 slot register in write_raw(), there is no guarantee that other channels
 were not also using that slot and now incorrectly retain their config
 as live.

Set the sampling frequency attribute as separate in the channel templates.
Do not set the sampling directly to the slot register in write_raw(),
just mark the config as not live and let the LRU mechanism handle it.
As the reg variable is no longer used, remove it.

Fixes: 76a1e6a428 ("iio: adc: ad7173: add AD7173 driver")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com>
Link: https://lore.kernel.org/r/20240530-ad7173-fixes-v3-5-b85f33079e18@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-03 19:05:45 +01:00
Dumitru Ceclan
18befe4a28 iio: adc: ad7173: Clear append status bit
The previous value of the append status bit was not cleared before
setting the new value. This caused the bit to remain set after enabling
buffered mode for multiple channels and not permit further buffered
reads from a single channel after the fact.

Fixes: 76a1e6a428 ("iio: adc: ad7173: add AD7173 driver")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com>
Link: https://lore.kernel.org/r/20240530-ad7173-fixes-v3-4-b85f33079e18@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-03 19:05:19 +01:00
Arnaldo Carvalho de Melo
d6283b160a tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL
To pick the changes from:

  2a82bb0294 ("statx: stx_subvol")

This silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlnK2Fmx_gahzwZI@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-03 14:44:28 -03:00
Paolo Bonzini
b50788f7cd Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.10, take #1

- No need to use mask when hart-index-bits is 0
- Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
2024-06-03 13:18:18 -04:00
Paolo Bonzini
b3233c737e Merge branch 'kvm-fixes-6.10-1' into HEAD
* Fixes and debugging help for the #VE sanity check.  Also disable
  it by default, even for CONFIG_DEBUG_KERNEL, because it was found
  to trigger spuriously (most likely a processor erratum as the
  exact symptoms vary by generation).

* Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
  situation (GIF=0 or interrupt shadow) when the processor supports
  virtual NMI.  While generally KVM will not request an NMI window
  when virtual NMIs are supported, in this case it *does* have to
  single-step over the interrupt shadow or enable the STGI intercept,
  in order to deliver the latched second NMI.

* Drop support for hand tuning APIC timer advancement from userspace.
  Since we have adaptive tuning, and it has proved to work well,
  drop the module parameter for manual configuration and with it a
  few stupid bugs that it had.
2024-06-03 13:18:08 -04:00
Sean Christopherson
89a58812c4 KVM: x86: Drop support for hand tuning APIC timer advancement from userspace
Remove support for specifying a static local APIC timer advancement value,
and instead present a read-only boolean parameter to let userspace enable
or disable KVM's dynamic APIC timer advancement.  Realistically, it's all
but impossible for userspace to specify an advancement that is more
precise than what KVM's adaptive tuning can provide.  E.g. a static value
needs to be tuned for the exact hardware and kernel, and if KVM is using
hrtimers, likely requires additional tuning for the exact configuration of
the entire system.

Dropping support for a userspace provided value also fixes several flaws
in the interface.  E.g. KVM interprets a negative value other than -1 as a
large advancement, toggling between a negative and positive value yields
unpredictable behavior as vCPUs will switch from dynamic to static
advancement, changing the advancement in the middle of VM creation can
result in different values for vCPUs within a VM, etc.  Those flaws are
mostly fixable, but there's almost no justification for taking on yet more
complexity (it's minimal complexity, but still non-zero).

The only arguments against using KVM's adaptive tuning is if a setup needs
a higher maximum, or if the adjustments are too reactive, but those are
arguments for letting userspace control the absolute max advancement and
the granularity of each adjustment, e.g. similar to how KVM provides knobs
for halt polling.

Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com
Cc: Shuling Zhou <zhoushuling@huawei.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240522010304.1650603-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:08:05 -04:00
Ravi Bangoria
b7e4be0a22 KVM: SEV-ES: Delegate LBR virtualization to the processor
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. Although KVM currently enforces LBRV for SEV-ES guests, there
are multiple issues with it:

o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR
  interception is used to dynamically toggle LBRV for performance reasons,
  this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:

  [guest ~]# wrmsr 0x1d9 0x4
  KVM: entry failed, hardware error 0xffffffff
  EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000

  Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
  No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
  is of swap type A.

o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the
  VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA
  encryption.

[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
     2023, Vol 2, 15.35.2 Enabling SEV-ES.
     https://bugzilla.kernel.org/attachment.cgi?id=304653

Fixes: 376c6d2850 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Co-developed-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:07:18 -04:00
Ravi Bangoria
d922056215 KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. So, prevent SEV-ES guests when LBRV support is missing.

[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
     2023, Vol 2, 15.35.2 Enabling SEV-ES.
     https://bugzilla.kernel.org/attachment.cgi?id=304653

Fixes: 376c6d2850 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:06:48 -04:00
Nikunj A Dadhania
27bd5fdc24 KVM: SEV-ES: Prevent MSR access post VMSA encryption
KVM currently allows userspace to read/write MSRs even after the VMSA is
encrypted. This can cause unintentional issues if MSR access has side-
effects. For ex, while migrating a guest, userspace could attempt to
migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on
the target. Fix this by preventing access to those MSRs which are context
switched via the VMSA, once the VMSA is encrypted.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03 13:06:48 -04:00
Linus Torvalds
f06ce44145 Merge tag 'loongarch-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
 "Some bootloader interface fixes, a dts fix, and a trivial cleanup"

* tag 'loongarch-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Fix GMAC's phy-mode definitions in dts
  LoongArch: Override higher address bits in JUMP_VIRT_ADDR
  LoongArch: Fix entry point in kernel image header
  LoongArch: Add all CPUs enabled by fdt to NUMA node 0
  LoongArch: Fix built-in DTB detection
  LoongArch: Remove CONFIG_ACPI_TABLE_UPGRADE in platform_init()
2024-06-03 09:27:45 -07:00
Hagar Hemdan
b97e8a2f71 irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()
its_vlpi_prop_update() calls lpi_write_config() which obtains the
mapping information for a VLPI without lock held. So it could race
with its_vlpi_unmap().

Since all calls from its_irq_set_vcpu_affinity() require the same
lock to be held, hoist the locking there instead of sprinkling the
locking all over the place.

This bug was discovered using Coverity Static Analysis Security Testing
(SAST) by Synopsys, Inc.

[ tglx: Use guard() instead of goto ]

Fixes: 015ec0386a ("irqchip/gic-v3-its: Add VLPI configuration handling")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240531162144.28650-1-hagarhem@amazon.com
2024-06-03 18:20:00 +02:00
Cong Wang
2884dc7d08 bpf: Fix a potential use-after-free in bpf_link_free()
After commit 1a80dbcb2d, bpf_link can be freed by
link->ops->dealloc_deferred, but the code still tests and uses
link->ops->dealloc afterward, which leads to a use-after-free as
reported by syzbot. Actually, one of them should be sufficient, so
just call one of them instead of both. Also add a WARN_ON() in case
of any problematic implementation.

Fixes: 1a80dbcb2d ("bpf: support deferring bpf_link dealloc to after RCU grace period")
Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com
2024-06-03 18:16:19 +02:00
Srinivas Pandruvada
1e24c31351 cpufreq: intel_pstate: Fix unchecked HWP MSR access
Fix unchecked MSR access error for processors with no HWP support. On
such processors, maximum frequency can be changed by the system firmware
using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results
in accessing HWP MSR 0x771.

Call Trace:
	<TASK>
	generic_exec_single+0x58/0x120
	smp_call_function_single+0xbf/0x110
	rdmsrl_on_cpu+0x46/0x60
	intel_pstate_get_hwp_cap+0x1b/0x70
	intel_pstate_update_limits+0x2a/0x60
	acpi_processor_notify+0xb7/0x140
	acpi_ev_notify_dispatch+0x3b/0x60

HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled.
Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is
true.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/
Fixes: e8217b4bec ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently")
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-03 18:00:23 +02:00
David Kaplan
93c1800b37 x86/kexec: Fix bug with call depth tracking
The call to cc_platform_has() triggers a fault and system crash if call depth
tracking is active because the GS segment has been reset by load_segments() and
GS_BASE is now 0 but call depth tracking uses per-CPU variables to operate.

Call cc_platform_has() earlier in the function when GS is still valid.

  [ bp: Massage. ]

Fixes: 5d8213864a ("x86/retbleed: Add SKL return thunk")
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240603083036.637-1-bp@kernel.org
2024-06-03 17:19:03 +02:00
Thorsten Blum
2317dc2c22 bpf, devmap: Remove unnecessary if check in for loop
The iterator variable dst cannot be NULL and the if check can be removed.
Remove it and fix the following Coccinelle/coccicheck warning reported
by itnull.cocci:

	ERROR: iterator variable bound on line 762 cannot be NULL

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com
2024-06-03 17:09:23 +02:00
Palmer Dabbelt
e2c79b4c5c Revert "riscv: mm: accelerate pagefault when badaccess"
I accidentally picked up an earlier version of this patch, which had
already landed via mm.  The patch  I picked up contains a bug, which I
kept as I thought it was a fix.  So let's just revert it.

This reverts commit 4c6c002042.

Fixes: 4c6c002042 ("riscv: mm: accelerate pagefault when badaccess")
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240530164451.21336-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-03 07:41:13 -07:00
Nam Cao
994af1825a riscv: fix overlap of allocated page and PTR_ERR
On riscv32, it is possible for the last page in virtual address space
(0xfffff000) to be allocated. This page overlaps with PTR_ERR, so that
shouldn't happen.

There is already some code to ensure memblock won't allocate the last page.
However, buddy allocator is left unchecked.

Fix this by reserving physical memory that would be mapped at virtual
addresses greater than 0xfffff000.

Reported-by: Björn Töpel <bjorn@kernel.org>
Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: <stable@vger.kernel.org>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240425115201.3044202-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-03 07:41:09 -07:00
Tetsuo Handa
c6144a2116 tomoyo: update project links
TOMOYO project has moved to SourceForge.net .

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2024-06-03 22:43:11 +09:00
Gao Xiang
3d117494e2 cachefiles: remove unneeded include of <linux/fdtable.h>
close_fd() has been killed, let's get rid of unneeded
<linux/fdtable.h> as Al Viro pointed out [1].

[1] https://lore.kernel.org/r/20240603034055.GI1629371@ZenIV

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240603062344.818290-1-hsiangkao@linux.alibaba.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-03 15:39:17 +02:00
Chuck Lever
4a77c3dead SUNRPC: Fix loop termination condition in gss_free_in_token_pages()
The in_token->pages[] array is not NULL terminated. This results in
the following KASAN splat:

  KASAN: maybe wild-memory-access in range [0x04a2013400000008-0x04a201340000000f]

Fixes: bafa6b4d95 ("SUNRPC: Fix gss_free_in_token_pages()")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-06-03 09:07:55 -04:00
Matthias Schiffer
90dd7de4ef gpio: tqmx86: fix broken IRQ_TYPE_EDGE_BOTH interrupt type
The TQMx86 GPIO controller only supports falling and rising edge
triggers, but not both. Fix this by implementing a software both-edge
mode that toggles the edge type after every interrupt.

Fixes: b868db94a6 ("gpio: tqmx86: Add GPIO from for this IO controller")
Co-developed-by: Gregor Herburger <gregor.herburger@tq-group.com>
Signed-off-by: Gregor Herburger <gregor.herburger@tq-group.com>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://lore.kernel.org/r/515324f0491c4d44f4ef49f170354aca002d81ef.1717063994.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-06-03 14:13:13 +02:00
Matthias Schiffer
08af509efd gpio: tqmx86: store IRQ trigger type and unmask status separately
irq_set_type() should not implicitly unmask the IRQ.

All accesses to the interrupt configuration register are moved to a new
helper tqmx86_gpio_irq_config(). We also introduce the new rule that
accessing irq_type must happen while locked, which will become
significant for fixing EDGE_BOTH handling.

Fixes: b868db94a6 ("gpio: tqmx86: Add GPIO from for this IO controller")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://lore.kernel.org/r/6aa4f207f77cb58ef64ffb947e91949b0f753ccd.1717063994.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-06-03 14:13:13 +02:00
Matthias Schiffer
9d6a811b52 gpio: tqmx86: introduce shadow register for GPIO output value
The TQMx86 GPIO controller uses the same register address for input and
output data. Reading the register will always return current inputs
rather than the previously set outputs (regardless of the current
direction setting). Therefore, using a RMW pattern does not make sense
when setting output values. Instead, the previously set output register
value needs to be stored as a shadow register.

As there is no reliable way to get the current output values from the
hardware, also initialize all channels to 0, to ensure that stored and
actual output values match. This should usually not have any effect in
practise, as the TQMx86 UEFI sets all outputs to 0 during boot.

Also prepare for extension of the driver to more than 8 GPIOs by using
DECLARE_BITMAP.

Fixes: b868db94a6 ("gpio: tqmx86: Add GPIO from for this IO controller")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/d0555933becd45fa92a85675d26e4d59343ddc01.1717063994.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-06-03 14:13:13 +02:00
Gregor Herburger
8c219e52ca gpio: tqmx86: fix typo in Kconfig label
Fix description for GPIO_TQMX86 from QTMX86 to TQMx86.

Fixes: b868db94a6 ("gpio: tqmx86: Add GPIO from for this IO controller")
Signed-off-by: Gregor Herburger <gregor.herburger@tq-group.com>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/e0e38c9944ad6d281d9a662a45d289b88edc808e.1717063994.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-06-03 14:13:13 +02:00
Samuel Holland
e306a894bd irqchip/sifive-plic: Chain to parent IRQ after handlers are ready
Now that the PLIC uses a platform driver, the driver is probed later in the
boot process, where interrupts from peripherals might already be pending.

As a result, plic_handle_irq() may be called as early as the call to
irq_set_chained_handler() completes. But this call happens before the
per-context handler is completely set up, so there is a window where
plic_handle_irq() can see incomplete per-context state and crash.

Avoid this by delaying the call to irq_set_chained_handler() until all
handlers from all PLICs are initialized.

Fixes: 8ec99b0331 ("irqchip/sifive-plic: Convert PLIC driver into a platform driver")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Anup Patel <anup@brainfault.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240529215458.937817-1-samuel.holland@sifive.com
Closes: https://lore.kernel.org/r/CAMuHMdVYFFR7K5SbHBLY-JHhb7YpgGMS_hnRWm8H0KD-wBo+4A@mail.gmail.com/
2024-06-03 13:53:12 +02:00
Tristram Ha
6149db4997 net: phy: micrel: fix KSZ9477 PHY issues after suspend/resume
When the PHY is powered up after powered down most of the registers are
reset, so the PHY setup code needs to be done again.  In addition the
interrupt register will need to be setup again so that link status
indication works again.

Fixes: 26dd2974c5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03 12:06:29 +01:00
Sunil V L
0110c4b110 irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails
When riscv_intc_init_common() fails, the firmware node allocated is not
freed. Add the missing free().

Fixes: 7023b9d83f ("irqchip/riscv-intc: Add ACPI support")
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527081113.616189-1-sunilvl@ventanamicro.com
2024-06-03 12:29:35 +02:00
Suma Hegde
77f1972bdc platform/x86/amd/hsmp: Check HSMP support on AMD family of processors
HSMP interface is supported only on few x86 processors from AMD.
Accessing HSMP registers on rest of the platforms might cause
unexpected behaviour. So add a check.

Also unavailability of this interface on rest of the processors
is not an error. Hence, use pr_info() instead of the pr_err() to
log the message.

Signed-off-by: Suma Hegde <suma.hegde@amd.com>
Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Link: https://lore.kernel.org/r/20240603081512.142909-1-suma.hegde@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-06-03 11:57:28 +02:00
Armin Wolf
306aec7eea platform/x86: dell-smbios: Simplify error handling
When the allocation of value_name fails, the error handling code
uses two gotos for error handling, which is not necessary.

Simplify the error handling in this case by only using a single goto.

Tested on a Dell Inspiron 3505.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240528204903.445546-2-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-06-03 11:54:29 +02:00
Armin Wolf
1981b296f8 platform/x86: dell-smbios: Fix wrong token data in sysfs
When reading token data from sysfs on my Inspiron 3505, the token
locations and values are wrong. This happens because match_attribute()
blindly assumes that all entries in da_tokens have an associated
entry in token_attrs.

This however is not true as soon as da_tokens[] contains zeroed
token entries. Those entries are being skipped when initialising
token_attrs, breaking the core assumption of match_attribute().

Fix this by defining an extra struct for each pair of token attributes
and use container_of() to retrieve token information.

Tested on a Dell Inspiron 3050.

Fixes: 33b9ca1e53 ("platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240528204903.445546-1-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-06-03 11:53:54 +02:00
Arnd Bergmann
078fc56f5c platform/x86: yt2-1380: add CONFIG_EXTCON dependency
This driver uses the extcon subsystem and fails to build
when it cannot call into that subsystem:

x86_64-linux-ld: vmlinux.o: in function `yt2_1380_fc_worker':
lenovo-yoga-tab2-pro-1380-fastcharger.c:(.text+0xa9d819): undefined reference to `extcon_get_state'
x86_64-linux-ld: lenovo-yoga-tab2-pro-1380-fastcharger.c:(.text+0xa9d853): undefined reference to `extcon_get_state'
x86_64-linux-ld: vmlinux.o: in function `yt2_1380_fc_serdev_probe':
lenovo-yoga-tab2-pro-1380-fastcharger.c:(.text+0xa9da22): undefined reference to `extcon_get_extcon_dev'
x86_64-linux-ld: lenovo-yoga-tab2-pro-1380-fastcharger.c:(.text+0xa9dc0c): undefined reference to `devm_extcon_register_notifier_all'

Add a Kconfig dependency to make it it always builds correctly.

Fixes: b2ed33e8d4 ("platform/x86: Add lenovo-yoga-tab2-pro-1380-fastcharger driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240528115940.3169455-1-arnd@kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-06-03 11:43:15 +02:00
Andy Shevchenko
55624db051 platform/x86: touchscreen_dmi: Use 2-argument strscpy()
Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-8-andy.shevchenko@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-06-03 11:12:38 +02:00
Hans de Goede
84b26f509c platform/x86: touchscreen_dmi: Drop "silead,max-fingers" property
The silead touchscreen driver now defaults to 10 fingers, so it is no
longer necessary to have a "silead,max-fingers=10" property for each
silead touchscreen model. Drop this property from all the configs.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240525193854.39130-3-hdegoede@redhat.com
2024-06-03 11:10:46 +02:00
Hans de Goede
38a38f5a36 Input: silead - Always support 10 fingers
When support for Silead touchscreens was orginal added some touchscreens
with older firmware versions only supported 5 fingers and this was made
the default requiring the setting of a "silead,max-fingers=10" uint32
device-property for all touchscreen models which do support 10 fingers.

There are very few models with the old 5 finger fw, so in practice the
setting of the "silead,max-fingers=10" is boilerplate which needs to
be copy and pasted to every touchscreen config.

Reporting that 10 fingers are supported on devices which only support
5 fingers doesn't cause any problems for userspace in practice, since
at max 4 finger gestures are supported anyways. Drop the max_fingers
configuration and simply always assume 10 fingers.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/20240525193854.39130-2-hdegoede@redhat.com
2024-06-03 11:10:08 +02:00
Chen Ni
629f2b4e05 drm/panel: sitronix-st7789v: Add check for of_drm_get_panel_orientation
Add check for the return value of of_drm_get_panel_orientation() and
return the error if it fails in order to catch the error.

Fixes: b27c0f6d20 ("drm/panel: sitronix-st7789v: add panel orientation support")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Michael Riesch <michael.riesch@wolfvision.net>
Acked-by: Jessica Zhang <quic_jesszhan@quicinc.com>
Link: https://lore.kernel.org/r/20240528030832.2529471-1-nichen@iscas.ac.cn
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240528030832.2529471-1-nichen@iscas.ac.cn
2024-06-03 10:46:36 +02:00
Huacai Chen
eb36e520f4 LoongArch: Fix GMAC's phy-mode definitions in dts
The GMAC of Loongson chips cannot insert the correct 1.5-2ns delay. So
we need the PHY to insert internal delays for both transmit and receive
data lines from/to the PHY device. Fix this by changing the "phy-mode"
from "rgmii" to "rgmii-id" in dts.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Jiaxun Yang
1098efd299 LoongArch: Override higher address bits in JUMP_VIRT_ADDR
In JUMP_VIRT_ADDR we are performing an or calculation on address value
directly from pcaddi.

This will only work if we are currently running from direct 1:1 mapping
addresses or firmware's DMW is configured exactly same as kernel. Still,
we should not rely on such assumption.

Fix by overriding higher bits in address comes from pcaddi, so we can
get rid of or operator.

Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Jiaxun Yang
beb2800074 LoongArch: Fix entry point in kernel image header
Currently kernel entry in head.S is in DMW address range, firmware is
instructed to jump to this address after loading the kernel image.

However kernel should not make any assumption on firmware's DMW
setting, thus the entry point should be a physical address falls into
direct translation region.

Fix by converting entry address to physical and amend entry calculation
logic in libstub accordingly.

BTW, use ABSOLUTE() to calculate variables to make Clang/LLVM happy.

Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Jiaxun Yang
3de9c42d02 LoongArch: Add all CPUs enabled by fdt to NUMA node 0
NUMA enabled kernel on FDT based machine fails to boot because CPUs
are all in NUMA_NO_NODE and mm subsystem won't accept that.

Fix by adding them to default NUMA node at FDT parsing phase and move
numa_add_cpu(0) to a later point.

Cc: stable@vger.kernel.org
Fixes: 88d4d957ed ("LoongArch: Add FDT booting support from efi system table")
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Jiaxun Yang
b56f67a6c7 LoongArch: Fix built-in DTB detection
fdt_check_header(__dtb_start) will always success because kernel
provides a dummy dtb, and by coincidence __dtb_start clashed with
entry of this dummy dtb. The consequence is fdt passed from firmware
will never be taken.

Fix by trying to utilise __dtb_start only when CONFIG_BUILTIN_DTB is
enabled.

Cc: stable@vger.kernel.org
Fixes: 7b937cc243 ("of: Create of_root if no dtb provided by firmware")
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Tiezhu Yang
6c3ca6654a LoongArch: Remove CONFIG_ACPI_TABLE_UPGRADE in platform_init()
Both acpi_table_upgrade() and acpi_boot_table_init() are defined as
empty functions under !CONFIG_ACPI_TABLE_UPGRADE and !CONFIG_ACPI in
include/linux/acpi.h, there are no implicit declaration errors with
various configs.

  #ifdef CONFIG_ACPI_TABLE_UPGRADE
  void acpi_table_upgrade(void);
  #else
  static inline void acpi_table_upgrade(void) { }
  #endif

  #ifdef	CONFIG_ACPI
  ...
  void acpi_boot_table_init (void);
  ...
  #else	/* !CONFIG_ACPI */
  ...
  static inline void acpi_boot_table_init(void)
  {
  }
  ...
  #endif	/* !CONFIG_ACPI */

As Huacai suggested, CONFIG_ACPI_TABLE_UPGRADE is ugly and not necessary
here, just remove it. At the same time, just keep CONFIG_ACPI to prevent
potential build errors in future, and give a signal to indicate the code
is ACPI-specific. For the same reason, we also put acpi_table_upgrade()
under CONFIG_ACPI.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Wolfram Sang
c4aff1d1ec Merge tag 'i2c-host-6.10-pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Removed the SPD class of i2c devices from the device core.

Additionally, a cleanup in the Synquacer code removes the pclk
from the global structure, as it is used only in the probe.
Therefore, it is now declared locally.
2024-06-03 08:51:53 +02:00
Linus Torvalds
c3f38fa61a Linux 6.10-rc2 2024-06-02 15:44:56 -07:00
Linus Torvalds
58d89ee81a Merge tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:

 - Add a quirk for three different devices that have shown issues with
   LPM (link power management). These devices appear to not implement
   LPM properly, since we see command timeouts when enabling LPM. The
   quirk disables LPM for these problematic devices. (Me)

 - Do not apply the Intel PCS quirk on Alder Lake. The quirk is not
   needed and was originally added by mistake when LPM support was
   enabled for this AHCI controller. Enabling the quirk when not needed
   causes the the controller to not be able to detect the connected
   devices on some platforms.

* tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
  ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
  ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
  ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
2024-06-02 13:30:53 -07:00
Linus Torvalds
a693b9c95a Merge tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Miscellaneous topology parsing fixes:

   - Fix topology parsing regression on older CPUs in the new AMD/Hygon
     parser

   - Fix boot crash on odd Intel Quark and similar CPUs that do not fill
     out cpuinfo_x86::x86_clflush_size and zero out
     cpuinfo_x86::x86_cache_alignment as a result.

     Provide 32 bytes as a general fallback value.

   - Fix topology enumeration on certain rare CPUs where the BIOS locks
     certain CPUID leaves and the kernel unlocked them late, which broke
     with the new topology parsing code. Factor out this unlocking logic
     and move it earlier in the parsing sequence"

* tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology/intel: Unlock CPUID before evaluating anything
  x86/cpu: Provide default cache line size if not enumerated
  x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
2024-06-02 09:32:34 -07:00
Linus Torvalds
3fca58ffad Merge tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Export a symbol to make life easier for instrumentation/debugging"

* tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/x86: Export 'percpu arch_freq_scale'
2024-06-02 09:23:35 -07:00
Linus Torvalds
efa8f11a7e Merge tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fix from Ingo Molnar:
 "Add missing MODULE_DESCRIPTION() lines"

* tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
  perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
2024-06-02 09:20:37 -07:00
Linus Torvalds
00a8c352dd Merge tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:

 - scsi: mpt3sas: Avoid possible run-time warning with long manufacturer
   strings

 - mailmap: update entry for Kees Cook

 - kunit/fortify: Remove __kmalloc_node() test

* tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kunit/fortify: Remove __kmalloc_node() test
  mailmap: update entry for Kees Cook
  scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings
2024-06-02 09:15:28 -07:00
Jean-Baptiste Maneyrol
245f3b149e iio: imu: inv_icm42600: delete unneeded update watermark call
Update watermark will be done inside the hwfifo_set_watermark callback
just after the update_scan_mode. It is useless to do it here.

Fixes: 7f85e42a6c ("iio: imu: inv_icm42600: add buffer support in iio devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Link: https://lore.kernel.org/r/20240527210008.612932-1-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-02 13:16:02 +01:00
Jean-Baptiste Maneyrol
d7bd473632 iio: imu: inv_icm42600: stabilized timestamp in interrupt
Use IRQF_ONESHOT flag to ensure the timestamp is not updated in the
hard handler during the thread handler. And compute and use the
effective watermark value that correspond to this first timestamp.

This way we can ensure the timestamp is always corresponding to the
value used by the timestamping mechanism. Otherwise, it is possible
that between FIFO count read and FIFO processing the timestamp is
overwritten in the hard handler.

Fixes: ec74ae9fd3 ("iio: imu: inv_icm42600: add accurate timestamping")
Cc: stable@vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Link: https://lore.kernel.org/r/20240529154717.651863-1-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-06-02 11:56:40 +01:00
Linus Torvalds
83814698cf Merge tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Enforce full ordering for ATOMIC operations with BPF_FETCH

 - Fix uaccess build errors seen with GCC 13/14

 - Fix build errors on ppc32 due to ARCH_HAS_KERNEL_FPU_SUPPORT

 - Drop error message from lparcfg guest name lookup

Thanks to Christophe Leroy, Guenter Roeck, Nathan Lynch, Naveen N Rao,
Puranjay Mohan, and Samuel Holland.

* tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64
  powerpc/uaccess: Use YZ asm constraint for ld
  powerpc/uaccess: Fix build errors seen with GCC 13/14
  powerpc/pseries/lparcfg: drop error message from guest name lookup
  powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
2024-06-01 17:34:35 -07:00
Linus Torvalds
54bec8ed57 Merge tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
 "After merging a commit 1fffe7a34c ("script: modpost: emit a warning
  when the description is missing"), MODULE_DESCRIPTOR seems to be
  mandatory for kernel modules. In FireWire subsystem, the most of
  practical kernel modules have the field, while KUnit test modules do
  not. A single patch is applied to fix them"

* tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: add missing MODULE_DESCRIPTION() to test modules
2024-06-01 17:05:00 -07:00
Dmitry Safonov
33700a0c9b net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED
TCP_CLOSE may or may not have current/rnext keys and should not be
considered "established". The fast-path for TCP_CLOSE is
SKB_DROP_REASON_TCP_CLOSE. This is what tcp_rcv_state_process() does
anyways. Add an early drop path to not spend any time verifying
segment signatures for sockets in TCP_CLOSE state.

Cc: stable@vger.kernel.org # v6.7
Fixes: 0a3a809089 ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://lore.kernel.org/r/20240529-tcp_ao-sk_state-v1-1-d69b5d323c52@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 16:27:26 -07:00
DelphineCCChiu
e85e271dec net/ncsi: Fix the multi thread manner of NCSI driver
Currently NCSI driver will send several NCSI commands back to back without
waiting the response of previous NCSI command or timeout in some state
when NIC have multi channel. This operation against the single thread
manner defined by NCSI SPEC(section 6.3.2.3 in DSP0222_1.1.1)

According to NCSI SPEC(section 6.2.13.1 in DSP0222_1.1.1), we should probe
one channel at a time by sending NCSI commands (Clear initial state, Get
version ID, Get capabilities...), than repeat this steps until the max
number of channels which we got from NCSI command (Get capabilities) has
been probed.

Fixes: e6f44ed6d0 ("net/ncsi: Package and channel management")
Signed-off-by: DelphineCCChiu <delphine_cc_chiu@wiwynn.com>
Link: https://lore.kernel.org/r/20240529065856.825241-1-delphine_cc_chiu@wiwynn.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 16:21:44 -07:00
Jason Xing
8105378c0c net: rps: fix error when CONFIG_RFS_ACCEL is off
John Sperbeck reported that if we turn off CONFIG_RFS_ACCEL, the 'head'
is not defined, which will trigger compile error. So I move the 'head'
out of the CONFIG_RFS_ACCEL scope.

Fixes: 84b6823cd9 ("net: rps: protect last_qtail with rps_input_queue_tail_save() helper")
Reported-by: John Sperbeck <jsperbeck@google.com>
Closes: https://lore.kernel.org/all/20240529203421.2432481-1-jsperbeck@google.com/
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240530032717.57787-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 16:02:08 -07:00
Duoming Zhou
166fcf86cd ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put()
The object "ax25_dev" is managed by reference counting. Thus it should
not be directly released by kfree(), replace with ax25_dev_put().

Fixes: d01ffb9eee ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/20240530051733.11416-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:49:42 -07:00
Lars Kellogg-Stedman
3c34fb0bd4 ax25: Fix refcount imbalance on inbound connections
When releasing a socket in ax25_release(), we call netdev_put() to
decrease the refcount on the associated ax.25 device. However, the
execution path for accepting an incoming connection never calls
netdev_hold(). This imbalance leads to refcount errors, and ultimately
to kernel crashes.

A typical call trace for the above situation will start with one of the
following errors:

    refcount_t: decrement hit 0; leaking memory.
    refcount_t: underflow; use-after-free.

And will then have a trace like:

    Call Trace:
    <TASK>
    ? show_regs+0x64/0x70
    ? __warn+0x83/0x120
    ? refcount_warn_saturate+0xb2/0x100
    ? report_bug+0x158/0x190
    ? prb_read_valid+0x20/0x30
    ? handle_bug+0x3e/0x70
    ? exc_invalid_op+0x1c/0x70
    ? asm_exc_invalid_op+0x1f/0x30
    ? refcount_warn_saturate+0xb2/0x100
    ? refcount_warn_saturate+0xb2/0x100
    ax25_release+0x2ad/0x360
    __sock_release+0x35/0xa0
    sock_close+0x19/0x20
    [...]

On reboot (or any attempt to remove the interface), the kernel gets
stuck in an infinite loop:

    unregister_netdevice: waiting for ax0 to become free. Usage count = 0

This patch corrects these issues by ensuring that we call netdev_hold()
and ax25_dev_hold() for new connections in ax25_accept(). This makes the
logic leading to ax25_accept() match the logic for ax25_bind(): in both
cases we increment the refcount, which is ultimately decremented in
ax25_release().

Fixes: 9fd75b66b8 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
Tested-by: Duoming Zhou <duoming@zju.edu.cn>
Tested-by: Dan Cross <crossd@gmail.com>
Tested-by: Chris Maness <christopher.maness@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/20240529210242.3346844-2-lars@oddbit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:49:26 -07:00
Jakub Kicinski
45c0a209dc Merge branch 'virtio_net-fix-lock-warning-and-unrecoverable-state'
Heng Qi says:

====================
virtio_net: fix lock warning and unrecoverable state

Patch 1 describes and fixes an issue where dim cannot return to
normal state in certain scenarios.

Patch 2 attempts to resolve lockdep's complaints that holding many
nested locks.
====================

Link: https://lore.kernel.org/r/20240528134116.117426-1-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:14:13 -07:00
Heng Qi
d1f0bd01bc virtio_net: fix a spurious deadlock issue
When the following snippet is run, lockdep will report a deadlock[1].

  /* Acquire all queues dim_locks */
  for (i = 0; i < vi->max_queue_pairs; i++)
          mutex_lock(&vi->rq[i].dim_lock);

There's no deadlock here because the vq locks are always taken
in the same order, but lockdep can not figure it out. So refactoring
the code to alleviate the problem.

[1]
========================================================
WARNING: possible recursive locking detected
6.9.0-rc7+ #319 Not tainted
--------------------------------------------
ethtool/962 is trying to acquire lock:

but task is already holding lock:

other info that might help us debug this:
Possible unsafe locking scenario:

      CPU0
      ----
 lock(&vi->rq[i].dim_lock);
 lock(&vi->rq[i].dim_lock);

*** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by ethtool/962:
 #0: ffffffff82dbaab0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
 #1: ffffffff82dad0a8 (rtnl_mutex){+.+.}-{3:3}, at:
				ethnl_default_set_doit+0xbe/0x1e0

stack backtrace:
CPU: 6 PID: 962 Comm: ethtool Not tainted 6.9.0-rc7+ #319
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
	   rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x79/0xb0
 check_deadlock+0x130/0x220
 __lock_acquire+0x861/0x990
 lock_acquire.part.0+0x72/0x1d0
 ? lock_acquire+0xf8/0x130
 __mutex_lock+0x71/0xd50
 virtnet_set_coalesce+0x151/0x190
 __ethnl_set_coalesce.isra.0+0x3f8/0x4d0
 ethnl_set_coalesce+0x34/0x90
 ethnl_default_set_doit+0xdd/0x1e0
 genl_family_rcv_msg_doit+0xdc/0x130
 genl_family_rcv_msg+0x154/0x230
 ? __pfx_ethnl_default_set_doit+0x10/0x10
 genl_rcv_msg+0x4b/0xa0
 ? __pfx_genl_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x5a/0x110
 genl_rcv+0x28/0x40
 netlink_unicast+0x1af/0x280
 netlink_sendmsg+0x20e/0x460
 __sys_sendto+0x1fe/0x210
 ? find_held_lock+0x2b/0x80
 ? do_user_addr_fault+0x3a2/0x8a0
 ? __lock_release+0x5e/0x160
 ? do_user_addr_fault+0x3a2/0x8a0
 ? lock_release+0x72/0x140
 ? do_user_addr_fault+0x3a7/0x8a0
 __x64_sys_sendto+0x29/0x30
 do_syscall_64+0x78/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 4d4ac2ecec ("virtio_net: Add a lock for per queue RX coalesce")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240528134116.117426-3-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:14:06 -07:00
Heng Qi
9e0945b190 virtio_net: fix possible dim status unrecoverable
When the dim worker is scheduled, if it no longer needs to issue
commands, dim may not be able to return to the working state later.

For example, the following single queue scenario:
  1. The dim worker of rxq0 is scheduled, and the dim status is
     changed to DIM_APPLY_NEW_PROFILE;
  2. dim is disabled or parameters have not been modified;
  3. virtnet_rx_dim_work exits directly;

Then, even if net_dim is invoked again, it cannot work because the
state is not restored to DIM_START_MEASURE.

Fixes: 6208799553 ("virtio-net: support rx netdim")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240528134116.117426-2-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:14:06 -07:00
Vadim Fedorenko
89e281ebff ethtool: init tsinfo stats if requested
Statistic values should be set to ETHTOOL_STAT_NOT_SET even if the
device doesn't support statistics. Otherwise zeros will be returned as
if they are proper values:

host# ethtool -I -T lo
Time stamping parameters for lo:
Capabilities:
	software-transmit
	software-receive
	software-system-clock
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes: none
Hardware Receive Filter Modes: none
Statistics:
  tx_pkts: 0
  tx_lost: 0
  tx_err: 0

Fixes: 0e9c127729 ("ethtool: add interface to read Tx hardware timestamping statistics")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20240530040814.1014446-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:11:09 -07:00
Peter Geis
7679935b8b MAINTAINERS: remove Peter Geis
The Motorcomm PHY driver is now maintained by the OEM. The driver has
expanded far beyond my original purpose, and I do not have the hardware
to test against the new portions of it. Therefore I am removing myself as
a maintainer of the driver.

Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Link: https://lore.kernel.org/r/20240529185635.538072-1-pgwipeout@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:10:35 -07:00
Heng Qi
30636258a7 virtio_net: fix missing lock protection on control_buf access
Refactored the handling of control_buf to be within the cvq_lock
critical section, mitigating race conditions between reading device
responses and new command submissions.

Fixes: 6f45ab3e04 ("virtio_net: Add a lock for the command VQ.")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240530034143.19579-1-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01 15:09:56 -07:00
Linus Torvalds
89be4025b0 Merge tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Two small smb3 fixes:

   - Fix socket creation with sfu mount option (spotted by test generic/423)

   - Minor cleanup: fix missing description in two files"

* tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix creating sockets when using sfu mount options
  fs: smb: common: add missing MODULE_DESCRIPTION() macros
2024-06-01 14:35:57 -07:00
Jens Axboe
5fc16fa5f1 io_uring: check for non-NULL file pointer in io_file_can_poll()
In earlier kernels, it was possible to trigger a NULL pointer
dereference off the forced async preparation path, if no file had
been assigned. The trace leading to that looks as follows:

BUG: kernel NULL pointer dereference, address: 00000000000000b0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 67 PID: 1633 Comm: buf-ring-invali Not tainted 6.8.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022
RIP: 0010:io_buffer_select+0xc3/0x210
Code: 00 00 48 39 d1 0f 82 ae 00 00 00 48 81 4b 48 00 00 01 00 48 89 73 70 0f b7 50 0c 66 89 53 42 85 ed 0f 85 d2 00 00 00 48 8b 13 <48> 8b 92 b0 00 00 00 48 83 7a 40 00 0f 84 21 01 00 00 4c 8b 20 5b
RSP: 0018:ffffb7bec38c7d88 EFLAGS: 00010246
RAX: ffff97af2be61000 RBX: ffff97af234f1700 RCX: 0000000000000040
RDX: 0000000000000000 RSI: ffff97aecfb04820 RDI: ffff97af234f1700
RBP: 0000000000000000 R08: 0000000000200030 R09: 0000000000000020
R10: ffffb7bec38c7dc8 R11: 000000000000c000 R12: ffffb7bec38c7db8
R13: ffff97aecfb05800 R14: ffff97aecfb05800 R15: ffff97af2be5e000
FS:  00007f852f74b740(0000) GS:ffff97b1eeec0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b0 CR3: 000000016deab005 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 ? __die+0x1f/0x60
 ? page_fault_oops+0x14d/0x420
 ? do_user_addr_fault+0x61/0x6a0
 ? exc_page_fault+0x6c/0x150
 ? asm_exc_page_fault+0x22/0x30
 ? io_buffer_select+0xc3/0x210
 __io_import_iovec+0xb5/0x120
 io_readv_prep_async+0x36/0x70
 io_queue_sqe_fallback+0x20/0x260
 io_submit_sqes+0x314/0x630
 __do_sys_io_uring_enter+0x339/0xbc0
 ? __do_sys_io_uring_register+0x11b/0xc50
 ? vm_mmap_pgoff+0xce/0x160
 do_syscall_64+0x5f/0x180
 entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0x55e0a110a67e
Code: ba cc 00 00 00 45 31 c0 44 0f b6 92 d0 00 00 00 31 d2 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> 90 89 30 eb a9 0f 1f 40 00 48 8b 42 20 8b 00 a8 06 75 af 85 f6

because the request is marked forced ASYNC and has a bad file fd, and
hence takes the forced async prep path.

Current kernels with the request async prep cleaned up can no longer hit
this issue, but for ease of backporting, let's add this safety check in
here too as it really doesn't hurt. For both cases, this will inevitably
end with a CQE posted with -EBADF.

Cc: stable@vger.kernel.org
Fixes: a76c0b31ee ("io_uring: commit non-pollable provided mapped buffers upfront")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-01 12:25:35 -06:00
Linus Torvalds
ec9eeb89e6 Merge tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Fix a Kconfig bug regarding comparisons to 'm' or 'n'

 - Replace missed $(srctree)/$(src)

 - Fix unneeded kallsyms step 3

 - Remove incorrect "compatible" properties from image nodes in
   image.fit

 - Improve gen_kheaders.sh

 - Fix 'make dt_binding_check'

 - Clean up unnecessary code

* tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  dt-bindings: kbuild: Fix dt_binding_check on unconfigured build
  kheaders: use `command -v` to test for existence of `cpio`
  kheaders: explicitly define file modes for archived headers
  scripts/make_fit: Drop fdt image entry compatible string
  kbuild: remove a stale comment about cleaning in link-vmlinux.sh
  kbuild: fix short log for AS in link-vmlinux.sh
  kbuild: change scripts/mksysmap into sed script
  kbuild: avoid unneeded kallsyms step 3
  kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)
  kconfig: remove redundant check in expr_join_or()
  kconfig: fix comparison to constant symbols, 'm', 'n'
  kconfig: remove unused expr_is_no()
2024-06-01 09:33:55 -07:00
Linus Torvalds
bbeb1219ee Merge tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:

 - Fix a livelock by dropping an xfarray sortinfo folio when an error
   is encountered

 - During extended attribute operations, Initialize transaction
   reservation computation based on attribute operation code

 - Relax symbolic link's ondisk verification code to allow symbolic
   links with short remote targets

 - Prevent soft lockups when unmapping file ranges and also during
   remapping blocks during a reflink operation

 - Fix compilation warnings when XFS is built with W=1 option

* tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Add cond_resched to block unmap range and reflink remap path
  xfs: don't open-code u64_to_user_ptr
  xfs: allow symlinks with short remote targets
  xfs: fix xfs_init_attr_trans not handling explicit operation codes
  xfs: drop xfarray sortinfo folio on error
  xfs: Stop using __maybe_unused in xfs_alloc.c
  xfs: Clear W=1 warning in xfs_iwalk_run_callbacks()
2024-06-01 08:59:04 -07:00
Linus Torvalds
f26ee67a0f Merge tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
 "Here is a single revert for a much-reported regression in 6.10-rc1
  when it comes to a few older architectures.

  Turns out that the VT ioctls don't work the same across all cpu types
  because of some old compatibility requrements for stuff like alpha and
  powerpc. So revert the change that attempted to have them use the
  _IO() macros and go back to the known-working values instead.

  This has NOT been in linux-next but has had many reports that it fixes
  the issue with 6.10-rc1"

* tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "VT: Use macros to define ioctls"
2024-06-01 08:53:39 -07:00
Linus Torvalds
d9aab0b1c9 Merge tag 'landlock-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
 "This fixes a wrong path walk triggered by syzkaller"

* tag 'landlock-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add layout1.refer_mount_root
  landlock: Fix d_parent walk
2024-06-01 08:28:24 -07:00
Bitterblue Smith
819bda58e7 wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
Since commit 0a44dfc070 ("wifi: mac80211: simplify non-chanctx
drivers") ieee80211_hw_config() is no longer called with changed = ~0.
rtlwifi relied on ~0 in order to ignore the default retry limits of
4/7, preferring 48/48 in station mode and 7/7 in AP/IBSS.

RTL8192DU has a lot of packet loss with the default limits from
mac80211. Fix it by ignoring IEEE80211_CONF_CHANGE_RETRY_LIMITS
completely, because it's the simplest solution.

Link: https://lore.kernel.org/linux-wireless/cedd13d7691f4692b2a2fa5a24d44a22@realtek.com/
Cc: stable@vger.kernel.org # 6.9.x
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/1fabb8e4-adf3-47ae-8462-8aea963bc2a5@gmail.com
2024-06-01 13:15:26 +03:00
Johannes Berg
40cecacabc wifi: mt76: mt7615: add missing chanctx ops
Here's another one I missed during the initial conversion,
fix that.

Cc: stable@vger.kernel.org
Reported-by: Rene Petersen <renepetersen@posteo.de>
Fixes: 0a44dfc070 ("wifi: mac80211: simplify non-chanctx drivers")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218895
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240528142308.3f7db1821e68.I531135d7ad76331a50244d6d5288e14aa9668390@changeid
2024-06-01 13:00:59 +03:00
Alexis Lothoré
596c195680 wifi: wilc1000: document SRCU usage instead of SRCU
Commit f236464f1d ("wifi: wilc1000: convert list management to RCU")
attempted to convert SRCU to RCU usage, assuming it was not really needed.
The runtime issues that arose after merging it showed that there are code
paths involving sleeping functions, and removing those would need some
heavier driver rework.

Add some documentation about SRCU need to make sure that any future
developer do not miss some use cases if tempted to convert back again to
RCU.

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240528-wilc_revert_srcu_to_rcu-v1-3-bce096e0798c@bootlin.com
2024-06-01 12:59:30 +03:00
Alexis Lothoré
3596717a6f Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section"
This reverts commit 35aee01ff4

Commit 35aee01ff4 ("wifi: wilc1000: set atomic flag on kmemdup in srcu
critical section") was preparatory to the SRCU to RCU conversion done by
commit f236464f1d ("wifi: wilc1000: convert list management to RCU").
This conversion brought issues and so has been reverted, so the atomic flag
is not needed anymore.

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240528-wilc_revert_srcu_to_rcu-v1-2-bce096e0798c@bootlin.com
2024-06-01 12:59:30 +03:00
Alexis Lothoré
ebfb5e8fc8 Revert "wifi: wilc1000: convert list management to RCU"
This reverts commit f236464f1d

Commit f236464f1d ("wifi: wilc1000: convert list management to RCU")
replaced SRCU with RCU, aiming to simplify RCU usage in the driver. No
documentation or commit history hinted about why SRCU has been preferred
in original design, so it has been assumed to be safe to do this
conversion.
Unfortunately, some static analyzers raised warnings, confirmed by runtime
checker, not long after the merge. At least three different issues arose
when switching to RCU:
- wilc_wlan_txq_filter_dup_tcp_ack is executed in a RCU read critical
  section yet calls wait_for_completion_timeout
- wilc_wfi_init_mon_interface calls kmalloc and register_netdevice while
  manipulating a vif retrieved from vif list
- set_channel sends command to chip (and so, also waits for a completion)
  while holding a vif retrieved from vif list (so, in RCU read critical
  section)

Some of those issues are not trivial to fix and would need bigger driver
rework. Fix those issues by reverting the SRCU to RCU conversion commit

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-wireless/3b46ec7c-baee-49fd-b760-3bc12fb12eaf@moroto.mountain/
Fixes: f236464f1d ("wifi: wilc1000: convert list management to RCU")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240528-wilc_revert_srcu_to_rcu-v1-1-bce096e0798c@bootlin.com
2024-06-01 12:59:29 +03:00
Kalle Valo
10bc8558b5 Merge tag 'ath-current-20240531' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath
Merge ath-current from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git

ath.git fixes for 6.10. Two fixes for user reported regressions in
ath11k. One dependency fix and one error path fix.
2024-06-01 12:57:28 +03:00
Greg Kroah-Hartman
7bc4244c88 Revert "VT: Use macros to define ioctls"
This reverts commit 8c467f3300.

Turns out this breaks many architectures as the vt ioctls do not all
match up everywhere due to historical reasons, so the original commit is
invalid for many values.

Reported-by: Nick Bowler <nbowler@draconx.ca>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/r/ad4e561c-1d49-4f25-882c-7a36c6b1b5c0@draconx.ca
Link: https://lore.kernel.org/r/0da9785e-ba44-4718-9d08-4e96c1ba7ab2@kernel.org
Link: https://lore.kernel.org/all/34d848f4-670b-4493-bf21-130ef862521b@xenosoft.de/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-01 07:28:21 +02:00
Linus Torvalds
cc8ed4d0a8 Merge tag 'drm-fixes-2024-06-01' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This is the weekly fixes. Lots of small fixes across the board, one
  BUG_ON fix in shmem seems most important, otherwise amdgpu, i915, xe
  mostly with small fixes to all the other drivers.

  shmem:
   - fix BUG_ON in COW handling
   - warn when trying to pin imported objects

  buddy:
   - fix page size handling

  dma-buf:
   - sw-sync: Don't interfere with IRQ handling
   - fix kthreads-handling error path

  i915:
   - fix a race in audio component by registering it later
   - make DPT object unshrinkable to avoid shrinking when framebuffer
     has not shrunk
   - fix CCS id calculation to fix a perf regression
   - fix selftest caching mode
   - fix FIELD_PREP compiler warnings
   - fix indefinite wait for GT wakeref release
   - revert overeager multi-gt pm reference removal

  xe:
   - pcode polling timeout change
   - fix for deadlocks for faulting VMs
   - error-path lock imbalance fix

  amdgpu:
   - RAS fix
   - fix colorspace property for MST connectors
   - fix for PCIe DPM
   - silence UBSAN warning
   - GPUVM robustness fix
   - partition fix
   - drop deprecated I2C_CLASS_SPD

  amdkfd:
   - revert unused changes for certain 11.0.3 devices
   - simplify APU VRAM handling

  lima:
   - fix dma_resv-related deadlock in object pin

  msm:
   - remove build-time dependency on Python 3.9

  nouveau:
   - nvif: Fix possible integer overflow

  panel:
   - lg-sw43408: Select DP helpers; Declare backlight ops as static
   - sitronix-st7789v: Various fixes for jt240mhqs_hwt_ek_e3 panel

  panfrost:
   - fix dma_resv-related deadlock in object pin"

* tag 'drm-fixes-2024-06-01' of https://gitlab.freedesktop.org/drm/kernel: (35 commits)
  drm/msm: remove python 3.9 dependency for compiling msm
  drm/panel: sitronix-st7789v: fix display size for jt240mhqs_hwt_ek_e3 panel
  drm/panel: sitronix-st7789v: tweak timing for jt240mhqs_hwt_ek_e3 panel
  drm/panel: sitronix-st7789v: fix timing for jt240mhqs_hwt_ek_e3 panel
  drm/amd/pm: remove deprecated I2C_CLASS_SPD support from newly added SMU_14_0_2
  drm/amdgpu: Make CPX mode auto default in NPS4
  drm/amdkfd: simplify APU VRAM handling
  Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
  drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
  drm/amdgpu: silence UBSAN warning
  drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
  drm/i915: Fix audio component initialization
  drm/i915/dpt: Make DPT object unshrinkable
  drm/i915/gt: Fix CCS id's calculation for CCS mode setting
  drm/panel/lg-sw43408: mark sw43408_backlight_ops as static
  drm/i915/selftests: Set always_coherent to false when reading from CPU
  drm/panel/lg-sw43408: select CONFIG_DRM_DISPLAY_DP_HELPER
  drm/i915/guc: avoid FIELD_PREP warning
  drm/i915/gt: Disarm breadcrumbs if engines are already idle
  Revert "drm/i915: Remove extra multi-gt pm-references"
  ...
2024-05-31 16:26:48 -07:00
Linus Torvalds
1b907b83ae Merge tag 'hwmon-for-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:

 - sttcs: Fix property spelling

 - intel-m10-bmc-hwmon: Fix multiplier for N6000 board power sensor

 - ltc2992: Fix memory leak

 - dell-smm: Add Dell G15 5511 to fan control whitelist

* tag 'hwmon-for-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (shtc1) Fix property misspelling
  hwmon: (intel-m10-bmc-hwmon) Fix multiplier for N6000 board power sensor
  hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt()
  hwmon: (dell-smm) Add Dell G15 5511 to fan control whitelist
2024-05-31 16:24:11 -07:00
Linus Torvalds
b7087cb35a Merge tag 'mailbox-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox fix from Jassi Brar:

 - zynqmp-ipi: fix linker error on some configurations

* tag 'mailbox-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
  mailbox: zynqmp-ipi: drop irq_to_desc() call
2024-05-31 16:21:00 -07:00
Linus Torvalds
d5931dd0de Merge tag 'spi-fix-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A series of fixes that came in since the merge window, the main thing
  being the fixes Andy did for DMA sync where we were calling into the
  DMA API in suprising ways and causing issues as a result, the main
  thing being confusing the IOMMU code.

  We've also got some fairly important fixes for the stm32 driver, it
  supports a wide range of hardware and some optimisations that were
  done recently have broken on some systems, and a fix to prevent
  glitched signals on the bus in the cadence driver"

* tag 'spi-fix-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: stm32: Don't warn about spurious interrupts
  spi: Assign dummy scatterlist to unidirectional transfers
  spi: cadence: Ensure data lines set to low during dummy-cycle period
  spi: stm32: Revert change that enabled controller before asserting CS
  spi: Check if transfer is mapped before calling DMA sync APIs
  spi: Don't mark message DMA mapped when no transfer in it is
2024-05-31 16:17:40 -07:00
Linus Torvalds
28add42dc2 Merge tag 'regulator-fix-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
 "One fix that came in since -rc1, fixing misuse of a local variable in
  the DT parsing code in the RTQ2208 driver"

* tag 'regulator-fix-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: rtq2208: Fix invalid memory access when devm_of_regulator_put_matches is called
2024-05-31 16:12:54 -07:00
Linus Torvalds
b7c05622da Merge tag 'regmap-fix-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
 "The I2C bus was not taking account of the register and any padding
  bytes when handling maximum write sizes supported by an I2C adaptor,
  this patch from Jim Wylder fixes that"

* tag 'regmap-fix-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap-i2c: Subtract reg size from max_write
2024-05-31 16:09:27 -07:00
Linus Torvalds
0f9a75179d Merge tag 'block-6.10-20240530' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe fixes via Keith:
      - Removing unused fields (Kanchan)
      - Large folio offsets support (Kundan)
      - Multipath NUMA node initialiazation fix (Nilay)
      - Multipath IO stats accounting fixes (Keith)
      - Circular lockdep fix (Keith)
      - Target race condition fix (Sagi)
      - Target memory leak fix (Sagi)

 - bcache fixes

 - null_blk fixes (Damien)

 - Fix regression in io.max due to throttle low removal (Waiman)

 - DM limit table fixes (Christoph)

 - SCSI and block limit fixes (Christoph)

 - zone fixes (Damien)

 - Misc fixes (Christoph, Hannes, hexue)

* tag 'block-6.10-20240530' of git://git.kernel.dk/linux: (25 commits)
  blk-throttle: Fix incorrect display of io.max
  block: Fix zone write plugging handling of devices with a runt zone
  block: Fix validation of zoned device with a runt zone
  null_blk: Do not allow runt zone with zone capacity smaller then zone size
  nvmet: fix a possible leak when destroy a ctrl during qp establishment
  nvme: use srcu for iterating namespace list
  bcache: code cleanup in __bch_bucket_alloc_set()
  bcache: call force_wake_up_gc() if necessary in check_should_bypass()
  bcache: allow allocator to invalidate bucket in gc
  block: check for max_hw_sectors underflow
  block: stack max_user_sectors
  sd: also set max_user_sectors when setting max_sectors
  null_blk: Print correct max open zones limit in null_init_zoned_dev()
  block: delete redundant function declaration
  null_blk: Fix return value of nullb_device_power_store()
  dm: make dm_set_zones_restrictions work on the queue limits
  dm: remove dm_check_zoned
  dm: move setting zoned_enabled to dm_table_set_restrictions
  block: remove blk_queue_max_integrity_segments
  nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset
  ...
2024-05-31 15:31:27 -07:00
Linus Torvalds
6d541d6672 Merge tag 'io_uring-6.10-20240530' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
 "A couple of minor fixes for issues introduced in the 6.10 merge window:

   - Ensure that all read/write ops have an appropriate cleanup handler
     set (Breno)

   - Regression for applications still doing multiple mmaps even if
     FEAT_SINGLE_MMAP is set (me)

   - Move kmsg inquiry setting above any potential failure point,
     avoiding a spurious NONEMPTY flag setting on early error (me)"

* tag 'io_uring-6.10-20240530' of git://git.kernel.dk/linux:
  io_uring/net: assign kmsg inq/flags before buffer selection
  io_uring/rw: Free iovec before cleaning async data
  io_uring: don't attempt to mmap larger than what the user asks for
2024-05-31 15:22:58 -07:00
Andrii Nakryiko
7d0b3953f6 libbpf: don't close(-1) in multi-uprobe feature detector
Guard close(link_fd) with extra link_fd >= 0 check to prevent close(-1).

Detected by Coverity static analysis.

Fixes: 04d939a2ab ("libbpf: detect broken PID filtering logic for multi-uprobe")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240529231212.768828-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-31 14:56:51 -07:00
Jiri Olsa
aeb8fe0283 bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list
The bpf_session_cookie is unavailable for !CONFIG_FPROBE as reported
by Sebastian [1].

To fix that we remove CONFIG_FPROBE ifdef for session kfuncs, which
is fine, because there's filter for session programs.

Then based on bpf_trace.o dependency:
  obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o

we add bpf_session_cookie BTF_ID in special_kfunc_set list dependency
on CONFIG_BPF_EVENTS.

[1] https://lore.kernel.org/bpf/20240531071557.MvfIqkn7@linutronix.de/T/#m71c6d5ec71db2967288cb79acedc15cc5dbfeec5
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 5c919acef8 ("bpf: Add support for kprobe session cookie")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240531194500.2967187-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-31 14:54:48 -07:00
Andrii Nakryiko
62da3acd28 selftests/bpf: fix inet_csk_accept prototype in test_sk_storage_tracing.c
Recent kernel change ([0]) changed inet_csk_accept() prototype. Adapt
progs/test_sk_storage_tracing.c to take that into account.

  [0] 92ef0fd55a ("net: change proto and proto_ops accept type")

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240528223218.3445297-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-31 14:54:25 -07:00
Alex Williamson
aac6db75a9 vfio/pci: Use unmap_mapping_range()
With the vfio device fd tied to the address space of the pseudo fs
inode, we can use the mm to track all vmas that might be mmap'ing
device BARs, which removes our vma_list and all the complicated lock
ordering necessary to manually zap each related vma.

Note that we can no longer store the pfn in vm_pgoff if we want to use
unmap_mapping_range() to zap a selective portion of the device fd
corresponding to BAR mappings.

This also converts our mmap fault handler to use vmf_insert_pfn()
because we no longer have a vma_list to avoid the concurrency problem
with io_remap_pfn_range().  The goal is to eventually use the vm_ops
huge_fault handler to avoid the additional faulting overhead, but
vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first.

Also, Jason notes that a race exists between unmap_mapping_range() and
the fops mmap callback if we were to call io_remap_pfn_range() to
populate the vma on mmap.  Specifically, mmap_region() does call_mmap()
before it does vma_link_file() which gives a window where the vma is
populated but invisible to unmap_mapping_range().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240530045236.1005864-3-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-05-31 15:15:51 -06:00
Alex Williamson
b7c5e64fec vfio: Create vfio_fs_type with inode per device
By linking all the device fds we provide to userspace to an
address space through a new pseudo fs, we can use tools like
unmap_mapping_range() to zap all vmas associated with a device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240530045236.1005864-2-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-05-31 15:15:51 -06:00
Weiwen Hu
b1a1fdd709 nvme: fix nvme_pr_* status code parsing
Fix the parsing if extra status bits (e.g. MORE) is present.

Fixes: 7fb42780d0 ("nvme: Convert NVMe errors to PR errors")
Signed-off-by: Weiwen Hu <huweiwen@linux.alibaba.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-31 13:50:59 -07:00
Kees Cook
99a6087dfd kunit/fortify: Remove __kmalloc_node() test
__kmalloc_node() is considered an "internal" function to the Slab, so
drop it from explicit testing.

Link: https://lore.kernel.org/r/20240531185703.work.588-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2024-05-31 13:47:41 -07:00
John Hubbard
4bf15b1c65 selftests/futex: don't pass a const char* to asprintf(3)
When building with clang, via:

    make LLVM=1 -C tools/testing/selftests

...clang issues this warning:

futex_requeue_pi.c:403:17: warning: passing 'const char **' to parameter
of type 'char **' discards qualifiers in nested pointer types
[-Wincompatible-pointer-types-discards-qualifiers]

This warning fires because test_name is passed into asprintf(3), which
then changes it.

Fix this by simply removing the const qualifier. This is a local
automatic variable in a very short function, so there is not much need
to use the compiler to enforce const-ness at this scope.

[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/

Fixes: f17d8a87ec ("selftests: fuxex: Report a unique test name per run of futex_requeue_pi")
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-31 14:37:10 -06:00
John Hubbard
32c75ad4a7 selftests/futex: don't redefine .PHONY targets (all, clean)
The .PHONY targets "all" and "clean"  are both already defined in the
file that is included in the very next line:

    ../lib.mk.

Remove this duplicate code.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-31 14:37:04 -06:00
Chunguang Xu
7dc3bfcb4c nvme-fabrics: use reserved tag for reg read/write command
In some scenarios, if too many commands are issued by nvme command in
the same time by user tasks, this may exhaust all tags of admin_q. If
a reset (nvme reset or IO timeout) occurs before these commands finish,
reconnect routine may fail to update nvme regs due to insufficient tags,
which will cause kernel hang forever. In order to workaround this issue,
maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
tags. This maybe safe for nvmf:

1. For the disable ctrl path,  we will not issue connect command
2. For the enable ctrl / fw activate path, since connect and reg_xx()
   are called serially.

So the reserved tags may still be enough while reg_xx() use reserved tags.

Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-31 13:26:15 -07:00
Linus Torvalds
b050496579 Merge tag 'dma-mapping-6.10-2024-05-31' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:

 - dma-mapping benchmark error handling fixes (Fedor Pchelkin)

 - correct a config symbol reference in the DMA API documentation (Lukas
   Bulwahn)

* tag 'dma-mapping-6.10-2024-05-31' of git://git.infradead.org/users/hch/dma-mapping:
  Documentation/core-api: correct reference to SWIOTLB_DYNAMIC
  dma-mapping: benchmark: handle NUMA_NO_NODE correctly
  dma-mapping: benchmark: fix node id validation
  dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails
  dma-mapping: benchmark: fix up kthread-related error handling
2024-05-31 12:14:55 -07:00
Linus Torvalds
7d88cc8ecc Merge tag 'sound-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Lots of small fixes:

   - A race fix for debugfs handling in ALSA core

   - A series of corrections for MIDI2 core format conversions

   - ASoC Intel fixes for 16 bit DMIC config

   - Updates for missing module parameters in ASoC code

   - HD-audio quirk, Cirrus codec fix, etc minor fixes"

* tag 'sound-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
  ALSA: seq: ump: Fix swapped song position pointer data
  ASoC: SOF: ipc4-topology: Adjust the params based on DAI formats
  ASoC: SOF: ipc4-topology: Improve readability of sof_ipc4_prepare_dai_copier()
  ASoC: SOF: ipc4-topology/pcm: Rename sof_ipc4_copier_is_single_format()
  ASoC: SOF: ipc4-topology: Print out the channel count in sof_ipc4_dbg_audio_format
  ASoC: SOF: ipc4-topology: Add support for NHLT with 16-bit only DMIC blob
  ALSA: seq: Fix yet another spot for system message conversion
  ALSA: ump: Set default protocol when not given explicitly
  ALSA: ump: Don't accept an invalid UMP protocol number
  ASoC: SOF: ipc4-topology: Fix input format query of process modules without base extension
  ASoC: Intel: sof-sdw: fix missing SPI_MASTER dependency
  ALSA: pcm: fix typo in comment
  ALSA: ump: Don't clear bank selection after sending a program change
  ALSA: seq: Fix incorrect UMP type for system messages
  ALSA/hda: intel-dsp-config: reduce log verbosity
  ALSA: seq: Don't clear bank selection at event -> UMP MIDI2 conversion
  ALSA: seq: Fix missing bank setup between MIDI1/MIDI2 UMP conversion
  ASoC: SOF: add missing MODULE_DESCRIPTION()
  ASoC: SOF: reorder MODULE_ definitions
  ASoC: SOF: AMD: group all module related information
  ...
2024-05-31 12:11:44 -07:00
Linus Torvalds
87895a6402 Merge tag 'platform-drivers-x86-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:

 - a use-after-free bugfix

 - Kconfig fixes for randconfig builds

 - allow setting touchscreen_dmi quirks from the cmdline for debugging

 - touchscreen_dmi quirks for two new laptop/tablet models

* tag 'platform-drivers-x86-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro
  platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet
  platform/x86: touchscreen_dmi: Add support for setting touchscreen properties from cmdline
  platform/x86: thinkpad_acpi: Select INPUT_SPARSEKMAP in Kconfig
  platform/x86: x86-android-tablets: Add "select LEDS_CLASS"
  platform/x86: ISST: fix use-after-free in tpmi_sst_dev_remove()
2024-05-31 12:03:28 -07:00
Linus Torvalds
c6cc9799b4 Merge tag 'riscv-for-linus-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to avoid pt_regs aliasing with idle thread stacks on secondary
   harts.

 - HAVE_ARCH_HUGE_VMAP is enabled on XIP kernels, which fixes boot
   issues on XIP systems with huge pages.

 - An update to the uABI documentation clarifying that only scalar
   misaligned accesses were grandfathered in as supported, as the vector
   extension did not exist at the time the uABI was frozen.

 - A fix for the recently-added byte/half atomics to avoid losing the
   fully ordered decorations.

* tag 'riscv-for-linus-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix fully ordered LR/SC xchg[8|16]() implementations
  Documentation: RISC-V: uabi: Only scalar misaligned loads are supported
  riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel
  riscv: prevent pt_regs corruption for secondary idle threads
2024-05-31 11:52:06 -07:00
Linus Torvalds
ff9bce3d06 Merge tag 'bcachefs-2024-05-30' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
 "Assorted odds and ends...

   - two downgrade fixes

   - a couple snapshot deletion and repair fixes, thanks to noradtux for
     finding these and providing the image to debug them

   - a couple assert fixes

   - convert to folio helper, from Matthew

   - some improved error messages

   - bit of code reorganization (just moving things around); doing this
     while things are quiet so I'm not rebasing fixes past reorgs

   - don't return -EROFS on inconsistency error in recovery, this
     confuses util-linux and has it retry the mount

   - fix failure to return error on misaligned dio write; reported as an
     issue with coreutils shred"

* tag 'bcachefs-2024-05-30' of https://evilpiepirate.org/git/bcachefs: (21 commits)
  bcachefs: Fix failure to return error on misaligned dio write
  bcachefs: Don't return -EROFS from mount on inconsistency error
  bcachefs: Fix uninitialized var warning
  bcachefs: Split out sb-errors_format.h
  bcachefs: Split out journal_seq_blacklist_format.h
  bcachefs: Split out replicas_format.h
  bcachefs: Split out disk_groups_format.h
  bcachefs: split out sb-downgrade_format.h
  bcachefs: split out sb-members_format.h
  bcachefs: Better fsck error message for key version
  bcachefs: btree_gc can now handle unknown btrees
  bcachefs: add missing MODULE_DESCRIPTION()
  bcachefs: Fix setting of downgrade recovery passes/errors
  bcachefs: Run check_key_has_snapshot in snapshot_delete_keys()
  bcachefs: Refactor delete_dead_snapshots()
  bcachefs: Fix locking assert
  bcachefs: Fix lookup_first_inode() when inode_generations are present
  bcachefs: Plumb bkey into __btree_err()
  bcachefs: Use copy_folio_from_iter_atomic()
  bcachefs: Fix sb-downgrade validation
  ...
2024-05-31 11:45:41 -07:00
Thomas Gleixner
0c2f6d0461 x86/topology/intel: Unlock CPUID before evaluating anything
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation including topology
enumeration does not work correctly as the evaluation code does not try
to analyze any leaf greater than two.

This went unnoticed before because the original topology code just
repeated evaluation several times and managed to overwrite the initial
limited information with the correct one later. The new evaluation code
does it once and therefore ends up with the limited and wrong
information.

Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two instead of
rereading stuff after unlock.

Fixes: 22d63660c3 ("x86/cpu: Use common topology code for Intel")
Reported-by: Peter Schneider <pschneider1968@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/fd3f73dc-a86f-4bcf-9c60-43556a21eb42@googlemail.com
2024-05-31 20:25:56 +02:00
Arnd Bergmann
d551ce15d0 mailbox: zynqmp-ipi: drop irq_to_desc() call
irq_to_desc() is not exported to loadable modules, so this driver now
fails to link in some configurations:

ERROR: modpost: "irq_to_desc" [drivers/mailbox/zynqmp-ipi-mailbox.ko] undefined!

I can't see a purpose for this call, since the return value is unused
and probably left over from some code refactoring.

Address the link failure by just removing the line.

Fixes: 6ffb163534 ("mailbox: zynqmp: handle SGI for shared IPI")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Tanmay Shah <tanmay.shah@amd.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-05-31 12:39:15 -05:00
Kees Cook
4e173c825b mailmap: update entry for Kees Cook
I'm tired of gmail breaking DKIM. Switch everything over to my
@kernel.org alias instead.

Signed-off-by: Kees Cook <kees@kernel.org>
2024-05-31 08:58:36 -07:00
Kees Cook
adb77bba9c scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings
The prior strscpy() replacement of strncpy() here expected the
manufacture_reply strings to be NUL-terminated, but it is possible
they are not, as the code pattern here shows, e.g., edev->vendor_id
being exactly 1 character larger than manufacture_reply->vendor_id,
and the replaced strncpy() was copying only up to the size of the
source character array. Replace this with memtostr(), which is the
unambiguous way to convert a maybe not-NUL-terminated character array
into a NUL-terminated string.

Fixes: b7e9712a02 ("scsi: mpt3sas: Replace deprecated strncpy() with strscpy()")
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240410023155.2100422-3-keescook@chromium.org
Signed-off-by: Kees Cook <kees@kernel.org>
2024-05-31 08:58:20 -07:00
Steve French
518549c120 cifs: fix creating sockets when using sfu mount options
When running fstest generic/423 with sfu mount option, it
was being skipped due to inability to create sockets:

  generic/423  [not run] cifs does not support mknod/mkfifo

which can also be easily reproduced with their af_unix tool:

  ./src/af_unix /mnt1/socket-two bind: Operation not permitted

Fix sfu mount option to allow creating and reporting sockets.

Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-31 10:55:15 -05:00
Mickaël Salaün
0055f53aac selftests/landlock: Add layout1.refer_mount_root
Add tests to check error codes when linking or renaming a mount root
directory.  This previously triggered a kernel warning, but it is fixed
with the previous commit.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20240516181935.1645983-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-31 16:41:54 +02:00
Mickaël Salaün
88da52ccd6 landlock: Fix d_parent walk
The WARN_ON_ONCE() in collect_domain_accesses() can be triggered when
trying to link a root mount point.  This cannot work in practice because
this directory is mounted, but the VFS check is done after the call to
security_path_link().

Do not use source directory's d_parent when the source directory is the
mount point.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: stable@vger.kernel.org
Reported-by: syzbot+bf4903dc7e12b18ebc87@syzkaller.appspotmail.com
Fixes: b91c3e4ea7 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Closes: https://lore.kernel.org/r/000000000000553d3f0618198200@google.com
Link: https://lore.kernel.org/r/20240516181935.1645983-2-mic@digikod.net
[mic: Fix commit message]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-31 16:41:52 +02:00
Masami Hiramatsu (Google)
0f42bdf59b selftests/tracing: Fix event filter test to retry up to 10 times
Commit eb50d0f250 ("selftests/ftrace: Choose target function for filter
test from samples") choose the target function from samples, but sometimes
this test failes randomly because the target function does not hit at the
next time. So retry getting samples up to 10 times.

Fixes: eb50d0f250 ("selftests/ftrace: Choose target function for filter test from samples")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-31 08:35:43 -06:00
Niklas Cassel
3cb648c4dd ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
Commit 7627a0edef ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
 register.

Partial and Slumber LPM states can either be initiated by HIPM or DIPM.

For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.

For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.

The problem is that Apacer AS340 drives do not handle low power modes
correctly. The problem was most likely not seen before because no one
had used this drive with a AHCI controller with LPM enabled.

Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).

Fixes: 7627a0edef ("ata: ahci: Drop low power policy board type")
Cc: stable@vger.kernel.org
Reported-by: Tim Teichmann <teichmanntim@outlook.de>
Closes: https://lore.kernel.org/linux-ide/87bk4pbve8.ffs@tglx/
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-05-31 15:14:06 +02:00
Niklas Cassel
4738803693 ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
Commit 7627a0edef ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
 register.

Partial and Slumber LPM states can either be initiated by HIPM or DIPM.

For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.

For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.

The problem is that AMD Radeon S3 SSD drives do not handle low power modes
correctly. The problem was most likely not seen before because no one
had used this drive with a AHCI controller with LPM enabled.

Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support both
HIPM and DIPM).

Fixes: 7627a0edef ("ata: ahci: Drop low power policy board type")
Cc: stable@vger.kernel.org
Reported-by: Doru Iorgulescu <doru.iorgulescu1@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-05-31 15:11:51 +02:00
Niklas Cassel
86aaa7e9d6 ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
Commit 7627a0edef ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
 register.

Partial and Slumber LPM states can either be initiated by HIPM or DIPM.

For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.

For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.

The problem is that Crucial CT240BX500SSD1 drives do not handle low power
modes correctly. The problem was most likely not seen before because no
one had used this drive with a AHCI controller with LPM enabled.

Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).

Fixes: 7627a0edef ("ata: ahci: Drop low power policy board type")
Cc: stable@vger.kernel.org
Reported-by: Aarrayy <lp610mh@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-05-31 15:10:41 +02:00
Dr. David Alan Gilbert
539d33b578 drm/komeda: remove unused struct 'gamma_curve_segment'
'gamma_curve_segment' looks like it has never been used.
Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516133724.251750-1-linux@treblig.org
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
2024-05-31 12:09:51 +01:00
Aapo Vienamo
985cfe501b thunderbolt: debugfs: Fix margin debugfs node creation condition
The margin debugfs node controls the "Enable Margin Test" field of the
lane margining operations. This field selects between either low or high
voltage margin values for voltage margin test or left or right timing
margin values for timing margin test.

According to the USB4 specification, whether or not the "Enable Margin
Test" control applies, depends on the values of the "Independent
High/Low Voltage Margin" or "Independent Left/Right Timing Margin"
capability fields for voltage and timing margin tests respectively. The
pre-existing condition enabled the debugfs node also in the case where
both low/high or left/right margins are returned, which is incorrect.
This change only enables the debugfs node in question, if the specific
required capability values are met.

Signed-off-by: Aapo Vienamo <aapo.vienamo@linux.intel.com>
Fixes: d0f1e0c2a6 ("thunderbolt: Add support for receiver lane margining")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2024-05-31 13:13:39 +03:00
Phil Auld
d40605a682 sched/x86: Export 'percpu arch_freq_scale'
Commit:

  7bc263840b ("sched/topology: Consolidate and clean up access to a CPU's max compute capacity")

removed rq->cpu_capacity_orig in favor of using arch_scale_freq_capacity()
calls. Export the underlying percpu symbol on x86 so that external trace
point helper modules can be made to work again.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240530181548.2039216-1-pauld@redhat.com
2024-05-31 11:48:42 +02:00
Jeff Johnson
dc8e5dfb52 perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
Fix the 'make W=1 C=1' warnings:

  WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-uncore.o
  WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-cstate.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-intel-v1-1-8252194ed20a@quicinc.com
2024-05-31 11:41:15 +02:00
Jeff Johnson
0a44078f2b perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
Fix the warning from 'make C=1 W=1':

  WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/rapl.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-v1-1-e45ffa8af99f@quicinc.com
2024-05-31 11:41:04 +02:00
Jan Beulich
e0eec24e2e memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
On an (old) x86 system with SRAT just covering space above 4Gb:

    ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xfffffffff] hotplug

the commit referenced below leads to this NUMA configuration no longer
being refused by a CONFIG_NUMA=y kernel (previously

    NUMA: nodes only cover 6144MB of your 8185MB e820 RAM. Not used.
    No NUMA configuration found
    Faking a node at [mem 0x0000000000000000-0x000000027fffffff]

was seen in the log directly after the message quoted above), because of
memblock_validate_numa_coverage() checking for NUMA_NO_NODE (only). This
in turn led to memblock_alloc_range_nid()'s warning about MAX_NUMNODES
triggering, followed by a NULL deref in memmap_init() when trying to
access node 64's (NODE_SHIFT=6) node data.

To compensate said change, make memblock_set_node() warn on and adjust
a passed in value of MAX_NUMNODES, just like various other functions
already do.

Fixes: ff6c3d81f2 ("NUMA: optimize detection of memory with no node id assigned by firmware")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1c8a058c-5365-4f27-a9f1-3aeb7fb3e7b2@suse.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-05-31 12:36:28 +03:00
Takashi Iwai
310fa3ec28 ALSA: seq: ump: Fix swapped song position pointer data
At converting between the legacy event and UMP, the parameters for
MIDI Song Position Pointer are incorrectly stored.  It should have
been LSB -> MSB order while it stored in MSB -> LSB order.
This patch corrects the ordering.

Fixes: e9e02819a9 ("ALSA: seq: Automatic conversion of UMP events")
Link: https://lore.kernel.org/r/20240531075110.3250-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-31 09:51:44 +02:00
Quan Zhou
c66f3b40b1 RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function
In the function kvm_riscv_vcpu_set_reg_isa_ext, the original code
used incorrect reg_subtype labels KVM_REG_RISCV_SBI_MULTI_EN/DIS.
These have been corrected to KVM_REG_RISCV_ISA_MULTI_EN/DIS respectively.
Although they are numerically equivalent, the actual processing
will not result in errors, but it may lead to ambiguous code semantics.

Fixes: 613029442a ("RISC-V: KVM: Extend ONE_REG to enable/disable multiple ISA extensions")
Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/ff1c6771a67d660db94372ac9aaa40f51e5e0090.1716429371.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
2024-05-31 10:40:39 +05:30
Yong-Xuan Wang
2d707b4e37 RISC-V: KVM: No need to use mask when hart-index-bit is 0
When the maximum hart number within groups is 1, hart-index-bit is set to
0. Consequently, there is no need to restore the hart ID from IMSIC
addresses and hart-index-bit settings. Currently, QEMU and kvmtool do not
pass correct hart-index-bit values when the maximum hart number is a
power of 2, thereby avoiding this issue. Corresponding patches for QEMU
and kvmtool will also be dispatched.

Fixes: 89d01306e3 ("RISC-V: KVM: Implement device interface for AIA irqchip")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240415064905.25184-1-yongxuan.wang@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2024-05-31 09:47:08 +05:30
Dave Airlie
a2ce3f7752 Merge tag 'drm-misc-fixes-2024-05-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:

dma-buf:
- sw-sync: Don't interfere with IRQ handling
- Fix kthreads-handling error path

gem-shmem:
- Warn when trying to pin imported objects

lima:
- Fix dma_resv-related deadlock in object pin

msm:
- Remove build-time dependency on Python 3.9

nouveau:
- nvif: Fix possible integer overflow

panel:
- lg-sw43408: Select DP helpers; Declare backlight ops as static
- sitronix-st7789v: Various fixes for jt240mhqs_hwt_ek_e3 panel

panfrost:
- Fix dma_resv-related deadlock in object pin

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530192307.GA14809@localhost.localdomain
2024-05-31 11:51:20 +10:00
Waiman Long
0a751df456 blk-throttle: Fix incorrect display of io.max
Commit bf20ab538c ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW")
attempts to revert the code change introduced by commit cd5ab1b0fc
("blk-throttle: add .low interface").  However, it leaves behind the
bps_conf[] and iops_conf[] fields in the throtl_grp structure which
aren't set anywhere in the new blk-throttle.c code but are still being
used by tg_prfill_limit() to display the limits in io.max. Now io.max
always displays the following values if a block queue is used:

	<m>:<n> rbps=0 wbps=0 riops=0 wiops=0

Fix this problem by removing bps_conf[] and iops_conf[] and use bps[]
and iops[] instead to complete the revert.

Fixes: bf20ab538c ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW")
Reported-by: Justin Forbes <jforbes@redhat.com>
Closes: https://github.com/containers/podman/issues/22701#issuecomment-2120627789
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240530134547.970075-1-longman@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 19:44:29 -06:00
Marc Zyngier
e7985f4360 of: property: Fix fw_devlink handling of interrupt-map
Commit d976c6f4b3 ("of: property: Add fw_devlink support for
interrupt-map property") tried to do what it says on the tin,
but failed on a couple of points:

- it confuses bytes and cells. Not a huge deal, except when it
  comes to pointer arithmetic

- it doesn't really handle anything but interrupt-maps that have
  their parent #address-cells set to 0

The combinations of the two leads to some serious fun on my M1
box, with plenty of WARN-ON() firing all over the shop, and
amusing values being generated for interrupt specifiers.

Having 2 versions of parsing code for "interrupt-map" was a bad
idea. Now that the common parsing parts have been refactored
into of_irq_parse_imap_parent(), rework the code here to use it
instead and fix the pointer arithmetic.

Note that the dependency will be a bit different than the original code
when the interrupt-map points to another interrupt-map. In this case,
the original code would resolve to the final interrupt controller. Now
the dependency is the parent interrupt-map (which itself should have a
dependency to the parent). It is possible that a node with an
interrupt-map has no driver.

Fixes: d976c6f4b3 ("of: property: Add fw_devlink support for interrupt-map property")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20240529-dt-interrupt-map-fix-v2-2-ef86dc5bcd2a@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-05-30 19:43:47 -05:00
Rob Herring (Arm)
935df1bd40 of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw()
Factor out the parsing of interrupt-map interrupt parent phandle and its
arg cells to a separate function, of_irq_parse_imap_parent(), so that it
can be used in other parsing scenarios (e.g. fw_devlink).

There was a refcount leak on non-matching entries when iterating thru
"interrupt-map" which is fixed.

Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20240529-dt-interrupt-map-fix-v2-1-ef86dc5bcd2a@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-05-30 19:43:19 -05:00
Chanwoo Lee
d53b681ce9 scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort()
An error unrelated to ufshcd_try_to_abort_task is being logged and can
cause confusion. Modify ufshcd_mcq_abort() to print the result of the abort
failure. For readability, return immediately instead of 'goto'.

Fixes: f1304d4420 ("scsi: ufs: mcq: Added ufshcd_mcq_abort()")
Signed-off-by: Chanwoo Lee <cw9316.lee@samsung.com>
Link: https://lore.kernel.org/r/20240524015904.1116005-1-cw9316.lee@samsung.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-30 20:40:48 -04:00
Dave Airlie
bb61cf46b6 Merge tag 'amd-drm-fixes-6.10-2024-05-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.10-2024-05-30:

amdgpu:
- RAS fix
- Fix colorspace property for MST connectors
- Fix for PCIe DPM
- Silence UBSAN warning
- GPUVM robustness fix
- Partition fix
- Drop deprecated I2C_CLASS_SPD

amdkfd:
- Revert unused changes for certain 11.0.3 devices
- Simplify APU VRAM handling

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530202316.2246826-1-alexander.deucher@amd.com
2024-05-31 08:38:15 +10:00
Dave Airlie
c301c3d2ac Merge tag 'drm-xe-fixes-2024-05-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- One pcode polling timeout change
- One fix for deadlocks for faulting VMs
- One error-path lock imbalance fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZlisNHzgoq9nVg6g@fedora
2024-05-31 08:33:13 +10:00
Dave Airlie
cfd36ae37c Merge tag 'drm-intel-fixes-2024-05-30' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
drm/i915 fixes for v6.10-rc2:
- Fix a race in audio component by registering it later
- Make DPT object unshrinkable to avoid shrinking when framebuffer has
  not shrunk
- Fix CCS id calculation to fix a perf regression
- Fix selftest caching mode
- Fix FIELD_PREP compiler warnings
- Fix indefinite wait for GT wakeref release
- Revert overeager multi-gt pm reference removal

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87a5k7iwod.fsf@intel.com
2024-05-31 08:18:18 +10:00
Damien Le Moal
29459c3eaa block: Fix zone write plugging handling of devices with a runt zone
A zoned device may have a last sequential write required zone that is
smaller than other zones. However, all tests to check if a zone write
plug write offset exceeds the zone capacity use the same capacity
value stored in the gendisk zone_capacity field. This is incorrect for a
zoned device with a last runt (smaller) zone.

Add the new field last_zone_capacity to struct gendisk to store the
capacity of the last zone of the device. blk_revalidate_seq_zone() and
blk_revalidate_conv_zone() are both modified to get this value when
disk_zone_is_last() returns true. Similarly to zone_capacity, the value
is first stored using the last_zone_capacity field of struct
blk_revalidate_zone_args. Once zone revalidation of all zones is done,
this is used to set the gendisk last_zone_capacity field.

The checks to determine if a zone is full or if a sector offset in a
zone exceeds the zone capacity in disk_should_remove_zone_wplug(),
disk_zone_wplug_abort_unaligned(), blk_zone_write_plug_init_request(),
and blk_zone_wplug_prepare_bio() are modified to use the new helper
functions disk_zone_is_full() and disk_zone_wplug_is_full().
disk_zone_is_full() uses the zone index to determine if the zone being
tested is the last one of the disk and uses the either the disk
zone_capacity or last_zone_capacity accordingly.

Fixes: dd291d77cc ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240530054035.491497-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 15:03:52 -06:00
Damien Le Moal
cd63999368 block: Fix validation of zoned device with a runt zone
Commit ecfe43b11b ("block: Remember zone capacity when revalidating
zones") introduced checks to ensure that the capacity of the zones of
a zoned device is constant for all zones. However, this check ignores
the possibility that a zoned device has a smaller last zone with a size
not equal to the capacity of other zones. Such device correspond in
practice to an SMR drive with a smaller last zone and all zones with a
capacity equal to the zone size, leading to the last zone capacity being
different than the capacity of other zones.

Correctly handle such device by fixing the check for the constant zone
capacity in blk_revalidate_seq_zone() using the new helper function
disk_zone_is_last(). This helper function is also used in
blk_revalidate_zone_cb() when checking the zone size.

Fixes: ecfe43b11b ("block: Remember zone capacity when revalidating zones")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240530054035.491497-3-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 15:03:52 -06:00
Damien Le Moal
b164316808 null_blk: Do not allow runt zone with zone capacity smaller then zone size
A zoned device with a smaller last zone together with a zone capacity
smaller than the zone size does make any sense as that does not
correspond to any possible setup for a real device:
1) For ZNS and zoned UFS devices, all zones are always the same size.
2) For SMR HDDs, all zones always have the same capacity.
In other words, if we have a smaller last runt zone, then this zone
capacity should always be equal to the zone size.

Add a check in null_init_zoned_dev() to prevent a configuration to have
both a smaller zone size and a zone capacity smaller than the zone size.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240530054035.491497-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 15:03:52 -06:00
NeilBrown
99bc9f2eb3 NFS: add barriers when testing for NFS_FSDATA_BLOCKED
dentry->d_fsdata is set to NFS_FSDATA_BLOCKED while unlinking or
renaming-over a file to ensure that no open succeeds while the NFS
operation progressed on the server.

Setting dentry->d_fsdata to NFS_FSDATA_BLOCKED is done under ->d_lock
after checking the refcount is not elevated.  Any attempt to open the
file (through that name) will go through lookp_open() which will take
->d_lock while incrementing the refcount, we can be sure that once the
new value is set, __nfs_lookup_revalidate() *will* see the new value and
will block.

We don't have any locking guarantee that when we set ->d_fsdata to NULL,
the wait_var_event() in __nfs_lookup_revalidate() will notice.
wait/wake primitives do NOT provide barriers to guarantee order.  We
must use smp_load_acquire() in wait_var_event() to ensure we look at an
up-to-date value, and must use smp_store_release() before wake_up_var().

This patch adds those barrier functions and factors out
block_revalidate() and unblock_revalidate() far clarity.

There is also a hypothetical bug in that if memory allocation fails
(which never happens in practice) we might leave ->d_fsdata locked.
This patch adds the missing call to unblock_revalidate().

Reported-and-tested-by: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501
Fixes: 3c59366c20 ("NFS: don't unhash dentry during unlink/rename")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-30 16:17:12 -04:00
Chen Hanxiao
33c94d7e3c SUNRPC: return proper error from gss_wrap_req_priv
don't return 0 if snd_buf->len really greater than snd_buf->buflen

Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Fixes: 0c77668ddb ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-30 16:12:43 -04:00
Olga Kornievskaia
28568c906c NFSv4.1 enforce rootpath check in fs_location query
In commit 4ca9f31a2b ("NFSv4.1 test and add 4.1 trunking transport"),
we introduce the ability to query the NFS server for possible trunking
locations of the existing filesystem. However, we never checked the
returned file system path for these alternative locations. According
to the RFC, the server can say that the filesystem currently known
under "fs_root" of fs_location also resides under these server
locations under the following "rootpath" pathname. The client cannot
handle trunking a filesystem that reside under different location
under different paths other than what the main path is. This patch
enforces the check that fs_root path and rootpath path in fs_location
reply is the same.

Fixes: 4ca9f31a2b ("NFSv4.1 test and add 4.1 trunking transport")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-30 16:12:43 -04:00
NeilBrown
296f4ce81d NFS: abort nfs_atomic_open_v23 if name is too long.
An attempt to open a file with a name longer than NFS3_MAXNAMLEN will
trigger a WARN_ON_ONCE in encode_filename3() because
nfs_atomic_open_v23() doesn't have the test on ->d_name.len that
nfs_atomic_open() has.

So add that test.

Reported-by: James Clark <james.clark@arm.com>
Closes: https://lore.kernel.org/all/20240528105249.69200-1-james.clark@arm.com/
Fixes: 7c6c5249f0 ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-30 16:12:11 -04:00
Jens Axboe
18414a4a2e io_uring/net: assign kmsg inq/flags before buffer selection
syzbot reports that recv is using an uninitialized value:

=====================================================
BUG: KMSAN: uninit-value in io_req_cqe_overflow io_uring/io_uring.c:810 [inline]
BUG: KMSAN: uninit-value in io_req_complete_post io_uring/io_uring.c:937 [inline]
BUG: KMSAN: uninit-value in io_issue_sqe+0x1f1b/0x22c0 io_uring/io_uring.c:1763
 io_req_cqe_overflow io_uring/io_uring.c:810 [inline]
 io_req_complete_post io_uring/io_uring.c:937 [inline]
 io_issue_sqe+0x1f1b/0x22c0 io_uring/io_uring.c:1763
 io_wq_submit_work+0xa17/0xeb0 io_uring/io_uring.c:1860
 io_worker_handle_work+0xc04/0x2000 io_uring/io-wq.c:597
 io_wq_worker+0x447/0x1410 io_uring/io-wq.c:651
 ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Uninit was stored to memory at:
 io_req_set_res io_uring/io_uring.h:215 [inline]
 io_recv_finish+0xf10/0x1560 io_uring/net.c:861
 io_recv+0x12ec/0x1ea0 io_uring/net.c:1175
 io_issue_sqe+0x429/0x22c0 io_uring/io_uring.c:1751
 io_wq_submit_work+0xa17/0xeb0 io_uring/io_uring.c:1860
 io_worker_handle_work+0xc04/0x2000 io_uring/io-wq.c:597
 io_wq_worker+0x447/0x1410 io_uring/io-wq.c:651
 ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:3877 [inline]
 slab_alloc_node mm/slub.c:3918 [inline]
 __do_kmalloc_node mm/slub.c:4038 [inline]
 __kmalloc+0x6e4/0x1060 mm/slub.c:4052
 kmalloc include/linux/slab.h:632 [inline]
 io_alloc_async_data+0xc0/0x220 io_uring/io_uring.c:1662
 io_msg_alloc_async io_uring/net.c:166 [inline]
 io_recvmsg_prep_setup io_uring/net.c:725 [inline]
 io_recvmsg_prep+0xbe8/0x1a20 io_uring/net.c:806
 io_init_req io_uring/io_uring.c:2135 [inline]
 io_submit_sqe io_uring/io_uring.c:2182 [inline]
 io_submit_sqes+0x1135/0x2f10 io_uring/io_uring.c:2335
 __do_sys_io_uring_enter io_uring/io_uring.c:3246 [inline]
 __se_sys_io_uring_enter+0x40f/0x3c80 io_uring/io_uring.c:3183
 __x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3183
 x64_sys_call+0x2c0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:427
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

which appears to be io_recv_finish() reading kmsg->msg.msg_inq to decide
if it needs to set IORING_CQE_F_SOCK_NONEMPTY or not. If the recv is
entered with buffer selection, but no buffer is available, then we jump
error path which calls io_recv_finish() without having assigned
kmsg->msg_inq. This might cause an errant setting of the NONEMPTY flag
for a request get gets errored with -ENOBUFS.

Reported-by: syzbot+b1647099e82b3b349fbf@syzkaller.appspotmail.com
Fixes: 4a3223f7bf ("io_uring/net: switch io_recv() to using io_async_msghdr")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 14:04:37 -06:00
Takashi Iwai
e1e287e6f9 Merge tag 'asoc-fix-v6.10-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.10

Several serieses of fixes that have come in since the merge window,
mostly for Intel systems.  The biggest thing is some updates from Peter
which fix support for a series of Intel laptops which have been found to
use 16 bit rather than 32 bit DMIC configuration blobs in their firmware
descriptions.  We also have a bunch of fixes for module annotations, and
some smaller single patch fixes.
2024-05-30 21:26:19 +02:00
John Hubbard
cb708ab9f5 selftests/futex: pass _GNU_SOURCE without a value to the compiler
It's slightly better to set _GNU_SOURCE in the source code, but if one
must do it via the compiler invocation, then the best way to do so is
this:

    $(CC) -D_GNU_SOURCE=

...because otherwise, if this form is used:

    $(CC) -D_GNU_SOURCE

...then that leads the compiler to set a value, as if you had passed in:

    $(CC) -D_GNU_SOURCE=1

That, in turn, leads to warnings under both gcc and clang, like this:

    futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined

Fix this by using the "-D_GNU_SOURCE=" form.

Reviewed-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-30 13:10:51 -06:00
Abhinav Kumar
bb19535880 drm/msm: remove python 3.9 dependency for compiling msm
Since commit 5acf491196 ("drm/msm: import gen_header.py script from Mesa"),
compilation is broken on machines having python versions older than 3.9
due to dependency on argparse.BooleanOptionalAction.

Switch to use simple bool for the validate flag to remove the dependency.

Fixes: 5acf491196 ("drm/msm: import gen_header.py script from Mesa")
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240507230440.3384949-1-quic_abhinavk@quicinc.com
2024-05-30 18:49:23 +02:00
Alexandre Ghiti
1d84afaf02 riscv: Fix fully ordered LR/SC xchg[8|16]() implementations
The fully ordered versions of xchg[8|16]() using LR/SC lack the
necessary memory barriers to guarantee the order.

Fix this by matching what is already implemented in the fully ordered
versions of cmpxchg() using LR/SC.

Suggested-by: Andrea Parri <parri.andrea@gmail.com>
Reported-by: Andrea Parri <parri.andrea@gmail.com>
Closes: https://lore.kernel.org/linux-riscv/ZlYbupL5XgzgA0MX@andrea/T/#u
Fixes: a8ed2b7a2c ("riscv/cmpxchg: Implement xchg for variables of size 1 and 2")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240530145546.394248-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30 09:43:14 -07:00
Palmer Dabbelt
982a7eb97b Documentation: RISC-V: uabi: Only scalar misaligned loads are supported
We're stuck supporting scalar misaligned loads in userspace because they
were part of the ISA at the time we froze the uABI.  That wasn't the
case for vector misaligned accesses, so depending on them
unconditionally is a userspace bug.  All extant vector hardware traps on
these misaligned accesses.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240524185600.5919-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30 09:42:53 -07:00
Nam Cao
7bed516174 riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel
HAVE_ARCH_HUGE_VMAP also works on XIP kernel, so remove its dependency on
!XIP_KERNEL.

This also fixes a boot problem for XIP kernel introduced by the commit in
"Fixes:". This commit used huge page mapping for vmemmap, but huge page
vmap was not enabled for XIP kernel.

Fixes: ff172d4818 ("riscv: Use hugepage mappings for vmemmap")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240526110104.470429-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30 09:42:52 -07:00
Sergey Matyukevich
a638b0461b riscv: prevent pt_regs corruption for secondary idle threads
Top of the kernel thread stack should be reserved for pt_regs. However
this is not the case for the idle threads of the secondary boot harts.
Their stacks overlap with their pt_regs, so both may get corrupted.

Similar issue has been fixed for the primary hart, see c7cdd96eca
("riscv: prevent stack corruption by reserving task_pt_regs(p) early").
However that fix was not propagated to the secondary harts. The problem
has been noticed in some CPU hotplug tests with V enabled. The function
smp_callin stored several registers on stack, corrupting top of pt_regs
structure including status field. As a result, kernel attempted to save
or restore inexistent V context.

Fixes: 9a2451f186 ("RISC-V: Avoid using per cpu array for ordered booting")
Fixes: 2875fe0561 ("RISC-V: Add cpu_ops and modify default booting method")
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240523084327.2013211-1-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30 09:42:51 -07:00
Marc Zyngier
47eb2d68d1 KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor
Now that we expose PAC to NV guests, we can also expose BTI (as
the two as joined at the hip, due to some of the PAC instructions
being landing pads).

While we're at it, also propagate CSV_frac, which requires no
particular emulation.

Fixes: f4f6a95bac ("KVM: arm64: nv: Advertise support for PAuth")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240528100632.1831995-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-30 17:36:22 +01:00
Marc Zyngier
41011e2de3 KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx
ERETAx can fail in multiple ways:

(1) ELR_EL2 points lalaland
(2) we get a PAC failure
(3) SPSR_EL2 has the wrong mode

(1) is easy, as we just let the CPU do its thing and deliver an
Instruction Abort. However, (2) and (3) are interesting, because
the PAC failure priority is way below that of the Illegal Execution
State exception.

Which means that if we have detected a PAC failure (and that we have
FPACCOMBINE), we must be careful to give priority to the Illegal
Execution State exception, should one be pending.

Solving this involves hoisting the SPSR calculation earlier and
testing for the IL bit before injecting the FPAC exception.

In the extreme case of a ERETAx returning to an invalid mode *and*
failing its PAC check, we end up with an Instruction Abort (due
to the new PC being mangled by the failed Auth) *and* PSTATE.IL
being set. Which matches the requirements of the architecture.

Whilst we're at it, remove a stale comment that states the obvious
and only confuses the reader.

Fixes: 213b3d1ea1 ("KVM: arm64: nv: Handle ERETA[AB] instructions")
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240528100632.1831995-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-30 17:36:22 +01:00
Guenter Roeck
52a2c70c3e hwmon: (shtc1) Fix property misspelling
The property name is "sensirion,low-precision", not
"sensicon,low-precision".

Cc: Chris Ruehl <chris.ruehl@gtsys.com.hk>
Fixes: be7373b60d ("hwmon: shtc1: add support for device tree bindings")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-05-30 09:05:19 -07:00
Peter Colberg
027a44fedd hwmon: (intel-m10-bmc-hwmon) Fix multiplier for N6000 board power sensor
The Intel N6000 BMC outputs the board power value in milliwatt, whereas
the hwmon sysfs interface must provide power values in microwatt.

Fixes: e1983220ae ("hwmon: intel-m10-bmc-hwmon: Add N6000 sensors")
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Link: https://lore.kernel.org/r/20240521181246.683833-1-peter.colberg@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-05-30 09:05:06 -07:00
Linus Torvalds
d8ec19857b Merge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - gro: initialize network_offset in network layer

   - tcp: reduce accepted window in NEW_SYN_RECV state

  Current release - new code bugs:

   - eth: mlx5e: do not use ptp structure for tx ts stats when not
     initialized

   - eth: ice: check for unregistering correct number of devlink params

  Previous releases - regressions:

   - bpf: Allow delete from sockmap/sockhash only if update is allowed

   - sched: taprio: extend minimum interval restriction to entire cycle
     too

   - netfilter: ipset: add list flush to cancel_gc

   - ipv4: fix address dump when IPv4 is disabled on an interface

   - sock_map: avoid race between sock_map_close and sk_psock_put

   - eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
     status rules

  Previous releases - always broken:

   - core: fix __dst_negative_advice() race

   - bpf:
       - fix multi-uprobe PID filtering logic
       - fix pkt_type override upon netkit pass verdict

   - netfilter: tproxy: bail out if IP has been disabled on the device

   - af_unix: annotate data-race around unix_sk(sk)->addr

   - eth: mlx5e: fix UDP GSO for encapsulated packets

   - eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
     buffers

   - eth: i40e: fully suspend and resume IO operations in EEH case

   - eth: octeontx2-pf: free send queue buffers incase of leaf to inner

   - eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"

* tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  netdev: add qstat for csum complete
  ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
  net: ena: Fix redundant device NUMA node override
  ice: check for unregistering correct number of devlink params
  ice: fix 200G PHY types to link speed mapping
  i40e: Fully suspend and resume IO operations in EEH case
  i40e: factoring out i40e_suspend/i40e_resume
  e1000e: move force SMBUS near the end of enable_ulp function
  net: dsa: microchip: fix RGMII error in KSZ DSA driver
  ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
  net: fix __dst_negative_advice() race
  nfc/nci: Add the inconsistency check between the input data length and count
  MAINTAINERS: dwmac: starfive: update Maintainer
  net/sched: taprio: extend minimum interval restriction to entire cycle too
  net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
  netfilter: nft_fib: allow from forward/input without iif selector
  netfilter: tproxy: bail out if IP has been disabled on the device
  netfilter: nft_payload: skbuff vlan metadata mangle support
  net: ti: icssg-prueth: Fix start counter for ft1 filter
  sock_map: avoid race between sock_map_close and sk_psock_put
  ...
2024-05-30 08:33:04 -07:00
Dave Hansen
2a38e4ca30 x86/cpu: Provide default cache line size if not enumerated
tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.

Long Story:

The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0.  Normally, this value is populated from
c->x86_clflush_size.

Right now the code is set up to get c->x86_clflush_size from two
places.  First, modern CPUs get it from CPUID.  Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().

The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID.  But there are oddballs.

Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:

	cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
	if (cap0 & (1<<19))
		c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;

So they: land in get_cpu_address_sizes() and see that CPUID has level
0x80000008 and jump into the side of the if() that does not fill in
c->x86_clflush_size.  That assigns a 0 to c->x86_cache_alignment, and
hilarity ensues in code like:

        buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
                         GFP_KERNEL);

To fix this, always provide a sane value for ->x86_clflush_size.

Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs.  It would not, for instance, have worked on
the QEMU config.

1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel/GenuineIntel0000590_Clanton_03_CPUID.txt
2. You can also get this behavior if you use "-cpu 486,+clzero"
   in QEMU.

[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog
	   because bpetkov brutally murdered it recently. ]

Fixes: fbf6449f84 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Jörn Heusipp <osmanx@heusipp.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linux.intel.com/
Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.de/
Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.intel.com
2024-05-30 08:29:45 -07:00
Bingbu Cao
ffb9072bce media: intel/ipu6: add csi2 port sanity check in notifier bound
Invalid csi2 port will break the isys notifier bound ops as it is
trying to access an invalid csi2 sub-device instance based on the
port. It will trigger a mc warning, and it will cause the sensor
driver to unbound an inexistent isys csi2 and crash. Adding a
csi2 port sanity check, return error to avoid such case.

Fixes: f50c4ca0a8 ("media: intel/ipu6: add the main input system driver")
Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
[Sakari Ailus: Fix spelling of "nports" field.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-30 16:34:59 +02:00
Bingbu Cao
54880795b4 media: intel/ipu6: update the maximum supported csi2 port number to 6
IPU6EP on Meteor Lake SoC supports maximum 6 csi2 ports instead of 4.

Fixes: 25fedc0219 ("media: intel/ipu6: add Intel IPU6 PCI device driver")
Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-30 16:34:59 +02:00
Sakari Ailus
cc864821c7 media: mei: csi: Warn less verbosely of a missing device fwnode
The check for having device fwnode was meant to be a sanity check but this
also happens if the ACPI DSDT has graph port nodes on sensor device(s) but
not on the IVSC device. Use a more meaningful warning message to tell
about this.

Fixes: 33116eb12c ("media: ivsc: csi: Use IPU bridge")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-30 16:34:58 +02:00
Sakari Ailus
328af04b1a media: mei: csi: Put the IPU device reference
The mei csi's probe function obtains a reference to the IPU device but
never puts that reference. Do that now.

Fixes: 33116eb12c ("media: ivsc: csi: Use IPU bridge")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-30 16:34:58 +02:00
Breno Leitao
e112311615 io_uring/rw: Free iovec before cleaning async data
kmemleak shows that there is a memory leak in io_uring read operation,
where a buffer is allocated at iovec import, but never de-allocated.

The memory is allocated at io_async_rw->free_iovec, but, then
io_async_rw is kfreed, taking the allocated memory with it. I saw this
happening when the read operation fails with -11 (EAGAIN).

This is the kmemleak splat.

    unreferenced object 0xffff8881da591c00 (size 256):
...
      backtrace (crc 7a15bdee):
	[<00000000256f2de4>] __kmalloc+0x2d6/0x410
	[<000000007a9f5fc7>] iovec_from_user.part.0+0xc6/0x160
	[<00000000cecdf83a>] __import_iovec+0x50/0x220
	[<00000000d1d586a2>] __io_import_iovec+0x13d/0x220
	[<0000000054ee9bd2>] io_prep_rw+0x186/0x340
	[<00000000a9c0372d>] io_prep_rwv+0x31/0x120
	[<000000001d1170b9>] io_prep_readv+0xe/0x30
	[<0000000070b8eb67>] io_submit_sqes+0x1bd/0x780
	[<00000000812496d4>] __do_sys_io_uring_enter+0x3ed/0x5b0
	[<0000000081499602>] do_syscall_64+0x5d/0x170
	[<00000000de1c5a4d>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

This occurs because the async data cleanup functions are not set for
read/write operations. As a result, the potentially allocated iovec in
the rw async data is not freed before the async data is released,
leading to a memory leak.

With this following patch, kmemleak does not show the leaked memory
anymore, and all liburing tests pass.

Fixes: a9165b83c1 ("io_uring/rw: always setup io_async_rw for read/write requests")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240530142340.1248216-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 08:33:01 -06:00
Thomas Gleixner
34bf6bae32 x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
The new AMD/HYGON topology parser evaluates the SMT information in CPUID leaf
0x8000001e unconditionally while the original code restricted it to CPUs with
family 0x17 and greater.

This breaks family 0x15 CPUs which advertise that leaf and have a non-zero
value in the SMT section. The machine boots, but the scheduler complains loudly
about the mismatch of the core IDs:

  WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6482 sched_cpu_starting+0x183/0x250
  WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2408 build_sched_domains+0x76b/0x12b0

Add the condition back to cure it.

  [ bp: Make it actually build because grandpa is not concerned with
    trivial stuff. :-P ]

Fixes: f7fb3b2dd9 ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Closes: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/56
Reported-by: Tim Teichmann <teichmanntim@outlook.de>
Reported-by: Christian Heusel <christian@heusel.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tim Teichmann <teichmanntim@outlook.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/7skhx6mwe4hxiul64v6azhlxnokheorksqsdbp7qw6g2jduf6c@7b5pvomauugk
2024-05-30 15:58:55 +02:00
Mark Brown
c85578e730 ASoC: SOF: ipc4-topology: Fix nhlt configuration blob
Merge series from Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:

The existing logic to pick a DMIC blob is based on several historical
assumptions that the NHLT in BIOS always contains 32-bits per sample
type (first patch, [1]).

The other issue with the existing logic is that it was designed to care only
about the bit depth of the format and fails to find the existing and correct
blob when rate/channels are different on the FE side compared to what we should
be using on the DAI side (we have components in path which can change
rate/channel count).

These issues have not been observed in past but with new MTL based (Windows)
laptops and new topologies to enhance the audio quality, we started to see weird
issues around how our assumptions of vendors failed.

Since some NHLT blob handling cleanup has been done for 6.10, this series will
complete that work to cover even cases that we don't anticipate to see.

[1] https://github.com/thesofproject/linux/issues/4973
2024-05-30 14:33:14 +01:00
Gerald Loacker
b62c150c3b drm/panel: sitronix-st7789v: fix display size for jt240mhqs_hwt_ek_e3 panel
This is a portrait mode display. Change the dimensions accordingly.

Fixes: 0fbbe96bfa ("drm/panel: sitronix-st7789v: add jasonic jt240mhqs-hwt-ek-e3 support")
Signed-off-by: Gerald Loacker <gerald.loacker@wolfvision.net>
Acked-by: Jessica Zhang <quic_jesszhan@quicinc.com>
Link: https://lore.kernel.org/r/20240409-bugfix-jt240mhqs_hwt_ek_e3-timing-v2-3-e4821802443d@wolfvision.net
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409-bugfix-jt240mhqs_hwt_ek_e3-timing-v2-3-e4821802443d@wolfvision.net
2024-05-30 14:57:33 +02:00
Gerald Loacker
2ba5058263 drm/panel: sitronix-st7789v: tweak timing for jt240mhqs_hwt_ek_e3 panel
Use the default timing parameters to get a refresh rate of about 60 Hz for
a clock of 6 MHz.

Fixes: 0fbbe96bfa ("drm/panel: sitronix-st7789v: add jasonic jt240mhqs-hwt-ek-e3 support")
Signed-off-by: Gerald Loacker <gerald.loacker@wolfvision.net>
Acked-by: Jessica Zhang <quic_jesszhan@quicinc.com>
Link: https://lore.kernel.org/r/20240409-bugfix-jt240mhqs_hwt_ek_e3-timing-v2-2-e4821802443d@wolfvision.net
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409-bugfix-jt240mhqs_hwt_ek_e3-timing-v2-2-e4821802443d@wolfvision.net
2024-05-30 14:57:33 +02:00
Gerald Loacker
0e5895ff7f drm/panel: sitronix-st7789v: fix timing for jt240mhqs_hwt_ek_e3 panel
Flickering was observed when using partial mode. Moving the vsync to the
same position as used by the default sitronix-st7789v timing resolves this
issue.

Fixes: 0fbbe96bfa ("drm/panel: sitronix-st7789v: add jasonic jt240mhqs-hwt-ek-e3 support")
Acked-by: Jessica Zhang <quic_jesszhan@quicinc.com>
Signed-off-by: Gerald Loacker <gerald.loacker@wolfvision.net>
Link: https://lore.kernel.org/r/20240409-bugfix-jt240mhqs_hwt_ek_e3-timing-v2-1-e4821802443d@wolfvision.net
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409-bugfix-jt240mhqs_hwt_ek_e3-timing-v2-1-e4821802443d@wolfvision.net
2024-05-30 14:57:32 +02:00
Samuel Holland
be2fc65d66 powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64
When building a 32-bit kernel, some toolchains do not allow mixing soft
float and hard float object files:

    LD      vmlinux.o
  powerpc64le-unknown-linux-musl-ld: lib/test_fpu_impl.o uses hard float, arch/powerpc/kernel/udbg.o uses soft float
  powerpc64le-unknown-linux-musl-ld: failed to merge target specific data of file lib/test_fpu_impl.o
  make[2]: *** [scripts/Makefile.vmlinux_o:62: vmlinux.o] Error 1
  make[1]: *** [Makefile:1152: vmlinux_o] Error 2
  make: *** [Makefile:240: __sub-make] Error 2

This is not an issue when building a 64-bit kernel. To unbreak the
build, limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64-bit kernels. This is okay
because the only real user of this option, amdgpu, was previously
limited to PPC64 anyway; see commit a28e4b672f ("drm/amd/display: use
ARCH_HAS_KERNEL_FPU_SUPPORT").

Fixes: 01db473e1a ("powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405250851.Z4daYSWG-lkp@intel.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/eeffaec3-df63-4e55-ab7a-064a65c00efa@roeck-us.net/
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240529162852.1209-1-samuel.holland@sifive.com
2024-05-30 22:57:27 +10:00
Michael Ellerman
50934945d5 powerpc/uaccess: Use YZ asm constraint for ld
The 'ld' instruction requires a 4-byte aligned displacement because it
is a DS-form instruction. But the "m" asm constraint doesn't enforce
that.

Add a special case of __get_user_asm2_goto() so that the "YZ" constraint
can be used for "ld".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "ld" or X-form "ldx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

No build errors have been reported due to this, but the possibility is
there depending on compiler code generation decisions.

Fixes: c20beffeec ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240529123029.146953-2-mpe@ellerman.id.au
2024-05-30 22:57:27 +10:00
Michael Ellerman
2d43cc701b powerpc/uaccess: Fix build errors seen with GCC 13/14
Building ppc64le_defconfig with GCC 14 fails with assembler errors:

    CC      fs/readdir.o
  /tmp/ccdQn0mD.s: Assembler messages:
  /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
  /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
  ... [6 lines]
  /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

  # ../fs/readdir.c:210:         unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
         ld 9,0(29)       # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 *)name_38(D) + _88 * 1]
  # 210 "../fs/readdir.c" 1
         1:      std 9,18(8)     # put_user       # *__pus_addr_52, MEM[(u64 *)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 13/14, that appears
to just be luck. The constraint has been incorrect since it was first
added.

Fixes: c20beffeec ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()")
Cc: stable@vger.kernel.org # v5.10+
Suggested-by: Kewen Lin <linkw@gcc.gnu.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240529123029.146953-1-mpe@ellerman.id.au
2024-05-30 22:57:27 +10:00
Nathan Lynch
12870ae381 powerpc/pseries/lparcfg: drop error message from guest name lookup
It's not an error or exceptional situation when the hosting
environment does not expose a name for the LP/guest via RTAS or the
device tree. This happens with qemu when run without the '-name'
option. The message also lacks a newline. Remove it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: eddaa9a402 ("powerpc/pseries: read the lpar name from the firmware")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240524-lparcfg-updates-v2-1-62e2e9d28724@linux.ibm.com
2024-05-30 22:57:26 +10:00
Peter Ujfalusi
b65456b7b3 ASoC: SOF: ipc4-topology: Adjust the params based on DAI formats
Currently we only check the bit depth value among to DAI formats, but other
parameters might be constant, like number of channels and/or rate.

In capture we use the fe params as a reference to find the format and blob
which should be used, but in the path we can have components which can
handle expanding/narrowing number of channels or do a resample.

In these cases the topology is expected to have 'fixed' parameter for
channels/rates/bit depth and the conversion to the fe format is going to
be done within the path.

In practice this patch fixes issues like:
All DMIC formats are fixed four channels
We have a component which converts the four channel to stereo
FE is opened with 2 channel

Even if we have the correct bit depth format and blob (for four channel) we
will still be looking for stereo configurations, which will fail.

Note: the adjustment of params have switched order with the checking of
single bit depth (needed for the NHLT blob fallback support). This change
is non function, just that if the sof_ipc4_narrow_params_to_format() would
fail, there is no point of checking the single bit depth.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://msgid.link/r/20240530111918.21974-6-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-30 12:33:32 +01:00
Peter Ujfalusi
2fcad03eab ASoC: SOF: ipc4-topology: Improve readability of sof_ipc4_prepare_dai_copier()
Remove the duplicated code paths to check for single bit depth and to
update the params with storing the parameters needed by the function and
have a single code section.

No functional change but the code is easier to follow.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://msgid.link/r/20240530111918.21974-5-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-30 12:33:31 +01:00
Peter Ujfalusi
3b64fd2f83 ASoC: SOF: ipc4-topology/pcm: Rename sof_ipc4_copier_is_single_format()
Rename the sof_ipc4_copier_is_single_format() to
sof_ipc4_copier_is_single_bitdepth() to clear the confusion of the use of
'format' when we are querying information on the bit depth.
Format is used to describe a combination of parameters (rate, channels,
sample format / bit depth).
Rename the flags used to store the result at the same time.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://msgid.link/r/20240530111918.21974-4-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-30 12:33:30 +01:00
Peter Ujfalusi
2a865c9c3f ASoC: SOF: ipc4-topology: Print out the channel count in sof_ipc4_dbg_audio_format
Print out the number of channels for the format explicitly instead of
having the reader to understand how to interpret the ch_map and ch_cfg
values.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://msgid.link/r/20240530111918.21974-3-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-30 12:33:29 +01:00
Peter Ujfalusi
49cb894d56 ASoC: SOF: ipc4-topology: Add support for NHLT with 16-bit only DMIC blob
The ACPI NHLT table always had 32-bit DMIC blob even if 16-bit was also
present and taken as a 'rule' which obviously got broken and there is at
least one device on the market which ships with only 16-bit DMIC
configuration blob.
This corner case has never been supported and it is going to need topology
updates for DMIC copier to support multiple formats.

As for the kernel side: if the copier supports multiple formats and the
preferred 32-bit DMIC blob is not found then we will try to get a 16-bit
DMIC configuration and look for a 16-bit copier config.

Fixes: f9209644ae ("ASoC: SOF: ipc4-topology: Correct DAI copier config and NHLT blob request")
Link: https://github.com/thesofproject/linux/issues/4973
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://msgid.link/r/20240530111918.21974-2-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-30 12:33:28 +01:00
Jakub Kicinski
13c7c941e7 netdev: add qstat for csum complete
Recent commit 0cfe71f45f ("netdev: add queue stats") added
a lot of useful stats, but only those immediately needed by virtio.
Presumably virtio does not support CHECKSUM_COMPLETE,
so statistic for that form of checksumming wasn't included.
Other drivers will definitely need it, in fact we expect it
to be needed in net-next soon (mlx5). So let's add the definition
of the counter for CHECKSUM_COMPLETE to uAPI in net already,
so that the counters are in a more natural order (all subsequent
counters have not been present in any released kernel, yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Fixes: 0cfe71f45f ("netdev: add queue stats")
Link: https://lore.kernel.org/r/20240529163547.3693194-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-30 12:15:56 +02:00
Takashi Iwai
700fe6fd09 ALSA: seq: Fix yet another spot for system message conversion
We fixed the incorrect UMP type for system messages in the recent
commit, but it missed one place in system_ev_to_ump_midi1().
Fix it now.

Fixes: e9e02819a9 ("ALSA: seq: Automatic conversion of UMP events")
Fixes: c2bb79613fed ("ALSA: seq: Fix incorrect UMP type for system messages")
Link: https://lore.kernel.org/r/20240530101044.17524-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-30 12:11:20 +02:00
Yue Haibing
b3dc6e8003 ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will
hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path.

WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70
Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper
CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:sk_mc_loop+0x2d/0x70
Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c
RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212
RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000
RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00
R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000
R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000
FS:  0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
 ? __warn (kernel/panic.c:693)
 ? sk_mc_loop (net/core/sock.c:760)
 ? report_bug (lib/bug.c:201 lib/bug.c:219)
 ? handle_bug (arch/x86/kernel/traps.c:239)
 ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
 ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
 ? sk_mc_loop (net/core/sock.c:760)
 ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1))
 ? nf_hook_slow (net/netfilter/core.c:626)
 ip6_finish_output (net/ipv6/ip6_output.c:222)
 ? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215)
 ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:602) ipvlan
 ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan
 dev_hard_start_xmit (net/core/dev.c:3594)
 sch_direct_xmit (net/sched/sch_generic.c:343)
 __qdisc_run (net/sched/sch_generic.c:416)
 net_tx_action (net/core/dev.c:5286)
 handle_softirqs (kernel/softirq.c:555)
 __irq_exit_rcu (kernel/softirq.c:589)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043)

The warning triggers as this:
packet_sendmsg
   packet_snd //skb->sk is packet sk
      __dev_queue_xmit
         __dev_xmit_skb //q->enqueue is not NULL
             __qdisc_run
               sch_direct_xmit
                 dev_hard_start_xmit
                   ipvlan_start_xmit
                      ipvlan_xmit_mode_l3 //l3 mode
                        ipvlan_process_outbound //vepa flag
                          ipvlan_process_v6_outbound
                            ip6_local_out
                                __ip6_finish_output
                                  ip6_finish_output2 //multicast packet
                                    sk_mc_loop //sk->sk_family is AF_PACKET

Call ip{6}_local_out() with NULL sk in ipvlan as other tunnels to fix this.

Fixes: 2ad7bf3638 ("ipvlan: Initial check-in of the IPVLAN driver.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240529095633.613103-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-30 12:05:52 +02:00
Paolo Abeni
e889eb17f4 Merge tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 syzbot reports that nf_reinject() could be called without
         rcu_read_lock() when flushing pending packets at nfnetlink
         queue removal, from Eric Dumazet.

Patch #2 flushes ipset list:set when canceling garbage collection to
         reference to other lists to fix a race, from Jozsef Kadlecsik.

Patch #3 restores q-in-q matching with nft_payload by reverting
         f6ae9f120d ("netfilter: nft_payload: add C-VLAN support").

Patch #4 fixes vlan mangling in skbuff when vlan offload is present
         in skbuff, without this patch nft_payload corrupts packets
         in this case.

Patch #5 fixes possible nul-deref in tproxy no IP address is found in
         netdevice, reported by syzbot and patch from Florian Westphal.

Patch #6 removes a superfluous restriction which prevents loose fib
         lookups from input and forward hooks, from Eric Garver.

My assessment is that patches #1, #2 and #5 address possible kernel
crash, anything else in this batch fixes broken features.

netfilter pull request 24-05-29

* tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_fib: allow from forward/input without iif selector
  netfilter: tproxy: bail out if IP has been disabled on the device
  netfilter: nft_payload: skbuff vlan metadata mangle support
  netfilter: nft_payload: restore vlan q-in-q match support
  netfilter: ipset: Add list flush to cancel_gc
  netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()
====================

Link: https://lore.kernel.org/r/20240528225519.1155786-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-30 10:14:56 +02:00
Yuntao Wang
ed8c7fbdfe fs/file: fix the check in find_next_fd()
The maximum possible return value of find_next_zero_bit(fdt->full_fds_bits,
maxbit, bitbit) is maxbit. This return value, multiplied by BITS_PER_LONG,
gives the value of bitbit, which can never be greater than maxfd, it can
only be equal to maxfd at most, so the following check 'if (bitbit > maxfd)'
will never be true.

Moreover, when bitbit equals maxfd, it indicates that there are no unused
fds, and the function can directly return.

Fix this check.

Signed-off-by: Yuntao Wang <yuntao.wang@linux.dev>
Link: https://lore.kernel.org/r/20240529160656.209352-1-yuntao.wang@linux.dev
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-30 09:11:47 +02:00
Shay Agroskin
2dc8b1e717 net: ena: Fix redundant device NUMA node override
The driver overrides the NUMA node id of the device regardless of
whether it knows its correct value (often setting it to -1 even though
the node id is advertised in 'struct device'). This can lead to
suboptimal configurations.

This patch fixes this behavior and makes the shared memory allocation
functions use the NUMA node id advertised by the underlying device.

Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20240528170912.1204417-1-shayagr@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 19:01:34 -07:00
Jakub Kicinski
602d9591a7 Merge branch 'intel-wired-lan-driver-updates-2024-05-28-e1000e-i40e-ice'
Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2024-05-28 (e1000e, i40e, ice) [part]

This series includes a variety of fixes that have been accumulating on the
Intel Wired LAN dev-queue.

Hui Wang provides a fix for suspend/resume on e1000e due to failure
to correctly setup the SMBUS in enable_ulp().

Thinh Tran provides a fix for EEH I/O suspend/resume on i40e to
ensure that I/O operations can continue after a resume. To avoid duplicate
code, the common logic is factored out of i40e_suspend and i40e_resume.

Paul Greenwalt provides a fix to correctly map the 200G PHY types to link
speeds in the ice driver.

Dave Ertman provides a fix correcting devlink parameter unregistration in
the event that the driver loads in safe mode and some of the parameters
were not registered.
====================

Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-0-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:57:03 -07:00
Dave Ertman
a51c9b1c9a ice: check for unregistering correct number of devlink params
On module load, the ice driver checks for the lack of a specific PF
capability to determine if it should reduce the number of devlink params
to register.  One situation when this test returns true is when the
driver loads in safe mode.  The same check is not present on the unload
path when devlink params are unregistered.  This results in the driver
triggering a WARN_ON in the kernel devlink code.

The current check and code path uses a reduction in the number of elements
reported in the list of params.  This is fragile and not good for future
maintaining.

Change the parameters to be held in two lists, one always registered and
one dependent on the check.

Add a symmetrical check in the unload path so that the correct parameters
are unregistered as well.

Fixes: 109eb29172 ("ice: Add tx_scheduling_layers devlink param")
CC: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-8-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:57:02 -07:00
Paul Greenwalt
2a6d8f2de2 ice: fix 200G PHY types to link speed mapping
Commit 24407a01e5 ("ice: Add 200G speed/phy type use") added support
for 200G PHY speeds, but did not include the mapping of 200G PHY types
to link speed. As a result the driver is returning UNKNOWN link speed
when setting 200G ethtool advertised link modes.

To fix this add 200G PHY types to link speed mapping to
ice_get_link_speed_based_on_phy_type().

Fixes: 24407a01e5 ("ice: Add 200G speed/phy type use")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-5-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:57:02 -07:00
Thinh Tran
c80b6538d3 i40e: Fully suspend and resume IO operations in EEH case
When EEH events occurs, the callback functions in the i40e, which are
managed by the EEH driver, will completely suspend and resume all IO
operations.

- In the PCI error detected callback, replaced i40e_prep_for_reset()
  with i40e_io_suspend(). The change is to fully suspend all I/O
  operations
- In the PCI error slot reset callback, replaced pci_enable_device_mem()
  with pci_enable_device(). This change enables both I/O and memory of
  the device.
- In the PCI error resume callback, replaced i40e_handle_reset_warning()
  with i40e_io_resume(). This change allows the system to resume I/O
  operations

Fixes: a5f3d2c17b ("powerpc/pseries/pci: Add MSI domains")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Robert Thomas <rob.thomas@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-3-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:56:30 -07:00
Thinh Tran
218ed820d3 i40e: factoring out i40e_suspend/i40e_resume
Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
introduced.  These functions were factored out from the existing
i40e_suspend() and i40e_resume() respectively.  This factoring was
done due to concerns about the logic of the I40E_SUSPENSED state, which
caused the device to be unable to recover.  The functions are now used
in the EEH handling for device suspend/resume callbacks.

The function i40e_enable_mc_magic_wake() has been moved ahead of
i40e_io_suspend() to ensure it is declared before being used.

Tested-by: Robert Thomas <rob.thomas@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-2-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:56:30 -07:00
Hui Wang
bfd546a552 e1000e: move force SMBUS near the end of enable_ulp function
The commit 861e808602 ("e1000e: move force SMBUS from enable ulp
function to avoid PHY loss issue") introduces a regression on
PCH_MTP_I219_LM18 (PCIID: 0x8086550A). Without the referred commit, the
ethernet works well after suspend and resume, but after applying the
commit, the ethernet couldn't work anymore after the resume and the
dmesg shows that the NIC link changes to 10Mbps (1000Mbps originally):

    [   43.305084] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx

Without the commit, the force SMBUS code will not be executed if
"return 0" or "goto out" is executed in the enable_ulp(), and in my
case, the "goto out" is executed since FWSM_FW_VALID is set. But after
applying the commit, the force SMBUS code will be ran unconditionally.

Here move the force SMBUS code back to enable_ulp() and put it
immediately ahead of hw->phy.ops.release(hw), this could allow the
longest settling time as possible for interface in this function and
doesn't change the original code logic.

The issue was found on a Lenovo laptop with the ethernet hw as below:
00:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:550a]
(rev 20).

And this patch is verified (cable plug and unplug, system suspend
and resume) on Lenovo laptops with ethernet hw: [8086:550a],
[8086:550b], [8086:15bb], [8086:15be], [8086:1a1f], [8086:1a1c] and
[8086:0dc7].

Fixes: 861e808602 ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-1-dc8593d2bbc6@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:56:30 -07:00
Jens Axboe
1521dc2410 Merge tag 'nvme-6.10-2024-05-29' of git://git.infradead.org/nvme into block-6.10
Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.10

 - Removing unused fields (Kanchan)
 - Large folio offsets support (Kundan)
 - Multipath NUMA node initialiazation fix (Nilay)
 - Multipath IO stats accounting fixes (Keith)
 - Circular lockdep fix (Keith)
 - Target race condition fix (Sagi)
 - Target memory leak fix (Sagi)"

* tag 'nvme-6.10-2024-05-29' of git://git.infradead.org/nvme:
  nvmet: fix a possible leak when destroy a ctrl during qp establishment
  nvme: use srcu for iterating namespace list
  nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset
  nvme: remove sgs and sws
  nvmet: fix ns enable/disable possible hang
  nvme-multipath: fix io accounting on failover
  nvme: fix multipath batched completion accounting
  nvme-multipath: find NUMA path only for online numa-node
2024-05-29 19:54:33 -06:00
Tristram Ha
278d65ccda net: dsa: microchip: fix RGMII error in KSZ DSA driver
The driver should return RMII interface when XMII is running in RMII mode.

Fixes: 0ab7f6bf16 ("net: dsa: microchip: ksz9477: use common xmii function")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Acked-by: Jerry Ray <jerry.ray@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1716932066-3342-1-git-send-email-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:44:36 -07:00
Alexander Mikhalitsyn
b8c8abefc0 ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
A recent change to inet_dump_ifaddr had the function incorrectly iterate
over net rather than tgt_net, resulting in the data coming for the
incorrect network namespace.

Fixes: cdb2f80f1c ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://github.com/lxc/incus/issues/892
Bisected-by: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Stéphane Graber <stgraber@stgraber.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240528203030.10839-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 18:43:42 -07:00
Eric Dumazet
92f1655aa2 net: fix __dst_negative_advice() race
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.

RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).

Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.

Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.

Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.

Many thanks to Clement Lecigne for tracking this issue.

This old bug became visible after the blamed commit, using UDP sockets.

Fixes: a87cb3e48e ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 17:34:49 -07:00
Javier Carrasco
a94ff8e50c hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt()
A new error path was added to the fwnode_for_each_available_node() loop
in ltc2992_parse_dt(), which leads to an early return that requires a
call to fwnode_handle_put() to avoid a memory leak in that case.

Add the missing fwnode_handle_put() in the error path from a zero value
shunt resistor.

Cc: stable@vger.kernel.org
Fixes: 10b0290204 ("hwmon: (ltc2992) Avoid division by zero")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://lore.kernel.org/r/20240523-fwnode_for_each_available_child_node_scoped-v2-1-701f3a03f2fb@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-05-29 15:34:46 -07:00
Armin Wolf
fa0bc8f297 hwmon: (dell-smm) Add Dell G15 5511 to fan control whitelist
A user reported that he needs to disable BIOS fan control on his
Dell G15 5511 in order to be able to control the fans.

Closes: https://github.com/Wer-Wolf/i8kutils/issues/5
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Acked-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20240522210809.294488-1-W_Armin@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-05-29 15:31:12 -07:00
Heiner Kallweit
67c7d4fa26 drm/amd/pm: remove deprecated I2C_CLASS_SPD support from newly added SMU_14_0_2
Support for I2C_CLASS_SPD  is currently being removed from the kernel.
Only remaining step is to remove the definition of I2C_CLASS_SPD.
Setting I2C_CLASS_SPD  in a driver is a no-op meanwhile, so remove it
here.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:08:08 -04:00
Rajneesh Bhardwaj
a9bc5a19e4 drm/amdgpu: Make CPX mode auto default in NPS4
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is
setup in NPS4 memory partition mode. This change is only applicable for
dGPU, for APU, continue to use TPX mode.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:06:48 -04:00
Alex Deucher
1f327dfc84 drm/amdkfd: simplify APU VRAM handling
With commit 89773b8559
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified.  Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:06:15 -04:00
Alex Deucher
dd2b75fd9a Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
This reverts commit 28ebbb4981.

Revert this commit as apparently the LLVM code to take advantage of
this never landed.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Feifei Xu <feifei.xu@amd.com>
2024-05-29 17:06:04 -04:00
Jesse Zhang
a0cf36546c drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:03:20 -04:00
Alex Deucher
05d9e24ddb drm/amdgpu: silence UBSAN warning
Convert a variable sized array from [1] to [].

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:02:41 -04:00
Alex Deucher
ba46b3bda2 drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
Use current speed/width on devices which don't support
dynamic PCIe switching.

Fixes: 466a7d1153 ("drm/amd: Use the first non-dGPU PCI device for BW limits")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:01:49 -04:00
Kent Overstreet
7b038b564b bcachefs: Fix failure to return error on misaligned dio write
This was reported as an error when running coreutils shred.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-29 16:40:30 -04:00
Samuel Holland
2607133196 clk: sifive: Do not register clkdevs for PRCI clocks
These clkdevs were unnecessary, because systems using this driver always
look up clocks using the devicetree. And as Russell King points out[1],
since the provided device name was truncated, lookups via clkdev would
never match.

Recently, commit 8d532528ff ("clkdev: report over-sized strings when
creating clkdev entries") caused clkdev registration to fail due to the
truncation, and this now prevents the driver from probing. Fix the
driver by removing the clkdev registration.

Link: https://lore.kernel.org/linux-clk/ZkfYqj+OcAxd9O2t@shell.armlinux.org.uk/ [1]
Fixes: 30b8e27e3b ("clk: sifive: add a driver for the SiFive FU540 PRCI IP block")
Fixes: 8d532528ff ("clkdev: report over-sized strings when creating clkdev entries")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/linux-clk/7eda7621-0dde-4153-89e4-172e4c095d01@roeck-us.net/
Suggested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240528001432.1200403-1-samuel.holland@sifive.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-05-29 12:31:02 -07:00
Michael Ellerman
e8b8c5264d selftests/overlayfs: Fix build error on ppc64
Fix build error on ppc64:
  dev_in_maps.c: In function ‘get_file_dev_and_inode’:
  dev_in_maps.c:60:59: error: format ‘%llu’ expects argument of type
  ‘long long unsigned int *’, but argument 7 has type ‘__u64 *’ {aka ‘long
  unsigned int *’} [-Werror=format=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:26:40 -06:00
Michael Ellerman
84b6df4c49 selftests/openat2: Fix build warnings on ppc64
Fix warnings like:

  openat2_test.c: In function ‘test_openat2_flags’:
  openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type
  ‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long
  unsigned int’} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:26:11 -06:00
Michael Ellerman
bc4d5f5d2d selftests: cachestat: Fix build warnings on ppc64
Fix warnings like:
  test_cachestat.c: In function ‘print_cachestat’:
  test_cachestat.c:30:38: warning: format ‘%llu’ expects argument of
  type ‘long long unsigned int’, but argument 2 has type ‘__u64’ {aka
  ‘long unsigned int’} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:24:44 -06:00
Steven Rostedt (Google)
23a4b108ac tracing/selftests: Fix kprobe event name test for .isra. functions
The kprobe_eventname.tc test checks if a function with .isra. can have a
kprobe attached to it. It loops through the kallsyms file for all the
functions that have the .isra. name, and checks if it exists in the
available_filter_functions file, and if it does, it uses it to attach a
kprobe to it.

The issue is that kprobes can not attach to functions that are listed more
than once in available_filter_functions. With the latest kernel, the
function that is found is: rapl_event_update.isra.0

  # grep rapl_event_update.isra.0 /sys/kernel/tracing/available_filter_functions
  rapl_event_update.isra.0
  rapl_event_update.isra.0

It is listed twice. This causes the attached kprobe to it to fail which in
turn fails the test. Instead of just picking the function function that is
found in available_filter_functions, pick the first one that is listed
only once in available_filter_functions.

Cc: stable@vger.kernel.org
Fixes: 604e354823 ("selftests/ftrace: Select an existing function in kprobe_eventname test")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:24:31 -06:00
Masami Hiramatsu (Google)
7ea794604b selftests/ftrace: Update required config
Update required config options for running all tests.
This also sorts the config entries alphabetically.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:24:14 -06:00
Masami Hiramatsu (Google)
f6c3c83db1 selftests/ftrace: Fix to check required event file
The dynevent/test_duplicates.tc test case uses `syscalls/sys_enter_openat`
event for defining eprobe on it. Since this `syscalls` events depend on
CONFIG_FTRACE_SYSCALLS=y, if it is not set, the test will fail.

Add the event file to `required` line so that the test will return
`unsupported` result.

Fixes: 297e1dcdca ("selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:24:07 -06:00
Mark Brown
2032e61e24 kselftest/alsa: Ensure _GNU_SOURCE is defined
The pcmtest driver tests use the kselftest harness which requires that
_GNU_SOURCE is defined but nothing causes it to be defined.  Since the
KHDR_INCLUDES Makefile variable has had the required define added let's
use that, this should provide some futureproofing.

Fixes: daef47b89e ("selftests: Compile kselftest headers with -D_GNU_SOURCE")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-29 12:23:57 -06:00
Rob Herring (Arm)
321e4fa68c dt-bindings: arm: stm32: st,mlahb: Drop spurious "reg" property from example
"reg" is not documented nor used for st,mlahb, so drop it from the
example to fix the warning:

Documentation/devicetree/bindings/arm/stm32/st,mlahb.example.dtb: ahb@38000000: Unevaluated properties are not allowed ('reg' was unexpected)
        from schema $id: http://devicetree.org/schemas/arm/stm32/st,mlahb.yaml#

Since "reg" is dropped, the unit-address must be as well.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240523154208.2457864-1-robh@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-05-29 13:22:14 -05:00
Rob Herring (Arm)
84081a8853 dt-bindings: arm: sunxi: Fix incorrect '-' usage
Commit 6bc6bf8a94 ("dt-bindings: arm: sunxi: document Anbernic RG35XX
handheld gaming device variants") mistakenly added '-' on each line
which created empty (i.e. description only) schemas matching anything.
This causes validation to fail on all the root node compatibles as
there are multiple oneOf clauses passing.

Fixes: 6bc6bf8a94 ("dt-bindings: arm: sunxi: document Anbernic RG35XX handheld gaming device variants")
Reviewed-by: Ryan Walklin <ryan@testtoast.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240503154402.967632-1-robh@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-05-29 13:22:14 -05:00
Uwe Kleine-König
95d7c452a2 spi: stm32: Don't warn about spurious interrupts
The dev_warn to notify about a spurious interrupt was introduced with
the reasoning that these are unexpected. However spurious interrupts
tend to trigger continously and the error message on the serial console
prevents that the core's detection of spurious interrupts kicks in
(which disables the irq) and just floods the console.

Fixes: c64e7efe46 ("spi: stm32: make spurious and overrun interrupts visible")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://msgid.link/r/20240521105241.62400-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-29 19:12:09 +01:00
Takashi Iwai
bc42ca002d ALSA: ump: Set default protocol when not given explicitly
When an inquiry of the current protocol via UMP Stream Configuration
message fails by some reason, we may leave the current protocol
undefined, which may lead to unexpected behavior.  Better to assume a
valid protocol found in the protocol capability bits instead.

For a device that doesn't support the UMP v1.2 feature, it won't reach
to this code path, and USB MIDI GTB descriptor would be used for
determining the protocol, instead.

Link: https://lore.kernel.org/r/20240529164723.18309-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-29 18:49:00 +02:00
Takashi Iwai
ac0d71ee53 ALSA: ump: Don't accept an invalid UMP protocol number
When a UMP Stream Configuration message is received, the driver tries
to switch the protocol, but there was no sanity check of the protocol,
hence it can pass an invalid value.  Add the check and bail out if a
wrong value is passed.

Fixes: a798076837 ("ALSA: ump: Add helper to change MIDI protocol")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240529164723.18309-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-29 18:48:51 +02:00
Linus Torvalds
4a4be1ad3a Revert "vfs: Delete the associated dentry when deleting a file"
This reverts commit 681ce86235.

We gave it a try, but it turns out the kernel test robot did in fact
find performance regressions for it, so we'll have to look at the more
involved alternative fixes for Yafang Shao's Elasticsearch load issue.

There were several alternatives discussed, they just weren't as simple
as this first attempt.

The report is of a -7.4% regression of filebench.sum_operations/s, which
appears significant enough to trigger my "this patch may get reverted if
somebody finds a performance regression on some other load" rule.

So it's still the case that we should end up deleting dentries more
aggressively - or just be better at pruning them later - but it needs a
bit more finesse than this simple thing.

Link: https://lore.kernel.org/all/202405291318.4dfbb352-oliver.sang@intel.com/
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-05-29 09:39:34 -07:00
Linus Torvalds
397a83ab97 Merge tag '9p-for-6.10-rc2' of https://github.com/martinetd/linux
Pull 9p fixes from Dominique Martinet:
 "Two fixes headed to stable trees:

   - a trace event was dumping uninitialized values

   - a missing lock that was thought to have exclusive access, and it
     turned out not to"

* tag '9p-for-6.10-rc2' of https://github.com/martinetd/linux:
  9p: add missing locking around taking dentry fid list
  net/9p: fix uninit-value in p9_client_rpc()
2024-05-29 09:25:15 -07:00
Rob Herring (Arm)
1b1c9f0fd3 dt-bindings: kbuild: Fix dt_binding_check on unconfigured build
The 'dt_binding_check' target shouldn't depend on the kernel
configuration, but it has since commit 604a57ba97 ("dt-bindings:
kbuild: Add separate target/dependency for processed-schema.json").
That is because CHECK_DT_BINDING make variable was dropped, but
scripts/dtc/Makefile was missed. The CHECK_DTBS variable can be used
instead.

Reported-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Fixes: 604a57ba97 ("dt-bindings: kbuild: Add separate target/dependency for processed-schema.json")
Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-30 01:15:58 +09:00
Miguel Ojeda
6e58e01735 kheaders: use command -v to test for existence of cpio
Commit 13e1df0928 ("kheaders: explicitly validate existence of cpio
command") added an explicit check for `cpio` using `type`.

However, `type` in `dash` (which is used in some popular distributions
and base images as the shell script runner) prints the missing message
to standard output, and thus no error is printed:

    $ bash -c 'type missing >/dev/null'
    bash: line 1: type: missing: not found
    $ dash -c 'type missing >/dev/null'
    $

For instance, this issue may be seen by loongarch builders, given its
defconfig enables CONFIG_IKHEADERS since commit 9cc1df421f ("LoongArch:
Update Loongson-3 default config file").

Therefore, use `command -v` instead to have consistent behavior, and
take the chance to provide a more explicit error.

Fixes: 13e1df0928 ("kheaders: explicitly validate existence of cpio command")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-30 01:13:20 +09:00
Matthias Maennich
3bd27a847a kheaders: explicitly define file modes for archived headers
Build environments might be running with different umask settings
resulting in indeterministic file modes for the files contained in
kheaders.tar.xz. The file itself is served with 444, i.e. world
readable. Archive the files explicitly with 744,a+X to improve
reproducibility across build environments.

--mode=0444 is not suitable as directories need to be executable. Also,
444 makes it hard to delete all the readonly files after extraction.

Cc: stable@vger.kernel.org
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-30 01:13:20 +09:00
Linus Torvalds
db163660b0 Merge tag 'v6.10-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes a new run-time warning triggered by tpm"

* tag 'v6.10-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: core - Remove add_early_randomness
2024-05-29 09:12:58 -07:00
Jens Axboe
06fe9b1df1 io_uring: don't attempt to mmap larger than what the user asks for
If IORING_FEAT_SINGLE_MMAP is ignored, as can happen if an application
uses an ancient liburing or does setup manually, then 3 mmap's are
required to map the ring into userspace. The kernel will still have
collapsed the mappings, however userspace may ask for mapping them
individually. If so, then we should not use the full number of ring
pages, as it may exceed the partial mapping. Doing so will yield an
-EFAULT from vm_insert_pages(), as we pass in more pages than what the
application asked for.

Cap the number of pages to match what the application asked for, for
the particular mapping operation.

Reported-by: Lucas Mülling <lmulling@proton.me>
Link: https://github.com/axboe/liburing/issues/1157
Fixes: 3ab1db3c60 ("io_uring: get rid of remap_pfn_range() for mapping rings/sqes")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-29 09:53:14 -06:00
Andy Shevchenko
9dedabe95b spi: Assign dummy scatterlist to unidirectional transfers
Commit 8cc3bad9d9 ("spi: Remove unneded check for orig_nents")
introduced a regression: unmapped data could now be passed to the DMA
APIs, resulting in null pointer dereferences. Commit 9f788ba457 ("spi:
Don't mark message DMA mapped when no transfer in it is") and commit
da560097c0 ("spi: Check if transfer is mapped before calling DMA sync
APIs") addressed the problem, but only partially. Unidirectional
transactions will still result in null pointer dereference. To prevent
that from happening, assign a dummy scatterlist when no data is mapped,
so that the DMA API can be called and not result in a null pointer
dereference.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
Closes: https://lore.kernel.org/r/8ae675b5-fcf9-4c9b-b06a-4462f70e1322@linaro.org
Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Closes: https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano
Closes: https://lore.kernel.org/all/4748499f-789c-45a8-b50a-2dd09f4bac8c@notapiano
Fixes: 8cc3bad9d9 ("spi: Remove unneded check for orig_nents")
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
[nfraprado: wrote the commit message]
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://msgid.link/r/20240529-dma-oops-dummy-v1-1-bb43aacfb11b@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-29 16:47:53 +01:00
Mark Brown
ba2e8323d7 ASoC: SOF: add missing MODULE_DESCRIPTION
Merge series from Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>:

'make W=1' now reports missing MODULE_DESCRIPTION lines. This patchset
cleans-up all the module definitions and adds MODULE_DESCRIPTION lines
as needed.
2024-05-29 15:46:05 +01:00
Dmitry Antipov
92ecbb3ac6 wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
When testing the previous patch with CONFIG_UBSAN_BOUNDS, I've
noticed the following:

UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:372:4
index 0 is out of range for type 'struct ieee80211_channel *[]'
CPU: 0 PID: 1435 Comm: wpa_supplicant Not tainted 6.9.0+ #1
Hardware name: LENOVO 20UN005QRT/20UN005QRT <...BIOS details...>
Call Trace:
 <TASK>
 dump_stack_lvl+0x2d/0x90
 __ubsan_handle_out_of_bounds+0xe7/0x140
 ? timerqueue_add+0x98/0xb0
 ieee80211_prep_hw_scan+0x2db/0x480 [mac80211]
 ? __kmalloc+0xe1/0x470
 __ieee80211_start_scan+0x541/0x760 [mac80211]
 rdev_scan+0x1f/0xe0 [cfg80211]
 nl80211_trigger_scan+0x9b6/0xae0 [cfg80211]
 ...<the rest is not too useful...>

Since '__ieee80211_start_scan()' leaves 'hw_scan_req->req.n_channels'
uninitialized, actual boundaries of 'hw_scan_req->req.channels' can't
be checked in 'ieee80211_prep_hw_scan()'. Although an initialization
of 'hw_scan_req->req.n_channels' introduces some confusion around
allocated vs. used VLA members, this shouldn't be a problem since
everything is correctly adjusted soon in 'ieee80211_prep_hw_scan()'.

Cleanup 'kmalloc()' math in '__ieee80211_start_scan()' by using the
convenient 'struct_size()' as well.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://msgid.link/20240517153332.18271-2-dmantipov@yandex.ru
[improve (imho) indentation a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:58:54 +02:00
Lingbo Kong
a26d8dc522 wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
Currently, the way of parsing Spatial Reuse Parameter Set element is
incorrect and some members of struct ieee80211_he_obss_pd are not assigned.

To address this issue, it must be parsed in the order of the elements of
Spatial Reuse Parameter Set defined in the IEEE Std 802.11ax specification.

The diagram of the Spatial Reuse Parameter Set element (IEEE Std 802.11ax
-2021-9.4.2.252).

-------------------------------------------------------------------------
|       |      |         |       |Non-SRG|  SRG  | SRG   | SRG  | SRG   |
|Element|Length| Element |  SR   |OBSS PD|OBSS PD|OBSS PD| BSS  |Partial|
|   ID  |      |   ID    |Control|  Max  |  Min  | Max   |Color | BSSID |
|       |      |Extension|       | Offset| Offset|Offset |Bitmap|Bitmap |
-------------------------------------------------------------------------

Fixes: 1ced169cc1 ("mac80211: allow setting spatial reuse parameters from bss_conf")
Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com>
Link: https://msgid.link/20240516021854.5682-3-quic_lingbok@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:35:12 +02:00
Lingbo Kong
0c2fd18f7e wifi: mac80211: fix Spatial Reuse element size check
Currently, the way to check the size of Spatial Reuse IE data in the
ieee80211_parse_extension_element() is incorrect.

This is because the len variable in the ieee80211_parse_extension_element()
function is equal to the size of Spatial Reuse IE data minus one and the
value of returned by the ieee80211_he_spr_size() function is equal to
the length of Spatial Reuse IE data. So the result of the
len >= ieee80211_he_spr_size(data) statement always false.

To address this issue and make it consistent with the logic used elsewhere
with ieee80211_he_oper_size(), change the
"len >= ieee80211_he_spr_size(data)" to
“len >= ieee80211_he_spr_size(data) - 1”.

Fixes: 9d0480a7c0 ("wifi: mac80211: move element parsing to a new file")
Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com>
Link: https://msgid.link/20240516021854.5682-2-quic_lingbok@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:34:46 +02:00
Emmanuel Grumbach
4bb95f4535 wifi: iwlwifi: mvm: don't read past the mfuart notifcation
In case the firmware sends a notification that claims it has more data
than it has, we will read past that was allocated for the notification.
Remove the print of the buffer, we won't see it by default. If needed,
we can see the content with tracing.

This was reported by KFENCE.

Fixes: bdccdb854f ("iwlwifi: mvm: support MFUART dump in case of MFUART assert")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240513132416.ba82a01a559e.Ia91dd20f5e1ca1ad380b95e68aebf2794f553d9b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:14 +02:00
Ilan Peer
e6dd2936ce wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill
When HW rfkill is toggled to disable the RF, the flow to stop scan is
called. When trying to send the command to abort the scan, since
HW rfkill is toggled, the command is not sent due to rfkill being
asserted, and -ERFKILL is returned from iwl_trans_send_cmd(), but this
is silently ignored in iwl_mvm_send_cmd() and thus the scan abort flow
continues to wait for scan complete notification and fails. Since it
fails, the UID to type mapping is not cleared, and thus a warning is
later fired when trying to stop the interface.

To fix this, modify the UMAC scan abort flow to force sending the
scan abort command even when in rfkill, so stop the FW from accessing
the radio etc.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240513132416.8cbe2f8c1a97.Iffe235c12a919dafec88eef399eb1f7bae2c5bdb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:14 +02:00
Miri Korenblit
60d62757df wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
In some versions of cfg80211, the ssids poinet might be a valid one even
though n_ssids is 0. Accessing the pointer in this case will cuase an
out-of-bound access. Fix this by checking n_ssids first.

Fixes: c1a7515393 ("iwlwifi: mvm: add adaptive dwell support")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240513132416.6e4d1762bf0d.I5a0e6cc8f02050a766db704d15594c61fe583d45@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Ayala Beker
989830d1cf wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option
Ensure that the 6 GHz channel is configured with a valid direct BSSID,
avoiding any invalid or multicast BSSID addresses.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240513132416.91a631a0fe60.I2ea2616af9b8a2eaf959b156c69cf65a2f1204d4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Johannes Berg
4d08c0b335 wifi: iwlwifi: mvm: handle BA session teardown in RF-kill
When entering RF-kill, mac80211 tears down BA sessions, but
due to RF-kill the commands aren't sent to the device. As a
result, there can be frames pending on the reorder buffer or
perhaps even received while doing so, leading to warnings.

Avoid the warnings by doing the BA session teardown normally
even in RF-kill, which also requires queue sync.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240513132416.0762cd80fb3d.I43c5877f3b546159b2db4f36d6d956b333c41cf0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Yedidya Benshimol
08b16d1b59 wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd
The BIGTK cipher field was added to the kek_kck_material_cmd
but wasn't assigned. Fix that by differentiating between the
IGTK/BIGTK keys and assign the ciphers fields accordingly.

Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240513132416.7fd0b22b7267.Ie9b581652b74bd7806980364d59e1b2e78e682c0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Benjamin Berg
cc3ba78f20 wifi: iwlwifi: mvm: remove stale STA link data during restart
If pre-recovery mac80211 tried to disable a link but this disablement
failed, then there might be a mismatch between mac80211 assuming the
link has been disabled and the driver still having the data around.
During recover itself, that is not a problem, but should the link be
activated again at a later point, iwlwifi will refuse the activation as
it detects the inconsistent state.

Solve this corner-case by iterating the station in the restart cleanup
handler.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240513132416.d2fd60338055.I840d4fdce5fd49fe69896d928b071067e3730259@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Shahar S Matityahu
87821b67de wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
The driver should call iwl_dbg_tlv_free even if debugfs is not defined
since ini mode does not depend on debugfs ifdef.

Fixes: 68f6f492c4 ("iwlwifi: trans: support loading ini TLVs from external file")
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Reviewed-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240510170500.c8e3723f55b0.I5e805732b0be31ee6b83c642ec652a34e974ff10@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Mordechay Goodstein
0f2e9f6f21 wifi: iwlwifi: mvm: set properly mac header
In the driver we only use skb_put* for adding data to the skb, hence data
never moves and skb_reset_mac_haeder would set mac_header to the first
time data was added and not to mac80211 header, fix this my using the
actual len of bytes added for setting the mac header.

Fixes: 3f7a9d577d ("wifi: iwlwifi: mvm: simplify by using SKB MAC header pointer")
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240510170500.12f2de2909c3.I72a819b96f2fe55bde192a8fd31a4b96c301aa73@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:13 +02:00
Johannes Berg
4a7aace289 wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
We don't actually support >64 even for HE devices, so revert
back to 64. This fixes an issue where the session is refused
because the queue is configured differently from the actual
session later.

Fixes: 514c30696f ("iwlwifi: add support for IEEE802.11ax")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Liad Kaufman <liad.kaufman@intel.com>
Reviewed-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240510170500.52f7b4cf83aa.If47e43adddf7fe250ed7f5571fbb35d8221c7c47@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:11 +02:00
Yedidya Benshimol
b7ffca9931 wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup
After moving from commands to notificaitons in the d3 resume flow,
removing the WOWLAN_GET_STATUSES and REPLY_OFFLOADS_QUERY_CMD causes
the return of the default value when looking up their version.
Returning zero here results in the driver sending the not supported
NON_QOS_TX_COUNTER_CMD.

Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240510170500.8cabfd580614.If3a0db9851f56041f8f5360959354abd5379224a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:30:09 +02:00
Emmanuel Grumbach
788e4c75f8 wifi: iwlwifi: mvm: fix a crash on 7265
Since IWL_FW_CMD_VER_UNKNOWN = 99, then my change to consider
cmd_ver >= 7 instead of cmd_ver = 7 included also firmwares that don't
advertise the command version at all. This made us send a command with a
bad size and because of that, the firmware hit a BAD_COMMAND immediately
after handling the REDUCE_TX_POWER_CMD command.

Fixes: 8f892e225f ("wifi: iwlwifi: mvm: support iwl_dev_tx_power_cmd_v8")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240512072733.eb20ff5050d3.Ie4fc6f5496cd296fd6ff20d15e98676f28a3cccd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:26:54 +02:00
Shaul Triebitz
98b7017ddb wifi: iwlwifi: mvm: always set the TWT IE offset
In beacon template version 14, make sure to always set
the TWT IE offset before sending the beacon template command,
also in the debugfs inject_beacon_ie path.
If the TWT IE does not exist, the offset will be set to zero.

Fixes: bf0212fd8f ("wifi: iwlwifi: mvm: add beacon template version 14")
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240512152312.eb27175c345a.If30ef24aba10fe47fd42a7a9703eb8903035e294@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:26:05 +02:00
Miri Korenblit
92158790ce wifi: iwlwifi: mvm: don't initialize csa_work twice
The initialization of this worker moved to iwl_mvm_mac_init_mvmvif
but we removed only from the pre-MLD version of the add_interface
callback. Remove it also from the MLD version.

Fixes: 0bcc215598 ("wifi: iwlwifi: mvm: init vif works only once")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240512152312.4f15b41604f0.Iec912158e5a706175531d3736d77d25adf02fba4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:26:05 +02:00
Aditya Kumar Singh
8ecc4d7a7c wifi: mac80211: pass proper link id for channel switch started notification
Original changes[1] posted is having proper changes. However, at the same
time, there was chandef puncturing changes which had a conflict with this.
While applying, two errors crept in -
   a) Whitespace error.
   b) Link ID being passed to channel switch started notifier function is
      0. However proper link ID is present in the function.

Fix these now.

[1] https://lore.kernel.org/all/20240130140918.1172387-5-quic_adisi@quicinc.com/

Fixes: 1a96bb4e8a ("wifi: mac80211: start and finalize channel switch on link basis")
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240509032555.263933-1-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:25:36 +02:00
Johannes Berg
f7a8b10bfd wifi: cfg80211: fix 6 GHz scan request building
The 6 GHz scan request struct allocated by cfg80211_scan_6ghz() is
meant to be formed this way:

 [base struct][channels][ssids][6ghz_params]

It is allocated with [channels] as the maximum number of channels
supported by the driver in the 6 GHz band, since allocation is
before knowing how many there will be.

However, the inner pointers are set incorrectly: initially, the
6 GHz scan parameters pointer is set:

 [base struct][channels]
                        ^ scan_6ghz_params

and later the SSID pointer is set to the end of the actually
_used_ channels.

 [base struct][channels]
                  ^ ssids

If many APs were to be discovered, and many channels used, and
there were many SSIDs, then the SSIDs could overlap the 6 GHz
parameters.

Additionally, the request->ssids for most of the function points
to the original request still (given the struct copy) but is used
normally, which is confusing.

Clear this up, by actually using the allocated space for 6 GHz
parameters _after_ the SSIDs, and set up the SSIDs initially so
they are used more clearly. Just like in nl80211.c, set them
only if there actually are SSIDs though.

Finally, also copy the elements (ie/ie_len) so they're part of
the same request, not pointing to the old request.

Co-developed-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240510113738.4190692ef4ee.I0cb19188be17a8abd029805e3373c0a7777c214c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:25:25 +02:00
Johannes Berg
177c6ae972 wifi: mac80211: handle tasklet frames before stopping
The code itself doesn't want to handle frames from the driver
if it's already stopped, but if the tasklet was queued before
and runs after the stop, then all bets are off. Flush queues
before actually stopping, RX should be off at this point since
all the interfaces are removed already, etc.

Reported-by: syzbot+8830db5d3593b5546d2e@syzkaller.appspotmail.com
Link: https://msgid.link/20240515135318.b05f11385c9a.I41c1b33a2e1814c3a7ef352cd7f2951b91785617@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:25:10 +02:00
Johannes Berg
02c665f048 wifi: mac80211: apply mcast rate only if interface is up
If the interface isn't enabled, don't apply multicast
rate changes immediately.

Reported-by: syzbot+de87c09cc7b964ea2e23@syzkaller.appspotmail.com
Link: https://msgid.link/20240515133410.d6cffe5756cc.I47b624a317e62bdb4609ff7fa79403c0c444d32d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:24:52 +02:00
Lin Ma
ab904521f4 wifi: cfg80211: pmsr: use correct nla_get_uX functions
The commit 9bb7e0f24e ("cfg80211: add peer measurement with FTM
initiator API") defines four attributes NL80211_PMSR_FTM_REQ_ATTR_
{NUM_BURSTS_EXP}/{BURST_PERIOD}/{BURST_DURATION}/{FTMS_PER_BURST} in
following ways.

static const struct nla_policy
nl80211_pmsr_ftm_req_attr_policy[NL80211_PMSR_FTM_REQ_ATTR_MAX + 1] = {
    ...
    [NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP] =
        NLA_POLICY_MAX(NLA_U8, 15),
    [NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD] = { .type = NLA_U16 },
    [NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION] =
        NLA_POLICY_MAX(NLA_U8, 15),
    [NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST] =
        NLA_POLICY_MAX(NLA_U8, 31),
    ...
};

That is, those attributes are expected to be NLA_U8 and NLA_U16 types.
However, the consumers of these attributes in `pmsr_parse_ftm` blindly
all use `nla_get_u32`, which is incorrect and causes functionality issues
on little-endian platforms. Hence, fix them with the correct `nla_get_u8`
and `nla_get_u16` functions.

Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://msgid.link/20240521075059.47999-1-linma@zju.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:23:54 +02:00
Remi Pommarel
642f89daa3 wifi: cfg80211: Lock wiphy in cfg80211_get_station
Wiphy should be locked before calling rdev_get_station() (see lockdep
assert in ieee80211_get_station()).

This fixes the following kernel NULL dereference:

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
 Mem abort info:
   ESR = 0x0000000096000006
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x06: level 2 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000006
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000
 [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000
 Internal error: Oops: 0000000096000006 [#1] SMP
 Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath
 CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705
 Hardware name: RPT (r1) (DT)
 Workqueue: bat_events batadv_v_elp_throughput_metric_update
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core]
 lr : sta_set_sinfo+0xcc/0xbd4
 sp : ffff000007b43ad0
 x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98
 x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000
 x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc
 x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000
 x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d
 x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e
 x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000
 x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000
 x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90
 x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000
 Call trace:
  ath10k_sta_statistics+0x10/0x2dc [ath10k_core]
  sta_set_sinfo+0xcc/0xbd4
  ieee80211_get_station+0x2c/0x44
  cfg80211_get_station+0x80/0x154
  batadv_v_elp_get_throughput+0x138/0x1fc
  batadv_v_elp_throughput_metric_update+0x1c/0xa4
  process_one_work+0x1ec/0x414
  worker_thread+0x70/0x46c
  kthread+0xdc/0xe0
  ret_from_fork+0x10/0x20
 Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814)

This happens because STA has time to disconnect and reconnect before
batadv_v_elp_throughput_metric_update() delayed work gets scheduled. In
this situation, ath10k_sta_state() can be in the middle of resetting
arsta data when the work queue get chance to be scheduled and ends up
accessing it. Locking wiphy prevents that.

Fixes: 7406353d43 ("cfg80211: implement cfg80211_get_station cfg80211 API")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Nicolas Escande <nico.escande@gmail.com>
Acked-by: Antonio Quartulli <a@unstable.cc>
Link: https://msgid.link/983b24a6a176e0800c01aedcd74480d9b551cb13.1716046653.git.repk@triplefau.lt
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:23:41 +02:00
Johannes Berg
e296c95eac wifi: cfg80211: fully move wiphy work to unbound workqueue
Previously I had moved the wiphy work to the unbound
system workqueue, but missed that when it restarts and
during resume it was still using the normal system
workqueue. Fix that.

Fixes: 91d20ab9d9 ("wifi: cfg80211: use system_unbound_wq for wiphy work")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240522124126.7ca959f2cbd3.I3e2a71ef445d167b84000ccf934ea245aef8d395@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:23:33 +02:00
Johannes Berg
4dc3a3893d wifi: cfg80211: validate HE operation element parsing
Validate that the HE operation element has the correct
length before parsing it.

Cc: stable@vger.kernel.org
Fixes: 645f3d8512 ("wifi: cfg80211: handle UHB AP and STA power type")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240523120533.677025eb4a92.I44c091029ef113c294e8fe8b9bf871bf5dbeeb27@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:20:11 +02:00
Remi Pommarel
44c06bbde6 wifi: mac80211: Fix deadlock in ieee80211_sta_ps_deliver_wakeup()
The ieee80211_sta_ps_deliver_wakeup() function takes sta->ps_lock to
synchronizes with ieee80211_tx_h_unicast_ps_buf() which is called from
softirq context. However using only spin_lock() to get sta->ps_lock in
ieee80211_sta_ps_deliver_wakeup() does not prevent softirq to execute
on this same CPU, to run ieee80211_tx_h_unicast_ps_buf() and try to
take this same lock ending in deadlock. Below is an example of rcu stall
that arises in such situation.

 rcu: INFO: rcu_sched self-detected stall on CPU
 rcu:    2-....: (42413413 ticks this GP) idle=b154/1/0x4000000000000000 softirq=1763/1765 fqs=21206996
 rcu:    (t=42586894 jiffies g=2057 q=362405 ncpus=4)
 CPU: 2 PID: 719 Comm: wpa_supplicant Tainted: G        W          6.4.0-02158-g1b062f552873 #742
 Hardware name: RPT (r1) (DT)
 pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : queued_spin_lock_slowpath+0x58/0x2d0
 lr : invoke_tx_handlers_early+0x5b4/0x5c0
 sp : ffff00001ef64660
 x29: ffff00001ef64660 x28: ffff000009bc1070 x27: ffff000009bc0ad8
 x26: ffff000009bc0900 x25: ffff00001ef647a8 x24: 0000000000000000
 x23: ffff000009bc0900 x22: ffff000009bc0900 x21: ffff00000ac0e000
 x20: ffff00000a279e00 x19: ffff00001ef646e8 x18: 0000000000000000
 x17: ffff800016468000 x16: ffff00001ef608c0 x15: 0010533c93f64f80
 x14: 0010395c9faa3946 x13: 0000000000000000 x12: 00000000fa83b2da
 x11: 000000012edeceea x10: ffff0000010fbe00 x9 : 0000000000895440
 x8 : 000000000010533c x7 : ffff00000ad8b740 x6 : ffff00000c350880
 x5 : 0000000000000007 x4 : 0000000000000001 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffff00000ac0e0e8
 Call trace:
  queued_spin_lock_slowpath+0x58/0x2d0
  ieee80211_tx+0x80/0x12c
  ieee80211_tx_pending+0x110/0x278
  tasklet_action_common.constprop.0+0x10c/0x144
  tasklet_action+0x20/0x28
  _stext+0x11c/0x284
  ____do_softirq+0xc/0x14
  call_on_irq_stack+0x24/0x34
  do_softirq_own_stack+0x18/0x20
  do_softirq+0x74/0x7c
  __local_bh_enable_ip+0xa0/0xa4
  _ieee80211_wake_txqs+0x3b0/0x4b8
  __ieee80211_wake_queue+0x12c/0x168
  ieee80211_add_pending_skbs+0xec/0x138
  ieee80211_sta_ps_deliver_wakeup+0x2a4/0x480
  ieee80211_mps_sta_status_update.part.0+0xd8/0x11c
  ieee80211_mps_sta_status_update+0x18/0x24
  sta_apply_parameters+0x3bc/0x4c0
  ieee80211_change_station+0x1b8/0x2dc
  nl80211_set_station+0x444/0x49c
  genl_family_rcv_msg_doit.isra.0+0xa4/0xfc
  genl_rcv_msg+0x1b0/0x244
  netlink_rcv_skb+0x38/0x10c
  genl_rcv+0x34/0x48
  netlink_unicast+0x254/0x2bc
  netlink_sendmsg+0x190/0x3b4
  ____sys_sendmsg+0x1e8/0x218
  ___sys_sendmsg+0x68/0x8c
  __sys_sendmsg+0x44/0x84
  __arm64_sys_sendmsg+0x20/0x28
  do_el0_svc+0x6c/0xe8
  el0_svc+0x14/0x48
  el0t_64_sync_handler+0xb0/0xb4
  el0t_64_sync+0x14c/0x150

Using spin_lock_bh()/spin_unlock_bh() instead prevents softirq to raise
on the same CPU that is holding the lock.

Fixes: 1d147bfa64 ("mac80211: fix AP powersave TX vs. wakeup race")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Link: https://msgid.link/8e36fe07d0fbc146f89196cd47a53c8a0afe84aa.1716910344.git.repk@triplefau.lt
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:19:55 +02:00
Nicolas Escande
6f6291f09a wifi: mac80211: mesh: init nonpeer_pm to active by default in mesh sdata
With a ath9k device I can see that:
	iw phy phy0 interface add mesh0 type mp
	ip link set mesh0 up
	iw dev mesh0 scan

Will start a scan with the Power Management bit set in the Frame Control Field.
This is because we set this bit depending on the nonpeer_pm variable of the mesh
iface sdata and when there are no active links on the interface it remains to
NL80211_MESH_POWER_UNKNOWN.

As soon as links starts to be established, it wil switch to
NL80211_MESH_POWER_ACTIVE as it is the value set by befault on the per sta
nonpeer_pm field.
As we want no power save by default, (as expressed with the per sta ini values),
lets init it to the expected default value of NL80211_MESH_POWER_ACTIVE.

Also please note that we cannot change the default value from userspace prior to
establishing a link as using NL80211_CMD_SET_MESH_CONFIG will not work before
NL80211_CMD_JOIN_MESH has been issued. So too late for our initial scan.

Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Link: https://msgid.link/20240527141759.299411-1-nico.escande@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:19:45 +02:00
Nicolas Escande
b7d7f11a29 wifi: mac80211: mesh: Fix leak of mesh_preq_queue objects
The hwmp code use objects of type mesh_preq_queue, added to a list in
ieee80211_if_mesh, to keep track of mpath we need to resolve. If the mpath
gets deleted, ex mesh interface is removed, the entries in that list will
never get cleaned. Fix this by flushing all corresponding items of the
preq_queue in mesh_path_flush_pending().

This should take care of KASAN reports like this:

unreferenced object 0xffff00000668d800 (size 128):
  comm "kworker/u8:4", pid 67, jiffies 4295419552 (age 1836.444s)
  hex dump (first 32 bytes):
    00 1f 05 09 00 00 ff ff 00 d5 68 06 00 00 ff ff  ..........h.....
    8e 97 ea eb 3e b8 01 00 00 00 00 00 00 00 00 00  ....>...........
  backtrace:
    [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c
    [<00000000049bd418>] kmalloc_trace+0x34/0x80
    [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8
    [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c
    [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4
    [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764
    [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4
    [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440
    [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c
    [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4
    [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508
    [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c
    [<00000000b36425d1>] worker_thread+0x9c/0x634
    [<0000000005852dd5>] kthread+0x1bc/0x1c4
    [<000000005fccd770>] ret_from_fork+0x10/0x20
unreferenced object 0xffff000009051f00 (size 128):
  comm "kworker/u8:4", pid 67, jiffies 4295419553 (age 1836.440s)
  hex dump (first 32 bytes):
    90 d6 92 0d 00 00 ff ff 00 d8 68 06 00 00 ff ff  ..........h.....
    36 27 92 e4 02 e0 01 00 00 58 79 06 00 00 ff ff  6'.......Xy.....
  backtrace:
    [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c
    [<00000000049bd418>] kmalloc_trace+0x34/0x80
    [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8
    [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c
    [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4
    [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764
    [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4
    [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440
    [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c
    [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4
    [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508
    [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c
    [<00000000b36425d1>] worker_thread+0x9c/0x634
    [<0000000005852dd5>] kthread+0x1bc/0x1c4
    [<000000005fccd770>] ret_from_fork+0x10/0x20

Fixes: 050ac52cbe ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Link: https://msgid.link/20240528142605.1060566-1-nico.escande@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-29 15:17:03 +02:00
Witold Sadowski
4a69c1264f spi: cadence: Ensure data lines set to low during dummy-cycle period
During dummy-cycles xSPI will switch GPIO into Hi-Z mode. In that dummy
period voltage on data lines will slowly drop, what can cause
unintentional modebyte transmission. Value send to SPI memory chip will
depend on last address, and clock frequency.
To prevent unforeseen consequences of that behaviour, force send
single modebyte(0x00).
Modebyte will be send only if number of dummy-cycles is not equal
to 0. Code must also reduce dummycycle byte count by one - as one byte
is send as modebyte.

Signed-off-by: Witold Sadowski <wsadowski@marvell.com>
Link: https://msgid.link/r/20240529074037.1345882-2-wsadowski@marvell.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-29 13:43:02 +01:00
Peter Ujfalusi
ffa077b2f6 ASoC: SOF: ipc4-topology: Fix input format query of process modules without base extension
If a process module does not have base config extension then the same
format applies to all of it's inputs and the process->base_config_ext is
NULL, causing NULL dereference when specifically crafted topology and
sequences used.

Fixes: 648fea1284 ("ASoC: SOF: ipc4-topology: set copier output format for process module")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://msgid.link/r/20240529121201.14687-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-29 13:34:44 +01:00
Puranjay Mohan
b1e7cee961 powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
The Linux Kernel Memory Model [1][2] requires RMW operations that have a
return value to be fully ordered.

BPF atomic operations with BPF_FETCH (including BPF_XCHG and
BPF_CMPXCHG) return a value back so they need to be JITed to fully
ordered operations. POWERPC currently emits relaxed operations for
these.

We can show this by running the following litmus-test:

  PPC SB+atomic_add+fetch

  {
      0:r0=x;  (* dst reg assuming offset is 0 *)
      0:r1=2;  (* src reg *)
      0:r2=1;
      0:r4=y;  (* P0 writes to this, P1 reads this *)
      0:r5=z;  (* P1 writes to this, P0 reads this *)
      0:r6=0;

      1:r2=1;
      1:r4=y;
      1:r5=z;
  }

  P0                      | P1            ;
  stw         r2, 0(r4)   | stw  r2,0(r5) ;
                          |               ;
  loop:lwarx  r3, r6, r0  |               ;
  mr          r8, r3      |               ;
  add         r3, r3, r1  | sync          ;
  stwcx.      r3, r6, r0  |               ;
  bne         loop        |               ;
  mr          r1, r8      |               ;
                          |               ;
  lwa         r7, 0(r5)   | lwa  r7,0(r4) ;

  ~exists(0:r7=0 /\ 1:r7=0)

  Witnesses
  Positive: 9 Negative: 3
  Condition ~exists (0:r7=0 /\ 1:r7=0)
  Observation SB+atomic_add+fetch Sometimes 3 9

This test shows that the older store in P0 is reordered with a newer
load to a different address. Although there is a RMW operation with
fetch between them. Adding a sync before and after RMW fixes the issue:

  Witnesses
  Positive: 9 Negative: 0
  Condition ~exists (0:r7=0 /\ 1:r7=0)
  Observation SB+atomic_add+fetch Never 0 9

[1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
[2] https://www.kernel.org/doc/Documentation/atomic_t.txt

Fixes: aea7ef8a82 ("powerpc/bpf/32: add support for BPF_ATOMIC bitwise operations")
Fixes: 2d9206b227 ("powerpc/bpf/32: Add instructions for atomic_[cmp]xchg")
Fixes: dbe6e2456f ("powerpc/bpf/64: add support for atomic fetch operations")
Fixes: 1e82dfaa78 ("powerpc/bpf/64: Add instructions for atomic_[cmp]xchg")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N Rao <naveen@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240513100248.110535-1-puranjay@kernel.org
2024-05-29 22:12:42 +10:00
Edward Adam Davis
068648aab7 nfc/nci: Add the inconsistency check between the input data length and count
write$nci(r0, &(0x7f0000000740)=ANY=[@ANYBLOB="610501"], 0xf)

Syzbot constructed a write() call with a data length of 3 bytes but a count value
of 15, which passed too little data to meet the basic requirements of the function
nci_rf_intf_activated_ntf_packet().

Therefore, increasing the comparison between data length and count value to avoid
problems caused by inconsistent data length and count.

Reported-and-tested-by: syzbot+71bfed2b2bcea46c98f2@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-29 13:08:31 +01:00
Minda Chen
e9022b31db MAINTAINERS: dwmac: starfive: update Maintainer
Update the maintainer of starfive dwmac driver.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Acked-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-29 12:25:02 +01:00
Christian Brauner
a82c13d299 Merge patch series "cachefiles: some bugfixes and cleanups for ondemand requests"
libaokun@huaweicloud.com <libaokun@huaweicloud.com> says:

We've been testing ondemand mode for cachefiles since January, and we're
almost done. We hit a lot of issues during the testing period, and this
patch set fixes some of the issues related to ondemand requests.
The patches have passed internal testing without regression.

The following is a brief overview of the patches, see the patches for
more details.

Patch 1-5: Holding reference counts of reqs and objects on read requests
to avoid malicious restore leading to use-after-free.

Patch 6-10: Add some consistency checks to copen/cread/get_fd to avoid
malicious copen/cread/close fd injections causing use-after-free or hung.

Patch 11: When cache is marked as CACHEFILES_DEAD, flush all requests,
otherwise the kernel may be hung. since this state is irreversible, the
daemon can read open requests but cannot copen.

Patch 12: Allow interrupting a read request being processed by killing
the read process as a way of avoiding hung in some special cases.

 fs/cachefiles/daemon.c            |   3 +-
 fs/cachefiles/internal.h          |   5 +
 fs/cachefiles/ondemand.c          | 217 ++++++++++++++++++++++--------
 include/trace/events/cachefiles.h |   8 +-
 4 files changed, 176 insertions(+), 57 deletions(-)

* patches from https://lore.kernel.org/r/20240522114308.2402121-1-libaokun@huaweicloud.com:
  cachefiles: make on-demand read killable
  cachefiles: flush all requests after setting CACHEFILES_DEAD
  cachefiles: Set object to close if ondemand_id < 0 in copen
  cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
  cachefiles: never get a new anonymous fd if ondemand_id is valid
  cachefiles: add spin_lock for cachefiles_ondemand_info
  cachefiles: add consistency check for copen/cread
  cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
  cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
  cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  cachefiles: remove requests from xarray during flushing requests
  cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd

Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:40 +02:00
Baokun Li
bc9dde6155 cachefiles: make on-demand read killable
Replacing wait_for_completion() with wait_for_completion_killable() in
cachefiles_ondemand_send_req() allows us to kill processes that might
trigger a hunk_task if the daemon is abnormal.

But now only CACHEFILES_OP_READ is killable, because OP_CLOSE and OP_OPEN
is initiated from kworker context and the signal is prohibited in these
kworker.

Note that when the req in xas changes, i.e. xas_load(&xas) != req, it
means that a process will complete the current request soon, so wait
again for the request to be completed.

In addition, add the cachefiles_ondemand_finish_req() helper function to
simplify the code.

Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-13-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:31 +02:00
Baokun Li
85e833cd72 cachefiles: flush all requests after setting CACHEFILES_DEAD
In ondemand mode, when the daemon is processing an open request, if the
kernel flags the cache as CACHEFILES_DEAD, the cachefiles_daemon_write()
will always return -EIO, so the daemon can't pass the copen to the kernel.
Then the kernel process that is waiting for the copen triggers a hung_task.

Since the DEAD state is irreversible, it can only be exited by closing
/dev/cachefiles. Therefore, after calling cachefiles_io_error() to mark
the cache as CACHEFILES_DEAD, if in ondemand mode, flush all requests to
avoid the above hungtask. We may still be able to read some of the cached
data before closing the fd of /dev/cachefiles.

Note that this relies on the patch that adds reference counting to the req,
otherwise it may UAF.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-12-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:31 +02:00
Zizhi Wo
4f8703fb34 cachefiles: Set object to close if ondemand_id < 0 in copen
If copen is maliciously called in the user mode, it may delete the request
corresponding to the random id. And the request may have not been read yet.

Note that when the object is set to reopen, the open request will be done
with the still reopen state in above case. As a result, the request
corresponding to this object is always skipped in select_req function, so
the read request is never completed and blocks other process.

Fix this issue by simply set object to close if its id < 0 in copen.

Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-11-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:30 +02:00
Baokun Li
4b4391e77a cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
After installing the anonymous fd, we can now see it in userland and close
it. However, at this point we may not have gotten the reference count of
the cache, but we will put it during colse fd, so this may cause a cache
UAF.

So grab the cache reference count before fd_install(). In addition, by
kernel convention, fd is taken over by the user land after fd_install(),
and the kernel should not call close_fd() after that, i.e., it should call
fd_install() after everything is ready, thus fd_install() is called after
copy_to_user() succeeds.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-10-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:30 +02:00
Baokun Li
4988e35e95 cachefiles: never get a new anonymous fd if ondemand_id is valid
Now every time the daemon reads an open request, it gets a new anonymous fd
and ondemand_id. With the introduction of "restore", it is possible to read
the same open request more than once, and therefore an object can have more
than one anonymous fd.

If the anonymous fd is not unique, the following concurrencies will result
in an fd leak:

     t1     |         t2         |          t3
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              cachefiles_ondemand_get_fd
                load->fd = fd0
                ondemand_id = object_id0
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                   // restore REQ_A
                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
                                      cachefiles_ondemand_get_fd
                                        load->fd = fd1
                                        ondemand_id = object_id1
             process_open_req(REQ_A)
             write(devfd, ("copen %u,%llu", msg->msg_id, size))
             cachefiles_ondemand_copen
              xa_erase(&cache->reqs, id)
              complete(&REQ_A->done)
   kfree(REQ_A)
                                  process_open_req(REQ_A)
                                  // copen fails due to no req
                                  // daemon close(fd1)
                                  cachefiles_ondemand_fd_release
                                   // set object closed
 -- umount --
 cachefiles_withdraw_cookie
  cachefiles_ondemand_clean_object
   cachefiles_ondemand_init_close_req
    if (!cachefiles_ondemand_object_is_open(object))
      return -ENOENT;
    // The fd0 is not closed until the daemon exits.

However, the anonymous fd holds the reference count of the object and the
object holds the reference count of the cookie. So even though the cookie
has been relinquished, it will not be unhashed and freed until the daemon
exits.

In fscache_hash_cookie(), when the same cookie is found in the hash list,
if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new
cookie waits for the old cookie to be unhashed, while the old cookie is
waiting for the leaked fd to be closed, if the daemon does not exit in time
it will trigger a hung task.

To avoid this, allocate a new anonymous fd only if no anonymous fd has
been allocated (ondemand_id == 0) or if the previously allocated anonymous
fd has been closed (ondemand_id == -1). Moreover, returns an error if
ondemand_id is valid, letting the daemon know that the current userland
restore logic is abnormal and needs to be checked.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-9-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:30 +02:00
Baokun Li
0a79004083 cachefiles: add spin_lock for cachefiles_ondemand_info
The following concurrency may cause a read request to fail to be completed
and result in a hung:

           t1             |             t2
---------------------------------------------------------
                            cachefiles_ondemand_copen
                              req = xa_erase(&cache->reqs, id)
// Anon fd is maliciously closed.
cachefiles_ondemand_fd_release
  xa_lock(&cache->reqs)
  cachefiles_ondemand_set_object_close(object)
  xa_unlock(&cache->reqs)
                              cachefiles_ondemand_set_object_open
                              // No one will ever close it again.
cachefiles_ondemand_daemon_read
  cachefiles_ondemand_select_req
  // Get a read req but its fd is already closed.
  // The daemon can't issue a cread ioctl with an closed fd, then hung.

So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and
state, thus we can avoid the above problem in cachefiles_ondemand_copen()
by using ondemand_id to determine if fd has been closed.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-8-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:30 +02:00
Baokun Li
a26dc49df3 cachefiles: add consistency check for copen/cread
This prevents malicious processes from completing random copen/cread
requests and crashing the system. Added checks are listed below:

  * Generic, copen can only complete open requests, and cread can only
    complete read requests.
  * For copen, ondemand_id must not be 0, because this indicates that the
    request has not been read by the daemon.
  * For cread, the object corresponding to fd and req should be the same.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-7-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:30 +02:00
Baokun Li
3e6d704f02 cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
The err_put_fd label is only used once, so remove it to make the code
more readable. In addition, the logic for deleting error request and
CLOSE request is merged to simplify the code.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-6-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:29 +02:00
Baokun Li
da4a827416 cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
We got the following issue in a fuzz test of randomly issuing the restore
command:

==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60
Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963

CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564
Call Trace:
 kasan_report+0x93/0xc0
 cachefiles_ondemand_daemon_read+0xb41/0xb60
 vfs_read+0x169/0xb50
 ksys_read+0xf5/0x1e0

Allocated by task 116:
 kmem_cache_alloc+0x140/0x3a0
 cachefiles_lookup_cookie+0x140/0xcd0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]

Freed by task 792:
 kmem_cache_free+0xfe/0x390
 cachefiles_put_object+0x241/0x480
 fscache_cookie_state_machine+0x5c8/0x1230
 [...]
==================================================================

Following is the process that triggers the issue:

     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
cachefiles_withdraw_cookie
 cachefiles_ondemand_clean_object(object)
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)

            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              msg->object_id = req->object->ondemand->ondemand_id
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                  xas_for_each(&xas, req, ULONG_MAX)
                                   xas_set_mark(&xas, CACHEFILES_REQ_NEW)

                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
              copy_to_user(_buffer, msg, n)
               xa_erase(&cache->reqs, id)
               complete(&REQ_A->done)
              ------ close(fd) ------
              cachefiles_ondemand_fd_release
               cachefiles_put_object
 cachefiles_put_object
  kmem_cache_free(cachefiles_object_jar, object)
                                    REQ_A->object->ondemand->ondemand_id
                                     // object UAF !!!

When we see the request within xa_lock, req->object must not have been
freed yet, so grab the reference count of object before xa_unlock to
avoid the above issue.

Fixes: 0a7e54c195 ("cachefiles: resend an open request if the read request's object is closed")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-5-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:29 +02:00
Baokun Li
de3e26f9e5 cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
We got the following issue in a fuzz test of randomly issuing the restore
command:

==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962

CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
Call Trace:
 kasan_report+0x94/0xc0
 cachefiles_ondemand_daemon_read+0x609/0xab0
 vfs_read+0x169/0xb50
 ksys_read+0xf5/0x1e0

Allocated by task 626:
 __kmalloc+0x1df/0x4b0
 cachefiles_ondemand_send_req+0x24d/0x690
 cachefiles_create_tmpfile+0x249/0xb30
 cachefiles_create_file+0x6f/0x140
 cachefiles_look_up_object+0x29c/0xa60
 cachefiles_lookup_cookie+0x37d/0xca0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]

Freed by task 626:
 kfree+0xf1/0x2c0
 cachefiles_ondemand_send_req+0x568/0x690
 cachefiles_create_tmpfile+0x249/0xb30
 cachefiles_create_file+0x6f/0x140
 cachefiles_look_up_object+0x29c/0xa60
 cachefiles_lookup_cookie+0x37d/0xca0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]
==================================================================

Following is the process that triggers the issue:

     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)

            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              cachefiles_ondemand_get_fd
              copy_to_user(_buffer, msg, n)
            process_open_req(REQ_A)
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                  xas_for_each(&xas, req, ULONG_MAX)
                                   xas_set_mark(&xas, CACHEFILES_REQ_NEW);

                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req

             write(devfd, ("copen %u,%llu", msg->msg_id, size));
             cachefiles_ondemand_copen
              xa_erase(&cache->reqs, id)
              complete(&REQ_A->done)
   kfree(REQ_A)
                                    cachefiles_ondemand_get_fd(REQ_A)
                                     fd = get_unused_fd_flags
                                     file = anon_inode_getfile
                                     fd_install(fd, file)
                                     load = (void *)REQ_A->msg.data;
                                     load->fd = fd;
                                     // load UAF !!!

This issue is caused by issuing a restore command when the daemon is still
alive, which results in a request being processed multiple times thus
triggering a UAF. So to avoid this problem, add an additional reference
count to cachefiles_req, which is held while waiting and reading, and then
released when the waiting and reading is over.

Note that since there is only one reference count for waiting, we need to
avoid the same request being completed multiple times, so we can only
complete the request if it is successfully removed from the xarray.

Fixes: e73fa11a35 ("cachefiles: add restore command to recover inflight ondemand read requests")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-4-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:29 +02:00
Baokun Li
0fc75c5940 cachefiles: remove requests from xarray during flushing requests
Even with CACHEFILES_DEAD set, we can still read the requests, so in the
following concurrency the request may be used after it has been freed:

     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
                                  // close dev fd
                                  cachefiles_flush_reqs
                                   complete(&REQ_A->done)
   kfree(REQ_A)
              xa_lock(&cache->reqs);
              cachefiles_ondemand_select_req
                req->msg.opcode != CACHEFILES_OP_READ
                // req use-after-free !!!
              xa_unlock(&cache->reqs);
                                   xa_destroy(&cache->reqs)

Hence remove requests from cache->reqs when flushing them to avoid
accessing freed requests.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-3-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:29 +02:00
Baokun Li
cc5ac966f2 cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
This lets us see the correct trace output.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-2-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 13:03:29 +02:00
Pierre-Louis Bossart
b062938fd9 ASoC: Intel: sof-sdw: fix missing SPI_MASTER dependency
The addition of the Cirrus Logic 'sidecar' amps adds a dependency on SPI_MASTER.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SND_SOC_CS35L56_SPI
   Depends on [n]:
     SOUND [=y] && SND [=y] && SND_SOC [=y] && SPI_MASTER [=n] &&
     (SOUNDWIRE [=y] || !SOUNDWIRE [=y])
   Selected by [y]:
    - SND_SOC_INTEL_SOUNDWIRE_SOF_MACH [=y] && SOUND [=y] && SND [=y]
      && SND_SOC [=y] && SND_SOC_INTEL_MACH [=y] &&
      SND_SOC_SOF_INTEL_SOUNDWIRE [=y] && I2C [=y] && ACPI [=y] &&
      (MFD_INTEL_LPSS [=y] || COMPILE_TEST [=n]) &&
      (SND_SOC_INTEL_USER_FRIENDLY_LONG_NAMES [=y] || COMPILE_TEST
      [=n]) && SOUNDWIRE [=y]

Fixes: b831b4dca4 ("ASoC: intel: sof_sdw: Add support for cs42l43-cs35l56 sidecar amps")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405140758.o2HY4nYD-lkp@intel.com/
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://msgid.link/r/20240527191940.30107-1-pierre-louis.bossart@linux.intel.com
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-29 11:07:48 +01:00
Alexandre Belloni
6d40dbc758 ALSA: pcm: fix typo in comment
Fix the typo in the comment for SNDRV_PCM_RATE_KNOT

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20240528191850.63314-1-alexandre.belloni@bootlin.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-29 10:40:36 +02:00
John Garry
ed7ee6a69f statx: Update offset commentary for struct statx
In commit 2a82bb0294 ("statx: stx_subvol"), a new member was added to
struct statx, but the offset comment was not correct. Update it.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240529081725.3769290-1-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-29 10:40:19 +02:00
Takashi Iwai
fe85f6e607 ALSA: ump: Don't clear bank selection after sending a program change
The current code clears the bank selection MSB/LSB after sending a
program change, but this can be wrong, as many apps may not send the
full bank selection with both MSB and LSB but sending only one.
Better to keep the previous bank set.

Fixes: 0b5288f5fe ("ALSA: ump: Add legacy raw MIDI support")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240529083823.5778-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-29 10:39:50 +02:00
Takashi Iwai
edb3277619 ALSA: seq: Fix incorrect UMP type for system messages
When converting a legacy system message to a UMP packet, it forgot to
modify the UMP type field but keeping the default type (either type 2
or 4).  Correct to the right type for system messages.

Fixes: e9e02819a9 ("ALSA: seq: Automatic conversion of UMP events")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240529083800.5742-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-29 10:39:40 +02:00
Imre Deak
75800e2e42 drm/i915: Fix audio component initialization
After registering the audio component in i915_audio_component_init()
the audio driver may call i915_audio_component_get_power() via the
component ops. This could program AUD_FREQ_CNTRL with an uninitialized
value if the latter function is called before display.audio.freq_cntrl
gets initialized. The get_power() function also does a modeset which in
the above case happens too early before the initialization step and
triggers the

"Reject display access from task"

error message added by the Fixes: commit below.

Fix the above issue by registering the audio component only after the
initialization step.

Fixes: 87c1694533 ("drm/i915: save AUD_FREQ_CNTRL state at audio domain suspend")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10291
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521143022.3784539-1-imre.deak@intel.com
(cherry picked from commit fdd0b80172)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:48 +03:00
Vidya Srinivas
43e2b37e2a drm/i915/dpt: Make DPT object unshrinkable
In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.

Cc: stable@vger.kernel.org
Reported-by: Shawn Lee <shawn.c.lee@intel.com>
Fixes: 0dc987b699 ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Vidya Srinivas <vidya.srinivas@intel.com>
[vsyrjala: Add TODO comment]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240520165634.1162470-1-vidya.srinivas@intel.com
(cherry picked from commit 51064d471c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:43 +03:00
Andi Shyti
ee01b6a386 drm/i915/gt: Fix CCS id's calculation for CCS mode setting
The whole point of the previous fixes has been to change the CCS
hardware configuration to generate only one stream available to
the compute users. We did this by changing the info.engine_mask
that is set during device probe, reset during the detection of
the fused engines, and finally reset again when choosing the CCS
mode.

We can't use the engine_mask variable anymore, as with the
current configuration, it imposes only one CCS no matter what the
hardware configuration is.

Before changing the engine_mask for the third time, save it and
use it for calculating the CCS mode.

After the previous changes, the user reported a performance drop
to around 1/4. We have tested that the compute operations, with
the current patch, have improved by the same factor.

Fixes: 6db31251bb ("drm/i915/gt: Enable only one CCS for compute workload")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Gnattu OC <gnattuoc@me.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Jian Ye <jian.ye@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Tested-by: Gnattu OC <gnattuoc@me.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517090616.242529-1-andi.shyti@linux.intel.com
(cherry picked from commit a09d2327a9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:38 +03:00
Dmitry Baryshkov
8c318cb70c drm/panel/lg-sw43408: mark sw43408_backlight_ops as static
Fix sparse warning regarding symbol 'sw43408_backlight_ops' not being
declared.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404200739.hbWZvOhR-lkp@intel.com/
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Fixes: 069a6c0e94 ("drm: panel: Add LG sw43408 panel driver")
Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240528-panel-sw43408-fix-v4-2-330b42445bcc@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2024-05-29 11:35:32 +03:00
Nirmoy Das
659a3062c7 drm/i915/selftests: Set always_coherent to false when reading from CPU
Commit 8d4ba9fc1c ("drm/i915/selftests: Pick correct caching mode.")
was not complete  as for non LLC  sharing platforms cpu read can happen
from LLC which probably doesn't have the latest changes made by GPU.

Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Fixes: 8d4ba9fc1c ("drm/i915/selftests: Pick correct caching mode.")
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516151403.2875-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
(cherry picked from commit 007ed70831)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:32 +03:00
Dmitry Baryshkov
33defcacd2 drm/panel/lg-sw43408: select CONFIG_DRM_DISPLAY_DP_HELPER
This panel driver uses DSC PPS functions and as such depends on the
DRM_DISPLAY_DP_HELPER. Select this symbol to make required functions
available to the driver.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404200800.kYsRYyli-lkp@intel.com/
Fixes: 069a6c0e94 ("drm: panel: Add LG sw43408 panel driver")
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240528-panel-sw43408-fix-v4-1-330b42445bcc@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2024-05-29 11:35:32 +03:00
Arnd Bergmann
d4f36db623 drm/i915/guc: avoid FIELD_PREP warning
With gcc-7 and earlier, there are lots of warnings like

In file included from <command-line>:0:0:
In function '__guc_context_policy_add_priority.isra.66',
    inlined from '__guc_context_set_prio.isra.67' at drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3292:3,
    inlined from 'guc_context_set_prio' at drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3320:2:
include/linux/compiler_types.h:399:38: error: call to '__compiletime_assert_631' declared with attribute error: FIELD_PREP: mask is not constant
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
...
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2422:3: note: in expansion of macro 'FIELD_PREP'
   FIELD_PREP(GUC_KLV_0_KEY, GUC_CONTEXT_POLICIES_KLV_ID_##id) | \
   ^~~~~~~~~~

Make sure that GUC_KLV_0_KEY is an unsigned value to avoid the warning.

Fixes: 77b6f79df6 ("drm/i915/guc: Update to GuC version 69.0.3")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430164809.482131-1-julia.filipchuk@intel.com
(cherry picked from commit 364e039827)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:26 +03:00
Chris Wilson
70cb9188ff drm/i915/gt: Disarm breadcrumbs if engines are already idle
The breadcrumbs use a GT wakeref for guarding the interrupt, but are
disarmed during release of the engine wakeref. This leaves a hole where
we may attach a breadcrumb just as the engine is parking (after it has
parked its breadcrumbs), execute the irq worker with some signalers still
attached, but never be woken again.

That issue manifests itself in CI with IGT runner timeouts while tests
are waiting indefinitely for release of all GT wakerefs.

<6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats
<7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5
<7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4
<7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3
<7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2
<7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off
<7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02
<4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT.
...
<6> [299.356526] sysrq: Show State
...
<6> [299.373964] task:i915_selftest   state:D stack:11784 pid:5578  tgid:5578  ppid:873    flags:0x00004002
<6> [299.373967] Call Trace:
<6> [299.373968]  <TASK>
<6> [299.373970]  __schedule+0x3bb/0xda0
<6> [299.373974]  schedule+0x41/0x110
<6> [299.373976]  intel_wakeref_wait_for_idle+0x82/0x100 [i915]
<6> [299.374083]  ? __pfx_var_wake_function+0x10/0x10
<6> [299.374087]  live_engine_busy_stats+0x9b/0x500 [i915]
<6> [299.374173]  __i915_subtests+0xbe/0x240 [i915]
<6> [299.374277]  ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
<6> [299.374369]  ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
<6> [299.374456]  intel_engine_live_selftests+0x1c/0x30 [i915]
<6> [299.374547]  __run_selftests+0xbb/0x190 [i915]
<6> [299.374635]  i915_live_selftests+0x4b/0x90 [i915]
<6> [299.374717]  i915_pci_probe+0x10d/0x210 [i915]

At the end of the interrupt worker, if there are no more engines awake,
disarm the breadcrumb and go to sleep.

Fixes: 9d5612ca16 ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: <stable@vger.kernel.org> # v5.12+
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423165505.465734-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit fbad43ecca)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:15 +03:00
Janusz Krzysztofik
647535760a Revert "drm/i915: Remove extra multi-gt pm-references"
This reverts commit 1f33dc0c11.

There was a patch supposed to fix an issue of illegal attempts to free a
still active i915 VMA object when parking a GT believed to be idle,
reported by CI on 2-GT Meteor Lake.  As a solution, an extra wakeref for
a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit
f56fe3e917 ("drm/i915: Fix a VMA UAF for multi-gt platform").

However, that fix occurred insufficient -- the issue was still reported by
CI.  That wakeref was released on exit from i915_gem_do_execbuffer(), then
potentially before completion of the request and deactivation of its
associated VMAs.  Moreover, CI reports indicated that single-GT platforms
also suffered sporadically from the same race.

Since that issue was fixed by another commit f3c71b2ded ("drm/i915/vma:
Fix UAF on destroy against retire race"), the changes introduced by that
insufficient fix were dropped as no longer useful.  However, that series
resulted in another VMA UAF scenario now being triggered in CI.

<4> [260.290809] ------------[ cut here ]------------
<4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510)
<4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0
..
<4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
<4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
<4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0
...
<4> [260.291087] Call Trace:
<4> [260.291089]  <TASK>
<4> [260.291124]  i915_vma_reopen+0x43/0x80 [i915]
<4> [260.291298]  eb_lookup_vmas+0x9cb/0xcc0 [i915]
<4> [260.291579]  i915_gem_do_execbuffer+0xc9a/0x26d0 [i915]
<4> [260.291883]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
...
<4> [260.292301]  </TASK>
...
<4> [260.292506] ---[ end trace 0000000000000000 ]---
<4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI
<4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G        W          6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
<4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
<4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915]
...
<4> [260.428756] Call Trace:
<4> [260.431192]  <TASK>
<4> [639.283393]  i915_gem_do_execbuffer+0xd05/0x26d0 [i915]
<4> [639.305245]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
...
<4> [639.411134]  </TASK>
...
<4> [639.449979] ---[ end trace 0000000000000000 ]---

We defer actually closing, unbinding and destroying a VMA until next idle
point, or until the object is freed in the meantime.  By postponing the
unbind, we allow for the VMA to be reopened by the client, avoiding the
work required to rebind the VMA.

Starting from commit b0647a5e79 ("drm/i915: Avoid live-lock with
i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA
would be reopened while we destroy them.  That assumption is no longer
true in multi-GT configurations, where a VMA we reopen may be handled by a
GT different from the one that we already keep active via its engine while
we set up an execbuf request.

Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer()
processing path seems to fix this issue.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com>
Fixes: 1f33dc0c11 ("drm/i915: Remove extra multi-gt pm-references")
Link: https://patchwork.freedesktop.org/patch/msgid/20240506180253.96858-2-janusz.krzysztofik@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 749670a58d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-29 11:35:08 +03:00
Chen-Yu Tsai
e06a698ae6 scripts/make_fit: Drop fdt image entry compatible string
According to the FIT image source file format document found in U-boot [1]
and the split-out FIT image specification [2], under "'/images' node" ->
"Conditionally mandatory property", the "compatible" property is described
as "compatible method for loading image", i.e., not the compatible string
embedded in the FDT or used for matching.

Drop the compatible string from the fdt image entry node.

While at it also fix up a typo in the document section of output_dtb.

[1] U-boot source "doc/usage/fit/source_file_format.rst", or on the website:
    https://docs.u-boot.org/en/latest/usage/fit/source_file_format.html
[2] https://github.com/open-source-firmware/flat-image-tree/blob/main/source/chapter2-source-file-format.rst

Fixes: 7a23b027ec ("arm64: boot: Support Flat Image Tree")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:41:21 +09:00
Masahiro Yamada
3c562a70cf kbuild: remove a stale comment about cleaning in link-vmlinux.sh
Remove the left-over of commit 51eb95e2da ("kbuild: Don't remove
link-vmlinux temporary files on exit/signal").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Masahiro Yamada
3430f65d61 kbuild: fix short log for AS in link-vmlinux.sh
In convention, short logs print the output file, not the input file.

Let's change the suffix for 'AS' since it assembles *.S into *.o.

[Before]

  LD      .tmp_vmlinux.kallsyms1
  NM      .tmp_vmlinux.kallsyms1.syms
  KSYMS   .tmp_vmlinux.kallsyms1.S
  AS      .tmp_vmlinux.kallsyms1.S
  LD      .tmp_vmlinux.kallsyms2
  NM      .tmp_vmlinux.kallsyms2.syms
  KSYMS   .tmp_vmlinux.kallsyms2.S
  AS      .tmp_vmlinux.kallsyms2.S
  LD      vmlinux

[After]

  LD      .tmp_vmlinux.kallsyms1
  NM      .tmp_vmlinux.kallsyms1.syms
  KSYMS   .tmp_vmlinux.kallsyms1.S
  AS      .tmp_vmlinux.kallsyms1.o
  LD      .tmp_vmlinux.kallsyms2
  NM      .tmp_vmlinux.kallsyms2.syms
  KSYMS   .tmp_vmlinux.kallsyms2.S
  AS      .tmp_vmlinux.kallsyms2.o
  LD      vmlinux

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Masahiro Yamada
b18b047002 kbuild: change scripts/mksysmap into sed script
The previous commit removed the subshell execution from scripts/mksysmap,
which is now simple enough to become a sed script.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Masahiro Yamada
04b8cb0945 kbuild: avoid unneeded kallsyms step 3
Since commit 951bcae6c5 ("kallsyms: Avoid weak references for kallsyms
symbols"), the kallsyms step 3 always occurs.

You can compare the build logs.

[Before 951bcae6c5]

  $ git checkout 951bcae6c5a0^
  $ make defconfig all
       [ snip ]
    LD      .tmp_vmlinux.kallsyms1
    NM      .tmp_vmlinux.kallsyms1.syms
    KSYMS   .tmp_vmlinux.kallsyms1.S
    AS      .tmp_vmlinux.kallsyms1.S
    LD      .tmp_vmlinux.kallsyms2
    NM      .tmp_vmlinux.kallsyms2.syms
    KSYMS   .tmp_vmlinux.kallsyms2.S
    AS      .tmp_vmlinux.kallsyms2.S
    LD      vmlinux

[After 951bcae6c5]

  $ git checkout 951bcae6c5
  $ make defconfig all
       [ snip ]
    LD      .tmp_vmlinux.kallsyms1
    NM      .tmp_vmlinux.kallsyms1.syms
    KSYMS   .tmp_vmlinux.kallsyms1.S
    AS      .tmp_vmlinux.kallsyms1.S
    LD      .tmp_vmlinux.kallsyms2
    NM      .tmp_vmlinux.kallsyms2.syms
    KSYMS   .tmp_vmlinux.kallsyms2.S
    AS      .tmp_vmlinux.kallsyms2.S
    LD      .tmp_vmlinux.kallsyms3             # should not happen
    NM      .tmp_vmlinux.kallsyms3.syms        # should not happen
    KSYMS   .tmp_vmlinux.kallsyms3.S           # should not happen
    AS      .tmp_vmlinux.kallsyms3.S           # should not happen
    LD      vmlinux

The resulting vmlinux is correct, but it always requires an additional
linking step.

The symbols produced by kallsyms are excluded from kallsyms itself
because they were previously missing in step 1. With those symbols
excluded, the symbol lists matched between step 1 and step 2,
eliminating the need for step 3. Now, this has a negative effect.

Since 951bcae6c5, the PROVIDE() directives provide the fallback
definitions, which are not trimmed from the sysbol list in step 1
because ${kallsymso_prev} is empty at this point.

In step 2, ${kallsymso_prev} is set, and the kallsyms_* symbols are
trimmed from the symbol list.

Due to the table size difference between step 1 and step 2 (the former
is larger due to the presence of kallsyms_*), step 3 is triggered.

Now that the kallsyms_* symbols are always linked, let's stop omitting
them from kallsyms. This avoids unnecessary step 3.

Fixes: 951bcae6c5 ("kallsyms: Avoid weak references for kallsyms symbols")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Douglas Anderson
659bbf7e1b kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)
Recently we went through the source tree and replaced
$(srctree)/$(src) w/ $(src). However, the gdb scripts Makefile had a
hidden $(srctree)/$(src) that looked like this:

  $(abspath $(srctree))/$(src)

Because we missed that then my installed kernel had symlinks that
looked like this:

  __init__.py ->
    ${INSTALL_DIR}/$(INSTALL_DIR}/scripts/gdb/linux/__init__.py

Let's also replace the midden $(abspath $(srctree))/$(src) with
$(src). Now:

  __init__.py ->
    $(INSTALL_DIR}/scripts/gdb/linux/__init__.py

Fixes: b1992c3772 ("kbuild: use $(src) instead of $(srctree)/$(src) for source directory")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Masahiro Yamada
31894d35b5 kconfig: remove redundant check in expr_join_or()
The check for 'sym1 == sym2' is redundant here because it has already
been done a few lines above:

    if (sym1 != sym2)
            return NULL;

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Masahiro Yamada
aabdc960a2 kconfig: fix comparison to constant symbols, 'm', 'n'
Currently, comparisons to 'm' or 'n' result in incorrect output.

[Test Code]

    config MODULES
            def_bool y
            modules

    config A
            def_tristate m

    config B
            def_bool A > n

CONFIG_B is unset, while CONFIG_B=y is expected.

The reason for the issue is because Kconfig compares the tristate values
as strings.

Currently, the .type fields in the constant symbol definitions,
symbol_{yes,mod,no} are unspecified, i.e., S_UNKNOWN.

When expr_calc_value() evaluates 'A > n', it checks the types of 'A' and
'n' to determine how to compare them.

The left-hand side, 'A', is a tristate symbol with a value of 'm', which
corresponds to a numeric value of 1. (Internally, 'y', 'm', and 'n' are
represented as 2, 1, and 0, respectively.)

The right-hand side, 'n', has an unknown type, so it is treated as the
string "n" during the comparison.

expr_calc_value() compares two values numerically only when both can
have numeric values. Otherwise, they are compared as strings.

    symbol    numeric value    ASCII code
    -------------------------------------
      y           2             0x79
      m           1             0x6d
      n           0             0x6e

'm' is greater than 'n' if compared numerically (since 1 is greater
than 0), but smaller than 'n' if compared as strings (since the ASCII
code 0x6d is smaller than 0x6e).

Specifying .type=S_TRISTATE for symbol_{yes,mod,no} fixes the above
test code.

Doing so, however, would cause a regression to the following test code.

[Test Code 2]

    config MODULES
            def_bool n
            modules

    config A
            def_tristate n

    config B
            def_bool A = m

You would get CONFIG_B=y, while CONFIG_B should not be set.

The reason is because sym_get_string_value() turns 'm' into 'n' when the
module feature is disabled. Consequently, expr_calc_value() evaluates
'A = n' instead of 'A = m'. This oddity has been hidden because the type
of 'm' was previously S_UNKNOWN instead of S_TRISTATE.

sym_get_string_value() should not tweak the string because the tristate
value has already been correctly calculated. There is no reason to
return the string "n" where its tristate value is mod.

Fixes: 31847b67be ("kconfig: allow use of relations other than (in)equality")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Masahiro Yamada
a607468b52 kconfig: remove unused expr_is_no()
This has not been used since commit e911503085 ("Kconfig: Remove
bad inference rules expr_eliminate_dups2()").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-29 16:40:03 +09:00
Adrián Larumbe
3b8407e81e drm/gem-shmem: Add import attachment warning to locked pin function
Commit ec144244a4 ("drm/gem-shmem: Acquire reservation lock in GEM
pin/unpin callbacks") moved locking DRM object's dma reservation to
drm_gem_shmem_object_pin, and made drm_gem_shmem_pin_locked public, so
we need to make sure the not-imported check warning is also added to
the latter.

Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: a780278472 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523113236.432585-4-adrian.larumbe@collabora.com
2024-05-29 09:30:44 +02:00
Adrián Larumbe
8c2f5dd0c3 drm/lima: Fix dma_resv deadlock at drm object pin time
Commit a780278472 ("drm/gem: Acquire reservation lock in
drm_gem_{pin/unpin}()") moved locking the DRM object's dma reservation to
drm_gem_pin(), but Lima's pin callback kept calling drm_gem_shmem_pin,
which also tries to lock the same dma_resv, leading to a double lock
situation.

As was already done for Panfrost in the previous commit, fix it by
replacing drm_gem_shmem_pin() with its locked variant.

Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Fixes: a780278472 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Val Packett <val@packett.cool>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523113236.432585-3-adrian.larumbe@collabora.com
2024-05-29 09:30:39 +02:00
Adrián Larumbe
e57f2187cc drm/panfrost: Fix dma_resv deadlock at drm object pin time
When Panfrost must pin an object that is being prepared a dma-buf
attachment for on behalf of another driver, the core drm gem object pinning
code already takes a lock on the object's dma reservation.

However, Panfrost GEM object's pinning callback would eventually try taking
the lock on the same dma reservation when delegating pinning of the object
onto the shmem subsystem, which led to a deadlock.

This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws
the following recursive locking situation:

weston/3440 is trying to acquire lock:
ffff000000e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper]
but task is already holding lock:
ffff000000e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm]

Fix it by replacing drm_gem_shmem_pin with its locked version, as the lock
had already been taken by drm_gem_pin().

Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Fixes: a780278472 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523113236.432585-2-adrian.larumbe@collabora.com
2024-05-29 09:30:38 +02:00
Vladimir Oltean
fb66df20a7 net/sched: taprio: extend minimum interval restriction to entire cycle too
It is possible for syzbot to side-step the restriction imposed by the
blamed commit in the Fixes: tag, because the taprio UAPI permits a
cycle-time different from (and potentially shorter than) the sum of
entry intervals.

We need one more restriction, which is that the cycle time itself must
be larger than N * ETH_ZLEN bit times, where N is the number of schedule
entries. This restriction needs to apply regardless of whether the cycle
time came from the user or was the implicit, auto-calculated value, so
we move the existing "cycle == 0" check outside the "if "(!new->cycle_time)"
branch. This way covers both conditions and scenarios.

Add a selftest which illustrates the issue triggered by syzbot.

Fixes: b5b73b26b3 ("taprio: Fix allowing too small intervals")
Reported-by: syzbot+a7d2b1d5d1af83035567@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/0000000000007d66bc06196e7c66@google.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240527153955.553333-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-28 19:46:41 -07:00
Vladimir Oltean
e634134180 net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
In commit b5b73b26b3 ("taprio: Fix allowing too small intervals"), a
comparison of user input against length_to_duration(q, ETH_ZLEN) was
introduced, to avoid RCU stalls due to frequent hrtimers.

The implementation of length_to_duration() depends on q->picos_per_byte
being set for the link speed. The blamed commit in the Fixes: tag has
moved this too late, so the checks introduced above are ineffective.
The q->picos_per_byte is zero at parse_taprio_schedule() ->
parse_sched_list() -> parse_sched_entry() -> fill_sched_entry() time.

Move the taprio_set_picos_per_byte() call as one of the first things in
taprio_change(), before the bulk of the netlink attribute parsing is
done. That's because it is needed there.

Add a selftest to make sure the issue doesn't get reintroduced.

Fixes: 09dbdf28f9 ("net/sched: taprio: fix calculation of maximum gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240527153955.553333-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-28 19:46:41 -07:00
Martin K. Petersen
4fedb1f095 Merge branch '6.10/scsi-queue' into 6.10/scsi-fixes
Pull in remaining commits from 6.10/scsi-queue.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-28 21:29:03 -04:00
Kent Overstreet
83208cbf2f bcachefs: Don't return -EROFS from mount on inconsistency error
We were accidentally returning -EROFS during recovery on filesystem
inconsistency - since this is what the journal returns on emergency
shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 19:23:03 -04:00
Li Zhijian
49ba7b515c cxl/region: Fix memregion leaks in devm_cxl_add_region()
Move the mode verification to __create_region() before allocating the
memregion to avoid the memregion leaks.

Fixes: 6e09926418 ("cxl/region: Add volatile region creation support")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-28 16:09:17 -07:00
Dave Jiang
d555105271 cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c
tools/testing/cxl/test/mem.c uses vmalloc() and vfree() but does not
include linux/vmalloc.h. Kernel v6.10 made changes that causes the
currently included headers not depend on vmalloc.h and therefore
mem.c can no longer compile. Add linux/vmalloc.h to fix compile
issue.

  CC [M]  tools/testing/cxl/test/mem.o
tools/testing/cxl/test/mem.c: In function ‘label_area_release’:
tools/testing/cxl/test/mem.c:1428:9: error: implicit declaration of function ‘vfree’; did you mean ‘kvfree’? [-Werror=implicit-function-declaration]
 1428 |         vfree(lsa);
      |         ^~~~~
      |         kvfree
tools/testing/cxl/test/mem.c: In function ‘cxl_mock_mem_probe’:
tools/testing/cxl/test/mem.c:1466:22: error: implicit declaration of function ‘vmalloc’; did you mean ‘kmalloc’? [-Werror=implicit-function-declaration]
 1466 |         mdata->lsa = vmalloc(LSA_SIZE);
      |                      ^~~~~~~
      |                      kmalloc

Fixes: 7d3eb23c4c ("tools/testing/cxl: Introduce a mock memory device + driver")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/20240528225551.1025977-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-28 16:08:04 -07:00
Eric Garver
e8ded22ef0 netfilter: nft_fib: allow from forward/input without iif selector
This removes the restriction of needing iif selector in the
forward/input hooks for fib lookups when requested result is
oif/oifname.

Removing this restriction allows "loose" lookups from the forward hooks.

Fixes: be8be04e5d ("netfilter: nft_fib: reverse path filter for policy-based routing on iif")
Signed-off-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-29 00:37:51 +02:00
Florian Westphal
21a673bddc netfilter: tproxy: bail out if IP has been disabled on the device
syzbot reports:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
[..]
RIP: 0010:nf_tproxy_laddr4+0xb7/0x340 net/ipv4/netfilter/nf_tproxy_ipv4.c:62
Call Trace:
 nft_tproxy_eval_v4 net/netfilter/nft_tproxy.c:56 [inline]
 nft_tproxy_eval+0xa9a/0x1a00 net/netfilter/nft_tproxy.c:168

__in_dev_get_rcu() can return NULL, so check for this.

Reported-and-tested-by: syzbot+b94a6818504ea90d7661@syzkaller.appspotmail.com
Fixes: cc6eb43385 ("tproxy: use the interface primary IP address as a default value for --on-ip")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-29 00:37:51 +02:00
Pablo Neira Ayuso
33c563ebf8 netfilter: nft_payload: skbuff vlan metadata mangle support
Userspace assumes vlan header is present at a given offset, but vlan
offload allows to store this in metadata fields of the skbuff. Hence
mangling vlan results in a garbled packet. Handle this transparently by
adding a parser to the kernel.

If vlan metadata is present and payload offset is over 12 bytes (source
and destination mac address fields), then subtract vlan header present
in vlan metadata, otherwise mangle vlan metadata based on offset and
length, extracting data from the source register.

This is similar to:

  8cfd23e674 ("netfilter: nft_payload: work around vlan header stripping")

to deal with vlan payload mangling.

Fixes: 7ec3f7b47b ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-29 00:35:32 +02:00
Kent Overstreet
8528bde1b6 bcachefs: Fix uninitialized var warning
Can't actually be used uninitialized, but gcc was being silly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 18:21:51 -04:00
Kent Overstreet
759bb4eabc bcachefs: Split out sb-errors_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 17:33:45 -04:00
Kent Overstreet
5c16c57488 bcachefs: Split out journal_seq_blacklist_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 17:32:03 -04:00
Kent Overstreet
24998050b6 bcachefs: Split out replicas_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 17:32:03 -04:00
Kent Overstreet
1cdcc6e3c2 bcachefs: Split out disk_groups_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 17:32:03 -04:00
Kent Overstreet
4c5eef0c50 bcachefs: split out sb-downgrade_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 17:32:03 -04:00
Kent Overstreet
016c22e410 bcachefs: split out sb-members_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 17:32:03 -04:00
Maarten Lankhorst
f73a058be5 Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes
v6.10-rc1 is released, forward from v6.9

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-05-28 22:21:34 +02:00
Rafael J. Wysocki
1ae088232b Merge tag 'linux-cpupower-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux into pm-tools
Merge cpupower utility fix for 6.10-rc2 from Shuah Khan:

"This cpupower fixes update for Linux 6.10-rc2 consists of one single
 fix to cpupower's P-State frequency calculation and reporting with
 AMD Family 1Ah+ processors, when using the acpi-cpufreq driver."

* tag 'linux-cpupower-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
  tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
2024-05-28 22:07:00 +02:00
Dhananjay Ugwekar
e4731baaf2 cpufreq: amd-pstate: Fix the inconsistency in max frequency units
The nominal frequency in cpudata is maintained in MHz whereas all other
frequencies are in KHz. This means we have to convert nominal frequency
value to KHz before we do any interaction with other frequency values.

In amd_pstate_set_boost(), this conversion from MHz to KHz is missed,
fix that.

Tested on a AMD Zen4 EPYC server

Before:
$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq | uniq
2151
$ cat /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_min_freq | uniq
400000
$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq | uniq
2151
409422

After:
$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq | uniq
2151000
$ cat /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_min_freq | uniq
400000
$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq | uniq
2151000
1799527

Fixes: ec437d71db ("cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors")
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: Peter Jung <ptr1337@cachyos.org>
Cc: 5.17+ <stable@vger.kernel.org> # 5.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-28 22:03:11 +02:00
Arnd Bergmann
779b8a14af cpufreq: amd-pstate: remove global header file
When extra warnings are enabled, gcc points out a global variable
definition in a header:

In file included from drivers/cpufreq/amd-pstate-ut.c:29:
include/linux/amd-pstate.h:123:27: error: 'amd_pstate_mode_string' defined but not used [-Werror=unused-const-variable=]
  123 | static const char * const amd_pstate_mode_string[] = {
      |                           ^~~~~~~~~~~~~~~~~~~~~~

This header is only included from two files in the same directory,
and one of them uses only a single definition from it, so clean it
up by moving most of the contents into the driver that uses them,
and making shared bits a local header file.

Fixes: 36c5014e54 ("cpufreq: amd-pstate: optimize driver working mode selection in amd_pstate_param()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-28 21:59:39 +02:00
Thomas Weißschuh
ac62f52138 ACPI: AC: Properly notify powermanagement core about changes
The powermanagement core does various actions when a powersupply changes.
It calls into notifiers, LED triggers, other power supplies and emits an uevent.

To make sure that all these actions happen properly call power_supply_changed().

Reported-by: Rajas Paranjpe <paranjperajas@gmail.com>
Closes: https://github.com/MrChromebox/firmware/issues/420#issuecomment-2132251318
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-28 21:56:40 +02:00
Arnaldo Carvalho de Melo
2f523f29d3 tools headers UAPI: Update i915_drm.h with the kernel sources
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 16:54:31 -03:00
Andy Shevchenko
edcde848c0 PNP: Hide pnp_bus_type from the non-PNP code
The pnp_bus_type is defined only when CONFIG_PNP=y, while being
not guarded by ifdeffery in the header. Moreover, it's not used
outside of the PNP code. Move it to the internal header to make
sure no-one will try to (ab)use it.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-28 21:53:51 +02:00
Andy Shevchenko
c7a5096781 PNP: Make dev_is_pnp() to be a function and export it for modules
Since we have a dev_is_pnp() macro that utilises the address of the
pnp_bus_type variable, the users, which can be compiled as modules,
will fail to build. Convert the macro to be a function and export it
to the modules to prevent build breakage.

Reported-by: Woody Suwalski <terraluna977@gmail.com>
Closes: https://lore.kernel.org/r/cc8a93b2-2504-9754-e26c-5d5c3bd1265c@gmail.com
Fixes: 2a49b45cd0 ("PNP: Add dev_is_pnp() macro")
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-28 21:53:51 +02:00
Arnaldo Carvalho de Melo
88e520512a tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:

  4af663c2f6 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version")
  4f5defae70 ("KVM: SEV: introduce KVM_SEV_INIT2 operation")
  26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
  ac5c48027b ("KVM: SEV: publish supported VMSA features")
  651d61bc8b ("KVM: PPC: Fix documentation for ppc mmu caps")

That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.

This addresses these perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 16:49:36 -03:00
Arnaldo Carvalho de Melo
ac4b069035 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets:

  53bc516ade ("x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place")

That patch just move definitions around, so this just silences this perf
build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/lkml/ZlYe8jOzd1_DyA7X@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 15:14:32 -03:00
Linus Torvalds
e0cce98fe2 Merge tag 'tpmdd-next-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
 "This fixes two unaddressed review comments for the HMAC encryption
  patch set. They are cosmetic but we are better off, if such
  unnecessary glitches do not exist in the release.

  The important part is enabling the HMAC encryption by default only on
  x86-64 because that is the only sufficiently tested arch.

  Finally, there is a bug fix for SPI transfer buffer allocation, which
  did not take into account the SPI header size"

* tag 'tpmdd-next-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Enable TCG_TPM2_HMAC by default only for X86_64
  tpm: Rename TPM2_OA_TMPL to TPM2_OA_NULL_KEY and make it local
  tpm: Open code tpm_buf_parameters()
  tpm_tis_spi: Account for SPI header when allocating TPM SPI xfer buffer
2024-05-28 10:40:52 -07:00
Linus Torvalds
8d6bc6a2b1 Merge tag 'probes-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:

 - uprobes: prevent mutex_lock() under rcu_read_lock().

   Recent changes moved uprobe_cpu_buffer preparation which involves
   mutex_lock(), under __uprobe_trace_func() which is called inside
   rcu_read_lock().

   Fix it by moving uprobe_cpu_buffer preparation outside of
   __uprobe_trace_func()

 - kprobe-events: handle the error case of btf_find_struct_member()

* tag 'probes-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: fix error check in parse_btf_field()
  uprobes: prevent mutex_lock() under rcu_read_lock()
2024-05-28 10:17:40 -07:00
Jeff Johnson
a0fc1a053b of: of_test: add MODULE_DESCRIPTION()
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/of/of_test.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240524-md-of-of_test-v1-1-6ebd078d620f@quicinc.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-05-28 12:10:00 -05:00
Sagi Grimberg
c758b77d4a nvmet: fix a possible leak when destroy a ctrl during qp establishment
In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we
know that a ctrl was allocated (in the admin connect request handler)
and we need to release pending AERs, clear ctrl->sqs and sq->ctrl
(for nvme-loop primarily), and drop the final reference on the ctrl.

However, a small window is possible where nvmet_sq_destroy starts (as
a result of the client giving up and disconnecting) concurrently with
the nvme admin connect cmd (which may be in an early stage). But *before*
kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq
live reference). In this case, sq->ctrl was allocated however after it was
captured in a local variable in nvmet_sq_destroy.
This prevented the final reference drop on the ctrl.

Solve this by re-capturing the sq->ctrl after all inflight request has
completed, where for sure sq->ctrl reference is final, and move forward
based on that.

This issue was observed in an environment with many hosts connecting
multiple ctrls simoutanuosly, creating a delay in allocating a ctrl
leading up to this race window.

Reported-by: Alex Turin <alex@vastdata.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-28 10:01:52 -07:00
Keith Busch
be647e2c76 nvme: use srcu for iterating namespace list
The nvme pci driver synchronizes with all the namespace queues during a
reset to ensure that there's no pending timeout work.

Meanwhile the timeout work potentially iterates those same namespaces to
freeze their queues.

Each of those namespace iterations use the same read lock. If a write
lock should somehow get between the synchronize and freeze steps, then
forward progress is deadlocked.

We had been relying on the nvme controller state machine to ensure the
reset work wouldn't conflict with timeout work. That guarantee may be a
bit fragile to rely on, so iterate the namespace lists without taking
potentially circular locks, as reported by lockdep.

Link: https://lore.kernel.org/all/20220930001943.zdbvolc3gkekfmcv@shindev/
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-28 09:43:32 -07:00
Kent Overstreet
f1d4fed13f bcachefs: Better fsck error message for key version
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
088d0de812 bcachefs: btree_gc can now handle unknown btrees
Compatibility fix - we no longer have a separate table for which order
gc walks btrees in, and special case the stripes btree directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Jeff Johnson
b4131076c1 bcachefs: add missing MODULE_DESCRIPTION()
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/mean_and_variance_test.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
247c056bde bcachefs: Fix setting of downgrade recovery passes/errors
bch2_check_version_downgrade() was setting c->sb.version, which
bch2_sb_set_downgrade() expects to be at the previous version; and it
shouldn't even have been set directly because c->sb.version is updated
by write_super().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
08f50005e0 bcachefs: Run check_key_has_snapshot in snapshot_delete_keys()
delete_dead_snapshots now runs before the main fsck.c passes which check
for keys for invalid snapshots; thus, it needs those checks as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
82af5ceb5d bcachefs: Refactor delete_dead_snapshots()
Consolidate per-key work into delete_dead_snapshots_process_key(), so we
now walk all keys once, not twice.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
218e5e0c2a bcachefs: Fix locking assert
We now track whether a transaction is locked, and verify that we don't
have nodes locked when the transaction isn't locked; reorder relocks to
not pop the new assert.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
9e1a66e668 bcachefs: Fix lookup_first_inode() when inode_generations are present
This function is used for finding the hash seed (which is the same in
all versions of an inode in different snapshots): ff an inode has been
deleted in a child snapshot we need to iterate until we find a live
version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
1292bc2ebf bcachefs: Plumb bkey into __btree_err()
It can be useful to know the exact byte offset within a btree node where
an error occured.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:23 -04:00
Dhananjay Ugwekar
43cad521c6 tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
Update cpupower's P-State frequency calculation and reporting with AMD
Family 1Ah+ processors, when using the acpi-cpufreq driver. This is due
to a change in the PStateDef MSR layout in AMD Family 1Ah+.

Tested on 4th and 5th Gen AMD EPYC system

Signed-off-by: Ananth Narayan <Ananth.Narayan@amd.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-28 09:22:57 -06:00
Filipe Manana
f13e01b89d btrfs: ensure fast fsync waits for ordered extents after a write failure
If a write path in COW mode fails, either before submitting a bio for the
new extents or an actual IO error happens, we can end up allowing a fast
fsync to log file extent items that point to unwritten extents.

This is because dropping the extent maps happens when completing ordered
extents, at btrfs_finish_one_ordered(), and the completion of an ordered
extent is executed in a work queue.

This can result in a fast fsync to start logging file extent items based
on existing extent maps before the ordered extents complete, therefore
resulting in a log that has file extent items that point to unwritten
extents, resulting in a corrupt file if a crash happens after and the log
tree is replayed the next time the fs is mounted.

This can happen for both direct IO writes and buffered writes.

For example consider a direct IO write, in COW mode, that fails at
btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
error:

1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
   set to false, meaning an error happened;

2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
   flag;

3) btrfs_finish_ordered_extent() queues the completion of the ordered
   extent - so that btrfs_finish_one_ordered() will be executed later in
   a work queue. That function will drop extent maps in the range when
   it's executed, since the extent maps point to unwritten locations
   (signaled by the BTRFS_ORDERED_IOERR flag);

4) After calling btrfs_finish_ordered_extent() we keep going down the
   write path and unlock the inode;

5) After that a fast fsync starts and locks the inode;

6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
   task sees the extent maps that point to the unwritten locations and
   logs file extent items based on them - it does not know they are
   unwritten, and the fast fsync path does not wait for ordered extents
   to complete, which is an intentional behaviour in order to reduce
   latency.

For the buffered write case, here's one example:

1) A fast fsync begins, and it starts by flushing delalloc and waiting for
   the writeback to complete by calling filemap_fdatawait_range();

2) Flushing the dellaloc created a new extent map X;

3) During the writeback some IO error happened, and at the end io callback
   (end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
   sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
   completion;

4) After queuing the ordered extent completion, the end io callback clears
   the writeback flag from all pages (or folios), and from that moment the
   fast fsync can proceed;

5) The fast fsync proceeds sees extent map X and logs a file extent item
   based on extent map X, resulting in a log that points to an unwritten
   data extent - because the ordered extent completion hasn't run yet, it
   happens only after the logging.

To fix this make btrfs_finish_ordered_extent() set the inode flag
BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
so that a fast fsync will wait for ordered extent completion.

Note that this issues of using extent maps that point to unwritten
locations can not happen for reads, because in read paths we start by
locking the extent range and wait for any ordered extents in the range
to complete before looking for extent maps.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-28 16:35:12 +02:00
Arnaldo Carvalho de Melo
da42b5229b tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall
But also to wire up shadow stacks on 32-bit x86, picking up those
changes from these csets:

  ff388fe5c4 ("mseal: wire up mseal syscall")
  2883f01ec3 ("x86/shstk: Enable shadow stacks for x32")

This makes 'perf trace' support it, now its possible, for instance to
do:

  # perf trace -e mseal --max-stack=16

Here is an example with the 'sendmmsg' syscall:

  root@x1:~# perf trace -e sendmmsg --max-stack 16 --max-events=1
       0.000 ( 0.062 ms): dbus-broker/1012 sendmmsg(fd: 150, mmsg: 0x7ffef57cca50, vlen: 1, flags: DONTWAIT|NOSIGNAL) = 1
                                         syscall_exit_to_user_mode_prepare ([kernel.kallsyms])
                                         syscall_exit_to_user_mode_prepare ([kernel.kallsyms])
                                         syscall_exit_to_user_mode ([kernel.kallsyms])
                                         do_syscall_64 ([kernel.kallsyms])
                                         entry_SYSCALL_64 ([kernel.kallsyms])
                                         [0x117ce7] (/usr/lib64/libc.so.6 (deleted))
  root@x1:~#

To do a system wide tracing of the new 'mseal' syscall with a backtrace
of at most 16 entries.

This addresses these perf tools build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
    diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
    diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
    diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
    diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H J Lu <hjl.tools@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlXlo4TNcba4wnVZ@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-28 11:10:00 -03:00
MD Danish Anwar
56a5cf538c net: ti: icssg-prueth: Fix start counter for ft1 filter
The start counter for FT1 filter is wrongly set to 0 in the driver.
FT1 is used for source address violation (SAV) check and source address
starts at Byte 6 not Byte 0. Fix this by changing start counter to
ETH_ALEN in icssg_ft1_set_mac_addr().

Fixes: e9b4ece7d7 ("net: ti: icssg-prueth: Add Firmware config and classification APIs.")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/20240527063015.263748-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-28 15:29:52 +02:00
Coly Li
74d4ce92e0 bcache: code cleanup in __bch_bucket_alloc_set()
In __bch_bucket_alloc_set() the lines after lable 'err:' indeed do
nothing useful after multiple cache devices are removed from bcache
code. This cleanup patch drops the useless code to save a bit CPU
cycles.

Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20240528120914.28705-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:55:59 -06:00
Coly Li
05356938a4 bcache: call force_wake_up_gc() if necessary in check_should_bypass()
If there are extreme heavy write I/O continuously hit on relative small
cache device (512GB in my testing), it is possible to make counter
c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD.

If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write
requests will bypass the cache device because check_should_bypass()
returns 'true'. Because all writes bypass the cache device, counter
c->sectors_to_gc has no chance to be negative value, and garbage
collection thread won't be waken up even the whole cache becomes clean
after writeback accomplished. The aftermath is that all write I/Os go
directly into backing device even the cache device is clean.

To avoid the above situation, this patch uses a quite conservative way
to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes
up garbage collection thread when the whole cache device is clean.

Before the fix, the writes-always-bypass situation happens after 10+
hours write I/O pressure on 512GB Intel optane memory which acts as
cache device. After this fix, such situation doesn't happen after 36+
hours testing.

Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20240528120914.28705-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:55:59 -06:00
Dongsheng Yang
a14a68b769 bcache: allow allocator to invalidate bucket in gc
Currently, if the gc is running, when the allocator found free_inc
is empty, allocator has to wait the gc finish. Before that, the
IO is blocked.

But actually, there would be some buckets is reclaimable before gc,
and gc will never mark this kind of bucket to be unreclaimable.

So we can put these buckets into free_inc in gc running to avoid
IO being blocked.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20240528120914.28705-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:55:59 -06:00
Hannes Reinecke
e993db2d6e block: check for max_hw_sectors underflow
The logical block size need to be smaller than the max_hw_sector
setting, otherwise we can't even transfer a single LBA.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:55:23 -06:00
Christoph Hellwig
e528bede6f block: stack max_user_sectors
The max_user_sectors is one of the three factors determining the actual
max_sectors limit for READ/WRITE requests.  Because of that it needs to
be stacked at least for the device mapper multi-path case where requests
are directly inserted on the lower device.  For SCSI disks this is
important because the sd driver actually sets it's own advisory limit
that is lower than max_hw_sectors based on the block limits VPD page.
While this is a bit odd an unusual, the same effect can happen if a
user or udev script tweaks the value manually.

Fixes: 4f563a6473 ("block: add a max_user_discard_sectors queue limit")
Reported-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240523182618.602003-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:54:36 -06:00
Christoph Hellwig
bafea1c58b sd: also set max_user_sectors when setting max_sectors
sd can set a max_sectors value that is lower than the max_hw_sectors
limit based on the block limits VPD page.   While this is rather unusual,
it used to work until the max_user_sectors field was split out to cleanly
deal with conflicting hardware and user limits when the hardware limit
changes.  Also set max_user_sectors to ensure the limit can properly be
stacked.

Fixes: 4f563a6473 ("block: add a max_user_discard_sectors queue limit")
Reported-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240523182618.602003-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:54:36 -06:00
Damien Le Moal
233e27b4d2 null_blk: Print correct max open zones limit in null_init_zoned_dev()
When changing the maximum number of open zones, print that number
instead of the total number of zones.

Fixes: dc4d137ee3 ("null_blk: add support for max open/active zone limit for zoned devices")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240528062852.437599-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-28 06:54:21 -06:00
Christian Brauner
db003a28e0 netfs: fix kernel doc for nets_wait_for_outstanding_io()
The @inode parameter wasn't documented leading to new doc build
warnings.

Fixes: f89ea63f1c ("netfs, 9p: Fix race between umount and async request completion")
Link: https://lore.kernel.org/r/20240528133050.7e09d78e@canb.auug.org.au
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-28 14:34:15 +02:00
Christian Brauner
0c07c273a5 debugfs: continue to ignore unknown mount options
Wolfram reported that debugfs remained empty on some of his boards
triggering the message "debugfs: Unknown parameter 'auto'".

The root of the issue is that we ignored unknown mount options in the
old mount api but we started rejecting unknown mount options in the new
mount api. Continue to ignore unknown mount options to not regress
userspace.

Fixes: a20971c187 ("vfs: Convert debugfs to use the new mount API")
Link: https://lore.kernel.org/r/20240527100618.np2wqiw5mz7as3vk@ninjato
Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-28 14:32:42 +02:00
Alina Yu
72b6a2d650 regulator: rtq2208: Fix invalid memory access when devm_of_regulator_put_matches is called
In this patch, a software bug has been fixed.
rtq2208_ldo_match is no longer a local variable.
It prevents invalid memory access when devm_of_regulator_put_matches
 is called.

Signed-off-by: Alina Yu <alina_yu@richtek.com>
Link: https://msgid.link/r/4ce8c4f16f1cf3aa4e5f36c0694dd3c5ccf3cd1c.1716870419.git.alina_yu@richtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-28 13:22:54 +01:00
Pierre-Louis Bossart
e662c90a6d ALSA/hda: intel-dsp-config: reduce log verbosity
The information on PCI class/subclass was interesting in the Skylake
timeframe, since the DSP was only enabled on a limited number of
platforms. Now most Intel platforms do enable the DSP, so the
information is less interesting to log.

When a DSP driver is used, the common helper may be called multiple
times due to deferred probes, but there's no reason to print the same
information multiple times. Using dev_info_once() covers all the
existing usages for internal cards with DSPs. External cards don't
rely on DSPs so far.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20240527193808.165652-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-28 13:07:51 +02:00
Takashi Iwai
a200df7deb ALSA: seq: Don't clear bank selection at event -> UMP MIDI2 conversion
The current code to convert from a legacy sequencer event to UMP MIDI2
clears the bank selection at each time the program change is
submitted.  This is confusing and may lead to incorrect bank values
tranmitted to the destination in the end.

Drop the line to clear the bank info and keep the provided values.

Fixes: e9e02819a9 ("ALSA: seq: Automatic conversion of UMP events")
Link: https://lore.kernel.org/r/20240527151852.29036-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-28 13:07:20 +02:00
Takashi Iwai
8a42886cae ALSA: seq: Fix missing bank setup between MIDI1/MIDI2 UMP conversion
When a UMP packet is converted between MIDI1 and MIDI2 protocols, the
bank selection may be lost.  The conversion from MIDI1 to MIDI2 needs
the encoding of the bank into UMP_MSG_STATUS_PROGRAM bits, while the
conversion from MIDI2 to MIDI1 needs the extraction from that
instead.

This patch implements the missing bank selection mechanism in those
conversions.

Fixes: e9e02819a9 ("ALSA: seq: Automatic conversion of UMP events")
Link: https://lore.kernel.org/r/20240527151852.29036-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-28 13:07:08 +02:00
Jarkko Sakkinen
d3e43a8fa4 tpm: Enable TCG_TPM2_HMAC by default only for X86_64
Given the not fully root caused performance issues on non-x86 platforms,
enable the feature by default only for x86-64. That is the platform it
brings the most value and has gone most of the QA. Can be reconsidered
later and can be obviously opt-in enabled too on any arch.

Link: https://lore.kernel.org/linux-integrity/bf67346ef623ff3c452c4f968b7d900911e250c3.camel@gmail.com/#t
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-28 13:14:29 +03:00
Jarkko Sakkinen
f09fc6cee0 tpm: Rename TPM2_OA_TMPL to TPM2_OA_NULL_KEY and make it local
Rename and document TPM2_OA_TMPL, as originally requested in the patch
set review, but left unaddressed without any appropriate reasoning. The
new name is TPM2_OA_NULL_KEY, has a documentation and is local only to
tpm2-sessions.c.

Link: https://lore.kernel.org/linux-integrity/ddbeb8111f48a8ddb0b8fca248dff6cc9d7079b2.camel@HansenPartnership.com/
Link: https://lore.kernel.org/linux-integrity/CZCKTWU6ZCC9.2UTEQPEVICYHL@suppilovahvero/
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-28 13:14:23 +03:00
Thadeu Lima de Souza Cascardo
4b4647add7 sock_map: avoid race between sock_map_close and sk_psock_put
sk_psock_get will return NULL if the refcount of psock has gone to 0, which
will happen when the last call of sk_psock_put is done. However,
sk_psock_drop may not have finished yet, so the close callback will still
point to sock_map_close despite psock being NULL.

This can be reproduced with a thread deleting an element from the sock map,
while the second one creates a socket, adds it to the map and closes it.

That will trigger the WARN_ON_ONCE:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Modules linked in:
CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02
RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293
RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000
RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0
RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3
R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840
R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870
FS:  000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 unix_release+0x87/0xc0 net/unix/af_unix.c:1048
 __sock_release net/socket.c:659 [inline]
 sock_close+0xbe/0x240 net/socket.c:1421
 __fput+0x42b/0x8a0 fs/file_table.c:422
 __do_sys_close fs/open.c:1556 [inline]
 __se_sys_close fs/open.c:1541 [inline]
 __x64_sys_close+0x7f/0x110 fs/open.c:1541
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb37d618070
Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070
RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

Use sk_psock, which will only check that the pointer is not been set to
NULL yet, which should only happen after the callbacks are restored. If,
then, a reference can still be gotten, we may call sk_psock_stop and cancel
psock->work.

As suggested by Paolo Abeni, reorder the condition so the control flow is
less convoluted.

After that change, the reproducer does not trigger the WARN_ON_ONCE
anymore.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reported-by: syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=07a2e4a1a57118ef7355
Fixes: aadb2bb83f ("sock_map: Fix a potential use-after-free in sock_map_close()")
Fixes: 5b4a79ba65 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
Cc: stable@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20240524144702.1178377-1-cascardo@igalia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-28 12:05:19 +02:00
Jarkko Sakkinen
f3d7ba9e1b tpm: Open code tpm_buf_parameters()
With only single call site, this makes no sense (slipped out of the
radar during the review). Open code and document the action directly
to the site, to make it more readable.

Fixes: 1b6d7f9eb1 ("tpm: add session encryption protection to tpm2_get_random()")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-28 13:03:57 +03:00
Matthew R. Ochs
195aba96b8 tpm_tis_spi: Account for SPI header when allocating TPM SPI xfer buffer
The TPM SPI transfer mechanism uses MAX_SPI_FRAMESIZE for computing the
maximum transfer length and the size of the transfer buffer. As such, it
does not account for the 4 bytes of header that prepends the SPI data
frame. This can result in out-of-bounds accesses and was confirmed with
KASAN.

Introduce SPI_HDRSIZE to account for the header and use to allocate the
transfer buffer.

Fixes: a86a42ac2b ("tpm_tis_spi: Add hardware wait polling")
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Tested-by: Carol Soto <csoto@nvidia.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-28 13:03:57 +03:00
Niranjana Vishwanathapura
6c5cd0807c drm/xe: Properly handle alloc_guc_id() failure
Release the submission_state lock if alloc_guc_id() fails.

v2: Add Fixes tag and CC stable kernel

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521201711.4934-1-niranjana.vishwanathapura@intel.com
(cherry picked from commit 40672b792a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-05-28 08:53:45 +02:00
Matthew Brost
c8ea2c31f5 drm/xe: Only use reserved BCS instances for usm migrate exec queue
The GuC context scheduling queue is 2 entires deep, thus it is possible
for a migration job to be stuck behind a fault if migration exec queue
shares engines with user jobs. This can deadlock as the migrate exec
queue is required to service page faults. Avoid deadlock by only using
reserved BCS instances for usm migrate exec queue.

Fixes: a043fbab7a ("drm/xe/pvc: Use fast copy engines as migrate engine on PVC")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415190453.696553-2-matthew.brost@intel.com
Reviewed-by: Brian Welty <brian.welty@intel.com>
(cherry picked from commit 04f4a70a18)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-05-28 08:53:35 +02:00
Himal Prasad Ghimiray
77b79df026 drm/xe: Change pcode timeout to 50msec while polling again
Polling is initially attempted with timeout_base_ms enabled for
preemption, and if it exceeds this timeframe, another attempt is made
without preemption, allowing an additional 50 ms before timing out.

v2
- Rebase

v3
- Move warnings to separate patch (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Fixes: 7dc9b92dcf ("drm/xe: Remove i915_utils dependency from xe_pcode.")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508152216.3263109-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit c81858eb52)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-05-28 08:53:24 +02:00
Bingbu Cao
c19fa08c14 media: intel/ipu6: fix the buffer flags caused by wrong parentheses
The buffer flags is set by wrong due to wrong parentheses, the
FL_INCOMING flag is never taken an account.
Fix it by wrapping the ternary conditional operation with parentheses.

Fixes: 3c1dfb5a69 ("media: intel/ipu6: input system video nodes and buffer queues")
Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-28 08:00:14 +02:00
Christophe JAILLET
ab0ed48101 media: intel/ipu6: Fix an error handling path in isys_probe()
If an error occurs after a successful alloc_fw_msg_bufs() call, some
resources should be released as already done in the remove function.

Add a new free_fw_msg_bufs() function that releases what has been allocated
by alloc_fw_msg_bufs().

Also use this new function in isys_remove() to avoid some code duplication.

Fixes: f50c4ca0a8 ("media: intel/ipu6: add the main input system driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-28 08:00:14 +02:00
Christophe JAILLET
266b44ec9a media: intel/ipu6: Move isys_remove() close to isys_probe()
In preparation to fixing a leak in isys_probe(), move isys_remove().
The fix will introduce a new function that will also be called from
isys_remove(). The code needs to be rearranged to avoid a forward
declaration.

Having the .remove function close to the .probe function is also more
standard.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-28 08:00:14 +02:00
Christophe JAILLET
fe61b2906b media: intel/ipu6: Fix some redundant resources freeing in ipu6_pci_remove()
pcim_iomap_regions() and pcim_enable_device() are used in the probe. So
the corresponding managed resources don't need to be freed explicitly in
the remove function.

Remove the incorrect pci_release_regions() and pci_disable_device() calls.

Fixes: 25fedc0219 ("media: intel/ipu6: add Intel IPU6 PCI device driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-28 08:00:14 +02:00
Sakari Ailus
fd7ccfb112 media: Documentation: v4l: Fix ACTIVE route flag
The documentation in one occasion mentions the
VIDIOC_SUBDEV_STREAM_FL_ACTIVE flag. This was meant to be
V4L2_SUBDEV_STREAM_FL_ACTIVE as it's a flag, not an IOCTL. Fix it.

Fixes: cd2c75454d ("media: Documentation: Document S_ROUTING behaviour")
Reported-by: Samuel Wein PhD <sam@samwein.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-05-28 08:00:14 +02:00
Thorsten Blum
c519cf9b74 docs: netdev: Fix typo in Signed-off-by tag
s/of/off/

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Fixes: e110ba6592 ("docs: netdev: add note about Changes Requested and revising commit messages")
Link: https://lore.kernel.org/r/20240527103618.265801-2-thorsten.blum@toblux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:15:22 -07:00
Jakub Kicinski
6e15774d92 Merge branch 'selftests-mptcp-mark-unstable-subtests-as-flaky'
Matthieu Baerts says:

====================
selftests: mptcp: mark unstable subtests as flaky

Some subtests can be unstable, failing once every X runs. Fixing them
can take time: there could be an issue in the kernel or in the subtest,
and it is then important to do a proper analysis, not to hide real bugs.

To avoid creating noises on the different CIs where tests are more
unstable than on our side, some subtests have been marked as flaky. As a
result, errors with these subtests (if any) are ignored.

Note that the MPTCP CI will continue to track these flaky subtests. All
these unstable subtests are also tracked by our bug tracker.

These are fixes for the -net tree, because the instabilities are visible
there. The first patch introducing the flake support has no 'Fixes'
tags, mainly because it requires recent and important refactoring done
in all MPTCP selftests. Backporting that to old versions where the flaky
tests have been introduced would be too difficult, and probably not
worth it. The other patches, adding MPTCP_LIB_SUBTEST_FLAKY=1, have a
Fixes tag, simply to ease the backport of the future fixes removing them
along with the proper fix.
====================

Link: https://lore.kernel.org/r/20240524-upstream-net-20240524-selftests-mptcp-flaky-v1-0-a352362f3f8e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:13:01 -07:00
Matthieu Baerts (NGI0)
38af56e666 selftests: mptcp: join: mark 'fail' tests as flaky
These tests are rarely unstable. It depends on the CI running the tests,
especially if it is also busy doing other tasks in parallel, and if a
debug kernel config is being used.

It looks like this issue is sometimes present with the NetDev CI. While
this is being investigated, the tests are marked as flaky not to create
noises on such CIs.

Fixes: b6e074e171 ("selftests: mptcp: add infinite map testcase")
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/491
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240524-upstream-net-20240524-selftests-mptcp-flaky-v1-4-a352362f3f8e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:12:51 -07:00
Matthieu Baerts (NGI0)
8c06ac2178 selftests: mptcp: join: mark 'fastclose' tests as flaky
These tests are flaky since their introduction. This might be less or
not visible depending on the CI running the tests, especially if it is
also busy doing other tasks in parallel, and if a debug kernel config is
being used.

It looks like this issue is often present with the NetDev CI. While this
is being investigated, the tests are marked as flaky not to create
noises on such CIs.

Fixes: 01542c9bf9 ("selftests: mptcp: add fastclose testcase")
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/324
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240524-upstream-net-20240524-selftests-mptcp-flaky-v1-3-a352362f3f8e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:12:50 -07:00
Matthieu Baerts (NGI0)
cc73a6577a selftests: mptcp: simult flows: mark 'unbalanced' tests as flaky
These tests are flaky since their introduction. This might be less or
not visible depending on the CI running the tests, especially if it is
also busy doing other tasks in parallel.

A first analysis shown that the transfer can be slowed down when there
are some re-injections at the MPTCP level. Such re-injections can of
course happen, and disturb the transfer, but it looks strange to have
them in this lab. That could be caused by the kernel having access to
less CPU cycles -- e.g. when other activities are executed in parallel
-- or by a misinterpretation on the MPTCP packet scheduler side.

While this is being investigated, the tests are marked as flaky not to
create noises in other CIs.

Fixes: 219d04992b ("mptcp: push pending frames when subflow has free space")
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/475
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240524-upstream-net-20240524-selftests-mptcp-flaky-v1-2-a352362f3f8e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:12:50 -07:00
Matthieu Baerts (NGI0)
5597613fb3 selftests: mptcp: lib: support flaky subtests
Some subtests can be unstable, failing once every X runs. Fixing them
can take time: there could be an issue in the kernel or in the subtest,
and it is then important to do a proper analysis, not to hide real bugs.

To avoid creating noises on the different CIs, it is important to have a
simple way to mark subtests as flaky, and ignore the errors. This is
what this patch introduces: subtests can be marked as flaky by setting
MPTCP_LIB_SUBTEST_FLAKY env var to 1, e.g.

  MPTCP_LIB_SUBTEST_FLAKY=1 <run flaky subtest>

The subtest will be executed, and errors (if any) will be ignored. It is
still good to run these subtests, as it exercises code, and the results
can still be useful for the on-going investigations.

Note that the MPTCP CI will continue to track these flaky subtests by
setting SELFTESTS_MPTCP_LIB_OVERRIDE_FLAKY env var to 1, and a ticket
has to be created before marking subtests as flaky.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240524-upstream-net-20240524-selftests-mptcp-flaky-v1-1-a352362f3f8e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:12:50 -07:00
Jakub Kicinski
7a8cc96ebe Merge branch 'intel-wired-lan-driver-updates-2024-05-23-ice-idpf'
Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2024-05-23 (ice, idpf)

This series contains two fixes which finished up testing.

First, Alexander fixes an issue in idpf caused by enabling NAPI and
interrupts prior to actually allocating the Rx buffers.

Second, Jacob fixes the ice driver VSI VLAN counting logic to ensure that
addition and deletion of VLANs properly manages the total VSI count.
====================

Link: https://lore.kernel.org/r/20240523-net-2024-05-23-intel-net-fixes-v1-0-17a923e0bb5f@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:11:46 -07:00
Jacob Keller
82617b9a04 ice: fix accounting if a VLAN already exists
The ice_vsi_add_vlan() function is used to add a VLAN filter for the target
VSI. This function prepares a filter in the switch table for the given VSI.
If it succeeds, the vsi->num_vlan counter is incremented.

It is not considered an error to add a VLAN which already exists in the
switch table, so the function explicitly checks and ignores -EEXIST. The
vsi->num_vlan counter is still incremented.

This seems incorrect, as it means we can double-count in the case where the
same VLAN is added twice by the caller. The actual table will have one less
filter than the count.

The ice_vsi_del_vlan() function similarly checks and handles the -ENOENT
condition for when deleting a filter that doesn't exist. This flow only
decrements the vsi->num_vlan if it actually deleted a filter.

The vsi->num_vlan counter is used only in a few places, primarily related
to tracking the number of non-zero VLANs. If the vsi->num_vlans gets out of
sync, then ice_vsi_num_non_zero_vlans() will incorrectly report more VLANs
than are present, and ice_vsi_has_non_zero_vlans() could return true
potentially in cases where there are only VLAN 0 filters left.

Fix this by only incrementing the vsi->num_vlan in the case where we
actually added an entry, and not in the case where the entry already
existed.

Fixes: a1ffafb0b4 ("ice: Support configuring the device to Double VLAN Mode")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240523-net-2024-05-23-intel-net-fixes-v1-2-17a923e0bb5f@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:11:43 -07:00
Alexander Lobakin
d514c8b542 idpf: don't enable NAPI and interrupts prior to allocating Rx buffers
Currently, idpf enables NAPI and interrupts prior to allocating Rx
buffers.
This may lead to frame loss (there are no buffers to place incoming
frames) and even crashes on quick ifup-ifdown. Interrupts must be
enabled only after all the resources are here and available.
Split interrupt init into two phases: initialization and enabling,
and perform the second only after the queues are fully initialized.
Note that we can't just move interrupt initialization down the init
process, as the queues must have correct a ::q_vector pointer set
and NAPI already added in order to allocate buffers correctly.
Also, during the deinit process, disable HW interrupts first and
only then disable NAPI. Otherwise, there can be a HW event leading
to napi_schedule(), but the NAPI will already be unavailable.

Fixes: d4d5587182 ("idpf: initialize interrupts and enable vport")
Reported-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240523-net-2024-05-23-intel-net-fixes-v1-1-17a923e0bb5f@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:11:43 -07:00
Alexander Lobakin
266aa3b481 page_pool: fix &page_pool_params kdoc issues
After the tagged commit, @netdev got documented twice and the kdoc
script didn't notice that. Remove the second description added later
and move the initial one according to the field position.

After merging commit 5f8e4007c1 ("kernel-doc: fix
struct_group_tagged() parsing"), kdoc requires to describe struct
groups as well. &page_pool_params has 2 struct groups which
generated new warnings, describe them to resolve this.

Fixes: 403f11ac9a ("page_pool: don't use driver-set flags field directly")
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240524112859.2757403-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:00:22 -07:00
Horatiu Vultur
4fb679040d net: micrel: Fix lan8841_config_intr after getting out of sleep mode
When the interrupt is enabled, the function lan8841_config_intr tries to
clear any pending interrupts by reading the interrupt status, then
checks the return value for errors and then continue to enable the
interrupt. It has been seen that once the system gets out of sleep mode,
the interrupt status has the value 0x400 meaning that the PHY detected
that the link was in low power. That is correct value but the problem is
that the check is wrong.  We try to check for errors but we return an
error also in this case which is not an error. Therefore fix this by
returning only when there is an error.

Fixes: a8f1a19d27 ("net: micrel: Add support for lan8841 PHY")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240524085350.359812-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:58:39 -07:00
Xiaolei Wang
bf0497f53c net:fec: Add fec_enet_deinit()
When fec_probe() fails or fec_drv_remove() needs to release the
fec queue and remove a NAPI context, therefore add a function
corresponding to fec_enet_init() and call fec_enet_deinit() which
does the opposite to release memory and remove a NAPI context.

Fixes: 59d0f74656 ("net: fec: init multi queue date structure")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240524050528.4115581-1-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:55:32 -07:00
Rob Herring (Arm)
12f86b9af9 dt-bindings: net: pse-pd: ti,tps23881: Fix missing "additionalProperties" constraints
The child nodes are missing "additionalProperties" constraints which
means any undocumented properties or child nodes are allowed. Add the
constraints and all the undocumented properties exposed by the fix.

Fixes: f562202fed ("dt-bindings: net: pse-pd: Add bindings for TPS23881 PSE controller")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240523171750.2837331-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:49:16 -07:00
Rob Herring (Arm)
0fe53c0ab0 dt-bindings: net: pse-pd: microchip,pd692x0: Fix missing "additionalProperties" constraints
The child nodes are missing "additionalProperties" constraints which
means any undocumented properties or child nodes are allowed. Add the
constraints, and fix the fallout of wrong manager node regex and
missing properties.

Fixes: 9c1de033af ("dt-bindings: net: pse-pd: Add bindings for PD692x0 PSE controller")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240523171732.2836880-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:48:55 -07:00
Eric Dumazet
f4dca95fc0 tcp: reduce accepted window in NEW_SYN_RECV state
Jason commit made checks against ACK sequence less strict
and can be exploited by attackers to establish spoofed flows
with less probes.

Innocent users might use tcp_rmem[1] == 1,000,000,000,
or something more reasonable.

An attacker can use a regular TCP connection to learn the server
initial tp->rcv_wnd, and use it to optimize the attack.

If we make sure that only the announced window (smaller than 65535)
is used for ACK validation, we force an attacker to use
65537 packets to complete the 3WHS (assuming server ISN is unknown)

Fixes: 378979e94e ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240523130528.60376-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:47:23 -07:00
Willem de Bruijn
be008726d0 net: gro: initialize network_offset in network layer
Syzkaller was able to trigger

    kernel BUG at net/core/gro.c:424 !
    RIP: 0010:gro_pull_from_frag0 net/core/gro.c:424 [inline]
    RIP: 0010:gro_try_pull_from_frag0 net/core/gro.c:446 [inline]
    RIP: 0010:dev_gro_receive+0x242f/0x24b0 net/core/gro.c:571

Due to using an incorrect NAPI_GRO_CB(skb)->network_offset.

The referenced commit sets this offset to 0 in skb_gro_reset_offset.
That matches the expected case in dev_gro_receive:

        pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
                                ipv6_gro_receive, inet_gro_receive,
                                &gro_list->list, skb);

But syzkaller injected an skb with protocol ETH_P_TEB into an ip6gre
device (by writing the IP6GRE encapsulated version to a TAP device).
The result was a first call to eth_gro_receive, and thus an extra
ETH_HLEN in network_offset that should not be there. First issue hit
is when computing offset from network header in ipv6_gro_pull_exthdrs.

Initialize both offsets in the network layer gro_receive.

This pairs with all reads in gro_receive, which use
skb_gro_receive_network_offset().

Fixes: 186b1ea73a ("net: gro: use cb instead of skb->network_header")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
CC: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240523141434.1752483-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:46:59 -07:00
Ido Schimmel
7b05ab85e2 ipv4: Fix address dump when IPv4 is disabled on an interface
Cited commit started returning an error when user space requests to dump
the interface's IPv4 addresses and IPv4 is disabled on the interface.
Restore the previous behavior and do not return an error.

Before cited commit:

 # ip address show dev dummy1
 10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::e040:68ff:fe98:d018/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
 # ip link set dev dummy1 mtu 67
 # ip address show dev dummy1
 10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff

After cited commit:

 # ip address show dev dummy1
 10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 32:2d:69:f2:9c:99 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::302d:69ff:fef2:9c99/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
 # ip link set dev dummy1 mtu 67
 # ip address show dev dummy1
 RTNETLINK answers: No such device
 Dump terminated

With this patch:

 # ip address show dev dummy1
 10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::dc17:56ff:febb:57c0/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
 # ip link set dev dummy1 mtu 67
 # ip address show dev dummy1
 10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff

I fixed the exact same issue for IPv6 in commit c04f7dfe6e ("ipv6: Fix
address dump when IPv6 is disabled on an interface"), but noted [1] that
I am not doing the change for IPv4 because I am not aware of a way to
disable IPv4 on an interface other than unregistering it. I clearly
missed the above case.

[1] https://lore.kernel.org/netdev/20240321173042.2151756-1-idosch@nvidia.com/

Fixes: cdb2f80f1c ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Carolina Jubran <cjubran@nvidia.com>
Reported-by: Yamen Safadi <ysafadi@nvidia.com>
Tested-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240523110257.334315-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:46:05 -07:00
Jakub Kicinski
2786ae339e Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2024-05-27

We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 18 files changed, 583 insertions(+), 55 deletions(-).

The main changes are:

1) Fix broken BPF multi-uprobe PID filtering logic which filtered by thread
   while the promise was to filter by process, from Andrii Nakryiko.

2) Fix the recent influx of syzkaller reports to sockmap which triggered
   a locking rule violation by performing a map_delete, from Jakub Sitnicki.

3) Fixes to netkit driver in particular on skb->pkt_type override upon pass
   verdict, from Daniel Borkmann.

4) Fix an integer overflow in resolve_btfids which can wrongly trigger build
   failures, from Friedrich Vock.

5) Follow-up fixes for ARC JIT reported by static analyzers,
   from Shahab Vahedi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Cover verifier checks for mutating sockmap/sockhash
  Revert "bpf, sockmap: Prevent lock inversion deadlock in map delete elem"
  bpf: Allow delete from sockmap/sockhash only if update is allowed
  selftests/bpf: Add netkit test for pkt_type
  selftests/bpf: Add netkit tests for mac address
  netkit: Fix pkt_type override upon netkit pass verdict
  netkit: Fix setting mac address in l2 mode
  ARC, bpf: Fix issues reported by the static analyzers
  selftests/bpf: extend multi-uprobe tests with USDTs
  selftests/bpf: extend multi-uprobe tests with child thread case
  libbpf: detect broken PID filtering logic for multi-uprobe
  bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic
  bpf: fix multi-uprobe PID filtering logic
  bpf: Fix potential integer overflow in resolve_btfids
  MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT
====================

Link: https://lore.kernel.org/r/20240527203551.29712-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:26:30 -07:00
Pierre-Louis Bossart
3ff78451b8 ASoC: SOF: add missing MODULE_DESCRIPTION()
MODULE_DESCRIPTION() was optional until it became mandatory and
flagged as an error by 'make W=1'.

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://msgid.link/r/20240527194414.166156-5-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 21:19:27 +01:00
Pierre-Louis Bossart
06a2315da0 ASoC: SOF: reorder MODULE_ definitions
Follow the arbitrary Intel convention order to allow for easier grep.

MODULE_LICENSE
MODULE_DESCRIPTION
MODULE_IMPORT

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://msgid.link/r/20240527194414.166156-4-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 21:19:26 +01:00
Pierre-Louis Bossart
b88056df4f ASoC: SOF: AMD: group all module related information
The module information is spread across files, group in a single
location.  For maintenability and alignment, the arbitrary Intel
convention is used with the following order:

MODULE_LICENSE
MODULE_DESCRIPTION
MODULE_IMPORT

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://msgid.link/r/20240527194414.166156-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 21:19:25 +01:00
Pierre-Louis Bossart
e30a942861 ASoC: SOF: stream-ipc: remove unnecessary MODULE_LICENSE
This file is part of the snd-sof module, there's no reason to re-add the
MODULE_LICENSE here.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://msgid.link/r/20240527194414.166156-2-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 21:19:24 +01:00
hexue
30a0e3135f block: delete redundant function declaration
blk_stats_alloc_enable was used for block hybrid poll, the related
function definition was removed by patch:
commit 54bdd67d0f ("blk-mq: remove hybrid polling")
but the function declaration was not deleted.

Signed-off-by: hexue <xue01.he@samsung.com>
Link: https://lore.kernel.org/r/20240527084533.1485210-1-xue01.he@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 13:58:06 -06:00
Damien Le Moal
d9ff882b54 null_blk: Fix return value of nullb_device_power_store()
When powering on a null_blk device that is not already on, the return
value ret that is initialized to be count is reused to check the return
value of null_add_dev(), leading to nullb_device_power_store() to return
null_add_dev() return value (0 on success) instead of "count".
So make sure to set ret to be equal to count when there are no errors.

Fixes: a2db328b08 ("null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20240527043445.235267-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 13:56:58 -06:00
Jakub Sitnicki
a63bf55616 selftests/bpf: Cover verifier checks for mutating sockmap/sockhash
Verifier enforces that only certain program types can mutate sock{map,hash}
maps, that is update it or delete from it. Add test coverage for these
checks so we don't regress.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-3-944b372f2101@cloudflare.com
2024-05-27 19:34:26 +02:00
Jakub Sitnicki
3b9ce0491a Revert "bpf, sockmap: Prevent lock inversion deadlock in map delete elem"
This reverts commit ff91059932.

This check is no longer needed. BPF programs attached to tracepoints are
now rejected by the verifier when they attempt to delete from a
sockmap/sockhash maps.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-2-944b372f2101@cloudflare.com
2024-05-27 19:34:25 +02:00
Jakub Sitnicki
98e948fb60 bpf: Allow delete from sockmap/sockhash only if update is allowed
We have seen an influx of syzkaller reports where a BPF program attached to
a tracepoint triggers a locking rule violation by performing a map_delete
on a sockmap/sockhash.

We don't intend to support this artificial use scenario. Extend the
existing verifier allowed-program-type check for updating sockmap/sockhash
to also cover deleting from a map.

From now on only BPF programs which were previously allowed to update
sockmap/sockhash can delete from these map types.

Fixes: ff91059932 ("bpf, sockmap: Prevent lock inversion deadlock in map delete elem")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+ec941d6e24f633a59172@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+ec941d6e24f633a59172@syzkaller.appspotmail.com
Acked-by: John Fastabend <john.fastabend@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=ec941d6e24f633a59172
Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-1-944b372f2101@cloudflare.com
2024-05-27 19:33:40 +02:00
Marc Zyngier
c92e8b9eac KVM: arm64: AArch32: Fix spurious trapping of conditional instructions
We recently upgraded the view of ESR_EL2 to 64bit, in keeping with
the requirements of the architecture.

However, the AArch32 emulation code was left unaudited, and the
(already dodgy) code that triages whether a trap is spurious or not
(because the condition code failed) broke in a subtle way:

If ESR_EL2.ISS2 is ever non-zero (unlikely, but hey, this is the ARM
architecture we're talking about), the hack that tests the top bits
of ESR_EL2.EC will break in an interesting way.

Instead, use kvm_vcpu_trap_get_class() to obtain the EC, and list
all the possible ECs that can fail a condition code check.

While we're at it, add SMC32 to the list, as it is explicitly listed
as being allowed to trap despite failing a condition code check (as
described in the HCR_EL2.TSC documentation).

Fixes: 0b12620fdd ("KVM: arm64: Treat ESR_EL2 as a 64-bit register")
Cc: stable@vger.kernel.org
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240524141956.1450304-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-27 17:46:09 +01:00
Marc Zyngier
dfe6d190f3 KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
It appears that we don't allow a vcpu to be restored in AArch32
System mode, as we *never* included it in the list of valid modes.

Just add it to the list of allowed modes.

Fixes: 0d854a60b1 ("arm64: KVM: enable initialization of a 32bit vcpu")
Cc: stable@vger.kernel.org
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240524141956.1450304-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-27 17:45:35 +01:00
Marc Zyngier
947051e361 KVM: arm64: Fix AArch32 register narrowing on userspace write
When userspace writes to one of the core registers, we make
sure to narrow the corresponding GPRs if PSTATE indicates
an AArch32 context.

The code tries to check whether the context is EL0 or EL1 so
that it narrows the correct registers. But it does so by checking
the full PSTATE instead of PSTATE.M.

As a consequence, and if we are restoring an AArch32 EL0 context
in a 64bit guest, and that PSTATE has *any* bit set outside of
PSTATE.M, we narrow *all* registers instead of only the first 15,
destroying the 64bit state.

Obviously, this is not something the guest is likely to enjoy.

Correctly masking PSTATE to only evaluate PSTATE.M fixes it.

Fixes: 90c1f934ed ("KVM: arm64: Get rid of the AArch32 register mapping code")
Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240524141956.1450304-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-27 17:45:21 +01:00
Arnaldo Carvalho de Melo
001821b0e7 perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources to pick POSTED_MSI_NOTIFICATION
To pick up the change in:

  f5a3562ec9 ("x86/irq: Reserve a per CPU IDT vector for posted MSIs")

That picks up this new vector:

  $ cp arch/x86/include/asm/irq_vectors.h tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h
  $ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh > after
  $ diff -u before after
  --- before	2024-05-27 12:50:47.708863932 -0300
  +++ after	2024-05-27 12:51:15.335113123 -0300
  @@ -1,6 +1,7 @@
   static const char *x86_irq_vectors[] = {
   	[0x02] = "NMI",
   	[0x80] = "IA32_SYSCALL",
  +	[0xeb] = "POSTED_MSI_NOTIFICATION",
   	[0xec] = "LOCAL_TIMER",
   	[0xed] = "HYPERV_STIMER0",
   	[0xee] = "HYPERV_REENLIGHTENMENT",
  $

Now those will be known when pretty printing the irq_vectors:*
tracepoints.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZlS34M0x30EFVhbg@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27 13:42:18 -03:00
Arnaldo Carvalho de Melo
a3eed53bee perf beauty: Update copy of linux/socket.h with the kernel sources
To pick up the fixes in:

  0645fbe760 ("net: have do_accept() take a struct proto_accept_arg argument")

That just changes a function prototype, not touching things used by the
perf scrape scripts such as:

  $ tools/perf/trace/beauty/sockaddr.sh | head -5
  static const char *socket_families[] = {
  	[0] = "UNSPEC",
  	[1] = "LOCAL",
  	[2] = "INET",
  	[3] = "AX25",
  $

This addresses this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlSrceExgjrUiDb5@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27 12:49:18 -03:00
Arnaldo Carvalho de Melo
1437a9f06f tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY
There is no scrape script yet for those, but the warning pointed out we
need to update the array with the F_LINUX_SPECIFIC_BASE entries, do it.

Now 'perf trace' can decode that cmd and also use it in filter, as in:

  root@number:~# perf trace -e syscalls:*enter_fcntl --filter 'cmd != SETFL && cmd != GETFL'
     0.000 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLK, arg: 0x7fffdc6a8a50)
     0.013 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a8aa0)
     0.090 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a88e0)
  ^Croot@number:~#

This picks up the changes in:

  c62b758bae ("fcntl: add F_DUPFD_QUERY fcntl()")

Addressing this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZlSqNQH9mFw2bmjq@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27 12:44:09 -03:00
Ritesh Harjani (IBM)
b0c6bcd58d xfs: Add cond_resched to block unmap range and reflink remap path
An async dio write to a sparse file can generate a lot of extents
and when we unlink this file (using rm), the kernel can be busy in umapping
and freeing those extents as part of transaction processing.

Similarly xfs reflink remapping path can also iterate over a million
extent entries in xfs_reflink_remap_blocks().

Since we can busy loop in these two functions, so let's add cond_resched()
to avoid softlockup messages like these.

watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S  L   6.9.0-rc5-0-default #1
Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
Call Trace:
  xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
  xfs_alloc_compute_aligned+0x5c/0x144
  xfs_alloc_ag_vextent_size+0x238/0x8d4
  xfs_alloc_fix_freelist+0x540/0x694
  xfs_free_extent_fix_freelist+0x84/0xe0
  __xfs_free_extent+0x74/0x1ec
  xfs_extent_free_finish_item+0xcc/0x214
  xfs_defer_finish_one+0x194/0x388
  xfs_defer_finish_noroll+0x1b4/0x5c8
  xfs_defer_finish+0x2c/0xc4
  xfs_bunmapi_range+0xa4/0x100
  xfs_itruncate_extents_flags+0x1b8/0x2f4
  xfs_inactive_truncate+0xe0/0x124
  xfs_inactive+0x30c/0x3e0
  xfs_inodegc_worker+0x140/0x234
  process_scheduled_works+0x240/0x57c
  worker_thread+0x198/0x468
  kthread+0x138/0x140
  start_kernel_thread+0x14/0x18

run fstests generic/175 at 2024-02-02 04:40:21
[   C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
 watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
 CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0
 NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
 LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
 Call Trace:
  0xc000000014107c00 (unreliable)
  __xfs_btree_updkeys+0x8c/0x2c0 [xfs]
  xfs_btree_update_keys+0x150/0x170 [xfs]
  xfs_btree_lshift+0x534/0x660 [xfs]
  xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
  xfs_btree_insrec+0x4e4/0x630 [xfs]
  xfs_btree_insert+0x104/0x2d0 [xfs]
  xfs_rmap_insert+0xc4/0x260 [xfs]
  xfs_rmap_map_shared+0x228/0x630 [xfs]
  xfs_rmap_finish_one+0x2d4/0x350 [xfs]
  xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
  xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
  __xfs_trans_commit+0x1f4/0x400 [xfs]
  xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
  xfs_reflink_remap_blocks+0x154/0x320 [xfs]
  xfs_file_remap_range+0x138/0x3a0 [xfs]
  do_clone_file_range+0x11c/0x2f0
  vfs_clone_file_range+0x60/0x1c0
  ioctl_file_clone+0x78/0x140
  sys_ioctl+0x934/0x1270
  system_call_exception+0x158/0x320
  system_call_vectored_common+0x15c/0x2ec

Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 20:50:35 +05:30
Arnaldo Carvalho de Melo
0efc88e444 tools headers UAPI: Sync linux/prctl.h with the kernel sources
To pick the changes in:

  628d701f2d ("powerpc/dexcr: Add DEXCR prctl interface")
  6b9391b581 ("riscv: Include riscv_set_icache_flush_ctx prctl")

That adds some PowerPC and a RISC-V specific prctl options:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  --- before	2024-05-27 12:14:21.358032781 -0300
  +++ after	2024-05-27 12:14:32.364530185 -0300
  @@ -65,6 +65,9 @@
   	[68] = "GET_MEMORY_MERGE",
   	[69] = "RISCV_V_SET_CONTROL",
   	[70] = "RISCV_V_GET_CONTROL",
  +	[71] = "RISCV_SET_ICACHE_FLUSH_CTX",
  +	[72] = "PPC_GET_DEXCR",
  +	[73] = "PPC_SET_DEXCR",
   };
   static const char *prctl_set_mm_options[] = {
   	[1] = "START_CODE",
  $

That now will be used to decode the syscall option and also to compose
filters, for instance:

  [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME
       0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee)
       0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670)
       7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10)
       7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970)
       8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10)
       8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970)
  ^C[root@five ~]#

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/lkml/ZlSklGWp--v_Ije7@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27 12:20:02 -03:00
Linus Torvalds
2bfcfd584f Merge tag 'pmdomain-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fix from Ulf Hansson:

 - Fix regression in gpcv2 PM domain for i.MX8

* tag 'pmdomain-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: imx: gpcv2: Add delay after power up handshake
2024-05-27 08:18:31 -07:00
Christoph Hellwig
c8c1f7012b dm: make dm_set_zones_restrictions work on the queue limits
Don't stuff the values directly into the queue without any
synchronization, but instead delay applying the queue limits in
the caller and let dm_set_zones_restrictions work on the limit
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:39 -06:00
Christoph Hellwig
5e7a4bbcc3 dm: remove dm_check_zoned
Fold it into the only caller in preparation to changes in the
queue limits setup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:39 -06:00
Christoph Hellwig
d9780064b1 dm: move setting zoned_enabled to dm_table_set_restrictions
Keep it together with the rest of the zoned code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:39 -06:00
Christoph Hellwig
80e4e17ac9 block: remove blk_queue_max_integrity_segments
This is unused now that all the atomic queue limit conversions are
merged.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240521221606.393040-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:22 -06:00
Arnaldo Carvalho de Melo
e5c7bd4e5c tools include UAPI: Sync linux/stat.h with the kernel sources
To get the changes in:

  2a82bb0294 ("statx: stx_subvol")

To pick up this change and support it:

  $ tools/perf/trace/beauty/statx_mask.sh > before
  $ cp include/uapi/linux/stat.h tools/perf/trace/beauty/include/uapi/linux/stat.h
  $ tools/perf/trace/beauty/statx_mask.sh > after
  $ diff -u before after
  --- before	2024-05-22 13:39:49.742470571 -0300
  +++ after	2024-05-22 13:39:59.157883101 -0300
  @@ -14,4 +14,5 @@
   	[ilog2(0x00001000) + 1] = "MNT_ID",
   	[ilog2(0x00002000) + 1] = "DIOALIGN",
   	[ilog2(0x00004000) + 1] = "MNT_ID_UNIQUE",
  +	[ilog2(0x00008000) + 1] = "SUBVOL",
   };
  $

Now we'll see it like we see these:

  # perf trace -e statx
     0.000 ( 0.015 ms): systemd-userwo/3982299 statx(dfd: 6, filename: ".", mask: TYPE|INO|MNT_ID, buffer: 0x7ffd8945e850) = 0
     <SNIP>
   180.559 ( 0.007 ms): (ostnamed)/3982957 statx(dfd: 4, filename: "sys", flags: SYMLINK_NOFOLLOW|NO_AUTOMOUNT|STATX_DONT_SYNC, mask: TYPE, buffer: 0x7fff13161190) = 0
   180.918 ( 0.011 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/kernel/security", flags: SYMLINK_NOFOLLOW|NO_AUTOMOUNT|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0
   180.956 ( 0.010 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/fs/cgroup", flags: SYMLINK_NOFOLLOW|NO_AUTOMOUNT|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0
   <SNIP>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zk5nO9yT0oPezUoo@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27 12:13:51 -03:00
Linus Torvalds
e4c07ec89e Merge tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Fix io_uring based write-through after converting cifs to use the
   netfs library

 - Fix aio error handling when doing write-through via netfs library

 - Fix performance regression in iomap when used with non-large folio
   mappings

 - Fix signalfd error code

 - Remove obsolete comment in signalfd code

 - Fix async request indication in netfs_perform_write() by raising
   BDP_ASYNC when IOCB_NOWAIT is set

 - Yield swap device immediately to prevent spurious EBUSY errors

 - Don't cross a .backup mountpoint from backup volumes in afs to avoid
   infinite loops

 - Fix a race between umount and async request completion in 9p after 9p
   was converted to use the netfs library

* tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs, 9p: Fix race between umount and async request completion
  afs: Don't cross .backup mountpoint from backup volume
  swap: yield device immediately
  netfs: Fix setting of BDP_ASYNC from iocb flags
  signalfd: drop an obsolete comment
  signalfd: fix error return code
  iomap: fault in smaller chunks for non-large folio mappings
  filemap: add helper mapping_max_folio_size()
  netfs: Fix AIO error handling when doing write-through
  netfs: Fix io_uring based write-through
2024-05-27 08:09:12 -07:00
Lukas Bulwahn
82d71b53d7 Documentation/core-api: correct reference to SWIOTLB_DYNAMIC
Commit c93f261dfc ("Documentation/core-api: add swiotlb documentation")
accidentally refers to CONFIG_DYNAMIC_SWIOTLB in one place, while the
config is actually called CONFIG_SWIOTLB_DYNAMIC.

Correct the reference to the intended config option.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Reviewed-by: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-27 16:52:09 +02:00
Fedor Pchelkin
6cb05d89fd dma-buf: handle testing kthreads creation failure
kthread creation may possibly fail inside race_signal_callback(). In
such a case stop the already started threads, put the already taken
references to them and return with error code.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 2989f64510 ("dma-buf: Add selftests for dma-fence")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522181308.841686-1-pchelkin@ispras.ru
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-05-27 16:09:22 +02:00
Charles Keepax
d5d2a5dacb MAINTAINERS: Remove James Schulman from Cirrus audio maintainers
James no longer works for Cirrus Logic, remove him from the list of
maintainers for the Cirrus audio CODEC drivers.

Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://msgid.link/r/20240527101326.440345-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 13:59:42 +01:00
Charles Keepax
8d34c12e87 ASoC: wm_adsp: Add missing MODULE_DESCRIPTION()
wm_adsp is built as a separate module and as such should include a
MODULE_DESCRIPTION() macro.

Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://msgid.link/r/20240527100237.430240-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 13:59:41 +01:00
Charles Keepax
797c525e85 ASoC: cs42l43: Only restrict 44.1kHz for the ASP
The SoundWire interface can always support 44.1kHz using flow controlled
mode, and whether the ASP is in master mode should obviously only affect
the ASP. Update cs42l43_startup() to only restrict the rates for the ASP
DAI.

Fixes: fc918cbe87 ("ASoC: cs42l43: Add support for the cs42l43")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://msgid.link/r/20240527100840.439832-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 13:59:40 +01:00
Carlos López
e569eb3497 tracing/probes: fix error check in parse_btf_field()
btf_find_struct_member() might return NULL or an error via the
ERR_PTR() macro. However, its caller in parse_btf_field() only checks
for the NULL condition. Fix this by using IS_ERR() and returning the
error up the stack.

Link: https://lore.kernel.org/all/20240527094351.15687-1-clopez@suse.de/

Fixes: c440adfbe3 ("tracing/probes: Support BTF based data structure field access")
Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-27 21:32:35 +09:00
David Howells
f89ea63f1c netfs, 9p: Fix race between umount and async request completion
There's a problem in 9p's interaction with netfslib whereby a crash occurs
because the 9p_fid structs get forcibly destroyed during client teardown
(without paying attention to their refcounts) before netfslib has finished
with them.  However, it's not a simple case of deferring the clunking that
p9_fid_put() does as that requires the p9_client record to still be
present.

The problem is that netfslib has to unlock pages and clear the IN_PROGRESS
flag before destroying the objects involved - including the fid - and, in
any case, nothing checks to see if writeback completed barring looking at
the page flags.

Fix this by keeping a count of outstanding I/O requests (of any type) and
waiting for it to quiesce during inode eviction.

Reported-by: syzbot+df038d463cca332e8414@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000005be0aa061846f8d6@google.com/
Reported-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000b86c5e06130da9c6@google.com/
Reported-by: syzbot+1527696d41a634cc1819@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000041f960618206d7e@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/755891.1716560771@warthog.procyon.org.uk
Tested-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Hillf Danton <hdanton@sina.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Reported-and-tested-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-27 13:12:13 +02:00
Guenter Roeck
779aa4d747 drm/nouveau/nvif: Avoid build error due to potential integer overflows
Trying to build parisc:allmodconfig with gcc 12.x or later results
in the following build error.

drivers/gpu/drm/nouveau/nvif/object.c: In function 'nvif_object_mthd':
drivers/gpu/drm/nouveau/nvif/object.c:161:9: error:
	'memcpy' accessing 4294967264 or more bytes at offsets 0 and 32 overlaps 6442450881 bytes at offset -2147483617 [-Werror=restrict]
  161 |         memcpy(data, args->mthd.data, size);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/nouveau/nvif/object.c: In function 'nvif_object_ctor':
drivers/gpu/drm/nouveau/nvif/object.c:298:17: error:
	'memcpy' accessing 4294967240 or more bytes at offsets 0 and 56 overlaps 6442450833 bytes at offset -2147483593 [-Werror=restrict]
  298 |                 memcpy(data, args->new.data, size);

gcc assumes that 'sizeof(*args) + size' can overflow, which would result
in the problem.

The problem is not new, only it is now no longer a warning but an error
since W=1 has been enabled for the drm subsystem and since Werror is
enabled for test builds.

Rearrange arithmetic and use check_add_overflow() for validating the
allocation size to avoid the overflow. While at it, split assignments
out of if conditions.

Fixes: a61ddb4393 ("drm: enable (most) W=1 warnings by default across the subsystem")
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524134817.1369993-1-linux@roeck-us.net
2024-05-27 13:10:30 +02:00
Rafael J. Wysocki
ae2170d6ea thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid
When a trip point becomes invalid after being crossed on the way up,
it is involved in a mitigation episode that needs to be adjusted to
compensate for the trip going away.

For this reason, introduce thermal_zone_trip_down() as a wrapper
around thermal_trip_crossed() and make thermal_zone_set_trip_temp()
call it if the new temperature of the trip at hand is equal to
THERMAL_TEMP_INVALID and it has been crossed on the way up to trigger
all of the necessary adjustments in user space, the thermal debug
code and the zone governor.

Fixes: 8c69a777e4 ("thermal: core: Fix the handling of invalid trip points")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki
cb573eec60 thermal: core: Introduce thermal_trip_crossed()
Add a helper function called thermal_trip_crossed() to be invoked by
__thermal_zone_device_update() in order to notify user space, the
thermal debug code and the zone governor about trip crossing.

Subsequently, this will also be used in the case when a trip point
becomes invalid after being crossed on the way up.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki
5a599e10e5 thermal/debugfs: Allow tze_seq_show() to print statistics for invalid trips
Commit a6258fde8d ("thermal/debugfs: Make tze_seq_show() skip invalid
trips and trips with no stats") modified tze_seq_show() to skip invalid
trips, but it overlooked the fact that a trip may become invalid during
a mitigation eposide involving it, in which case its statistics should
still be reported.

For this reason, remove the invalid trip temperature check from the
main loop in tze_seq_show().

The trips that have never been valid will still be skipped after this
change because there are no statistics to report for them.

Fixes: a6258fde8d ("thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki
9e69acc1de thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()
The temperature and hysteresis of a trip point may change during a
mitigation episode it is involved in (it may even become invalid
altogether), so in order to avoid possible confusion related to that,
store the temperature and hysteresis of trip points at the time they
are crossed on the way up and print those values instead of their
current temperature and hysteresis.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Parthiban Veerasooran
52a2f06083 net: usb: smsc95xx: fix changing LED_SEL bit value updated from EEPROM
LED Select (LED_SEL) bit in the LED General Purpose IO Configuration
register is used to determine the functionality of external LED pins
(Speed Indicator, Link and Activity Indicator, Full Duplex Link
Indicator). The default value for this bit is 0 when no EEPROM is
present. If a EEPROM is present, the default value is the value of the
LED Select bit in the Configuration Flags of the EEPROM. A USB Reset or
Lite Reset (LRST) will cause this bit to be restored to the image value
last loaded from EEPROM, or to be set to 0 if no EEPROM is present.

While configuring the dual purpose GPIO/LED pins to LED outputs in the
LED General Purpose IO Configuration register, the LED_SEL bit is changed
as 0 and resulting the configured value from the EEPROM is cleared. The
issue is fixed by using read-modify-write approach.

Fixes: f293501c61 ("smsc95xx: configure LED outputs")
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Woojung Huh <woojung.huh@microchip.com>
Link: https://lore.kernel.org/r/20240523085314.167650-1-Parthiban.Veerasooran@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-27 12:48:23 +02:00
Armin Wolf
c4bd7f1d78 ACPI: EC: Avoid returning AE_OK on errors in address space handler
If an error code other than EINVAL, ENODEV or ETIME is returned
by acpi_ec_read() / acpi_ec_write(), then AE_OK is incorrectly
returned by acpi_ec_space_handler().

Fix this by only returning AE_OK on success, and return AE_ERROR
otherwise.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 12:43:29 +02:00
Armin Wolf
f6f172dc6a ACPI: EC: Abort address space access upon error
When a multi-byte address space access is requested, acpi_ec_read()/
acpi_ec_write() is being called multiple times.

Abort such operations if a single call to acpi_ec_read() /
acpi_ec_write() fails, as the data read from / written to the EC
might be incomplete.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 12:43:29 +02:00
Dan Williams
7ff6c798ec ACPI: APEI: EINJ: Fix einj_dev release leak
The platform driver conversion of EINJ mistakenly used
platform_device_del() to unwind platform_device_register_full() at
module exit. This leads to a small leak of one 'struct platform_device'
instance per module load/unload cycle. Switch to
platform_device_unregister() which performs both device_del() and final
put_device().

Fixes: 5621fafaac ("EINJ: Migrate to a platform driver")
Cc: 6.9+ <stable@vger.kernel.org> # 6.9+
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 12:35:37 +02:00
Darrick J. Wong
95b19e2f4e xfs: don't open-code u64_to_user_ptr
Don't open-code what the kernel already provides.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:55:52 +05:30
Darrick J. Wong
38de567906 xfs: allow symlinks with short remote targets
An internal user complained about log recovery failing on a symlink
("Bad dinode after recovery") with the following (excerpted) format:

core.magic = 0x494e
core.mode = 0120777
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.nextents = 1
core.size = 297
core.nblocks = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,12,1,0]

This is a symbolic link with a 297-byte target stored in a disk block,
which is to say this is a symlink with a remote target.  The forkoff is
0, which is to say that there's 512 - 176 == 336 bytes in the inode core
to store the data fork.

Eventually, testing of generic/388 failed with the same inode corruption
message during inode recovery.  In writing a debugging patch to call
xfs_dinode_verify on dirty inode log items when we're committing
transactions, I observed that xfs/298 can reproduce the problem quite
quickly.

xfs/298 creates a symbolic link, adds some extended attributes, then
deletes them all.  The test failure occurs when the final removexattr
also deletes the attr fork because that does not convert the remote
symlink back into a shortform symlink.  That is how we trip this test.
The only reason why xfs/298 only triggers with the debug patch added is
that it deletes the symlink, so the final iflush shows the inode as
free.

I wrote a quick fstest to emulate the behavior of xfs/298, except that
it leaves the symlinks on the filesystem after inducing the "corrupt"
state.  Kernels going back at least as far as 4.18 have written out
symlink inodes in this manner and prior to 1eb70f54c4 they did not
object to reading them back in.

Because we've been writing out inodes this way for quite some time, the
only way to fix this is to relax the check for symbolic links.
Directories don't have this problem because di_size is bumped to
blocksize during the sf->data conversion.

Fixes: 1eb70f54c4 ("xfs: validate inode fork size against fork format")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:55:52 +05:30
Darrick J. Wong
97835e6866 xfs: fix xfs_init_attr_trans not handling explicit operation codes
When we were converting the attr code to use an explicit operation code
instead of keying off of attr->value being null, we forgot to change the
code that initializes the transaction reservation.  Split the function
into two helpers that handle the !remove and remove cases, then fix both
callsites to handle this correctly.

Fixes: c27411d4c6 ("xfs: make attr removal an explicit operation")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:55:52 +05:30
Darrick J. Wong
2b3f004d3d xfs: drop xfarray sortinfo folio on error
Chandan Babu reports the following livelock in xfs/708:

 run fstests xfs/708 at 2024-05-04 15:35:29
 XFS (loop16): EXPERIMENTAL online scrub feature in use. Use at your own risk!
 XFS (loop5): Mounting V5 Filesystem e96086f0-a2f9-4424-a1d5-c75d53d823be
 XFS (loop5): Ending clean mount
 XFS (loop5): Quotacheck needed: Please wait.
 XFS (loop5): Quotacheck: Done.
 XFS (loop5): EXPERIMENTAL online scrub feature in use. Use at your own risk!
 INFO: task xfs_io:143725 blocked for more than 122 seconds.
       Not tainted 6.9.0-rc4+ #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:xfs_io          state:D stack:0     pid:143725 tgid:143725 ppid:117661 flags:0x00004006
 Call Trace:
  <TASK>
  __schedule+0x69c/0x17a0
  schedule+0x74/0x1b0
  io_schedule+0xc4/0x140
  folio_wait_bit_common+0x254/0x650
  shmem_undo_range+0x9d5/0xb40
  shmem_evict_inode+0x322/0x8f0
  evict+0x24e/0x560
  __dentry_kill+0x17d/0x4d0
  dput+0x263/0x430
  __fput+0x2fc/0xaa0
  task_work_run+0x132/0x210
  get_signal+0x1a8/0x1910
  arch_do_signal_or_restart+0x7b/0x2f0
  syscall_exit_to_user_mode+0x1c2/0x200
  do_syscall_64+0x72/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The shmem code is trying to drop all the folios attached to a shmem
file and gets stuck on a locked folio after a bnobt repair.  It looks
like the process has a signal pending, so I started looking for places
where we lock an xfile folio and then deal with a fatal signal.

I found a bug in xfarray_sort_scan via code inspection.  This function
is called to set up the scanning phase of a quicksort operation, which
may involve grabbing a locked xfile folio.  If we exit the function with
an error code, the caller does not call xfarray_sort_scan_done to put
the xfile folio.  If _sort_scan returns an error code while si->folio is
set, we leak the reference and never unlock the folio.

Therefore, change xfarray_sort to call _scan_done on exit.  This is safe
to call multiple times because it sets si->folio to NULL and ignores a
NULL si->folio.  Also change _sort_scan to use an intermediate variable
so that we never pollute si->folio with an errptr.

Fixes: 232ea05277 ("xfs: enable sorting of xfile-backed arrays")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:55:52 +05:30
John Garry
b33874fb7f xfs: Stop using __maybe_unused in xfs_alloc.c
In both xfs_alloc_cur_finish() and xfs_alloc_ag_vextent_exact(), local
variable @afg is tagged as __maybe_unused. Otherwise an unused variable
warning would be generated for when building with W=1 and CONFIG_XFS_DEBUG
unset. In both cases, the variable is unused as it is only referenced in
an ASSERT() call, which is compiled out (in this config).

It is generally a poor programming style to use __maybe_unused for
variables.

The ASSERT() call is to verify that agbno of the end of the extent is
within bounds for both functions. @afg is used as an intermediate variable
to find the AG length.

However xfs_verify_agbext() already exists to verify a valid extent range.
The arguments for calling xfs_verify_agbext() are already available, so use
that instead.

An advantage of using xfs_verify_agbext() is that it verifies that both the
start and the end of the extent are within the bounds of the AG and
catches overflows.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:54:24 +05:30
John Garry
d7ba701da6 xfs: Clear W=1 warning in xfs_iwalk_run_callbacks()
For CONFIG_XFS_DEBUG unset, xfs_iwalk_run_callbacks() generates the
following warning for when building with W=1:

fs/xfs/xfs_iwalk.c: In function ‘xfs_iwalk_run_callbacks’:
fs/xfs/xfs_iwalk.c:354:42: error: variable ‘irec’ set but not used [-Werror=unused-but-set-variable]
  354 |         struct xfs_inobt_rec_incore     *irec;
      |                                          ^~~~
cc1: all warnings being treated as errors

Drop @irec, as it is only an intermediate variable.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:54:24 +05:30
Hariprasad Kelam
1684842147 Octeontx2-pf: Free send queue buffers incase of leaf to inner
There are two type of classes. "Leaf classes" that are  the
bottom of the class hierarchy. "Inner classes" that are neither
the root class nor leaf classes. QoS rules can only specify leaf
classes as targets for traffic.

			 Root
		        /  \
		       /    \
                      1      2
                             /\
                            /  \
                           4    5
               classes 1,4 and 5 are leaf classes.
               class 2 is a inner class.

When a leaf class made as inner, or vice versa, resources associated
with send queue (send queue buffers and transmit schedulers) are not
getting freed.

Fixes: 5e6808b4c6 ("octeontx2-pf: Add support for HTB offload")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240523073626.4114-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-27 11:55:47 +02:00
Kuniyuki Iwashima
51d1b25a72 af_unix: Read sk->sk_hash under bindlock during bind().
syzkaller reported data-race of sk->sk_hash in unix_autobind() [0],
and the same ones exist in unix_bind_bsd() and unix_bind_abstract().

The three bind() functions prefetch sk->sk_hash locklessly and
use it later after validating that unix_sk(sk)->addr is NULL under
unix_sk(sk)->bindlock.

The prefetched sk->sk_hash is the hash value of unbound socket set
in unix_create1() and does not change until bind() completes.

There could be a chance that sk->sk_hash changes after the lockless
read.  However, in such a case, non-NULL unix_sk(sk)->addr is visible
under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using
the prefetched value.

The KCSAN splat is false-positive, but let's silence it by reading
sk->sk_hash under unix_sk(sk)->bindlock.

[0]:
BUG: KCSAN: data-race in unix_autobind / unix_autobind

write to 0xffff888034a9fb88 of 4 bytes by task 4468 on cpu 0:
 __unix_set_addr_hash net/unix/af_unix.c:331 [inline]
 unix_autobind+0x47a/0x7d0 net/unix/af_unix.c:1185
 unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373
 __sys_connect_file+0xd7/0xe0 net/socket.c:2048
 __sys_connect+0x114/0x140 net/socket.c:2065
 __do_sys_connect net/socket.c:2075 [inline]
 __se_sys_connect net/socket.c:2072 [inline]
 __x64_sys_connect+0x40/0x50 net/socket.c:2072
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x46/0x4e

read to 0xffff888034a9fb88 of 4 bytes by task 4465 on cpu 1:
 unix_autobind+0x28/0x7d0 net/unix/af_unix.c:1134
 unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373
 __sys_connect_file+0xd7/0xe0 net/socket.c:2048
 __sys_connect+0x114/0x140 net/socket.c:2065
 __do_sys_connect net/socket.c:2075 [inline]
 __se_sys_connect net/socket.c:2072 [inline]
 __x64_sys_connect+0x40/0x50 net/socket.c:2072
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x46/0x4e

value changed: 0x000000e4 -> 0x000001e3

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 6.8.0-12822-gcd51db110a7e #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: afd20b9290 ("af_unix: Replace the big lock with small locks.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240522154218.78088-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-27 11:46:56 +02:00
Kuniyuki Iwashima
97e1db06c7 af_unix: Annotate data-race around unix_sk(sk)->addr.
Once unix_sk(sk)->addr is assigned under net->unx.table.locks and
unix_sk(sk)->bindlock, *(unix_sk(sk)->addr) and unix_sk(sk)->path are
fully set up, and unix_sk(sk)->addr is never changed.

unix_getname() and unix_copy_addr() access the two fields locklessly,
and commit ae3b564179 ("missing barriers in some of unix_sock ->addr
and ->path accesses") added smp_store_release() and smp_load_acquire()
pairs.

In other functions, we still read unix_sk(sk)->addr locklessly to check
if the socket is bound, and KCSAN complains about it.  [0]

Given these functions have no dependency for *(unix_sk(sk)->addr) and
unix_sk(sk)->path, READ_ONCE() is enough to annotate the data-race.

Note that it is safe to access unix_sk(sk)->addr locklessly if the socket
is found in the hash table.  For example, the lockless read of otheru->addr
in unix_stream_connect() is safe.

Note also that newu->addr there is of the child socket that is still not
accessible from userspace, and smp_store_release() publishes the address
in case the socket is accept()ed and unix_getname() / unix_copy_addr()
is called.

[0]:
BUG: KCSAN: data-race in unix_bind / unix_listen

write (marked) to 0xffff88805f8d1840 of 8 bytes by task 13723 on cpu 0:
 __unix_set_addr_hash net/unix/af_unix.c:329 [inline]
 unix_bind_bsd net/unix/af_unix.c:1241 [inline]
 unix_bind+0x881/0x1000 net/unix/af_unix.c:1319
 __sys_bind+0x194/0x1e0 net/socket.c:1847
 __do_sys_bind net/socket.c:1858 [inline]
 __se_sys_bind net/socket.c:1856 [inline]
 __x64_sys_bind+0x40/0x50 net/socket.c:1856
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x46/0x4e

read to 0xffff88805f8d1840 of 8 bytes by task 13724 on cpu 1:
 unix_listen+0x72/0x180 net/unix/af_unix.c:734
 __sys_listen+0xdc/0x160 net/socket.c:1881
 __do_sys_listen net/socket.c:1890 [inline]
 __se_sys_listen net/socket.c:1888 [inline]
 __x64_sys_listen+0x2e/0x40 net/socket.c:1888
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x46/0x4e

value changed: 0x0000000000000000 -> 0xffff88807b5b1b40

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 13724 Comm: syz-executor.4 Not tainted 6.8.0-12822-gcd51db110a7e #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240522154002.77857-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-27 11:46:31 +02:00
Geliang Tang
21a22ed618 selftests: hsr: Fix "File exists" errors for hsr_ping
The hsr_ping test reports the following errors:

 INFO: preparing interfaces for HSRv0.
 INFO: Initial validation ping.
 INFO: Longer ping test.
 INFO: Cutting one link.
 INFO: Delay the link and drop a few packages.
 INFO: All good.
 INFO: preparing interfaces for HSRv1.
 RTNETLINK answers: File exists
 RTNETLINK answers: File exists
 RTNETLINK answers: File exists
 RTNETLINK answers: File exists
 RTNETLINK answers: File exists
 RTNETLINK answers: File exists
 Error: ipv4: Address already assigned.
 Error: ipv6: address already assigned.
 Error: ipv4: Address already assigned.
 Error: ipv6: address already assigned.
 Error: ipv4: Address already assigned.
 Error: ipv6: address already assigned.
 INFO: Initial validation ping.

That is because the cleanup code for the 2nd round test before
"setup_hsr_interfaces 1" is removed incorrectly in commit 680fda4f67
("test: hsr: Remove script code already implemented in lib.sh").

This patch fixes it by re-setup the namespaces using

	setup_ns ns1 ns2 ns3

command before "setup_hsr_interfaces 1". It deletes previous namespaces
and create new ones.

Fixes: 680fda4f67 ("test: hsr: Remove script code already implemented in lib.sh")
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/6485d3005f467758d49f0f313c8c009759ba6b05.1716374462.git.tanggeliang@kylinos.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-27 11:44:42 +02:00
hmtheboy154
3050052613 platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro
The "EZpad 6s Pro" uses the same touchscreen as the "EZpad 6 Pro B",
unlike the "Ezpad 6 Pro" which has its own touchscreen.

Signed-off-by: hmtheboy154 <buingoc67@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240527091447.248849-3-hdegoede@redhat.com
2024-05-27 11:43:03 +02:00
hmtheboy154
7c8639aa41 platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet
This is a tablet created by GlobalSpace Technologies Limited
which uses an Intel Atom x5-Z8300, 4GB of RAM & 64GB of storage.

Link: https://web.archive.org/web/20171102141952/http://globalspace.in/11.6-device.html
Signed-off-by: hmtheboy154 <buingoc67@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240527091447.248849-2-hdegoede@redhat.com
2024-05-27 11:43:03 +02:00
Hans de Goede
0b178b0267 platform/x86: touchscreen_dmi: Add support for setting touchscreen properties from cmdline
On x86/ACPI platforms touchscreens mostly just work without needing any
device/model specific configuration. But in some cases (mostly with Silead
and Goodix touchscreens) it is still necessary to manually specify various
touchscreen-properties on a per model basis.

touchscreen_dmi is a special place for DMI quirks for this, but it can be
challenging for users to figure out the right property values, especially
for Silead touchscreens where non of these can be read back from
the touchscreen-controller.

ATM users can only test touchscreen properties by editing touchscreen_dmi.c
and then building a completely new kernel which makes it unnecessary
difficult for users to test and submit properties when necessary for their
laptop / tablet model.

Add support for specifying properties on the kernel commandline to allow
users to easily figure out the right settings. See the added documentation
in kernel-parameters.txt for the commandline syntax.

Cc: Gregor Riepl <onitake@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240523143601.47555-1-hdegoede@redhat.com
2024-05-27 11:42:57 +02:00
Martin Tůma
825fc49497 media: mgb4: Fix double debugfs remove
Fixes an error where debugfs_remove_recursive() is called first on a parent
directory and then again on a child which causes a kernel panic.

Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 0ab13674a9 ("media: pci: mgb4: Added Digiteq Automotive MGB4 driver")
Cc: <stable@vger.kernel.org>
[hverkuil: added Fixes/Cc tags]
2024-05-27 11:33:56 +02:00
Steven Rostedt (Google)
5d059bf2b1 platform/x86: thinkpad_acpi: Select INPUT_SPARSEKMAP in Kconfig
Now that drivers/platform/x86/thinkpad_acpi.c uses
sparse_keymap_report_event(), it must select INPUT_SPARSEKMAP in its
Kconfig option otherwise the build fails with:

  ld: vmlinux.o: in function `tpacpi_input_send_key':
  thinkpad_acpi.c:(.text+0xd4d27f): undefined reference to `sparse_keymap_report_event'
  ld: vmlinux.o: in function `hotkey_init':
  thinkpad_acpi.c:(.init.text+0x66cb6): undefined reference to `sparse_keymap_setup'

Fixes: 42f7b965de ("platform/x86: thinkpad_acpi: Switch to using sparse-keymap helpers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240522074813.379b9fc2@gandalf.local.home
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-05-27 11:30:34 +02:00
Hans de Goede
4d6ef1be24 platform/x86: x86-android-tablets: Add "select LEDS_CLASS"
Since the x86-android-tablets now calls devm_led_classdev_register_ext()
it needs to select LEDS_CLASS as well as LEDS_CLASS' NEW_LEDS dependency.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405182256.FsKBjIzG-lkp@intel.com/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240521094741.273397-1-hdegoede@redhat.com
2024-05-27 11:30:15 +02:00
Harshit Mogalapalli
a4edf675ba platform/x86: ISST: fix use-after-free in tpmi_sst_dev_remove()
In tpmi_sst_dev_remove(), tpmi_sst is dereferenced after being freed.
Fix this by reordering the kfree() post the dereference.

Fixes: 9d1d36268f ("platform/x86: ISST: Support partitioned systems")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240517144946.289615-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-05-27 11:25:33 +02:00
Roded Zats
e8021b94b0 enic: Validate length of nl attributes in enic_set_vf_port
enic_set_vf_port assumes that the nl attribute IFLA_PORT_PROFILE
is of length PORT_PROFILE_MAX and that the nl attributes
IFLA_PORT_INSTANCE_UUID, IFLA_PORT_HOST_UUID are of length PORT_UUID_MAX.
These attributes are validated (in the function do_setlink in rtnetlink.c)
using the nla_policy ifla_port_policy. The policy defines IFLA_PORT_PROFILE
as NLA_STRING, IFLA_PORT_INSTANCE_UUID as NLA_BINARY and
IFLA_PORT_HOST_UUID as NLA_STRING. That means that the length validation
using the policy is for the max size of the attributes and not on exact
size so the length of these attributes might be less than the sizes that
enic_set_vf_port expects. This might cause an out of bands
read access in the memcpys of the data of these
attributes in enic_set_vf_port.

Fixes: f8bd909183 ("net: Add ndo_{set|get}_vf_port support for enic dynamic vnics")
Signed-off-by: Roded Zats <rzats@paloaltonetworks.com>
Link: https://lore.kernel.org/r/20240522073044.33519-1-rzats@paloaltonetworks.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-27 11:18:01 +02:00
Jean-Baptiste Maneyrol
95444b9eeb iio: invensense: fix odr switching to same value
ODR switching happens in 2 steps, update to store the new value and then
apply when the ODR change flag is received in the data. When switching to
the same ODR value, the ODR change flag is never happening, and frequency
switching is blocked waiting for the never coming apply.

Fix the issue by preventing update to happen when switching to same ODR
value.

Fixes: 0ecc363cce ("iio: make invensense timestamp module generic")
Cc: stable@vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Link: https://lore.kernel.org/r/20240524124851.567485-1-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Dumitru Ceclan
f00dd89530 iio: adc: ad7173: Remove index from temp channel
Temperature channel is unique per device, index is not needed.
This is breaking userspace: Include fixes tag to be released within the
same rc cycle.

Fixes: 76a1e6a428 ("iio: adc: ad7173: add AD7173 driver")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com>
Link: https://lore.kernel.org/r/20240521-ad7173-fixes-v1-3-8161cc7f3ad1@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Dumitru Ceclan
3450ee7e80 iio: adc: ad7173: Add ad7173_device_info names
Add missing names from the device info struct for 3 models to ensure
consistency with the rest of the models.

Fixes: 76a1e6a428 ("iio: adc: ad7173: add AD7173 driver")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com>
Link: https://lore.kernel.org/r/20240521-ad7173-fixes-v1-2-8161cc7f3ad1@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Dumitru Ceclan
ab6f0ab178 iio: adc: ad7173: fix buffers enablement for ad7176-2
AD7176-2 does not feature input buffers and marks corespondent register
bits as read only. Enable buffers only on supported models.

Fixes: 76a1e6a428 ("iio: adc: ad7173: add AD7173 driver")
Reviewed-by: David Lechner <dlechner@baylibre.com>
Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com>
Link: https://lore.kernel.org/r/20240521-ad7173-fixes-v1-1-8161cc7f3ad1@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Harshit Mogalapalli
a23c14b062 iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
When devm_regmap_init_i2c() fails, regmap_ee could be error pointer,
instead of checking for IS_ERR(regmap_ee), regmap is checked which looks
like a copy paste error.

Fixes: a1d1ba5e1c ("iio: temperature: mlx90635 MLX90635 IR Temperature sensor")
Reviewed-by: Crt Mori<cmo@melexis.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20240513203427.3208696-1-harshit.m.mogalapalli@oracle.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Vasileios Amoiridis
bedb2ccb56 iio: imu: bmi323: Fix trigger notification in case of error
In case of error in the bmi323_trigger_handler() function, the
function exits without calling the iio_trigger_notify_done()
which is responsible for informing the attached trigger that
the process is done and in case there is a .reenable(), to
call it.

Fixes: 8a636db3aa ("iio: imu: Add driver for BMI323 IMU")
Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com>
Link: https://lore.kernel.org/r/20240508155407.139805-1-vassilisamir@gmail.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Marc Ferland
279428df88 iio: dac: ad5592r: fix temperature channel scaling value
The scale value for the temperature channel is (assuming Vref=2.5 and
the datasheet):

    376.7897513

When calculating both val and val2 for the temperature scale we
use (3767897513/25) and multiply it by Vref (here I assume 2500mV) to
obtain:

  2500 * (3767897513/25) ==> 376789751300

Finally we divide with remainder by 10^9 to get:

    val = 376
    val2 = 789751300

However, we return IIO_VAL_INT_PLUS_MICRO (should have been NANO) as
the scale type. So when converting the raw temperature value to the
'processed' temperature value we will get (assuming raw=810,
offset=-753):

    processed = (raw + offset) * scale_val
              = (810 + -753) * 376
	      = 21432

    processed += div((raw + offset) * scale_val2, 10^6)
              += div((810 + -753) * 789751300, 10^6)
	      += 45015
    ==> 66447
    ==> 66.4 Celcius

instead of the expected 21.5 Celsius.

Fix this issue by changing IIO_VAL_INT_PLUS_MICRO to
IIO_VAL_INT_PLUS_NANO.

Fixes: 56ca9db862 ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs")
Signed-off-by: Marc Ferland <marc.ferland@sonatest.com>
Link: https://lore.kernel.org/r/20240501150554.1871390-1-marc.ferland@sonatest.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Adam Rizkalla
0f0f630661 iio: pressure: bmp280: Fix BMP580 temperature reading
Fix overflow issue when storing BMP580 temperature reading and
properly preserve sign of 24-bit data.

Signed-off-by: Adam Rizkalla <ajarizzo@gmail.com>
Tested-By: Vasileios Amoiridis <vassilisamir@gmail.com>
Acked-by: Angel Iglesias <ang.iglesiasg@gmail.com>
Link: https://lore.kernel.org/r/Zin2udkXRD0+GrML@adam-asahi.lan
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
Angelo Dureghello
72d0a20fab dt-bindings: iio: dac: fix ad354xr output range
Fix output range, as per datasheet must be -2.5 to 7.5.

Signed-off-by: Angelo Dureghello <adureghello@baylibre.com>
Fixes: b0a96c5f59 ("dt-bindings: iio: dac: Add adi,ad3552r.yaml")
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20240503185528.2043127-1-adureghello@baylibre.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:20 +01:00
David Lechner
8a01ef749b iio: adc: ad9467: fix scan type sign
According to the IIO documentation, the sign in the scan type should be
lower case. The ad9467 driver was incorrectly using upper case.

Fix by changing to lower case.

Fixes: 4606d0f4b0 ("iio: adc: ad9467: add support for AD9434 high-speed ADC")
Fixes: ad67971202 ("iio: adc: ad9467: add support AD9467 ADC")
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240503-ad9467-fix-scan-type-sign-v1-1-c7a1a066ebb9@baylibre.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-05-27 09:49:19 +01:00
Jason Nader
9e2f46cd87 ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
Commit b8b8b4e0c0 ("ata: ahci: Add Intel Alder Lake-P AHCI controller
to low power chipsets list") added Intel Alder Lake to the ahci_pci_tbl.

Because of the way that the Intel PCS quirk was implemented, having
an explicit entry in the ahci_pci_tbl caused the Intel PCS quirk to
be applied. (The quirk was not being applied if there was no explict
entry.)

Thus, entries that were added to the ahci_pci_tbl also got the Intel
PCS quirk applied.

The quirk was cleaned up in commit 7edbb60592 ("ahci: clean up
intel_pcs_quirk"), such that it is clear which entries that actually
applies the Intel PCS quirk.

Newer Intel AHCI controllers do not need the Intel PCS quirk,
and applying it when not needed actually breaks some platforms.

Do not apply the Intel PCS quirk for Intel Alder Lake.
This is in line with how things worked before commit b8b8b4e0c0 ("ata:
ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list"),
such that certain platforms using Intel Alder Lake will work once again.

Cc: stable@vger.kernel.org # 6.7
Fixes: b8b8b4e0c0 ("ata: ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list")
Signed-off-by: Jason Nader <dev@kayoway.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-05-27 10:07:37 +02:00
Luke D. Jones
2be46155d7 ALSA: hda/realtek: Adjust G814JZR to use SPI init for amp
The 2024 ASUS ROG G814J model is much the same as the 2023 model
and the 2023 16" version. We can use the same Cirrus Amp quirk.

Fixes: 811dd426a9 ("ALSA: hda/realtek: Add quirks for Asus ROG 2024 laptops using CS35L41")
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20240526091032.114545-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-27 09:29:19 +02:00
Takashi Iwai
495000a386 ALSA: core: Remove debugfs at disconnection
The card-specific debugfs entries are removed at the last stage of
card free phase, and it's performed after synchronization of the
closes of all opened fds.  This works fine for most cases, but it can
be potentially problematic for a hotplug device like USB-audio.  Due
to the nature of snd_card_free_when_closed(), the card free isn't
called immediately after the driver removal for a hotplug device, but
it's left until the last fd is closed.  It implies that the card
debugfs entries also remain.  Meanwhile, when a new device is inserted
before the last close and the very same card slot is assigned, the
driver tries to create the card debugfs root again on the very same
path.  This conflicts with the remaining entry, and results in the
kernel warning such as:
  debugfs: Directory 'card0' with parent 'sound' already present!
with the missing debugfs entry afterwards.

For avoiding such conflicts, remove debugfs entries at the device
disconnection phase instead.  The jack kctl debugfs entries get
removed in snd_jack_dev_disconnect() instead of each kctl
private_free.

Fixes: 2d670ea2bd ("ALSA: jack: implement software jack injection via debugfs")
Link: https://lore.kernel.org/r/20240524151256.32521-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-27 09:28:21 +02:00
Jeff Johnson
9ee267a293 fs: smb: common: add missing MODULE_DESCRIPTION() macros
Fix the 'make W=1' warnings:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/smb/common/cifs_arc4.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/smb/common/cifs_md4.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-27 00:44:23 -05:00
Dave Airlie
3e049b6b8f Merge tag 'drm-misc-fixes-2024-05-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:

buddy:
- stop using PAGE_SIZE

shmem-helper:
- avoid kernel panic in mmap()

tests:
- buddy: fix PAGE_SIZE dependency

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523184745.GA11363@localhost.localdomain
2024-05-27 13:47:14 +10:00
Matthew Wilcox (Oracle)
b82b6eeefd bcachefs: Use copy_folio_from_iter_atomic()
copy_page_from_iter_atomic() will be removed at some point.
Also fixup a comment for folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-26 22:30:09 -04:00
Jim Wylder
611b7eb19d regmap-i2c: Subtract reg size from max_write
Currently, when an adapter defines a max_write_len quirk,
the data will be chunked into data sizes equal to the
max_write_len quirk value.  But the payload will be increased by
the size of the register address before transmission.  The
resulting value always ends up larger than the limit set
by the quirk.

Avoid this error by setting regmap's max_write to the quirk's
max_write_len minus the number of bytes for the register and
padding.  This allows the chunking to work correctly for this
limited case without impacting other use-cases.

Signed-off-by: Jim Wylder <jwylder@google.com>
Link: https://msgid.link/r/20240523211437.2839942-1-jwylder@google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-27 01:30:33 +01:00
Jeff Johnson
f94b77709e firewire: add missing MODULE_DESCRIPTION() to test modules
Fix the 'make W=1' warnings:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/firewire/uapi-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/firewire/packet-serdes-test.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240523-md-firewire-uapi-test-v1-1-6be5adcc3aed@quicinc.com
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-27 07:34:58 +09:00
Kent Overstreet
9242a34b76 bcachefs: Fix sb-downgrade validation
Superblock downgrade entries are only two byte aligned, but section
sizes are 8 byte aligned, which means we have to be careful about
overrun checks; an entry that crosses the end of the section is allowed
(and ignored) as long as it has zero errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-26 12:44:12 -04:00
Kent Overstreet
d509cadc3a bcachefs: Fix debug assert
Reported-by: syzbot+a8074a75b8d73328751e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-26 12:40:30 -04:00
Herbert Xu
67ec8cdf29 hwrng: core - Remove add_early_randomness
A potential deadlock was reported with the config file at

https://web.archive.org/web/20240522052129/https://0x0.st/XPN_.txt

In this particular configuration, the deadlock doesn't exist because
the warning triggered at a point before modules were even available.
However, the deadlock can be real because any module loaded would
invoke async_synchronize_full.

The issue is spurious for software crypto algorithms which aren't
themselves involved in async probing.  However, it would be hard to
avoid for a PCI crypto driver using async probing.

In this particular call trace, the problem is easily avoided because
the only reason the module is being requested during probing is the
add_early_randomness call in the hwrng core.  This feature is
vestigial since there is now a kernel thread dedicated to doing
exactly this.

So remove add_early_randomness as it is no longer needed.

Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 1b6d7f9eb1 ("tpm: add session encryption protection to tpm2_get_random()")
Link: https://lore.kernel.org/r/119dc5ed-f159-41be-9dda-1a056f29888d@notapiano/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-26 18:32:16 +08:00
Daniel Borkmann
95348e463e selftests/bpf: Add netkit test for pkt_type
Add a test case to assert that the skb->pkt_type which was set from the BPF
program is retained from the netkit xmit side to the peer's device at tcx
ingress location.

  # ./vmtest.sh -- ./test_progs -t netkit
  [...]
  ./test_progs -t netkit
  [    1.140780] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.141127] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.284601] tsc: Refined TSC clocksource calibration: 3408.006 MHz
  [    1.286672] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd9b189d, max_idle_ns: 440795225691 ns
  [    1.290384] clocksource: Switched to clocksource tsc
  #345     tc_netkit_basic:OK
  #346     tc_netkit_device:OK
  #347     tc_netkit_multi_links:OK
  #348     tc_netkit_multi_opts:OK
  #349     tc_netkit_neigh_links:OK
  #350     tc_netkit_pkt_type:OK
  Summary: 6/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240524163619.26001-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:53:11 -07:00
Daniel Borkmann
998ffeb273 selftests/bpf: Add netkit tests for mac address
This adds simple tests around setting MAC addresses in the different
netkit modes.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240524163619.26001-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:48:57 -07:00
Daniel Borkmann
3998d18426 netkit: Fix pkt_type override upon netkit pass verdict
When running Cilium connectivity test suite with netkit in L2 mode, we
found that compared to tcx a few tests were failing which pushed traffic
into an L7 proxy sitting in host namespace. The problem in particular is
around the invocation of eth_type_trans() in netkit.

In case of tcx, this is run before the tcx ingress is triggered inside
host namespace and thus if the BPF program uses the bpf_skb_change_type()
helper the newly set type is retained. However, in case of netkit, the
late eth_type_trans() invocation overrides the earlier decision from the
BPF program which eventually leads to the test failure.

Instead of eth_type_trans(), split out the relevant parts, meaning, reset
of mac header and call to eth_skb_pkt_type() before the BPF program is run
in order to have the same behavior as with tcx, and refactor a small helper
called eth_skb_pull_mac() which is run in case it's passed up the stack
where the mac header must be pulled. With this all connectivity tests pass.

Fixes: 35dfaad718 ("netkit, bpf: Add bpf programmable net device")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240524163619.26001-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:48:57 -07:00
Daniel Borkmann
d6fe532b74 netkit: Fix setting mac address in l2 mode
When running Cilium connectivity test suite with netkit in L2 mode, we
found that it is expected to be able to specify a custom MAC address for
the devices, in particular, cilium-cni obtains the specified MAC address
by querying the endpoint and sets the MAC address of the interface inside
the Pod. Thus, fix the missing support in netkit for L2 mode.

Fixes: 35dfaad718 ("netkit, bpf: Add bpf programmable net device")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20240524163619.26001-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:48:57 -07:00
Shahab Vahedi
dd6a403795 ARC, bpf: Fix issues reported by the static analyzers
Also updated couple of comments along the way.

One of the issues reported was indeed a bug in the code:

  memset(ctx, 0, sizeof(ctx))      // original line
  memset(ctx, 0, sizeof(*ctx))     // fixed line

That was a nice catch.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405222314.UG5F2NHn-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202405232036.Xqoc3b0J-lkp@intel.com/
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240525035628.1026-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:47:21 -07:00
Alexei Starovoitov
590016ad83 Merge branch 'fix-bpf-multi-uprobe-pid-filtering-logic'
Andrii Nakryiko says:

====================
Fix BPF multi-uprobe PID filtering logic

It turns out that current implementation of multi-uprobe PID filtering logic
is broken. It filters by thread, while the promise is filtering by process.
Patch #1 fixes the logic trivially. The rest is testing and mitigations that
are necessary for libbpf to not break users of USDT programs.

v1->v2:
  - fix selftest in last patch (CI);
  - use semicolon in patch #3 (Jiri).
====================

Link: https://lore.kernel.org/r/20240521163401.3005045-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:46:03 -07:00
Andrii Nakryiko
198034a87d selftests/bpf: extend multi-uprobe tests with USDTs
Validate libbpf's USDT-over-multi-uprobe logic by adding USDTs to
existing multi-uprobe tests. This checks correct libbpf fallback to
singular uprobes (when run on older kernels with buggy PID filtering).
We reuse already established child process and child thread testing
infrastructure, so additions are minimal. These test fail on either
older kernels or older version of libbpf that doesn't detect PID
filtering problems.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:46:02 -07:00
Andrii Nakryiko
70342420a1 selftests/bpf: extend multi-uprobe tests with child thread case
Extend existing multi-uprobe tests to test that PID filtering works
correctly. We already have child *process* tests, but we need also child
*thread* tests. This patch adds spawn_thread() helper to start child
thread, wait for it to be ready, and then instruct it to trigger desired
uprobes.

Additionally, we extend BPF-side code to track thread ID, not just
process ID. Also we detect whether extraneous triggerings with
unexpected process IDs happened, and validate that none of that happened
in practice.

These changes prove that fixed PID filtering logic for multi-uprobe
works as expected. These tests fail on old kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:46:02 -07:00
Andrii Nakryiko
04d939a2ab libbpf: detect broken PID filtering logic for multi-uprobe
Libbpf is automatically (and transparently to user) detecting
multi-uprobe support in the kernel, and, if supported, uses
multi-uprobes to improve USDT attachment speed.

USDTs can be attached system-wide or for the specific process by PID. In
the latter case, we rely on correct kernel logic of not triggering USDT
for unrelated processes.

As such, on older kernels that do support multi-uprobes, but still have
broken PID filtering logic, we need to fall back to singular uprobes.

Unfortunately, whether user is using PID filtering or not is known at
the attachment time, which happens after relevant BPF programs were
loaded into the kernel. Also unfortunately, we need to make a call
whether to use multi-uprobes or singular uprobe for SEC("usdt") programs
during BPF object load time, at which point we have no information about
possible PID filtering.

The distinction between single and multi-uprobes is small, but important
for the kernel. Multi-uprobes get BPF_TRACE_UPROBE_MULTI attach type,
and kernel internally substitiute different implementation of some of
BPF helpers (e.g., bpf_get_attach_cookie()) depending on whether uprobe
is multi or singular. So, multi-uprobes and singular uprobes cannot be
intermixed.

All the above implies that we have to make an early and conservative
call about the use of multi-uprobes. And so this patch modifies libbpf's
existing feature detector for multi-uprobe support to also check correct
PID filtering. If PID filtering is not yet fixed, we fall back to
singular uprobes for USDTs.

This extension to feature detection is simple thanks to kernel's -EINVAL
addition for pid < 0.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:46:02 -07:00
Andrii Nakryiko
4a8f635a60 bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic
get_pid_task() internally already calls rcu_read_lock() and
rcu_read_unlock(), so there is no point to do this one extra time.

This is a drive-by improvement and has no correctness implications.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:46:02 -07:00
Andrii Nakryiko
46ba0e49b6 bpf: fix multi-uprobe PID filtering logic
Current implementation of PID filtering logic for multi-uprobes in
uprobe_prog_run() is filtering down to exact *thread*, while the intent
for PID filtering it to filter by *process* instead. The check in
uprobe_prog_run() also differs from the analogous one in
uprobe_multi_link_filter() for some reason. The latter is correct,
checking task->mm, not the task itself.

Fix the check in uprobe_prog_run() to perform the same task->mm check.

While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
type of lookup, given the intent is to get a representative task of an
entire process. This doesn't change behavior, but seems more logical. It
would hold task group leader task now, not any random thread task.

Last but not least, given multi-uprobe support is half-broken due to
this PID filtering logic (depending on whether PID filtering is
important or not), we need to make it easy for user space consumers
(including libbpf) to easily detect whether PID filtering logic was
already fixed.

We do it here by adding an early check on passed pid parameter. If it's
negative (and so has no chance of being a valid PID), we return -EINVAL.
Previous behavior would eventually return -ESRCH ("No process found"),
given there can't be any process with negative PID. This subtle change
won't make any practical change in behavior, but will allow applications
to detect PID filtering fixes easily. Libbpf fixes take advantage of
this in the next patch.

Cc: stable@vger.kernel.org
Acked-by: Jiri Olsa <jolsa@kernel.org>
Fixes: b733eeade4 ("bpf: Add pid filter support for uprobe_multi link")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:46:02 -07:00
Marc Dionne
29be9100ac afs: Don't cross .backup mountpoint from backup volume
Don't cross a mountpoint that explicitly specifies a backup volume
(target is <vol>.backup) when starting from a backup volume.

It it not uncommon to mount a volume's backup directly in the volume
itself.  This can cause tools that are not paying attention to get
into a loop mounting the volume onto itself as they attempt to
traverse the tree, leading to a variety of problems.

This doesn't prevent the general case of loops in a sequence of
mountpoints, but addresses a common special case in the same way
as other afs clients.

Reported-by: Jan Henrik Sylvester <jan.henrik.sylvester@uni-hamburg.de>
Link: http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.html
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/768760.1716567475@warthog.procyon.org.uk
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-25 14:02:40 +02:00
Scott Mayhew
0c8c7c5597 nfs: don't invalidate dentries on transient errors
This is a slight variation on a patch previously proposed by Neil Brown
that never got merged.

Prior to commit 5ceb9d7fda ("NFS: Refactor nfs_lookup_revalidate()"),
any error from nfs_lookup_verify_inode() other than -ESTALE would result
in nfs_lookup_revalidate() returning that error (-ESTALE is mapped to
zero).

Since that commit, all errors result in nfs_lookup_revalidate()
returning zero, resulting in dentries being invalidated where they
previously were not (particularly in the case of -ERESTARTSYS).

Fix it by passing the actual error code to nfs_lookup_revalidate_done(),
and leaving the decision on whether to  map the error code to zero or
one to nfs_lookup_revalidate_done().

A simple reproducer is to run the following python code in a
subdirectory of an NFS mount (not in the root of the NFS mount):

---8<---
import os
import multiprocessing
import time

if __name__=="__main__":
    multiprocessing.set_start_method("spawn")

    count = 0
    while True:
        try:
            os.getcwd()
            pool = multiprocessing.Pool(10)
            pool.close()
            pool.terminate()
            count += 1
        except Exception as e:
            print(f"Failed after {count} iterations")
            print(e)
            break
---8<---

Prior to commit 5ceb9d7fda, the above code would run indefinitely.
After commit 5ceb9d7fda, it fails almost immediately with -ENOENT.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-24 12:31:11 -04:00
Jan Kara
a527c3ba41 nfs: Avoid flushing many pages with NFS_FILE_SYNC
When we are doing WB_SYNC_ALL writeback, nfs submits write requests with
NFS_FILE_SYNC flag to the server (which then generally treats it as an
O_SYNC write). This helps to reduce latency for single requests but when
submitting more requests, additional fsyncs on the server side hurt
latency. NFS generally avoids this additional overhead by not setting
NFS_FILE_SYNC if desc->pg_moreio is set.

However this logic doesn't always work. When we do random 4k writes to a huge
file and then call fsync(2), each page writeback is going to be sent with
NFS_FILE_SYNC because after preparing one page for writeback, we start writing
back next, nfs_do_writepage() will call nfs_pageio_cond_complete() which finds
the page is not contiguous with previously prepared IO and submits is *without*
setting desc->pg_moreio.  Hence NFS_FILE_SYNC is used resulting in poor
performance.

Fix the problem by setting desc->pg_moreio in nfs_pageio_cond_complete() before
submitting outstanding IO. This improves throughput of
fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-24 12:26:14 -04:00
Kundan Kumar
1bd293fcf3 nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset
bio_vec start offset may be relatively large particularly when large
folio gets added to the bio. A bigger offset will result in avoiding the
single-segment mapping optimization and end up using expensive
mempool_alloc further.

Rather than using absolute value, adjust bv_offset by
NVME_CTRL_PAGE_SIZE while checking if segment can be fitted into one/two
PRP entries.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-24 08:59:16 -07:00
Kanchan Joshi
64e3d02b43 nvme: remove sgs and sws
sgs/sws are unused, so remove these from nvme_ns_head structure.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-24 08:57:40 -07:00
Friedrich Vock
44382b3ed6 bpf: Fix potential integer overflow in resolve_btfids
err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit
at least on 64-bit platforms. If symbols_patch is called on a binary between
2-4GB in size, the result will be negative when cast to a 32-bit integer,
which the code assumes means an error occurred. This can wrongly trigger
build failures when building very large kernel images.

Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240514070931.199694-1-friedrich.vock@gmx.de
2024-05-24 17:12:12 +02:00
Tetsuo Handa
b794918961 dma-buf/sw-sync: don't enable IRQ from sync_print_obj()
Since commit a6aa8fca4d ("dma-buf/sw-sync: Reduce irqsave/irqrestore from
known context") by error replaced spin_unlock_irqrestore() with
spin_unlock_irq() for both sync_debugfs_show() and sync_print_obj() despite
sync_print_obj() is called from sync_debugfs_show(), lockdep complains
inconsistent lock state warning.

Use plain spin_{lock,unlock}() for sync_print_obj(), for
sync_debugfs_show() is already using spin_{lock,unlock}_irq().

Reported-by: syzbot <syzbot+a225ee3df7e7f9372dbe@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=a225ee3df7e7f9372dbe
Fixes: a6aa8fca4d ("dma-buf/sw-sync: Reduce irqsave/irqrestore from known context")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c2e46020-aaa6-4e06-bf73-f05823f913f0@I-love.SAKURA.ne.jp
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-05-24 16:30:05 +02:00
David S. Miller
0b4f5add9f Merge branch 'mlx5-fixes'
Tariq Toukan says:

====================
mlx5 fixes 24-05-22

This patchset provides bug fixes to mlx5 core and Eth drivers.

Series generated against:
commit 9c91c7fadb ("net: mana: Fix the extra HZ in mana_hwc_send_request")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Gal Pressman
83fea49f27 net/mlx5e: Fix UDP GSO for encapsulated packets
When the skb is encapsulated, adjust the inner UDP header instead of the
outer one, and account for UDP header (instead of TCP) in the inline
header size calculation.

Fixes: 689adf0d48 ("net/mlx5e: Add UDP GSO support")
Reported-by: Jason Baron <jbaron@akamai.com>
Closes: https://lore.kernel.org/netdev/c42961cb-50b9-4a9a-bd43-87fe48d88d29@akamai.com/
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Carolina Jubran
5c74195d5d net/mlx5e: Use rx_missed_errors instead of rx_dropped for reporting buffer exhaustion
Previously, the driver incorrectly used rx_dropped to report device
buffer exhaustion.

According to the documentation, rx_dropped should not be used to count
packets dropped due to buffer exhaustion, which is the purpose of
rx_missed_errors.

Use rx_missed_errors as intended for counting packets dropped due to
buffer exhaustion.

Fixes: 269e6b3af3 ("net/mlx5e: Report additional error statistics in get stats ndo")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu
f55cd31287 net/mlx5e: Do not use ptp structure for tx ts stats when not initialized
The ptp channel instance is only initialized when ptp traffic is first
processed by the driver. This means that there is a window in between when
port timestamping is enabled and ptp traffic is sent where the ptp channel
instance is not initialized. Accessing statistics during this window will
lead to an access violation (NULL + member offset). Check the validity of
the instance before attempting to query statistics.

  BUG: unable to handle page fault for address: 0000000000003524
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 109dfc067 P4D 109dfc067 PUD 1064ef067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 420 Comm: ethtool Not tainted 6.9.0-rc2-rrameshbabu+ #245
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/204
  RIP: 0010:mlx5e_stats_ts_get+0x4c/0x130
  <snip>
  Call Trace:
   <TASK>
   ? show_regs+0x60/0x70
   ? __die+0x24/0x70
   ? page_fault_oops+0x15f/0x430
   ? do_user_addr_fault+0x2c9/0x5c0
   ? exc_page_fault+0x63/0x110
   ? asm_exc_page_fault+0x27/0x30
   ? mlx5e_stats_ts_get+0x4c/0x130
   ? mlx5e_stats_ts_get+0x20/0x130
   mlx5e_get_ts_stats+0x15/0x20
  <snip>

Fixes: 3579032c08 ("net/mlx5e: Implement ethtool hardware timestamping statistics")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu
9a52f6d44f net/mlx5e: Fix IPsec tunnel mode offload feature check
Remove faulty check disabling checksum offload and GSO for offload of
simple IPsec tunnel L4 traffic. Comment previously describing the deleted
code incorrectly claimed the check prevented double tunnel (or three layers
of ip headers).

Fixes: f1267798c9 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu
16d66a4fa8 net/mlx5: Use mlx5_ipsec_rx_status_destroy to correctly delete status rules
rx_create no longer allocates a modify_hdr instance that needs to be
cleaned up. The mlx5_modify_header_dealloc call will lead to a NULL pointer
dereference. A leak in the rules also previously occurred since there are
now two rules populated related to status.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 109907067 P4D 109907067 PUD 116890067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 1 PID: 484 Comm: ip Not tainted 6.9.0-rc2-rrameshbabu+ #254
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014
  RIP: 0010:mlx5_modify_header_dealloc+0xd/0x70
  <snip>
  Call Trace:
   <TASK>
   ? show_regs+0x60/0x70
   ? __die+0x24/0x70
   ? page_fault_oops+0x15f/0x430
   ? free_to_partial_list.constprop.0+0x79/0x150
   ? do_user_addr_fault+0x2c9/0x5c0
   ? exc_page_fault+0x63/0x110
   ? asm_exc_page_fault+0x27/0x30
   ? mlx5_modify_header_dealloc+0xd/0x70
   rx_create+0x374/0x590
   rx_add_rule+0x3ad/0x500
   ? rx_add_rule+0x3ad/0x500
   ? mlx5_cmd_exec+0x2c/0x40
   ? mlx5_create_ipsec_obj+0xd6/0x200
   mlx5e_accel_ipsec_fs_add_rule+0x31/0xf0
   mlx5e_xfrm_add_state+0x426/0xc00
  <snip>

Fixes: 94af50c0a9 ("net/mlx5e: Unify esw and normal IPsec status table creation/destruction")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Gal Pressman
1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c7 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Tariq Toukan
fca3b47918 net/mlx5: Do not query MPIR on embedded CPU function
A proper query to MPIR needs to set the correct value in the depth field.
On embedded CPU this value is not necessarily zero. As there is no real
use case for multi-PF netdev on the embedded CPU of the smart NIC, block
this option.

This fixes the following failure:
ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5)

Fixes: 678eb44805 ("net/mlx5: SD, Implement basic query and instantiation")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Maher Sanalla
51ef9305b8 net/mlx5: Lag, do bond only if slaves agree on roce state
Currently, the driver does not enforce that lag bond slaves must have
matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts
to enable roce on all vports of the bond slaves, causing the following
syndrome when one slave has no roce fw support:

mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0)
failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22)

Thus, create HW lag only if bond's slaves agree on roce state,
either all slaves have roce support resulting in a roce lag bond,
or none do, resulting in a raw eth bond.

Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Christian Brauner
712182b67e swap: yield device immediately
Otherwise we can cause spurious EBUSY issues when trying to mount the
rootfs later on.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845
Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:08 +02:00
David Howells
c596bea145 netfs: Fix setting of BDP_ASYNC from iocb flags
Fix netfs_perform_write() to set BDP_ASYNC if IOCB_NOWAIT is set rather
than if IOCB_SYNC is not set.  It reflects asynchronicity in the sense of
not waiting rather than synchronicity in the sense of not returning until
the op is complete.

Without this, generic/590 fails on cifs in strict caching mode with a
complaint that one of the writes fails with EAGAIN.  The test can be
distilled down to:

        mount -t cifs /my/share /mnt -ostuff
        xfs_io -i -c 'falloc 0 8191M -c fsync -f /mnt/file
        xfs_io -i -c 'pwrite -b 1M -W 0 8191M' /mnt/file

Fixes: c38f4e96e6 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/316306.1716306586@warthog.procyon.org.uk
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Jeff Layton <jlayton@kernel.org>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Fedor Pchelkin
65bea99537 signalfd: drop an obsolete comment
Commit fbe38120eb ("signalfd: convert to ->read_iter()") removed the
call to anon_inode_getfd() by splitting fd setup into two parts. Drop the
comment referencing the internal details of that function.

Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://lore.kernel.org/r/20240520090819.76342-2-pchelkin@ispras.ru
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Fedor Pchelkin
f826bc9d6f signalfd: fix error return code
If anon_inode_getfile() fails, return appropriate error code. This looks
like a single typo: the similar code changes in timerfd and userfaultfd
are okay.

Found by Linux Verification Center (linuxtesting.org).

Fixes: fbe38120eb ("signalfd: convert to ->read_iter()")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://lore.kernel.org/r/20240520090819.76342-1-pchelkin@ispras.ru
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Xu Yang
4e527d5841 iomap: fault in smaller chunks for non-large folio mappings
Since commit (5d8edfb900 "iomap: Copy larger chunks from userspace"),
iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
mapping doesn't support large folio, only one page of maximum 4KB will
be created and 4KB data will be writen to pagecache each time. Then,
next 4KB will be handled in next iteration. This will cause potential
write performance problem.

If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again:

start         	end
-
buf,    	buf+2MB
buf+4KB, 	buf+2MB
buf+8KB, 	buf+2MB
...
buf+2044KB 	buf+2MB

Obviously the checking size is wrong since only 4KB will be handled each
time. So this will get a correct chunk to let iomap work well in non-large
folio case.

With this change, the write speed will be stable. Tested on ARM64 device.

Before:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)

After:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-2-xu.yang_2@nxp.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Xu Yang
79c1374548 filemap: add helper mapping_max_folio_size()
Add mapping_max_folio_size() to get the maximum folio size for this
pagecache mapping.

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-1-xu.yang_2@nxp.com
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
David Howells
2c6b531020 netfs: Fix AIO error handling when doing write-through
If an error occurs whilst we're doing an AIO write in write-through mode,
we may end up calling ->ki_complete() *and* returning an error from
->write_iter().  This can result in either a UAF (the ->ki_complete() func
pointer may get overwritten, for example) or a refcount underflow in
io_submit() as ->ki_complete is called twice.

Fix this by making netfs_end_writethrough() - and thus
netfs_perform_write() - unconditionally return -EIOCBQUEUED if we're doing
an AIO write and wait for completion if we're not.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/295052.1716298587@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
David Howells
9b038d004c netfs: Fix io_uring based write-through
This can be triggered by mounting a cifs filesystem with a cache=strict
mount option and then, using the fsx program from xfstests, doing:

        ltp/fsx -A -d -N 1000 -S 11463 -P /tmp /cifs-mount/foo \
          --replay-ops=gen112-fsxops

Where gen112-fsxops holds:

        fallocate 0x6be7 0x8fc5 0x377d3
        copy_range 0x9c71 0x77e8 0x2edaf 0x377d3
        write 0x2776d 0x8f65 0x377d3

The problem is that netfs_io_request::len is being used for two purposes
and ends up getting set to the amount of data we transferred, not the
amount of data the caller asked to be transferred (for various reasons,
such as mmap'd writes, we might end up rounding out the data written to the
server to include the entire folio at each end).

Fix this by keeping the amount we were asked to write in ->len and using
->submitted to track what we issued ops for.  Then, when we come to calling
->ki_complete(), ->len is the right size.

This also required netfs_cleanup_dio_write() to change since we're no
longer advancing wreq->len.  Use wreq->transferred instead as we might have
done a short read.

With this, the generic/112 xfstest passes if cifs is forced to put all
non-DIO opens into write-through mode.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/295086.1716298663@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <stfrench@microsoft.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
Mathieu Othacehe
128d54fbcb net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8061
Following a similar reinstate for the KSZ8081 and KSZ9031.

Older kernels would use the genphy_soft_reset if the PHY did not implement
a .soft_reset.

The KSZ8061 errata described here:
https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8061-Errata-DS80000688B.pdf
and worked around with 232ba3a51c ("net: phy: Micrel KSZ8061: link failure after cable connect")
is back again without this soft reset.

Fixes: 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
Tested-by: Karim Ben Houcine <karim.benhoucine@landisgyr.com>
Signed-off-by: Mathieu Othacehe <othacehe@gnu.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 12:30:38 +01:00
Matt Jan
06e785aeb9 connector: Fix invalid conversion in cn_proc.h
The implicit conversion from unsigned int to enum
proc_cn_event is invalid, so explicitly cast it
for compilation in a C++ compiler.
/usr/include/linux/cn_proc.h: In function 'proc_cn_event valid_event(proc_cn_event)':
/usr/include/linux/cn_proc.h:72:17: error: invalid conversion from 'unsigned int' to 'proc_cn_event' [-fpermissive]
   72 |         ev_type &= PROC_EVENT_ALL;
      |                 ^
      |                 |
      |                 unsigned int

Signed-off-by: Matt Jan <zoo868e@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 10:36:55 +01:00
Martin K. Petersen
d09c05aa35 scsi: core: Handle devices which return an unusually large VPD page count
Peter Schneider reported that a system would no longer boot after
updating to 6.8.4.  Peter bisected the issue and identified commit
b5fc07a5fb ("scsi: core: Consult supported VPD page list prior to
fetching page") as being the culprit.

Turns out the enclosure device in Peter's system reports a byteswapped
page length for VPD page 0. It reports "02 00" as page length instead
of "00 02". This causes us to attempt to access 516 bytes (page length
+ header) of information despite only 2 pages being present.

Limit the page search scope to the size of our VPD buffer to guard
against devices returning a larger page count than requested.

Link: https://lore.kernel.org/r/20240521023040.2703884-1-martin.petersen@oracle.com
Fixes: b5fc07a5fb ("scsi: core: Consult supported VPD page list prior to fetching page")
Cc: stable@vger.kernel.org
Reported-by: Peter Schneider <pschneider1968@googlemail.com>
Closes: https://lore.kernel.org/all/eec6ebbf-061b-4a7b-96dc-ea748aa4d035@googlemail.com/
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-23 20:35:32 -04:00
Andrii Nakryiko
699646734a uprobes: prevent mutex_lock() under rcu_read_lock()
Recent changes made uprobe_cpu_buffer preparation lazy, and moved it
deeper into __uprobe_trace_func(). This is problematic because
__uprobe_trace_func() is called inside rcu_read_lock()/rcu_read_unlock()
block, which then calls prepare_uprobe_buffer() -> uprobe_buffer_get() ->
mutex_lock(&ucb->mutex), leading to a splat about using mutex under
non-sleepable RCU:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 98231, name: stress-ng-sigq
   preempt_count: 0, expected: 0
   RCU nest depth: 1, expected: 0
   ...
   Call Trace:
    <TASK>
    dump_stack_lvl+0x3d/0xe0
    __might_resched+0x24c/0x270
    ? prepare_uprobe_buffer+0xd5/0x1d0
    __mutex_lock+0x41/0x820
    ? ___perf_sw_event+0x206/0x290
    ? __perf_event_task_sched_in+0x54/0x660
    ? __perf_event_task_sched_in+0x54/0x660
    prepare_uprobe_buffer+0xd5/0x1d0
    __uprobe_trace_func+0x4a/0x140
    uprobe_dispatcher+0x135/0x280
    ? uprobe_dispatcher+0x94/0x280
    uprobe_notify_resume+0x650/0xec0
    ? atomic_notifier_call_chain+0x21/0x110
    ? atomic_notifier_call_chain+0xf8/0x110
    irqentry_exit_to_user_mode+0xe2/0x1e0
    asm_exc_int3+0x35/0x40
   RIP: 0033:0x7f7e1d4da390
   Code: 33 04 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b9 01 00 00 00 e9 b2 fc ff ff 66 90 f3 0f 1e fa 31 c9 e9 a5 fc ff ff 0f 1f 44 00 00 <cc> 0f 1e fa b8 27 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 6e
   RSP: 002b:00007ffd2abc3608 EFLAGS: 00000246
   RAX: 0000000000000000 RBX: 0000000076d325f1 RCX: 0000000000000000
   RDX: 0000000076d325f1 RSI: 000000000000000a RDI: 00007ffd2abc3690
   RBP: 000000000000000a R08: 00017fb700000000 R09: 00017fb700000000
   R10: 00017fb700000000 R11: 0000000000000246 R12: 0000000000017ff2
   R13: 00007ffd2abc3610 R14: 0000000000000000 R15: 00007ffd2abc3780
    </TASK>

Luckily, it's easy to fix by moving prepare_uprobe_buffer() to be called
slightly earlier: into uprobe_trace_func() and uretprobe_trace_func(), outside
of RCU locked section. This still keeps this buffer preparation lazy and helps
avoid the overhead when it's not needed. E.g., if there is only BPF uprobe
handler installed on a given uprobe, buffer won't be initialized.

Note, the other user of prepare_uprobe_buffer(), __uprobe_perf_func(), is not
affected, as it doesn't prepare buffer under RCU read lock.

Link: https://lore.kernel.org/all/20240521053017.3708530-1-andrii@kernel.org/

Fixes: 1b8f85defb ("uprobes: prepare uprobe args buffer lazily")
Reported-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-24 07:44:44 +09:00
Mario Limonciello
8195979d2d drm/amd/display: Enable colorspace property for MST connectors
MST colorspace property support was disabled due to a series of warnings
that came up when the device was plugged in since the properties weren't
made at device creation. Create the properties in advance instead.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: 69a9596102 ("drm/amd/display: Temporary Disable MST DP Colorspace Property").
Reported-and-tested-by: Tyler Schneider <tyler.schneider@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3353
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 18:14:14 -04:00
Hawking Zhang
ec58991054 drm/amdgpu: correct hbm field in boot status
hbm filed takes bit 13 and bit 14 in boot status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 18:13:23 -04:00
Sagi Grimberg
f97914e35f nvmet: fix ns enable/disable possible hang
When disabling an nvmet namespace, there is a period where the
subsys->lock is released, as the ns disable waits for backend IO to
complete, and the ns percpu ref to be properly killed. The original
intent was to avoid taking the subsystem lock for a prolong period as
other processes may need to acquire it (for example new incoming
connections).

However, it opens up a window where another process may come in and
enable the ns, (re)intiailizing the ns percpu_ref, causing the disable
sequence to hang.

Solve this by taking the global nvmet_config_sem over the entire configfs
enable/disable sequence.

Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-23 13:44:42 -07:00
Keith Busch
a2e4c5f5f6 nvme-multipath: fix io accounting on failover
There are io stats accounting that needs to be handled, so don't call
blk_mq_end_request() directly. Use the existing nvme_end_req() helper
that already handles everything.

Fixes: d4d957b53d ("nvme-multipath: support io stats on the mpath device")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-23 13:44:42 -07:00
Keith Busch
2fe7b42246 nvme: fix multipath batched completion accounting
Batched completions were missing the io stats accounting and bio trace
events. Move the common code to a helper and call it from the batched
and non-batched functions.

Fixes: d4d957b53d ("nvme-multipath: support io stats on the mpath device")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-23 13:44:35 -07:00
Sean Christopherson
b4bd556467 KVM: SVM: WARN on vNMI + NMI window iff NMIs are outright masked
When requesting an NMI window, WARN on vNMI support being enabled if and
only if NMIs are actually masked, i.e. if the vCPU is already handling an
NMI.  KVM's ABI for NMIs that arrive simultanesouly (from KVM's point of
view) is to inject one NMI and pend the other.  When using vNMI, KVM pends
the second NMI simply by setting V_NMI_PENDING, and lets the CPU do the
rest (hardware automatically sets V_NMI_BLOCKING when an NMI is injected).

However, if KVM can't immediately inject an NMI, e.g. because the vCPU is
in an STI shadow or is running with GIF=0, then KVM will request an NMI
window and trigger the WARN (but still function correctly).

Whether or not the GIF=0 case makes sense is debatable, as the intent of
KVM's behavior is to provide functionality that is as close to real
hardware as possible.  E.g. if two NMIs are sent in quick succession, the
probability of both NMIs arriving in an STI shadow is infinitesimally low
on real hardware, but significantly larger in a virtual environment, e.g.
if the vCPU is preempted in the STI shadow.  For GIF=0, the argument isn't
as clear cut, because the window where two NMIs can collide is much larger
in bare metal (though still small).

That said, KVM should not have divergent behavior for the GIF=0 case based
on whether or not vNMI support is enabled.  And KVM has allowed
simultaneous NMIs with GIF=0 for over a decade, since commit 7460fb4a34
("KVM: Fix simultaneous NMIs").  I.e. KVM's GIF=0 handling shouldn't be
modified without a *really* good reason to do so, and if KVM's behavior
were to be modified, it should be done irrespective of vNMI support.

Fixes: fa4c027a79 ("KVM: x86: Add support for SVM's Virtual NMI")
Cc: stable@vger.kernel.org
Cc: Santosh Shukla <Santosh.Shukla@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240522021435.1684366-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:34:44 -04:00
Sean Christopherson
76d5363c20 KVM: x86: Force KVM_WERROR if the global WERROR is enabled
Force KVM_WERROR if the global WERROR is enabled to avoid pestering the
user about a Kconfig that will ultimately be ignored.  Force KVM_WERROR
instead of making it mutually exclusive with WERROR to avoid generating a
.config builds KVM with -Werror, but has KVM_WERROR=n.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240517180341.974251-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:33:31 -04:00
Sean Christopherson
6af6142e3a KVM: x86: Disable KVM_INTEL_PROVE_VE by default
Disable KVM's "prove #VE" support by default, as it provides no functional
value, and even its sanity checking benefits are relatively limited.  I.e.
it should be fully opt-in even on debug kernels, especially since EPT
Violation #VE suppression appears to be buggy on some CPUs.

Opportunistically add a line in the help text to make it abundantly clear
that KVM_INTEL_PROVE_VE should never be enabled in a production
environment.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:33:15 -04:00
Sean Christopherson
a5dc0c9b55 KVM: VMX: Enumerate EPT Violation #VE support in /proc/cpuinfo
Don't suppress printing EPT_VIOLATION_VE in /proc/cpuinfo, knowing whether
or not KVM_INTEL_PROVE_VE actually does anything is extremely valuable.
A privileged user can get at the information by reading the raw MSR, but
the whole point of the VMX flags is to avoid needing to glean information
from raw MSR reads.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:28:49 -04:00
Sean Christopherson
bca99c0356 KVM: x86/mmu: Print SPTEs on unexpected #VE
Print the SPTEs that correspond to the faulting GPA on an unexpected EPT
Violation #VE to help the user debug failures, e.g. to pinpoint which SPTE
didn't have SUPPRESS_VE set.

Opportunistically assert that the underlying exit reason was indeed an EPT
Violation, as the CPU has *really* gone off the rails if a #VE occurs due
to a completely unexpected exit reason.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:28:45 -04:00
Sean Christopherson
743f177336 KVM: VMX: Dump VMCS on unexpected #VE
Dump the VMCS on an unexpected #VE, otherwise it's practically impossible
to figure out why the #VE occurred.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:28:04 -04:00
Sean Christopherson
837d557aba KVM: x86/mmu: Add sanity checks that KVM doesn't create EPT #VE SPTEs
Assert that KVM doesn't set a SPTE to a value that could trigger an EPT
Violation #VE on a non-MMIO SPTE, e.g. to help detect bugs even without
KVM_INTEL_PROVE_VE enabled, and to help debug actual #VE failures.

Note, this will run afoul of TDX support, which needs to reflect emulated
MMIO accesses into the guest as #VEs (which was the whole point of adding
EPT Violation #VE support in KVM).  The obvious fix for that is to exempt
MMIO SPTEs, but that's annoyingly difficult now that is_mmio_spte() relies
on a per-VM value.  However, resolving that conundrum is a future problem,
whereas getting KVM_INTEL_PROVE_VE healthy is a current problem.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:27:26 -04:00
Sean Christopherson
9031b42139 KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)
Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
are KVM's responsibility.

Fixes: 8131cf5b4f ("KVM: VMX: Introduce test mode related to EPT violation VE")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:27:26 -04:00
Sean Christopherson
d1b32ecdc8 KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support
Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
enabled and a VE info address pointing at pfn 0.

Fixes: 8131cf5b4f ("KVM: VMX: Introduce test mode related to EPT violation VE")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:27:25 -04:00
Sean Christopherson
40e8a6901a KVM: VMX: Don't kill the VM on an unexpected #VE
Don't terminate the VM on an unexpected #VE, as it's extremely unlikely
the #VE is fatal to the guest, and even less likely that it presents a
danger to the host.  Simply resume the guest on "failure", as the #VE info
page's BUSY field will prevent converting any more EPT Violations to #VEs
for the vCPU (at least, that's what the BUSY field is supposed to do).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240518000430.1118488-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:27:17 -04:00
Isaku Yamahata
803482f472 KVM: x86/mmu: Use SHADOW_NONPRESENT_VALUE for atomic zap in TDP MMU
Use SHADOW_NONPRESENT_VALUE when zapping TDP MMU SPTEs with mmu_lock held
for read, tdp_mmu_zap_spte_atomic() was simply missed during the initial
development.

Fixes: 7f01cab849 ("KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[sean: write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240518000430.1118488-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23 12:24:39 -04:00
Daniel Vetter
36f53d622a Merge tag 'drm-misc-fixes-2024-05-16' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Short summary of fixes pull:

nouveau:
- use tile_mode and pte_kind for VM_BIND bo allocations

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516072658.GA8395@linux.fritz.box
2024-05-23 17:07:13 +02:00
Mark Brown
3aac9f4885 soi: Don't call DMA sync API when not needed
Merge series from Andy Shevchenko <andriy.shevchenko@linux.intel.com>:

A couple of fixes to avoid calling DMA sync API when it's not needed.
This doesn't stop from discussing if IOMMU code is doing the right thing,
i.e. dereferences SG list when orig_nents == 0, but this is a separate
story.
2024-05-23 15:16:57 +01:00
Fedor Pchelkin
e64746e74f dma-mapping: benchmark: handle NUMA_NO_NODE correctly
cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark()
resulting in the following sanitizer report:

UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
index -1 is out of range for type 'cpumask [64][1]'
CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
ubsan_epilogue (lib/ubsan.c:232)
__ubsan_handle_out_of_bounds (lib/ubsan.c:429)
cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline]
do_map_benchmark (kernel/dma/map_benchmark.c:104)
map_benchmark_ioctl (kernel/dma/map_benchmark.c:246)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Use cpumask_of_node() in place when binding a kernel thread to a cpuset
of a particular node.

Note that the provided node id is checked inside map_benchmark_ioctl().
It's just a NUMA_NO_NODE case which is not handled properly later.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789daa80 ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Barry Song <baohua@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23 15:06:48 +02:00
Fedor Pchelkin
1ff05e723f dma-mapping: benchmark: fix node id validation
While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789daa80 ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23 15:06:47 +02:00
Fedor Pchelkin
f7c9ccaadf dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails
If do_map_benchmark() has failed, there is nothing useful to copy back
to userspace.

Suggested-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23 15:06:47 +02:00
Fedor Pchelkin
bb9025f443 dma-mapping: benchmark: fix up kthread-related error handling
kthread creation failure is invalidly handled inside do_map_benchmark().
The put_task_struct() calls on the error path are supposed to balance the
get_task_struct() calls which only happen after all the kthreads are
successfully created. Rollback using kthread_stop() for already created
kthreads in case of such failure.

In normal situation call kthread_stop_put() to gracefully stop kthreads
and put their task refcounts. This should be done for all started
kthreads.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789daa80 ("dma-mapping: add benchmark support for streaming DMA APIs")
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23 15:06:47 +02:00
Baochen Qiang
6e16782d6b wifi: ath11k: move power type check to ASSOC stage when connecting to 6 GHz AP
With commit bc8a0fac86 ("wifi: mac80211: don't set bss_conf in parsing")
ath11k fails to connect to 6 GHz AP.

This is because currently ath11k checks AP's power type in
ath11k_mac_op_assign_vif_chanctx() which would be called in AUTH stage.
However with above commit power type is not available until ASSOC stage.
As a result power type check fails and therefore connection fails.

Fix this by moving power type check to ASSOC stage, also move regulatory
rules update there because it depends on power type.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30

Fixes: bc8a0fac86 ("wifi: mac80211: don't set bss_conf in parsing")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://msgid.link/20240424064019.4847-1-quic_bqiang@quicinc.com
2024-05-23 15:45:52 +03:00
Carl Huang
ed281c6ab6 wifi: ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
WCN6750 firmware crashes because of num_vdevs changed from 4 to 17
in ath11k_init_wmi_config_qca6390() as the ab->hw_params.num_vdevs
is 17. This is caused by commit f019f4dff2 ("wifi: ath11k: support
2 station interfaces") which assigns ab->hw_params.num_vdevs directly
to config->num_vdevs in ath11k_init_wmi_config_qca6390(), therefore
WCN6750 firmware crashes as it can't support such a big num_vdevs.

Fix it by assign 3 to num_vdevs in hw_params for WCN6750 as 3 is
sufficient too.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01371-QCAMSLSWPLZ-1

Fixes: f019f4dff2 ("wifi: ath11k: support 2 station interfaces")
Reported-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Closes: https://lore.kernel.org/r/D15TIIDIIESY.D1EKKJLZINMA@fairphone.com/
Signed-off-by: Carl Huang <quic_cjhuang@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://msgid.link/20240520030757.2209395-1-quic_cjhuang@quicinc.com
2024-05-23 15:43:40 +03:00
Chen Ni
0a3f9f7fc5 HID: nvidia-shield: Add missing check for input_ff_create_memless
Add check for the return value of input_ff_create_memless() and return
the error if it fails in order to catch the error.

Fixes: 09308562d4 ("HID: nvidia-shield: Initial driver implementation with Thunderstrike support")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-05-23 14:17:22 +02:00
Zhang Lixu
2360497238 HID: intel-ish-hid: Fix build error for COMPILE_TEST
kernel test robot reported build error due to a pointer type mismatch:

  .../ishtp/loader.c:172:8: error: incompatible pointer types passing
  '__le64 *' (aka 'unsigned long long *') to parameter of type
  'dma_addr_t *' (aka 'unsigned int *')

The issue arises because the driver, which is primarily intended for
x86-64, is also built for i386 when COMPILE_TEST is enabled.

Resolve type mismatch by using a temporary dma_addr_t variable to hold
the DMA address. Populate this temporary variable in dma_alloc_coherent()
function, and then convert and store the address in the
fragment->fragment_tbl[i].ddr_adrs field in the correct format.
Similarly, convert the ddr_adrs field back to dma_addr_t when freeing
the DMA buffer with dma_free_coherent().

Fixes: 579a267e46 ("HID: intel-ish-hid: Implement loading firmware from host feature")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405201313.SAStVPrT-lkp@intel.com/
Signed-off-by: Zhang Lixu <lixu.zhang@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-05-23 14:12:48 +02:00
Heiner Kallweit
e61bcf42d2 i2c: Remove I2C_CLASS_SPD
Remove this class after all users have been gone.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-05-23 13:38:15 +02:00
Christophe JAILLET
e6722ea6b9 i2c: synquacer: Remove a clk reference from struct synquacer_i2c
'pclk' is only used locally in the probe. Remove it from the
'synquacer_i2c' structure.

Also remove a useless debug message.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-05-23 13:38:15 +02:00
Uwe Kleine-König
a827ad9b3c spi: stm32: Revert change that enabled controller before asserting CS
On stm32mp157 enabling the controller before asserting CS makes the
hardware trigger spurious interrupts in a tight loop and the transfers
fail. Revert the commit that swapped the order of enable and CS. This
reintroduces the problem that swapping was supposed to fix, which
however is less grave.

Reported-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Link: https://lore.kernel.org/all/39033ed7-3e57-4339-80b4-fc8919e26aa7@pengutronix.de/
Fixes: 52b62e7a5d ("spi: stm32: enable controller before asserting CS")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://msgid.link/r/20240523103326.792907-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-23 12:35:09 +01:00
Andy Shevchenko
da560097c0 spi: Check if transfer is mapped before calling DMA sync APIs
The resent update to remove the orig_nents checks revealed
that not all DMA sync backends can cope with the unallocated
SG list, while supplying orig_nents == 0 (the commit 861370f49c
("iommu/dma: force bouncing if the size is not cacheline-aligned"),
for example, makes that happen for the IOMMU case). It means
we have to check if the buffers are DMA mapped before trying
to sync them. Re-introduce that check in a form of calling
->can_dma() in the same way as it's done in the DMA mapping loop
for the SPI transfers.

Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
Closes: https://lore.kernel.org/r/8ae675b5-fcf9-4c9b-b06a-4462f70e1322@linaro.org
Closes: https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano
Fixes: 8cc3bad9d9 ("spi: Remove unneded check for orig_nents")
Suggested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://msgid.link/r/20240522171018.3362521-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-23 12:32:54 +01:00
Andy Shevchenko
9f788ba457 spi: Don't mark message DMA mapped when no transfer in it is
There is no need to set the DMA mapped flag of the message if it has
no mapped transfers. Moreover, it may give the code a chance to take
the wrong paths, i.e. to exercise DMA related APIs on unmapped data.
Make __spi_map_msg() to bail earlier on the above mentioned cases.

Fixes: 99adef310f ("spi: Provide core support for DMA mapping transfers")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://msgid.link/r/20240522171018.3362521-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-23 12:32:52 +01:00
Dominique Martinet
c898afdc15 9p: add missing locking around taking dentry fid list
Fix a use-after-free on dentry's d_fsdata fid list when a thread
looks up a fid through dentry while another thread unlinks it:

UAF thread:
refcount_t: addition on 0; use-after-free.
 p9_fid_get linux/./include/net/9p/client.h:262
 v9fs_fid_find+0x236/0x280 linux/fs/9p/fid.c:129
 v9fs_fid_lookup_with_uid linux/fs/9p/fid.c:181
 v9fs_fid_lookup+0xbf/0xc20 linux/fs/9p/fid.c:314
 v9fs_vfs_getattr_dotl+0xf9/0x360 linux/fs/9p/vfs_inode_dotl.c:400
 vfs_statx+0xdd/0x4d0 linux/fs/stat.c:248

Freed by:
 p9_fid_destroy (inlined)
 p9_client_clunk+0xb0/0xe0 linux/net/9p/client.c:1456
 p9_fid_put linux/./include/net/9p/client.h:278
 v9fs_dentry_release+0xb5/0x140 linux/fs/9p/vfs_dentry.c:55
 v9fs_remove+0x38f/0x620 linux/fs/9p/vfs_inode.c:518
 vfs_unlink+0x29a/0x810 linux/fs/namei.c:4335

The problem is that d_fsdata was not accessed under d_lock, because
d_release() normally is only called once the dentry is otherwise no
longer accessible but since we also call it explicitly in v9fs_remove
that lock is required:
move the hlist out of the dentry under lock then unref its fids once
they are no longer accessible.

Fixes: 154372e67d ("fs/9p: fix create-unlink-getattr idiom")
Cc: stable@vger.kernel.org
Reported-by: Meysam Firouzi
Reported-by: Amirmohammad Eftekhar
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20240521122947.1080227-1-asmadeus@codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2024-05-23 20:29:09 +09:00
Guilherme G. Piccoli
7c23b186ab efi: pstore: Return proper errors on UEFI failures
Right now efi-pstore either returns 0 (success) or -EIO; but we
do have a function to convert UEFI errors in different standard
error codes, helping to narrow down potential issues more accurately.

So, let's use this helper here.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-05-23 09:09:39 +02:00
Nathan Chancellor
5134acb15d efi/libstub: zboot.lds: Discard .discard sections
When building ARCH=loongarch defconfig + CONFIG_UNWINDER_ORC=y using
LLVM, there is a warning from ld.lld when linking the EFI zboot image
due to the use of unreachable() in number() in vsprintf.c:

  ld.lld: warning: drivers/firmware/efi/libstub/lib.a(vsprintf.stub.o):(.discard.unreachable+0x0): has non-ABS relocation R_LARCH_32_PCREL against symbol ''

If the compiler cannot eliminate the default case for any reason, the
.discard.unreachable section will remain in the final binary but the
entire point of any section prefixed with .discard is that it is only
used at compile time, so it can be discarded via /DISCARD/ in a linker
script. The asm-generic vmlinux.lds.h includes .discard and .discard.*
in the COMMON_DISCARDS macro but that is not used for zboot.lds, as it
is not a kernel image linker script.

Add .discard and .discard.* to /DISCARD/ in zboot.lds, so that any
sections meant to be discarded at link time are not included in the
final zboot image. This issue is not specific to LoongArch, it is just
the first architecture to select CONFIG_OBJTOOL, which defines
annotate_unreachable() as an asm statement to add the
.discard.unreachable section, and use the EFI stub.

Closes: https://github.com/ClangBuiltLinux/linux/issues/2023
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-05-23 09:02:39 +02:00
Sagi Grimberg
134d0b3f24 nfs: propagate readlink errors in nfs_symlink_filler
There is an inherent race where a symlink file may have been overriden
(by a different client) between lookup and readlink, resulting in a
spurious EIO error returned to userspace. Fix this by propagating back
ESTALE errors such that the vfs will retry the lookup/get_link (similar
to nfs4_file_open) at least once.

Cc: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-22 19:25:00 -04:00
Trond Myklebust
6cbe14f42b MAINTAINERS: Change email address for Trond Myklebust
Ensure that patches are sent to an email server that won't corrupt them.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-22 11:27:32 -04:00
Dmitry Mastykin
aad11473f8 NFSv4: Fix memory leak in nfs4_set_security_label
We leak nfs_fattr and nfs4_label every time we set a security xattr.

Signed-off-by: Dmitry Mastykin <mastichi@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-22 11:27:04 -04:00
Xu Kuohai
8d00547ea8 MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT
I am working on ARM64 BPF JIT for a while, hence add myself
as reviewer.

Signed-off-by: Xu Kuohai <xukuohai@huaweicloud.com>
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/r/20240516020928.156125-1-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-21 11:00:30 -07:00
Nilay Shroff
d3a043733f nvme-multipath: find NUMA path only for online numa-node
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-21 06:43:08 -07:00
Wachowski, Karol
39bc27bd68 drm/shmem-helper: Fix BUG_ON() on mmap(PROT_WRITE, MAP_PRIVATE)
Lack of check for copy-on-write (COW) mapping in drm_gem_shmem_mmap
allows users to call mmap with PROT_WRITE and MAP_PRIVATE flag
causing a kernel panic due to BUG_ON in vmf_insert_pfn_prot:
BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));

Return -EINVAL early if COW mapping is detected.

This bug affects all drm drivers using default shmem helpers.
It can be reproduced by this simple example:
void *ptr = mmap(0, size, PROT_WRITE, MAP_PRIVATE, fd, mmap_offset);
ptr[0] = 0;

Fixes: 2194a63a81 ("drm: Add library for shmem backed GEM objects")
Cc: Noralf Trønnes <noralf@tronnes.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20240520100514.925681-1-jacek.lawrynowicz@linux.intel.com
2024-05-21 14:38:51 +02:00
Nikita Zhandarovich
25460d6f39 net/9p: fix uninit-value in p9_client_rpc()
Syzbot with the help of KMSAN reported the following error:

BUG: KMSAN: uninit-value in trace_9p_client_res include/trace/events/9p.h:146 [inline]
BUG: KMSAN: uninit-value in p9_client_rpc+0x1314/0x1340 net/9p/client.c:754
 trace_9p_client_res include/trace/events/9p.h:146 [inline]
 p9_client_rpc+0x1314/0x1340 net/9p/client.c:754
 p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031
 v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410
 v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122
 legacy_get_tree+0x114/0x290 fs/fs_context.c:662
 vfs_get_tree+0xa7/0x570 fs/super.c:1797
 do_new_mount+0x71f/0x15e0 fs/namespace.c:3352
 path_mount+0x742/0x1f20 fs/namespace.c:3679
 do_mount fs/namespace.c:3692 [inline]
 __do_sys_mount fs/namespace.c:3898 [inline]
 __se_sys_mount+0x725/0x810 fs/namespace.c:3875
 __x64_sys_mount+0xe4/0x150 fs/namespace.c:3875
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

Uninit was created at:
 __alloc_pages+0x9d6/0xe70 mm/page_alloc.c:4598
 __alloc_pages_node include/linux/gfp.h:238 [inline]
 alloc_pages_node include/linux/gfp.h:261 [inline]
 alloc_slab_page mm/slub.c:2175 [inline]
 allocate_slab mm/slub.c:2338 [inline]
 new_slab+0x2de/0x1400 mm/slub.c:2391
 ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
 __slab_alloc mm/slub.c:3610 [inline]
 __slab_alloc_node mm/slub.c:3663 [inline]
 slab_alloc_node mm/slub.c:3835 [inline]
 kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
 p9_tag_alloc net/9p/client.c:278 [inline]
 p9_client_prepare_req+0x20a/0x1770 net/9p/client.c:641
 p9_client_rpc+0x27e/0x1340 net/9p/client.c:688
 p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031
 v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410
 v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122
 legacy_get_tree+0x114/0x290 fs/fs_context.c:662
 vfs_get_tree+0xa7/0x570 fs/super.c:1797
 do_new_mount+0x71f/0x15e0 fs/namespace.c:3352
 path_mount+0x742/0x1f20 fs/namespace.c:3679
 do_mount fs/namespace.c:3692 [inline]
 __do_sys_mount fs/namespace.c:3898 [inline]
 __se_sys_mount+0x725/0x810 fs/namespace.c:3875
 __x64_sys_mount+0xe4/0x150 fs/namespace.c:3875
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

If p9_check_errors() fails early in p9_client_rpc(), req->rc.tag
will not be properly initialized. However, trace_9p_client_res()
ends up trying to print it out anyway before p9_client_rpc()
finishes.

Fix this issue by assigning default values to p9_fcall fields
such as 'tag' and (just in case KMSAN unearths something new) 'id'
during the tag allocation stage.

Reported-and-tested-by: syzbot+ff14db38f56329ef68df@syzkaller.appspotmail.com
Fixes: 348b59012e ("net/9p: Convert net/9p protocol dumps to tracepoints")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Cc: stable@vger.kernel.org
Message-ID: <20240408141039.30428-1-n.zhandarovich@fintech.ru>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2024-05-21 21:27:28 +09:00
Shengjiu Wang
e8dc41afca pmdomain: imx: gpcv2: Add delay after power up handshake
AudioMix BLK-CTRL on i.MX8MP encountered an accessing register issue
after power up.

[    2.181035] Kernel panic - not syncing: Asynchronous SError Interrupt
[    2.181038] CPU: 1 PID: 48 Comm: kworker/u16:2 Not tainted 6.9.0-rc5-next-20240424-00003-g21cec88845c6 #171
[    2.181047] Hardware name: NXP i.MX8MPlus EVK board (DT)
[    2.181050] Workqueue: events_unbound deferred_probe_work_func
[    2.181064] Call trace:
[...]
[    2.181142]  arm64_serror_panic+0x6c/0x78
[    2.181149]  do_serror+0x3c/0x70
[    2.181157]  el1h_64_error_handler+0x30/0x48
[    2.181164]  el1h_64_error+0x64/0x68
[    2.181171]  clk_imx8mp_audiomix_runtime_resume+0x34/0x44
[    2.181183]  __genpd_runtime_resume+0x30/0x80
[    2.181195]  genpd_runtime_resume+0x110/0x244
[    2.181205]  __rpm_callback+0x48/0x1d8
[    2.181213]  rpm_callback+0x68/0x74
[    2.181224]  rpm_resume+0x468/0x6c0
[    2.181234]  __pm_runtime_resume+0x50/0x94
[    2.181243]  pm_runtime_get_suppliers+0x60/0x8c
[    2.181258]  __driver_probe_device+0x48/0x12c
[    2.181268]  driver_probe_device+0xd8/0x15c
[    2.181278]  __device_attach_driver+0xb8/0x134
[    2.181290]  bus_for_each_drv+0x84/0xe0
[    2.181302]  __device_attach+0x9c/0x188
[    2.181312]  device_initial_probe+0x14/0x20
[    2.181323]  bus_probe_device+0xac/0xb0
[    2.181334]  deferred_probe_work_func+0x88/0xc0
[    2.181344]  process_one_work+0x150/0x290
[    2.181357]  worker_thread+0x2f8/0x408
[    2.181370]  kthread+0x110/0x114
[    2.181381]  ret_from_fork+0x10/0x20
[    2.181391] SMP: stopping secondary CPUs

According to comments in power up handshake:

	/* request the ADB400 to power up */
	if (domain->bits.hskreq) {
		regmap_update_bits(domain->regmap, domain->regs->hsk,
				   domain->bits.hskreq, domain->bits.hskreq);

		/*
		 * ret = regmap_read_poll_timeout(domain->regmap, domain->regs->hsk, reg_val,
		 *				  (reg_val & domain->bits.hskack), 0,
		 *				  USEC_PER_MSEC);
		 * Technically we need the commented code to wait handshake. But that needs
		 * the BLK-CTL module BUS clk-en bit being set.
		 *
		 * There is a separate BLK-CTL module and we will have such a driver for it,
		 * that driver will set the BUS clk-en bit and handshake will be triggered
		 * automatically there. Just add a delay and suppose the handshake finish
		 * after that.
		 */
	}

The BLK-CTL module needs to add delay to wait for a handshake request finished.
For some BLK-CTL module (eg. AudioMix on i.MX8MP) doesn't have BUS clk-en
bit, it is better to add delay in this driver, as the BLK-CTL module doesn't
need to care about how it is powered up.

regmap_read_bypassed() is to make sure the above write IO transaction already
reaches target before udelay().

Fixes: 1496dd413b ("clk: imx: imx8mp: Add pm_runtime support for power saving")
Reported-by: Francesco Dolcini <francesco@dolcini.it>
Closes: https://lore.kernel.org/all/66293535.170a0220.21fe.a2e7@mx.google.com/
Suggested-by: Frank Li <frank.li@nxp.com>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Tested-by: Adam Ford <aford173@gmail.com>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/1715396125-3724-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-05-21 12:21:01 +02:00
Pablo Neira Ayuso
aff5c01fa1 netfilter: nft_payload: restore vlan q-in-q match support
Revert f6ae9f120d ("netfilter: nft_payload: add C-VLAN support").

f41f72d09e ("netfilter: nft_payload: simplify vlan header handling")
already allows to match on inner vlan tags by subtract the vlan header
size to the payload offset which has been popped and stored in skbuff
metadata fields.

Fixes: f6ae9f120d ("netfilter: nft_payload: add C-VLAN support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-20 20:38:42 +02:00
Alexander Maltsev
c1193d9bbb netfilter: ipset: Add list flush to cancel_gc
Flushing list in cancel_gc drops references to other lists right away,
without waiting for RCU to destroy list. Fixes race when referenced
ipsets can't be destroyed while referring list is scheduled for destroy.

Fixes: 97f7cf1cd8 ("netfilter: ipset: fix performance regression in swap operation")
Signed-off-by: Alexander Maltsev <keltar.gw@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-20 20:38:42 +02:00
Dmitry Baryshkov
21ae74e1bf wifi: ath10k: fix QCOM_RPROC_COMMON dependency
If ath10k_snoc is built-in, while Qualcomm remoteprocs are built as
modules, compilation fails with:

/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_init':
drivers/net/wireless/ath/ath10k/snoc.c:1534: undefined reference to `qcom_register_ssr_notifier'
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_deinit':
drivers/net/wireless/ath/ath10k/snoc.c:1551: undefined reference to `qcom_unregister_ssr_notifier'

Add corresponding dependency to ATH10K_SNOC Kconfig entry so that it's
built as module if QCOM_RPROC_COMMON is built as module too.

Fixes: 747ff7d3d7 ("ath10k: Don't always treat modem stop events as crashes")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://msgid.link/20240511-ath10k-snoc-dep-v1-1-9666e3af5c27@linaro.org
2024-05-20 14:48:34 +03:00
Eric Dumazet
dc21c6cc3d netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()
syzbot reported that nf_reinject() could be called without rcu_read_lock() :

WARNING: suspicious RCU usage
6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0 Not tainted

net/netfilter/nfnetlink_queue.c:263 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz-executor.4/13427:
  #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
  #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2190 [inline]
  #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_core+0xa86/0x1830 kernel/rcu/tree.c:2471
  #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
  #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: nfqnl_flush net/netfilter/nfnetlink_queue.c:405 [inline]
  #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: instance_destroy_rcu+0x30/0x220 net/netfilter/nfnetlink_queue.c:172

stack backtrace:
CPU: 0 PID: 13427 Comm: syz-executor.4 Not tainted 6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Call Trace:
 <IRQ>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
  lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
  nf_reinject net/netfilter/nfnetlink_queue.c:323 [inline]
  nfqnl_reinject+0x6ec/0x1120 net/netfilter/nfnetlink_queue.c:397
  nfqnl_flush net/netfilter/nfnetlink_queue.c:410 [inline]
  instance_destroy_rcu+0x1ae/0x220 net/netfilter/nfnetlink_queue.c:172
  rcu_do_batch kernel/rcu/tree.c:2196 [inline]
  rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2471
  handle_softirqs+0x2d6/0x990 kernel/softirq.c:554
  __do_softirq kernel/softirq.c:588 [inline]
  invoke_softirq kernel/softirq.c:428 [inline]
  __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
 </IRQ>
 <TASK>

Fixes: 9872bec773 ("[NETFILTER]: nfnetlink: use RCU for queue instances hash")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-20 11:38:42 +02:00
Matthew Auld
520fb7f183 drm/tests/buddy: stop using PAGE_SIZE
Gives the wrong impression that min page-size has to be tied to the CPU
PAGE_SIZE.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229105112.250077-4-matthew.auld@intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-05-17 12:57:39 +02:00
Matthew Auld
117bbc0e43 drm/buddy: stop using PAGE_SIZE
The drm_buddy minimum page-size requirements should be distinct from the
CPU PAGE_SIZE. Only restriction is that the minimum page-size is at
least 4K.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229105112.250077-3-matthew.auld@intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-05-17 12:56:42 +02:00
Breno Leitao
637c435f08 wifi: ath11k: Fix error path in ath11k_pcic_ext_irq_config
If one of the dummy allocation fails in ath11k_pcic_ext_irq_config(),
the previous allocated devices might leak due to returning without
deallocating the devices.

Instead of returning on the error path, deallocate all the previously
allocated net_devices and then return.

Fixes: bca592ead8 ("wifi: ath11k: allocate dummy net_device dynamically")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://msgid.link/20240508185902.70975-1-leitao@debian.org
2024-05-17 09:54:32 +03:00
Deming Wang
e4f5f8298c scsi: mpt3sas: Add missing kerneldoc parameter descriptions
Add missing kerneldoc parameter descriptions to _scsih_set_debug_level().

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Link: https://lore.kernel.org/r/20240513115956.1576-1-wangdeming@inspur.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 10:38:09 -04:00
Saurav Kashyap
6c3bb589de scsi: qedf: Set qed_slowpath_params to zero before use
Zero qed_slowpath_params before use.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20240515091101.18754-4-skashyap@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 10:26:53 -04:00
Saurav Kashyap
78e88472b6 scsi: qedf: Wait for stag work during unload
If stag work is already scheduled and unload is called, it can lead to
issues as unload cleans up the work element. Wait for stag work to get
completed before cleanup during unload.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20240515091101.18754-3-skashyap@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 10:26:53 -04:00
Saurav Kashyap
51071f0831 scsi: qedf: Don't process stag work during unload and recovery
Stag work can cause issues during unload and recovery, hence don't process
it.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20240515091101.18754-2-skashyap@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 10:26:53 -04:00
Justin Stitt
9fad9d560a scsi: sr: Fix unintentional arithmetic wraparound
Running syzkaller with the newly reintroduced signed integer overflow
sanitizer produces this report:

[   65.194362] ------------[ cut here ]------------
[   65.197752] UBSAN: signed-integer-overflow in ../drivers/scsi/sr_ioctl.c:436:9
[   65.203607] -2147483648 * 177 cannot be represented in type 'int'
[   65.207911] CPU: 2 PID: 10416 Comm: syz-executor.1 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1
[   65.213585] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   65.219923] Call Trace:
[   65.221556]  <TASK>
[   65.223029]  dump_stack_lvl+0x93/0xd0
[   65.225573]  handle_overflow+0x171/0x1b0
[   65.228219]  sr_select_speed+0xeb/0xf0
[   65.230786]  ? __pm_runtime_resume+0xe6/0x130
[   65.233606]  sr_block_ioctl+0x15d/0x1d0
...

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the kernel
with Commit 557f8c582a ("ubsan: Reintroduce signed overflow sanitizer").

Firstly, let's change the type of "speed" to unsigned long as
sr_select_speed()'s only caller passes in an unsigned long anyways.

$ git grep '\.select_speed'
|	drivers/scsi/sr.c:      .select_speed           = sr_select_speed,
...
|	static int cdrom_ioctl_select_speed(struct cdrom_device_info *cdi,
|	                unsigned long arg)
|	{
|	        ...
|	        return cdi->ops->select_speed(cdi, arg);
|	}

Next, let's add an extra check to make sure we don't exceed 0xffff/177
(350) since 0xffff is the max speed. This has two benefits: 1) we deal
with integer overflow before it happens and 2) we properly respect the
max speed of 0xffff. There are some "magic" numbers here but I did not
want to change more than what was necessary.

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/357
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240508-b4-b4-sio-sr_select_speed-v2-1-00b68f724290@google.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 10:05:24 -04:00
Martin Wilck
10157b1fc1 scsi: core: alua: I/O errors for ALUA state transitions
When a host is configured with a few LUNs and I/O is running, injecting FC
faults repeatedly leads to path recovery problems.  The LUNs have 4 paths
each and 3 of them come back active after say an FC fault which makes 2 of
the paths go down, instead of all 4. This happens after several iterations
of continuous FC faults.

Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC
ACCESS STATE TRANSITION) instead of retrying.

[mwilck: The original patch was developed by Rajashekhar M A and Hannes
Reinecke. I moved the code to alua_check_sense() as suggested by Mike
Christie [1]. Evan Milne had raised the question whether pg->state should
be set to transitioning in the UA case [2]. I believe that doing this is
correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
errors. Our handler schedules an RTPG, which will only result in an I/O
error condition if the transitioning timeout expires.]

[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/

Co-developed-by: Rajashekhar M A <rajs@netapp.com>
Co-developed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 09:46:13 -04:00
Nathan Chancellor
9f365cb8bb scsi: mpi3mr: Use proper format specifier in mpi3mr_sas_port_add()
When building for a 32-bit platform such as ARM or i386, for which size_t
is unsigned int, there is a warning due to using an unsigned long format
specifier:

  drivers/scsi/mpi3mr/mpi3mr_transport.c:1370:11: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat]
   1369 |                         ioc_warn(mrioc, "skipping port %u, max allowed value is %lu\n",
        |                                                                                 ~~~
        |                                                                                 %u
   1370 |                             i, sizeof(mr_sas_port->phy_mask) * 8);
        |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use the proper format specifier for size_t, %zu, to resolve the warning for
all platforms.

Fixes: 3668651def ("scsi: mpi3mr: Sanitise num_phys")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240514-mpi3mr-fix-wformat-v1-1-f1ad49217e5e@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-15 09:41:39 -04:00
Mohamed Ahmed
aed9a1a4f7 drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
Allow PTE kind and tile mode on BO create with VM_BIND, and add a
GETPARAM to indicate this change. This is needed to support modifiers in
NVK and ensure correctness when dealing with the nouveau GL driver.

The userspace modifiers implementation this is for can be found here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795

Fixes: b88baab828 ("drm/nouveau: implement new VM_BIND uAPI")
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240509204352.7597-1-mohamedahmedegypt2001@gmail.com
2024-05-13 22:17:58 +02:00
863 changed files with 9341 additions and 5218 deletions

View File

@@ -5,7 +5,6 @@ root = true
[{*.{awk,c,dts,dtsi,dtso,h,mk,s,S},Kconfig,Makefile,Makefile.*}]
charset = utf-8
end_of_line = lf
trim_trailing_whitespace = true
insert_final_newline = true
indent_style = tab
indent_size = 8
@@ -13,7 +12,6 @@ indent_size = 8
[*.{json,py,rs}]
charset = utf-8
end_of_line = lf
trim_trailing_whitespace = true
insert_final_newline = true
indent_style = space
indent_size = 4
@@ -26,7 +24,6 @@ indent_size = 8
[*.yaml]
charset = utf-8
end_of_line = lf
trim_trailing_whitespace = unset
insert_final_newline = true
indent_style = space
indent_size = 2

View File

@@ -72,6 +72,8 @@ Andrey Ryabinin <ryabinin.a.a@gmail.com> <aryabinin@virtuozzo.com>
Andrzej Hajda <andrzej.hajda@intel.com> <a.hajda@samsung.com>
André Almeida <andrealmeid@igalia.com> <andrealmeid@collabora.com>
Andy Adamson <andros@citi.umich.edu>
Andy Shevchenko <andy@kernel.org> <andy@smile.org.ua>
Andy Shevchenko <andy@kernel.org> <ext-andriy.shevchenko@nokia.com>
Anilkumar Kolli <quic_akolli@quicinc.com> <akolli@codeaurora.org>
Anirudh Ghayal <quic_aghayal@quicinc.com> <aghayal@codeaurora.org>
Antoine Tenart <atenart@kernel.org> <antoine.tenart@bootlin.com>
@@ -217,6 +219,7 @@ Geliang Tang <geliang@kernel.org> <geliang.tang@suse.com>
Geliang Tang <geliang@kernel.org> <geliangtang@xiaomi.com>
Geliang Tang <geliang@kernel.org> <geliangtang@gmail.com>
Geliang Tang <geliang@kernel.org> <geliangtang@163.com>
Geliang Tang <geliang@kernel.org> <tanggeliang@kylinos.cn>
Georgi Djakov <djakov@kernel.org> <georgi.djakov@linaro.org>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <geraldsc@de.ibm.com>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <gerald.schaefer@de.ibm.com>
@@ -337,10 +340,11 @@ Kalyan Thota <quic_kalyant@quicinc.com> <kalyan_t@codeaurora.org>
Karthikeyan Periyasamy <quic_periyasa@quicinc.com> <periyasa@codeaurora.org>
Kathiravan T <quic_kathirav@quicinc.com> <kathirav@codeaurora.org>
Kay Sievers <kay.sievers@vrfy.org>
Kees Cook <keescook@chromium.org> <kees.cook@canonical.com>
Kees Cook <keescook@chromium.org> <keescook@google.com>
Kees Cook <keescook@chromium.org> <kees@outflux.net>
Kees Cook <keescook@chromium.org> <kees@ubuntu.com>
Kees Cook <kees@kernel.org> <kees.cook@canonical.com>
Kees Cook <kees@kernel.org> <keescook@chromium.org>
Kees Cook <kees@kernel.org> <keescook@google.com>
Kees Cook <kees@kernel.org> <kees@outflux.net>
Kees Cook <kees@kernel.org> <kees@ubuntu.com>
Keith Busch <kbusch@kernel.org> <keith.busch@intel.com>
Keith Busch <kbusch@kernel.org> <keith.busch@linux.intel.com>
Kenneth W Chen <kenneth.w.chen@intel.com>

View File

@@ -9,8 +9,8 @@ TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel.
LiveCD-based tutorials are available at
http://tomoyo.sourceforge.jp/1.8/ubuntu12.04-live.html
http://tomoyo.sourceforge.jp/1.8/centos6-live.html
https://tomoyo.sourceforge.net/1.8/ubuntu12.04-live.html
https://tomoyo.sourceforge.net/1.8/centos6-live.html
Though these tutorials use non-LSM version of TOMOYO, they are useful for you
to know what TOMOYO is.
@@ -21,45 +21,32 @@ How to enable TOMOYO?
Build the kernel with ``CONFIG_SECURITY_TOMOYO=y`` and pass ``security=tomoyo`` on
kernel's command line.
Please see http://tomoyo.osdn.jp/2.5/ for details.
Please see https://tomoyo.sourceforge.net/2.6/ for details.
Where is documentation?
=======================
User <-> Kernel interface documentation is available at
https://tomoyo.osdn.jp/2.5/policy-specification/index.html .
https://tomoyo.sourceforge.net/2.6/policy-specification/index.html .
Materials we prepared for seminars and symposiums are available at
https://osdn.jp/projects/tomoyo/docs/?category_id=532&language_id=1 .
https://sourceforge.net/projects/tomoyo/files/docs/ .
Below lists are chosen from three aspects.
What is TOMOYO?
TOMOYO Linux Overview
https://osdn.jp/projects/tomoyo/docs/lca2009-takeda.pdf
https://sourceforge.net/projects/tomoyo/files/docs/lca2009-takeda.pdf
TOMOYO Linux: pragmatic and manageable security for Linux
https://osdn.jp/projects/tomoyo/docs/freedomhectaipei-tomoyo.pdf
https://sourceforge.net/projects/tomoyo/files/docs/freedomhectaipei-tomoyo.pdf
TOMOYO Linux: A Practical Method to Understand and Protect Your Own Linux Box
https://osdn.jp/projects/tomoyo/docs/PacSec2007-en-no-demo.pdf
https://sourceforge.net/projects/tomoyo/files/docs/PacSec2007-en-no-demo.pdf
What can TOMOYO do?
Deep inside TOMOYO Linux
https://osdn.jp/projects/tomoyo/docs/lca2009-kumaneko.pdf
https://sourceforge.net/projects/tomoyo/files/docs/lca2009-kumaneko.pdf
The role of "pathname based access control" in security.
https://osdn.jp/projects/tomoyo/docs/lfj2008-bof.pdf
https://sourceforge.net/projects/tomoyo/files/docs/lfj2008-bof.pdf
History of TOMOYO?
Realities of Mainlining
https://osdn.jp/projects/tomoyo/docs/lfj2008.pdf
What is future plan?
====================
We believe that inode based security and name based security are complementary
and both should be used together. But unfortunately, so far, we cannot enable
multiple LSM modules at the same time. We feel sorry that you have to give up
SELinux/SMACK/AppArmor etc. when you want to use TOMOYO.
We hope that LSM becomes stackable in future. Meanwhile, you can use non-LSM
version of TOMOYO, available at http://tomoyo.osdn.jp/1.8/ .
LSM version of TOMOYO is a subset of non-LSM version of TOMOYO. We are planning
to port non-LSM version's functionalities to LSM versions.
https://sourceforge.net/projects/tomoyo/files/docs/lfj2008.pdf

View File

@@ -1921,6 +1921,28 @@
Format:
<bus_id>,<clkrate>
i2c_touchscreen_props= [HW,ACPI,X86]
Set device-properties for ACPI-enumerated I2C-attached
touchscreen, to e.g. fix coordinates of upside-down
mounted touchscreens. If you need this option please
submit a drivers/platform/x86/touchscreen_dmi.c patch
adding a DMI quirk for this.
Format:
<ACPI_HW_ID>:<prop_name>=<val>[:prop_name=val][:...]
Where <val> is one of:
Omit "=<val>" entirely Set a boolean device-property
Unsigned number Set a u32 device-property
Anything else Set a string device-property
Examples (split over multiple lines):
i2c_touchscreen_props=GDIX1001:touchscreen-inverted-x:
touchscreen-inverted-y
i2c_touchscreen_props=MSSL1680:touchscreen-size-x=1920:
touchscreen-size-y=1080:touchscreen-inverted-y:
firmware-name=gsl1680-vendor-model.fw:silead,home-button
i8042.debug [HW] Toggle i8042 debug mode
i8042.unmask_kbd_data
[HW] Enable printing of interrupt data from the KBD port

View File

@@ -467,11 +467,11 @@ anon_fault_fallback_charge
instead falls back to using huge pages with lower orders or
small pages even though the allocation was successful.
anon_swpout
swpout
is incremented every time a huge page is swapped out in one
piece without splitting.
anon_swpout_fallback
swpout_fallback
is incremented if a huge page has to be split before swapout.
Usually because failed to allocate some continuous swap space
for the huge page.

View File

@@ -65,4 +65,6 @@ the extension, or may have deliberately removed it from the listing.
Misaligned accesses
-------------------
Misaligned accesses are supported in userspace, but they may perform poorly.
Misaligned scalar accesses are supported in userspace, but they may perform
poorly. Misaligned vector accesses are only supported if the Zicclsm extension
is supported.

View File

@@ -217,7 +217,7 @@ current *struct* is::
int (*media_changed)(struct cdrom_device_info *, int);
int (*tray_move)(struct cdrom_device_info *, int);
int (*lock_door)(struct cdrom_device_info *, int);
int (*select_speed)(struct cdrom_device_info *, int);
int (*select_speed)(struct cdrom_device_info *, unsigned long);
int (*get_last_session) (struct cdrom_device_info *,
struct cdrom_multisession *);
int (*get_mcn)(struct cdrom_device_info *, struct cdrom_mcn *);
@@ -396,7 +396,7 @@ action need be taken, and the return value should be 0.
::
int select_speed(struct cdrom_device_info *cdi, int speed)
int select_speed(struct cdrom_device_info *cdi, unsigned long speed)
Some CD-ROM drives are capable of changing their head-speed. There
are several reasons for changing the speed of a CD-ROM drive. Badly

View File

@@ -192,7 +192,7 @@ alignment larger than PAGE_SIZE.
Dynamic swiotlb
---------------
When CONFIG_DYNAMIC_SWIOTLB is enabled, swiotlb can do on-demand expansion of
When CONFIG_SWIOTLB_DYNAMIC is enabled, swiotlb can do on-demand expansion of
the amount of memory available for allocation as bounce buffers. If a bounce
buffer request fails due to lack of available space, an asynchronous background
task is kicked off to allocate memory from general system memory and turn it

View File

@@ -54,11 +54,10 @@ unevaluatedProperties: false
examples:
- |
mlahb: ahb@38000000 {
ahb {
compatible = "st,mlahb", "simple-bus";
#address-cells = <1>;
#size-cells = <1>;
reg = <0x10000000 0x40000>;
ranges;
dma-ranges = <0x00000000 0x38000000 0x10000>,
<0x10000000 0x10000000 0x60000>,

View File

@@ -57,17 +57,17 @@ properties:
- const: allwinner,sun8i-v3s
- description: Anbernic RG35XX (2024)
- items:
items:
- const: anbernic,rg35xx-2024
- const: allwinner,sun50i-h700
- description: Anbernic RG35XX Plus
- items:
items:
- const: anbernic,rg35xx-plus
- const: allwinner,sun50i-h700
- description: Anbernic RG35XX H
- items:
items:
- const: anbernic,rg35xx-h
- const: allwinner,sun50i-h700

View File

@@ -139,7 +139,7 @@ allOf:
Voltage output range of the channel as <minimum, maximum>
Required connections:
Rfb1x for: 0 to 2.5 V; 0 to 3V; 0 to 5 V;
Rfb2x for: 0 to 10 V; 2.5 to 7.5V; -5 to 5 V;
Rfb2x for: 0 to 10 V; -2.5 to 7.5V; -5 to 5 V;
oneOf:
- items:
- const: 0

View File

@@ -18,9 +18,12 @@ allOf:
properties:
compatible:
enum:
- elan,ekth6915
- ilitek,ili2901
oneOf:
- items:
- enum:
- elan,ekth5015m
- const: elan,ekth6915
- const: elan,ekth6915
reg:
const: 0x10
@@ -33,6 +36,12 @@ properties:
reset-gpios:
description: Reset GPIO; not all touchscreens using eKTH6915 hook this up.
no-reset-on-power-off:
type: boolean
description:
Reset line is wired so that it can (and should) be left deasserted when
the power supply is off.
vcc33-supply:
description: The 3.3V supply to the touchscreen.
@@ -58,8 +67,8 @@ examples:
#address-cells = <1>;
#size-cells = <0>;
ap_ts: touchscreen@10 {
compatible = "elan,ekth6915";
touchscreen@10 {
compatible = "elan,ekth5015m", "elan,ekth6915";
reg = <0x10>;
interrupt-parent = <&tlmm>;

View File

@@ -0,0 +1,66 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/input/ilitek,ili2901.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Ilitek ILI2901 touchscreen controller
maintainers:
- Jiri Kosina <jkosina@suse.com>
description:
Supports the Ilitek ILI2901 touchscreen controller.
This touchscreen controller uses the i2c-hid protocol with a reset GPIO.
allOf:
- $ref: /schemas/input/touchscreen/touchscreen.yaml#
properties:
compatible:
enum:
- ilitek,ili2901
reg:
maxItems: 1
interrupts:
maxItems: 1
panel: true
reset-gpios:
maxItems: 1
vcc33-supply: true
vccio-supply: true
required:
- compatible
- reg
- interrupts
- vcc33-supply
additionalProperties: false
examples:
- |
#include <dt-bindings/gpio/gpio.h>
#include <dt-bindings/interrupt-controller/irq.h>
i2c {
#address-cells = <1>;
#size-cells = <0>;
touchscreen@41 {
compatible = "ilitek,ili2901";
reg = <0x41>;
interrupt-parent = <&tlmm>;
interrupts = <9 IRQ_TYPE_LEVEL_LOW>;
reset-gpios = <&tlmm 8 GPIO_ACTIVE_LOW>;
vcc33-supply = <&pp3300_ts>;
};
};

View File

@@ -24,6 +24,7 @@ properties:
managers:
type: object
additionalProperties: false
description:
List of the PD69208T4/PD69204T4/PD69208M PSE managers. Each manager
have 4 or 8 physical ports according to the chip version. No need to
@@ -47,8 +48,9 @@ properties:
- "#size-cells"
patternProperties:
"^manager@0[0-9a-b]$":
"^manager@[0-9a-b]$":
type: object
additionalProperties: false
description:
PD69208T4/PD69204T4/PD69208M PSE manager exposing 4 or 8 physical
ports.
@@ -69,9 +71,14 @@ properties:
patternProperties:
'^port@[0-7]$':
type: object
additionalProperties: false
properties:
reg:
maxItems: 1
required:
- reg
additionalProperties: false
required:
- reg

View File

@@ -29,13 +29,31 @@ properties:
of the ports conversion matrix that establishes relationship between
the logical ports and the physical channels.
type: object
additionalProperties: false
properties:
"#address-cells":
const: 1
"#size-cells":
const: 0
patternProperties:
'^channel@[0-7]$':
type: object
additionalProperties: false
properties:
reg:
maxItems: 1
required:
- reg
required:
- "#address-cells"
- "#size-cells"
unevaluatedProperties: false
required:

View File

@@ -65,6 +65,7 @@ patternProperties:
description: The hard wired USB devices
type: object
$ref: /schemas/usb/usb-device.yaml
additionalProperties: true
required:
- peer-hub

View File

@@ -150,6 +150,12 @@ applicable everywhere (see syntax).
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.
If "select" <symbol> is followed by "if" <expr>, <symbol> will be
selected by the logical AND of the value of the current menu symbol
and <expr>. This means, the lower limit can be downgraded due to the
presence of "if" <expr>. This behavior may seem weird, but we rely on
it. (The future of this behavior is undecided.)
- weak reverse dependencies: "imply" <symbol> ["if" <expr>]
This is similar to "select" as it enforces a lower limit on another
@@ -184,7 +190,7 @@ applicable everywhere (see syntax).
ability to hook into a secondary subsystem while allowing the user to
configure that subsystem out without also having to unset these drivers.
Note: If the combination of FOO=y and BAR=m causes a link error,
Note: If the combination of FOO=y and BAZ=m causes a link error,
you can guard the function call with IS_REACHABLE()::
foo_init()
@@ -202,6 +208,10 @@ applicable everywhere (see syntax).
imply BAR
imply BAZ
Note: If "imply" <symbol> is followed by "if" <expr>, the default of <symbol>
will be the logical AND of the value of the current menu symbol and <expr>.
(The future of this behavior is undecided.)
- limiting menu display: "visible if" <expr>
This attribute is only applicable to menu blocks, if the condition is

View File

@@ -349,6 +349,10 @@ attribute-sets:
Number of packets dropped due to transient lack of resources, such as
buffer space, host descriptors etc.
type: uint
-
name: rx-csum-complete
doc: Number of packets that were marked as CHECKSUM_COMPLETE.
type: uint
-
name: rx-csum-unnecessary
doc: Number of packets that were marked as CHECKSUM_UNNECESSARY.

View File

@@ -329,24 +329,23 @@ XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field as you registered the UMEM on that
socket. These two sockets will now share one and the same UMEM.
In this case, it is possible to use the NIC's packet steering
capabilities to steer the packets to the right queue. This is not
possible in the previous example as there is only one queue shared
among sockets, so the NIC cannot do this steering as it can only steer
between queues.
There is no need to supply an XDP program like the one in the previous
case where sockets were bound to the same queue id and
device. Instead, use the NIC's packet steering capabilities to steer
the packets to the right queue. In the previous example, there is only
one queue shared among sockets, so the NIC cannot do this steering. It
can only steer between queues.
In libxdp (or libbpf prior to version 1.0), you need to use the
xsk_socket__create_shared() API as it takes a reference to a FILL ring
and a COMPLETION ring that will be created for you and bound to the
shared UMEM. You can use this function for all the sockets you create,
or you can use it for the second and following ones and use
xsk_socket__create() for the first one. Both methods yield the same
result.
In libbpf, you need to use the xsk_socket__create_shared() API as it
takes a reference to a FILL ring and a COMPLETION ring that will be
created for you and bound to the shared UMEM. You can use this
function for all the sockets you create, or you can use it for the
second and following ones and use xsk_socket__create() for the first
one. Both methods yield the same result.
Note that a UMEM can be shared between sockets on the same queue id
and device, as well as between queues on the same device and between
devices at the same time. It is also possible to redirect to any
socket as long as it is bound to the same umem with XDP_SHARED_UMEM.
devices at the same time.
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
@@ -823,10 +822,6 @@ A: The short answer is no, that is not supported at the moment. The
switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket.
Note that if you are using the XDP_SHARED_UMEM option, it is
possible to switch traffic between any socket bound to the same
umem.
Q: My packets are sometimes corrupted. What is wrong?
A: Care has to be taken not to feed the same buffer in the UMEM into

View File

@@ -227,7 +227,7 @@ preferably including links to previous postings, for example::
The amount of mooing will depend on packet rate so should match
the diurnal cycle quite well.
Signed-of-by: Joe Defarmer <joe@barn.org>
Signed-off-by: Joe Defarmer <joe@barn.org>
---
v3:
- add a note about time-of-day mooing fluctuation to the commit message

View File

@@ -582,7 +582,7 @@ depending on the hardware. In all cases, however, only routes that have the
Devices generating the streams may allow enabling and disabling some of the
routes or have a fixed routing configuration. If the routes can be disabled, not
declaring the routes (or declaring them without
``VIDIOC_SUBDEV_STREAM_FL_ACTIVE`` flag set) in ``VIDIOC_SUBDEV_S_ROUTING`` will
``V4L2_SUBDEV_STREAM_FL_ACTIVE`` flag set) in ``VIDIOC_SUBDEV_S_ROUTING`` will
disable the routes. ``VIDIOC_SUBDEV_S_ROUTING`` will still return such routes
back to the user in the routes array, with the ``V4L2_SUBDEV_STREAM_FL_ACTIVE``
flag unset.

View File

@@ -1107,7 +1107,6 @@ L: linux-pm@vger.kernel.org
S: Supported
F: Documentation/admin-guide/pm/amd-pstate.rst
F: drivers/cpufreq/amd-pstate*
F: include/linux/amd-pstate.h
F: tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py
AMD PTDMA DRIVER
@@ -3854,6 +3853,7 @@ BPF JIT for ARM64
M: Daniel Borkmann <daniel@iogearbox.net>
M: Alexei Starovoitov <ast@kernel.org>
M: Puranjay Mohan <puranjay@kernel.org>
R: Xu Kuohai <xukuohai@huaweicloud.com>
L: bpf@vger.kernel.org
S: Supported
F: arch/arm64/net/
@@ -5187,7 +5187,6 @@ F: Documentation/devicetree/bindings/media/i2c/chrontel,ch7322.yaml
F: drivers/media/cec/i2c/ch7322.c
CIRRUS LOGIC AUDIO CODEC DRIVERS
M: James Schulman <james.schulman@cirrus.com>
M: David Rhodes <david.rhodes@cirrus.com>
M: Richard Fitzgerald <rf@opensource.cirrus.com>
L: alsa-devel@alsa-project.org (moderated for non-subscribers)
@@ -11035,8 +11034,8 @@ F: include/uapi/drm/i915_drm.h
INTEL DRM XE DRIVER (Lunar Lake and newer)
M: Lucas De Marchi <lucas.demarchi@intel.com>
M: Oded Gabbay <ogabbay@kernel.org>
M: Thomas Hellström <thomas.hellstrom@linux.intel.com>
M: Rodrigo Vivi <rodrigo.vivi@intel.com>
L: intel-xe@lists.freedesktop.org
S: Supported
W: https://drm.pages.freedesktop.org/intel-docs/
@@ -15238,7 +15237,6 @@ F: drivers/staging/most/
F: include/linux/most.h
MOTORCOMM PHY DRIVER
M: Peter Geis <pgwipeout@gmail.com>
M: Frank <Frank.Sae@motor-comm.com>
L: netdev@vger.kernel.org
S: Maintained
@@ -15827,7 +15825,7 @@ F: drivers/nfc/virtual_ncidev.c
F: tools/testing/selftests/nci/
NFS, SUNRPC, AND LOCKD CLIENTS
M: Trond Myklebust <trond.myklebust@hammerspace.com>
M: Trond Myklebust <trondmy@kernel.org>
M: Anna Schumaker <anna@kernel.org>
L: linux-nfs@vger.kernel.org
S: Maintained
@@ -21316,7 +21314,7 @@ F: arch/riscv/boot/dts/starfive/
STARFIVE DWMAC GLUE LAYER
M: Emil Renner Berthing <kernel@esmil.dk>
M: Samin Guo <samin.guo@starfivetech.com>
M: Minda Chen <minda.chen@starfivetech.com>
S: Maintained
F: Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
F: drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
@@ -22679,7 +22677,7 @@ L: tomoyo-users-en@lists.osdn.me (subscribers-only, for users in English)
L: tomoyo-dev@lists.osdn.me (subscribers-only, for developers in Japanese)
L: tomoyo-users@lists.osdn.me (subscribers-only, for users in Japanese)
S: Maintained
W: https://tomoyo.osdn.jp/
W: https://tomoyo.sourceforge.net/
F: security/tomoyo/
TOPSTAR LAPTOP EXTRAS DRIVER

View File

@@ -2,7 +2,7 @@
VERSION = 6
PATCHLEVEL = 10
SUBLEVEL = 0
EXTRAVERSION = -rc1
EXTRAVERSION = -rc4
NAME = Baby Opossum Posse
# *DOCUMENTATION*

View File

@@ -39,7 +39,7 @@
/************** Functions that the back-end must provide **************/
/* Extension for 32-bit operations. */
inline u8 zext(u8 *buf, u8 rd);
u8 zext(u8 *buf, u8 rd);
/***** Moves *****/
u8 mov_r32(u8 *buf, u8 rd, u8 rs, u8 sign_ext);
u8 mov_r32_i32(u8 *buf, u8 reg, s32 imm);

View File

@@ -62,7 +62,7 @@ enum {
* If/when we decide to add ARCv2 instructions that do use register pairs,
* the mapping, hopefully, doesn't need to be revisited.
*/
const u8 bpf2arc[][2] = {
static const u8 bpf2arc[][2] = {
/* Return value from in-kernel function, and exit value from eBPF */
[BPF_REG_0] = {ARC_R_8, ARC_R_9},
/* Arguments from eBPF program to in-kernel function */
@@ -1302,7 +1302,7 @@ static u8 arc_b(u8 *buf, s32 offset)
/************* Packers (Deal with BPF_REGs) **************/
inline u8 zext(u8 *buf, u8 rd)
u8 zext(u8 *buf, u8 rd)
{
if (rd != BPF_REG_FP)
return arc_movi_r(buf, REG_HI(rd), 0);
@@ -2235,6 +2235,7 @@ u8 gen_swap(u8 *buf, u8 rd, u8 size, u8 endian, bool force, bool do_zext)
break;
default:
/* The caller must have handled this. */
break;
}
} else {
/*
@@ -2253,6 +2254,7 @@ u8 gen_swap(u8 *buf, u8 rd, u8 size, u8 endian, bool force, bool do_zext)
break;
default:
/* The caller must have handled this. */
break;
}
}
@@ -2517,7 +2519,7 @@ u8 arc_epilogue(u8 *buf, u32 usage, u16 frame_size)
#define JCC64_NR_OF_JMPS 3 /* Number of jumps in jcc64 template. */
#define JCC64_INSNS_TO_END 3 /* Number of insn. inclusive the 2nd jmp to end. */
#define JCC64_SKIP_JMP 1 /* Index of the "skip" jump to "end". */
const struct {
static const struct {
/*
* "jit_off" is common between all "jmp[]" and is coupled with
* "cond" of each "jmp[]" instance. e.g.:
@@ -2883,7 +2885,7 @@ u8 gen_jmp_64(u8 *buf, u8 rd, u8 rs, u8 cond, u32 curr_off, u32 targ_off)
* The "ARC_CC_SET" becomes "CC_unequal" because of the "tst"
* instruction that precedes the conditional branch.
*/
const u8 arcv2_32_jmps[ARC_CC_LAST] = {
static const u8 arcv2_32_jmps[ARC_CC_LAST] = {
[ARC_CC_UGT] = CC_great_u,
[ARC_CC_UGE] = CC_great_eq_u,
[ARC_CC_ULT] = CC_less_u,

View File

@@ -159,7 +159,7 @@ static void jit_dump(const struct jit_context *ctx)
/* Initialise the context so there's no garbage. */
static int jit_ctx_init(struct jit_context *ctx, struct bpf_prog *prog)
{
memset(ctx, 0, sizeof(ctx));
memset(ctx, 0, sizeof(*ctx));
ctx->orig_prog = prog;
@@ -167,7 +167,7 @@ static int jit_ctx_init(struct jit_context *ctx, struct bpf_prog *prog)
ctx->prog = bpf_jit_blind_constants(prog);
if (IS_ERR(ctx->prog))
return PTR_ERR(ctx->prog);
ctx->blinded = (ctx->prog == ctx->orig_prog ? false : true);
ctx->blinded = (ctx->prog != ctx->orig_prog);
/* If the verifier doesn't zero-extend, then we have to do it. */
ctx->do_zext = !ctx->prog->aux->verifier_zext;
@@ -1182,12 +1182,12 @@ static int jit_prepare(struct jit_context *ctx)
}
/*
* All the "handle_*()" functions have been called before by the
* "jit_prepare()". If there was an error, we would know by now.
* Therefore, no extra error checking at this point, other than
* a sanity check at the end that expects the calculated length
* (jit.len) to be equal to the length of generated instructions
* (jit.index).
* jit_compile() is the real compilation phase. jit_prepare() is
* invoked before jit_compile() as a dry-run to make sure everything
* will go OK and allocate the necessary memory.
*
* In the end, jit_compile() checks if it has produced the same number
* of instructions as jit_prepare() would.
*/
static int jit_compile(struct jit_context *ctx)
{
@@ -1407,9 +1407,9 @@ static struct bpf_prog *do_extra_pass(struct bpf_prog *prog)
/*
* This function may be invoked twice for the same stream of BPF
* instructions. The "extra pass" happens, when there are "call"s
* involved that their addresses are not known during the first
* invocation.
* instructions. The "extra pass" happens, when there are
* (re)locations involved that their addresses are not known
* during the first run.
*/
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{

View File

@@ -232,11 +232,24 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
unsigned long old;
if (unlikely(atomic_read(&current->tracing_graph_pause)))
err_out:
return;
if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
/* FP points one word below parent's top of stack */
frame_pointer += 4;
/*
* Usually, the stack frames are contiguous in memory but cases
* have been observed where the next stack frame does not live
* at 'frame_pointer + 4' as this code used to assume.
*
* Instead, dereference the field in the stack frame that
* stores the SP of the calling frame: to avoid unbounded
* recursion, this cannot involve any ftrace instrumented
* functions, so use the __get_kernel_nofault() primitive
* directly.
*/
__get_kernel_nofault(&frame_pointer,
(unsigned long *)(frame_pointer - 8),
unsigned long, err_out);
} else {
struct stackframe frame = {
.fp = frame_pointer,

View File

@@ -1649,6 +1649,7 @@ config RODATA_FULL_DEFAULT_ENABLED
config ARM64_SW_TTBR0_PAN
bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
depends on !KCSAN
help
Enabling this option prevents the kernel from accessing
user-space memory directly by pointing TTBR0_EL1 to a reserved

View File

@@ -112,6 +112,9 @@
#define _ASM_EXTABLE_KACCESS_ERR(insn, fixup, err) \
_ASM_EXTABLE_KACCESS_ERR_ZERO(insn, fixup, err, wzr)
#define _ASM_EXTABLE_KACCESS(insn, fixup) \
_ASM_EXTABLE_KACCESS_ERR_ZERO(insn, fixup, wzr, wzr)
#define _ASM_EXTABLE_LOAD_UNALIGNED_ZEROPAD(insn, fixup, data, addr) \
__DEFINE_ASM_GPR_NUMS \
__ASM_EXTABLE_RAW(#insn, #fixup, \

View File

@@ -146,7 +146,7 @@
/* Coprocessor traps */
.macro __init_el2_cptr
__check_hvhe .LnVHE_\@, x1
mov x0, #(CPACR_EL1_FPEN_EL1EN | CPACR_EL1_FPEN_EL0EN)
mov x0, #CPACR_ELx_FPEN
msr cpacr_el1, x0
b .Lskip_set_cptr_\@
.LnVHE_\@:
@@ -277,7 +277,7 @@
// (h)VHE case
mrs x0, cpacr_el1 // Disable SVE traps
orr x0, x0, #(CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
orr x0, x0, #CPACR_ELx_ZEN
msr cpacr_el1, x0
b .Lskip_set_cptr_\@
@@ -298,7 +298,7 @@
// (h)VHE case
mrs x0, cpacr_el1 // Disable SME traps
orr x0, x0, #(CPACR_EL1_SMEN_EL0EN | CPACR_EL1_SMEN_EL1EN)
orr x0, x0, #CPACR_ELx_SMEN
msr cpacr_el1, x0
b .Lskip_set_cptr_sme_\@

View File

@@ -153,8 +153,9 @@ extern void __memset_io(volatile void __iomem *, int, size_t);
* emit the large TLP from the CPU.
*/
static inline void __const_memcpy_toio_aligned32(volatile u32 __iomem *to,
const u32 *from, size_t count)
static __always_inline void
__const_memcpy_toio_aligned32(volatile u32 __iomem *to, const u32 *from,
size_t count)
{
switch (count) {
case 8:
@@ -196,24 +197,22 @@ static inline void __const_memcpy_toio_aligned32(volatile u32 __iomem *to,
void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);
static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
size_t count)
static __always_inline void
__iowrite32_copy(void __iomem *to, const void *from, size_t count)
{
if (count == 8 || count == 4 || count == 2 || count == 1) {
if (__builtin_constant_p(count) &&
(count == 8 || count == 4 || count == 2 || count == 1)) {
__const_memcpy_toio_aligned32(to, from, count);
dgh();
} else {
__iowrite32_copy_full(to, from, count);
}
}
#define __iowrite32_copy __iowrite32_copy
#define __iowrite32_copy(to, from, count) \
(__builtin_constant_p(count) ? \
__const_iowrite32_copy(to, from, count) : \
__iowrite32_copy_full(to, from, count))
static inline void __const_memcpy_toio_aligned64(volatile u64 __iomem *to,
const u64 *from, size_t count)
static __always_inline void
__const_memcpy_toio_aligned64(volatile u64 __iomem *to, const u64 *from,
size_t count)
{
switch (count) {
case 8:
@@ -255,21 +254,18 @@ static inline void __const_memcpy_toio_aligned64(volatile u64 __iomem *to,
void __iowrite64_copy_full(void __iomem *to, const void *from, size_t count);
static inline void __const_iowrite64_copy(void __iomem *to, const void *from,
size_t count)
static __always_inline void
__iowrite64_copy(void __iomem *to, const void *from, size_t count)
{
if (count == 8 || count == 4 || count == 2 || count == 1) {
if (__builtin_constant_p(count) &&
(count == 8 || count == 4 || count == 2 || count == 1)) {
__const_memcpy_toio_aligned64(to, from, count);
dgh();
} else {
__iowrite64_copy_full(to, from, count);
}
}
#define __iowrite64_copy(to, from, count) \
(__builtin_constant_p(count) ? \
__const_iowrite64_copy(to, from, count) : \
__iowrite64_copy_full(to, from, count))
#define __iowrite64_copy __iowrite64_copy
/*
* I/O memory mapping functions.

View File

@@ -305,6 +305,12 @@
GENMASK(19, 14) | \
BIT(11))
#define CPTR_VHE_EL2_RES0 (GENMASK(63, 32) | \
GENMASK(27, 26) | \
GENMASK(23, 22) | \
GENMASK(19, 18) | \
GENMASK(15, 0))
/* Hyp Debug Configuration Register bits */
#define MDCR_EL2_E2TB_MASK (UL(0x3))
#define MDCR_EL2_E2TB_SHIFT (UL(24))

View File

@@ -557,6 +557,68 @@ static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
vcpu_set_flag((v), e); \
} while (0)
#define __build_check_all_or_none(r, bits) \
BUILD_BUG_ON(((r) & (bits)) && ((r) & (bits)) != (bits))
#define __cpacr_to_cptr_clr(clr, set) \
({ \
u64 cptr = 0; \
\
if ((set) & CPACR_ELx_FPEN) \
cptr |= CPTR_EL2_TFP; \
if ((set) & CPACR_ELx_ZEN) \
cptr |= CPTR_EL2_TZ; \
if ((set) & CPACR_ELx_SMEN) \
cptr |= CPTR_EL2_TSM; \
if ((clr) & CPACR_ELx_TTA) \
cptr |= CPTR_EL2_TTA; \
if ((clr) & CPTR_EL2_TAM) \
cptr |= CPTR_EL2_TAM; \
if ((clr) & CPTR_EL2_TCPAC) \
cptr |= CPTR_EL2_TCPAC; \
\
cptr; \
})
#define __cpacr_to_cptr_set(clr, set) \
({ \
u64 cptr = 0; \
\
if ((clr) & CPACR_ELx_FPEN) \
cptr |= CPTR_EL2_TFP; \
if ((clr) & CPACR_ELx_ZEN) \
cptr |= CPTR_EL2_TZ; \
if ((clr) & CPACR_ELx_SMEN) \
cptr |= CPTR_EL2_TSM; \
if ((set) & CPACR_ELx_TTA) \
cptr |= CPTR_EL2_TTA; \
if ((set) & CPTR_EL2_TAM) \
cptr |= CPTR_EL2_TAM; \
if ((set) & CPTR_EL2_TCPAC) \
cptr |= CPTR_EL2_TCPAC; \
\
cptr; \
})
#define cpacr_clear_set(clr, set) \
do { \
BUILD_BUG_ON((set) & CPTR_VHE_EL2_RES0); \
BUILD_BUG_ON((clr) & CPACR_ELx_E0POE); \
__build_check_all_or_none((clr), CPACR_ELx_FPEN); \
__build_check_all_or_none((set), CPACR_ELx_FPEN); \
__build_check_all_or_none((clr), CPACR_ELx_ZEN); \
__build_check_all_or_none((set), CPACR_ELx_ZEN); \
__build_check_all_or_none((clr), CPACR_ELx_SMEN); \
__build_check_all_or_none((set), CPACR_ELx_SMEN); \
\
if (has_vhe() || has_hvhe()) \
sysreg_clear_set(cpacr_el1, clr, set); \
else \
sysreg_clear_set(cptr_el2, \
__cpacr_to_cptr_clr(clr, set), \
__cpacr_to_cptr_set(clr, set));\
} while (0)
static __always_inline void kvm_write_cptr_el2(u64 val)
{
if (has_vhe() || has_hvhe())
@@ -570,17 +632,16 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
u64 val;
if (has_vhe()) {
val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |
CPACR_EL1_ZEN_EL1EN);
val = (CPACR_ELx_FPEN | CPACR_EL1_ZEN_EL1EN);
if (cpus_have_final_cap(ARM64_SME))
val |= CPACR_EL1_SMEN_EL1EN;
} else if (has_hvhe()) {
val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
val = CPACR_ELx_FPEN;
if (!vcpu_has_sve(vcpu) || !guest_owns_fp_regs())
val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN;
val |= CPACR_ELx_ZEN;
if (cpus_have_final_cap(ARM64_SME))
val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN;
val |= CPACR_ELx_SMEN;
} else {
val = CPTR_NVHE_EL2_RES1;

View File

@@ -76,6 +76,7 @@ static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
extern unsigned int __ro_after_init kvm_sve_max_vl;
extern unsigned int __ro_after_init kvm_host_sve_max_vl;
int __init kvm_arm_init_sve(void);
u32 __attribute_const__ kvm_target_cpu(void);
@@ -521,6 +522,20 @@ struct kvm_cpu_context {
u64 *vncr_array;
};
struct cpu_sve_state {
__u64 zcr_el1;
/*
* Ordering is important since __sve_save_state/__sve_restore_state
* relies on it.
*/
__u32 fpsr;
__u32 fpcr;
/* Must be SVE_VQ_BYTES (128 bit) aligned. */
__u8 sve_regs[];
};
/*
* This structure is instantiated on a per-CPU basis, and contains
* data that is:
@@ -534,7 +549,15 @@ struct kvm_cpu_context {
*/
struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
struct user_fpsimd_state *fpsimd_state; /* hyp VA */
/*
* All pointers in this union are hyp VA.
* sve_state is only used in pKVM and if system_supports_sve().
*/
union {
struct user_fpsimd_state *fpsimd_state;
struct cpu_sve_state *sve_state;
};
/* Ownership of the FP regs */
enum {

View File

@@ -111,7 +111,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
void __sve_restore_state(void *sve_pffr, u32 *fpsr);
void __sve_save_state(void *sve_pffr, u32 *fpsr, int save_ffr);
void __sve_restore_state(void *sve_pffr, u32 *fpsr, int restore_ffr);
u64 __guest_enter(struct kvm_vcpu *vcpu);
@@ -142,5 +143,6 @@ extern u64 kvm_nvhe_sym(id_aa64smfr0_el1_sys_val);
extern unsigned long kvm_nvhe_sym(__icache_flags);
extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
extern unsigned int kvm_nvhe_sym(kvm_host_sve_max_vl);
#endif /* __ARM64_KVM_HYP_H__ */

View File

@@ -128,4 +128,13 @@ static inline unsigned long hyp_ffa_proxy_pages(void)
return (2 * KVM_FFA_MBOX_NR_PAGES) + DIV_ROUND_UP(desc_max, PAGE_SIZE);
}
static inline size_t pkvm_host_sve_state_size(void)
{
if (!system_supports_sve())
return 0;
return size_add(sizeof(struct cpu_sve_state),
SVE_SIG_REGS_SIZE(sve_vq_from_vl(kvm_host_sve_max_vl)));
}
#endif /* __ARM64_KVM_PKVM_H__ */

View File

@@ -30,23 +30,20 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
/*
* Test whether a block of memory is a valid user space address.
* Returns 1 if the range is valid, 0 otherwise.
*
* This is equivalent to the following test:
* (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
* We only care that the address cannot reach the kernel mapping, and
* that an invalid address will fault.
*/
static inline int access_ok(const void __user *addr, unsigned long size)
static inline int access_ok(const void __user *p, unsigned long size)
{
/*
* Asynchronous I/O running in a kernel thread does not have the
* TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
* the user address before checking.
*/
if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
(current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
addr = untagged_addr(addr);
unsigned long addr = (unsigned long)p;
return likely(__access_ok(addr, size));
/* Only bit 55 of the address matters */
addr |= addr+size;
addr = (addr >> 55) & 1;
size >>= 55;
return !(addr | size);
}
#define access_ok access_ok
@@ -184,29 +181,40 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
* The "__xxx_error" versions set the third argument to -EFAULT if an error
* occurs, and leave it unchanged on success.
*/
#define __get_mem_asm(load, reg, x, addr, err, type) \
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
#define __get_mem_asm(load, reg, x, addr, label, type) \
asm_goto_output( \
"1: " load " " reg "0, [%1]\n" \
_ASM_EXTABLE_##type##ACCESS_ERR(1b, %l2, %w0) \
: "=r" (x) \
: "r" (addr) : : label)
#else
#define __get_mem_asm(load, reg, x, addr, label, type) do { \
int __gma_err = 0; \
asm volatile( \
"1: " load " " reg "1, [%2]\n" \
"2:\n" \
_ASM_EXTABLE_##type##ACCESS_ERR_ZERO(1b, 2b, %w0, %w1) \
: "+r" (err), "=r" (x) \
: "r" (addr))
: "+r" (__gma_err), "=r" (x) \
: "r" (addr)); \
if (__gma_err) goto label; } while (0)
#endif
#define __raw_get_mem(ldr, x, ptr, err, type) \
#define __raw_get_mem(ldr, x, ptr, label, type) \
do { \
unsigned long __gu_val; \
switch (sizeof(*(ptr))) { \
case 1: \
__get_mem_asm(ldr "b", "%w", __gu_val, (ptr), (err), type); \
__get_mem_asm(ldr "b", "%w", __gu_val, (ptr), label, type); \
break; \
case 2: \
__get_mem_asm(ldr "h", "%w", __gu_val, (ptr), (err), type); \
__get_mem_asm(ldr "h", "%w", __gu_val, (ptr), label, type); \
break; \
case 4: \
__get_mem_asm(ldr, "%w", __gu_val, (ptr), (err), type); \
__get_mem_asm(ldr, "%w", __gu_val, (ptr), label, type); \
break; \
case 8: \
__get_mem_asm(ldr, "%x", __gu_val, (ptr), (err), type); \
__get_mem_asm(ldr, "%x", __gu_val, (ptr), label, type); \
break; \
default: \
BUILD_BUG(); \
@@ -219,27 +227,34 @@ do { \
* uaccess_ttbr0_disable(). As `x` and `ptr` could contain blocking functions,
* we must evaluate these outside of the critical section.
*/
#define __raw_get_user(x, ptr, err) \
#define __raw_get_user(x, ptr, label) \
do { \
__typeof__(*(ptr)) __user *__rgu_ptr = (ptr); \
__typeof__(x) __rgu_val; \
__chk_user_ptr(ptr); \
\
uaccess_ttbr0_enable(); \
__raw_get_mem("ldtr", __rgu_val, __rgu_ptr, err, U); \
uaccess_ttbr0_disable(); \
\
(x) = __rgu_val; \
do { \
__label__ __rgu_failed; \
uaccess_ttbr0_enable(); \
__raw_get_mem("ldtr", __rgu_val, __rgu_ptr, __rgu_failed, U); \
uaccess_ttbr0_disable(); \
(x) = __rgu_val; \
break; \
__rgu_failed: \
uaccess_ttbr0_disable(); \
goto label; \
} while (0); \
} while (0)
#define __get_user_error(x, ptr, err) \
do { \
__label__ __gu_failed; \
__typeof__(*(ptr)) __user *__p = (ptr); \
might_fault(); \
if (access_ok(__p, sizeof(*__p))) { \
__p = uaccess_mask_ptr(__p); \
__raw_get_user((x), __p, (err)); \
__raw_get_user((x), __p, __gu_failed); \
} else { \
__gu_failed: \
(x) = (__force __typeof__(x))0; (err) = -EFAULT; \
} \
} while (0)
@@ -262,40 +277,42 @@ do { \
do { \
__typeof__(dst) __gkn_dst = (dst); \
__typeof__(src) __gkn_src = (src); \
int __gkn_err = 0; \
do { \
__label__ __gkn_label; \
\
__mte_enable_tco_async(); \
__raw_get_mem("ldr", *((type *)(__gkn_dst)), \
(__force type *)(__gkn_src), __gkn_err, K); \
__mte_disable_tco_async(); \
\
if (unlikely(__gkn_err)) \
__mte_enable_tco_async(); \
__raw_get_mem("ldr", *((type *)(__gkn_dst)), \
(__force type *)(__gkn_src), __gkn_label, K); \
__mte_disable_tco_async(); \
break; \
__gkn_label: \
__mte_disable_tco_async(); \
goto err_label; \
} while (0); \
} while (0)
#define __put_mem_asm(store, reg, x, addr, err, type) \
asm volatile( \
"1: " store " " reg "1, [%2]\n" \
#define __put_mem_asm(store, reg, x, addr, label, type) \
asm goto( \
"1: " store " " reg "0, [%1]\n" \
"2:\n" \
_ASM_EXTABLE_##type##ACCESS_ERR(1b, 2b, %w0) \
: "+r" (err) \
: "rZ" (x), "r" (addr))
_ASM_EXTABLE_##type##ACCESS(1b, %l2) \
: : "rZ" (x), "r" (addr) : : label)
#define __raw_put_mem(str, x, ptr, err, type) \
#define __raw_put_mem(str, x, ptr, label, type) \
do { \
__typeof__(*(ptr)) __pu_val = (x); \
switch (sizeof(*(ptr))) { \
case 1: \
__put_mem_asm(str "b", "%w", __pu_val, (ptr), (err), type); \
__put_mem_asm(str "b", "%w", __pu_val, (ptr), label, type); \
break; \
case 2: \
__put_mem_asm(str "h", "%w", __pu_val, (ptr), (err), type); \
__put_mem_asm(str "h", "%w", __pu_val, (ptr), label, type); \
break; \
case 4: \
__put_mem_asm(str, "%w", __pu_val, (ptr), (err), type); \
__put_mem_asm(str, "%w", __pu_val, (ptr), label, type); \
break; \
case 8: \
__put_mem_asm(str, "%x", __pu_val, (ptr), (err), type); \
__put_mem_asm(str, "%x", __pu_val, (ptr), label, type); \
break; \
default: \
BUILD_BUG(); \
@@ -307,25 +324,34 @@ do { \
* uaccess_ttbr0_disable(). As `x` and `ptr` could contain blocking functions,
* we must evaluate these outside of the critical section.
*/
#define __raw_put_user(x, ptr, err) \
#define __raw_put_user(x, ptr, label) \
do { \
__label__ __rpu_failed; \
__typeof__(*(ptr)) __user *__rpu_ptr = (ptr); \
__typeof__(*(ptr)) __rpu_val = (x); \
__chk_user_ptr(__rpu_ptr); \
\
uaccess_ttbr0_enable(); \
__raw_put_mem("sttr", __rpu_val, __rpu_ptr, err, U); \
uaccess_ttbr0_disable(); \
do { \
uaccess_ttbr0_enable(); \
__raw_put_mem("sttr", __rpu_val, __rpu_ptr, __rpu_failed, U); \
uaccess_ttbr0_disable(); \
break; \
__rpu_failed: \
uaccess_ttbr0_disable(); \
goto label; \
} while (0); \
} while (0)
#define __put_user_error(x, ptr, err) \
do { \
__label__ __pu_failed; \
__typeof__(*(ptr)) __user *__p = (ptr); \
might_fault(); \
if (access_ok(__p, sizeof(*__p))) { \
__p = uaccess_mask_ptr(__p); \
__raw_put_user((x), __p, (err)); \
__raw_put_user((x), __p, __pu_failed); \
} else { \
__pu_failed: \
(err) = -EFAULT; \
} \
} while (0)
@@ -348,15 +374,18 @@ do { \
do { \
__typeof__(dst) __pkn_dst = (dst); \
__typeof__(src) __pkn_src = (src); \
int __pkn_err = 0; \
\
__mte_enable_tco_async(); \
__raw_put_mem("str", *((type *)(__pkn_src)), \
(__force type *)(__pkn_dst), __pkn_err, K); \
__mte_disable_tco_async(); \
\
if (unlikely(__pkn_err)) \
do { \
__label__ __pkn_err; \
__mte_enable_tco_async(); \
__raw_put_mem("str", *((type *)(__pkn_src)), \
(__force type *)(__pkn_dst), __pkn_err, K); \
__mte_disable_tco_async(); \
break; \
__pkn_err: \
__mte_disable_tco_async(); \
goto err_label; \
} while (0); \
} while(0)
extern unsigned long __must_check __arch_copy_from_user(void *to, const void __user *from, unsigned long n);
@@ -381,6 +410,51 @@ extern unsigned long __must_check __arch_copy_to_user(void __user *to, const voi
__actu_ret; \
})
static __must_check __always_inline bool user_access_begin(const void __user *ptr, size_t len)
{
if (unlikely(!access_ok(ptr,len)))
return 0;
uaccess_ttbr0_enable();
return 1;
}
#define user_access_begin(a,b) user_access_begin(a,b)
#define user_access_end() uaccess_ttbr0_disable()
#define unsafe_put_user(x, ptr, label) \
__raw_put_mem("sttr", x, uaccess_mask_ptr(ptr), label, U)
#define unsafe_get_user(x, ptr, label) \
__raw_get_mem("ldtr", x, uaccess_mask_ptr(ptr), label, U)
/*
* KCSAN uses these to save and restore ttbr state.
* We do not support KCSAN with ARM64_SW_TTBR0_PAN, so
* they are no-ops.
*/
static inline unsigned long user_access_save(void) { return 0; }
static inline void user_access_restore(unsigned long enabled) { }
/*
* We want the unsafe accessors to always be inlined and use
* the error labels - thus the macro games.
*/
#define unsafe_copy_loop(dst, src, len, type, label) \
while (len >= sizeof(type)) { \
unsafe_put_user(*(type *)(src),(type __user *)(dst),label); \
dst += sizeof(type); \
src += sizeof(type); \
len -= sizeof(type); \
}
#define unsafe_copy_to_user(_dst,_src,_len,label) \
do { \
char __user *__ucu_dst = (_dst); \
const char *__ucu_src = (_src); \
size_t __ucu_len = (_len); \
unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
} while (0)
#define INLINE_COPY_TO_USER
#define INLINE_COPY_FROM_USER

View File

@@ -462,6 +462,9 @@ static int run_all_insn_set_hw_mode(unsigned int cpu)
for (int i = 0; i < ARRAY_SIZE(insn_emulations); i++) {
struct insn_emulation *insn = insn_emulations[i];
bool enable = READ_ONCE(insn->current_mode) == INSN_HW;
if (insn->status == INSN_UNAVAILABLE)
continue;
if (insn->set_hw_mode && insn->set_hw_mode(enable)) {
pr_warn("CPU[%u] cannot support the emulation of %s",
cpu, insn->name);

View File

@@ -582,12 +582,9 @@ subsys_initcall(register_mte_tcf_preferred_sysctl);
size_t mte_probe_user_range(const char __user *uaddr, size_t size)
{
const char __user *end = uaddr + size;
int err = 0;
char val;
__raw_get_user(val, uaddr, err);
if (err)
return size;
__raw_get_user(val, uaddr, efault);
uaddr = PTR_ALIGN(uaddr, MTE_GRANULE_SIZE);
while (uaddr < end) {
@@ -595,12 +592,13 @@ size_t mte_probe_user_range(const char __user *uaddr, size_t size)
* A read is sufficient for mte, the caller should have probed
* for the pte write permission if required.
*/
__raw_get_user(val, uaddr, err);
if (err)
return end - uaddr;
__raw_get_user(val, uaddr, efault);
uaddr += MTE_GRANULE_SIZE;
}
(void)val;
return 0;
efault:
return end - uaddr;
}

View File

@@ -1931,6 +1931,11 @@ static unsigned long nvhe_percpu_order(void)
return size ? get_order(size) : 0;
}
static size_t pkvm_host_sve_state_order(void)
{
return get_order(pkvm_host_sve_state_size());
}
/* A lookup table holding the hypervisor VA for each vector slot */
static void *hyp_spectre_vector_selector[BP_HARDEN_EL2_SLOTS];
@@ -2310,12 +2315,20 @@ static void __init teardown_subsystems(void)
static void __init teardown_hyp_mode(void)
{
bool free_sve = system_supports_sve() && is_protected_kvm_enabled();
int cpu;
free_hyp_pgds();
for_each_possible_cpu(cpu) {
free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
if (free_sve) {
struct cpu_sve_state *sve_state;
sve_state = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state;
free_pages((unsigned long) sve_state, pkvm_host_sve_state_order());
}
}
}
@@ -2398,6 +2411,58 @@ static int __init kvm_hyp_init_protection(u32 hyp_va_bits)
return 0;
}
static int init_pkvm_host_sve_state(void)
{
int cpu;
if (!system_supports_sve())
return 0;
/* Allocate pages for host sve state in protected mode. */
for_each_possible_cpu(cpu) {
struct page *page = alloc_pages(GFP_KERNEL, pkvm_host_sve_state_order());
if (!page)
return -ENOMEM;
per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state = page_address(page);
}
/*
* Don't map the pages in hyp since these are only used in protected
* mode, which will (re)create its own mapping when initialized.
*/
return 0;
}
/*
* Finalizes the initialization of hyp mode, once everything else is initialized
* and the initialziation process cannot fail.
*/
static void finalize_init_hyp_mode(void)
{
int cpu;
if (system_supports_sve() && is_protected_kvm_enabled()) {
for_each_possible_cpu(cpu) {
struct cpu_sve_state *sve_state;
sve_state = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state;
per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state =
kern_hyp_va(sve_state);
}
} else {
for_each_possible_cpu(cpu) {
struct user_fpsimd_state *fpsimd_state;
fpsimd_state = &per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->host_ctxt.fp_regs;
per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->fpsimd_state =
kern_hyp_va(fpsimd_state);
}
}
}
static void pkvm_hyp_init_ptrauth(void)
{
struct kvm_cpu_context *hyp_ctxt;
@@ -2566,6 +2631,10 @@ static int __init init_hyp_mode(void)
goto out_err;
}
err = init_pkvm_host_sve_state();
if (err)
goto out_err;
err = kvm_hyp_init_protection(hyp_va_bits);
if (err) {
kvm_err("Failed to init hyp memory protection\n");
@@ -2730,6 +2799,13 @@ static __init int kvm_arm_init(void)
if (err)
goto out_subs;
/*
* This should be called after initialization is done and failure isn't
* possible anymore.
*/
if (!in_hyp_mode)
finalize_init_hyp_mode();
kvm_arm_initialised = true;
return 0;

View File

@@ -2181,16 +2181,23 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
if (forward_traps(vcpu, HCR_NV))
return;
spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
/* Check for an ERETAx */
esr = kvm_vcpu_get_esr(vcpu);
if (esr_iss_is_eretax(esr) && !kvm_auth_eretax(vcpu, &elr)) {
/*
* Oh no, ERETAx failed to authenticate. If we have
* FPACCOMBINE, deliver an exception right away. If we
* don't, then let the mangled ELR value trickle down the
* Oh no, ERETAx failed to authenticate.
*
* If we have FPACCOMBINE and we don't have a pending
* Illegal Execution State exception (which has priority
* over FPAC), deliver an exception right away.
*
* Otherwise, let the mangled ELR value trickle down the
* ERET handling, and the guest will have a little surprise.
*/
if (kvm_has_pauth(vcpu->kvm, FPACCOMBINE)) {
if (kvm_has_pauth(vcpu->kvm, FPACCOMBINE) && !(spsr & PSR_IL_BIT)) {
esr &= ESR_ELx_ERET_ISS_ERETA;
esr |= FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_FPAC);
kvm_inject_nested_sync(vcpu, esr);
@@ -2201,17 +2208,11 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
preempt_disable();
kvm_arch_vcpu_put(vcpu);
spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
if (!esr_iss_is_eretax(esr))
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
trace_kvm_nested_eret(vcpu, elr, spsr);
/*
* Note that the current exception level is always the virtual EL2,
* since we set HCR_EL2.NV bit only when entering the virtual EL2.
*/
*vcpu_pc(vcpu) = elr;
*vcpu_cpsr(vcpu) = spsr;

View File

@@ -90,6 +90,13 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
fpsimd_save_and_flush_cpu_state();
}
}
/*
* If normal guests gain SME support, maintain this behavior for pKVM
* guests, which don't support SME.
*/
WARN_ON(is_protected_kvm_enabled() && system_supports_sme() &&
read_sysreg_s(SYS_SVCR));
}
/*
@@ -161,9 +168,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
if (has_vhe() && system_supports_sme()) {
/* Also restore EL0 state seen on entry */
if (vcpu_get_flag(vcpu, HOST_SME_ENABLED))
sysreg_clear_set(CPACR_EL1, 0,
CPACR_EL1_SMEN_EL0EN |
CPACR_EL1_SMEN_EL1EN);
sysreg_clear_set(CPACR_EL1, 0, CPACR_ELx_SMEN);
else
sysreg_clear_set(CPACR_EL1,
CPACR_EL1_SMEN_EL0EN,

View File

@@ -251,6 +251,7 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case PSR_AA32_MODE_SVC:
case PSR_AA32_MODE_ABT:
case PSR_AA32_MODE_UND:
case PSR_AA32_MODE_SYS:
if (!vcpu_el1_is_32bit(vcpu))
return -EINVAL;
break;
@@ -276,7 +277,7 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if (*vcpu_cpsr(vcpu) & PSR_MODE32_BIT) {
int i, nr_reg;
switch (*vcpu_cpsr(vcpu)) {
switch (*vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK) {
/*
* Either we are dealing with user mode, and only the
* first 15 registers (+ PC) must be narrowed to 32bit.

View File

@@ -50,9 +50,23 @@ bool kvm_condition_valid32(const struct kvm_vcpu *vcpu)
u32 cpsr_cond;
int cond;
/* Top two bits non-zero? Unconditional. */
if (kvm_vcpu_get_esr(vcpu) >> 30)
/*
* These are the exception classes that could fire with a
* conditional instruction.
*/
switch (kvm_vcpu_trap_get_class(vcpu)) {
case ESR_ELx_EC_CP15_32:
case ESR_ELx_EC_CP15_64:
case ESR_ELx_EC_CP14_MR:
case ESR_ELx_EC_CP14_LS:
case ESR_ELx_EC_FP_ASIMD:
case ESR_ELx_EC_CP10_ID:
case ESR_ELx_EC_CP14_64:
case ESR_ELx_EC_SVC32:
break;
default:
return true;
}
/* Is condition field valid? */
cond = kvm_vcpu_get_condition(vcpu);

View File

@@ -25,3 +25,9 @@ SYM_FUNC_START(__sve_restore_state)
sve_load 0, x1, x2, 3
ret
SYM_FUNC_END(__sve_restore_state)
SYM_FUNC_START(__sve_save_state)
mov x2, #1
sve_save 0, x1, x2, 3
ret
SYM_FUNC_END(__sve_save_state)

View File

@@ -316,10 +316,24 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
{
sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
__sve_restore_state(vcpu_sve_pffr(vcpu),
&vcpu->arch.ctxt.fp_regs.fpsr);
&vcpu->arch.ctxt.fp_regs.fpsr,
true);
write_sysreg_el1(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR);
}
static inline void __hyp_sve_save_host(void)
{
struct cpu_sve_state *sve_state = *host_data_ptr(sve_state);
sve_state->zcr_el1 = read_sysreg_el1(SYS_ZCR);
write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
__sve_save_state(sve_state->sve_regs + sve_ffr_offset(kvm_host_sve_max_vl),
&sve_state->fpsr,
true);
}
static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu);
/*
* We trap the first access to the FP/SIMD to save the host context and
* restore the guest context lazily.
@@ -330,7 +344,6 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
{
bool sve_guest;
u8 esr_ec;
u64 reg;
if (!system_supports_fpsimd())
return false;
@@ -353,24 +366,15 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
/* Valid trap. Switch the context: */
/* First disable enough traps to allow us to update the registers */
if (has_vhe() || has_hvhe()) {
reg = CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN;
if (sve_guest)
reg |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
sysreg_clear_set(cpacr_el1, 0, reg);
} else {
reg = CPTR_EL2_TFP;
if (sve_guest)
reg |= CPTR_EL2_TZ;
sysreg_clear_set(cptr_el2, reg, 0);
}
if (sve_guest || (is_protected_kvm_enabled() && system_supports_sve()))
cpacr_clear_set(0, CPACR_ELx_FPEN | CPACR_ELx_ZEN);
else
cpacr_clear_set(0, CPACR_ELx_FPEN);
isb();
/* Write out the host state if it's in the registers */
if (host_owns_fp_regs())
__fpsimd_save_state(*host_data_ptr(fpsimd_state));
kvm_hyp_save_fpsimd_host(vcpu);
/* Restore the guest state */
if (sve_guest)

View File

@@ -59,7 +59,6 @@ static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
}
void pkvm_hyp_vm_table_init(void *tbl);
void pkvm_host_fpsimd_state_init(void);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);

View File

@@ -23,20 +23,80 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu)
{
__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
/*
* On saving/restoring guest sve state, always use the maximum VL for
* the guest. The layout of the data when saving the sve state depends
* on the VL, so use a consistent (i.e., the maximum) guest VL.
*/
sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
__sve_save_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr, true);
write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
}
static void __hyp_sve_restore_host(void)
{
struct cpu_sve_state *sve_state = *host_data_ptr(sve_state);
/*
* On saving/restoring host sve state, always use the maximum VL for
* the host. The layout of the data when saving the sve state depends
* on the VL, so use a consistent (i.e., the maximum) host VL.
*
* Setting ZCR_EL2 to ZCR_ELx_LEN_MASK sets the effective length
* supported by the system (or limited at EL3).
*/
write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
__sve_restore_state(sve_state->sve_regs + sve_ffr_offset(kvm_host_sve_max_vl),
&sve_state->fpsr,
true);
write_sysreg_el1(sve_state->zcr_el1, SYS_ZCR);
}
static void fpsimd_sve_flush(void)
{
*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
}
static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
{
if (!guest_owns_fp_regs())
return;
cpacr_clear_set(0, CPACR_ELx_FPEN | CPACR_ELx_ZEN);
isb();
if (vcpu_has_sve(vcpu))
__hyp_sve_save_guest(vcpu);
else
__fpsimd_save_state(&vcpu->arch.ctxt.fp_regs);
if (system_supports_sve())
__hyp_sve_restore_host();
else
__fpsimd_restore_state(*host_data_ptr(fpsimd_state));
*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
}
static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
{
struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
fpsimd_sve_flush();
hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
hyp_vcpu->vcpu.arch.sve_state = kern_hyp_va(host_vcpu->arch.sve_state);
hyp_vcpu->vcpu.arch.sve_max_vl = host_vcpu->arch.sve_max_vl;
/* Limit guest vector length to the maximum supported by the host. */
hyp_vcpu->vcpu.arch.sve_max_vl = min(host_vcpu->arch.sve_max_vl, kvm_host_sve_max_vl);
hyp_vcpu->vcpu.arch.hw_mmu = host_vcpu->arch.hw_mmu;
hyp_vcpu->vcpu.arch.hcr_el2 = host_vcpu->arch.hcr_el2;
hyp_vcpu->vcpu.arch.mdcr_el2 = host_vcpu->arch.mdcr_el2;
hyp_vcpu->vcpu.arch.cptr_el2 = host_vcpu->arch.cptr_el2;
hyp_vcpu->vcpu.arch.iflags = host_vcpu->arch.iflags;
@@ -54,10 +114,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
unsigned int i;
fpsimd_sve_sync(&hyp_vcpu->vcpu);
host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
host_vcpu->arch.hcr_el2 = hyp_vcpu->vcpu.arch.hcr_el2;
host_vcpu->arch.cptr_el2 = hyp_vcpu->vcpu.arch.cptr_el2;
host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault;
@@ -79,6 +140,17 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
struct kvm *host_kvm;
/*
* KVM (and pKVM) doesn't support SME guests for now, and
* ensures that SME features aren't enabled in pstate when
* loading a vcpu. Therefore, if SME features enabled the host
* is misbehaving.
*/
if (unlikely(system_supports_sme() && read_sysreg_s(SYS_SVCR))) {
ret = -EINVAL;
goto out;
}
host_kvm = kern_hyp_va(host_vcpu->kvm);
hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
host_vcpu->vcpu_idx);
@@ -405,11 +477,7 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
handle_host_smc(host_ctxt);
break;
case ESR_ELx_EC_SVE:
if (has_hvhe())
sysreg_clear_set(cpacr_el1, 0, (CPACR_EL1_ZEN_EL1EN |
CPACR_EL1_ZEN_EL0EN));
else
sysreg_clear_set(cptr_el2, CPTR_EL2_TZ, 0);
cpacr_clear_set(0, CPACR_ELx_ZEN);
isb();
sve_cond_update_zcr_vq(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
break;

View File

@@ -18,6 +18,8 @@ unsigned long __icache_flags;
/* Used by kvm_get_vttbr(). */
unsigned int kvm_arm_vmid_bits;
unsigned int kvm_host_sve_max_vl;
/*
* Set trap register values based on features in ID_AA64PFR0.
*/
@@ -63,7 +65,7 @@ static void pvm_init_traps_aa64pfr0(struct kvm_vcpu *vcpu)
/* Trap SVE */
if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_SVE), feature_ids)) {
if (has_hvhe())
cptr_clear |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
cptr_clear |= CPACR_ELx_ZEN;
else
cptr_set |= CPTR_EL2_TZ;
}
@@ -247,17 +249,6 @@ void pkvm_hyp_vm_table_init(void *tbl)
vm_table = tbl;
}
void pkvm_host_fpsimd_state_init(void)
{
unsigned long i;
for (i = 0; i < hyp_nr_cpus; i++) {
struct kvm_host_data *host_data = per_cpu_ptr(&kvm_host_data, i);
host_data->fpsimd_state = &host_data->host_ctxt.fp_regs;
}
}
/*
* Return the hyp vm structure corresponding to the handle.
*/
@@ -586,6 +577,8 @@ unlock:
if (ret)
unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
hyp_vcpu->vcpu.arch.cptr_el2 = kvm_get_reset_cptr_el2(&hyp_vcpu->vcpu);
return ret;
}

View File

@@ -67,6 +67,28 @@ static int divide_memory_pool(void *virt, unsigned long size)
return 0;
}
static int pkvm_create_host_sve_mappings(void)
{
void *start, *end;
int ret, i;
if (!system_supports_sve())
return 0;
for (i = 0; i < hyp_nr_cpus; i++) {
struct kvm_host_data *host_data = per_cpu_ptr(&kvm_host_data, i);
struct cpu_sve_state *sve_state = host_data->sve_state;
start = kern_hyp_va(sve_state);
end = start + PAGE_ALIGN(pkvm_host_sve_state_size());
ret = pkvm_create_mappings(start, end, PAGE_HYP);
if (ret)
return ret;
}
return 0;
}
static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
unsigned long *per_cpu_base,
u32 hyp_va_bits)
@@ -125,6 +147,8 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
return ret;
}
pkvm_create_host_sve_mappings();
/*
* Map the host sections RO in the hypervisor, but transfer the
* ownership from the host to the hypervisor itself to make sure they
@@ -300,7 +324,6 @@ void __noreturn __pkvm_init_finalise(void)
goto out;
pkvm_hyp_vm_table_init(vm_table_base);
pkvm_host_fpsimd_state_init();
out:
/*
* We tail-called to here from handle___pkvm_init() and will not return,

View File

@@ -48,15 +48,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= has_hvhe() ? CPACR_EL1_TTA : CPTR_EL2_TTA;
if (cpus_have_final_cap(ARM64_SME)) {
if (has_hvhe())
val &= ~(CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN);
val &= ~CPACR_ELx_SMEN;
else
val |= CPTR_EL2_TSM;
}
if (!guest_owns_fp_regs()) {
if (has_hvhe())
val &= ~(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |
CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN);
val &= ~(CPACR_ELx_FPEN | CPACR_ELx_ZEN);
else
val |= CPTR_EL2_TFP | CPTR_EL2_TZ;
@@ -182,6 +181,25 @@ static bool kvm_handle_pvm_sys64(struct kvm_vcpu *vcpu, u64 *exit_code)
kvm_handle_pvm_sysreg(vcpu, exit_code));
}
static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu)
{
/*
* Non-protected kvm relies on the host restoring its sve state.
* Protected kvm restores the host's sve state as not to reveal that
* fpsimd was used by a guest nor leak upper sve bits.
*/
if (unlikely(is_protected_kvm_enabled() && system_supports_sve())) {
__hyp_sve_save_host();
/* Re-enable SVE traps if not supported for the guest vcpu. */
if (!vcpu_has_sve(vcpu))
cpacr_clear_set(CPACR_ELx_ZEN, 0);
} else {
__fpsimd_save_state(*host_data_ptr(fpsimd_state));
}
}
static const exit_handler_fn hyp_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = NULL,
[ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32,

View File

@@ -93,8 +93,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1);
val |= CPACR_ELx_TTA;
val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN |
CPACR_EL1_SMEN_EL0EN | CPACR_EL1_SMEN_EL1EN);
val &= ~(CPACR_ELx_ZEN | CPACR_ELx_SMEN);
/*
* With VHE (HCR.E2H == 1), accesses to CPACR_EL1 are routed to
@@ -109,9 +108,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu))
val |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
val |= CPACR_ELx_ZEN;
} else {
val &= ~(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
val &= ~CPACR_ELx_FPEN;
__activate_traps_fpsimd32(vcpu);
}
@@ -262,6 +261,11 @@ static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
return true;
}
static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu)
{
__fpsimd_save_state(*host_data_ptr(fpsimd_state));
}
static const exit_handler_fn hyp_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = NULL,
[ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32,

View File

@@ -58,8 +58,10 @@ static u64 limit_nv_id_reg(u32 id, u64 val)
break;
case SYS_ID_AA64PFR1_EL1:
/* Only support SSBS */
val &= NV_FTR(PFR1, SSBS);
/* Only support BTI, SSBS, CSV2_frac */
val &= (NV_FTR(PFR1, BT) |
NV_FTR(PFR1, SSBS) |
NV_FTR(PFR1, CSV2_frac));
break;
case SYS_ID_AA64MMFR0_EL1:

View File

@@ -32,6 +32,7 @@
/* Maximum phys_shift supported for any VM on this host */
static u32 __ro_after_init kvm_ipa_limit;
unsigned int __ro_after_init kvm_host_sve_max_vl;
/*
* ARMv8 Reset Values
@@ -51,6 +52,8 @@ int __init kvm_arm_init_sve(void)
{
if (system_supports_sve()) {
kvm_sve_max_vl = sve_max_virtualisable_vl();
kvm_host_sve_max_vl = sve_max_vl();
kvm_nvhe_sym(kvm_host_sve_max_vl) = kvm_host_sve_max_vl;
/*
* The get_sve_reg()/set_sve_reg() ioctl interface will need

View File

@@ -376,7 +376,7 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
* clearing access/dirty for the whole block.
*/
unsigned long start = addr;
unsigned long end = start + nr;
unsigned long end = start + nr * PAGE_SIZE;
if (pte_cont(__ptep_get(ptep + nr - 1)))
end = ALIGN(end, CONT_PTE_SIZE);
@@ -386,7 +386,7 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
ptep = contpte_align_down(ptep);
}
__clear_young_dirty_ptes(vma, start, ptep, end - start, flags);
__clear_young_dirty_ptes(vma, start, ptep, (end - start) / PAGE_SIZE, flags);
}
EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);

View File

@@ -44,14 +44,14 @@
&gmac0 {
status = "okay";
phy-mode = "rgmii";
phy-mode = "rgmii-id";
bus_id = <0x0>;
};
&gmac1 {
status = "okay";
phy-mode = "rgmii";
phy-mode = "rgmii-id";
bus_id = <0x1>;
};

View File

@@ -43,7 +43,7 @@
&gmac0 {
status = "okay";
phy-mode = "rgmii";
phy-mode = "rgmii-id";
phy-handle = <&phy0>;
mdio {
compatible = "snps,dwmac-mdio";
@@ -58,7 +58,7 @@
&gmac1 {
status = "okay";
phy-mode = "rgmii";
phy-mode = "rgmii-id";
phy-handle = <&phy1>;
mdio {
compatible = "snps,dwmac-mdio";

View File

@@ -92,7 +92,7 @@
&gmac2 {
status = "okay";
phy-mode = "rgmii";
phy-mode = "rgmii-id";
phy-handle = <&phy2>;
mdio {
compatible = "snps,dwmac-mdio";

View File

@@ -56,6 +56,7 @@ extern int early_cpu_to_node(int cpu);
static inline void early_numa_add_cpu(int cpuid, s16 node) { }
static inline void numa_add_cpu(unsigned int cpu) { }
static inline void numa_remove_cpu(unsigned int cpu) { }
static inline void set_cpuid_to_node(int cpuid, s16 node) { }
static inline int early_cpu_to_node(int cpu)
{

View File

@@ -42,7 +42,7 @@
.macro JUMP_VIRT_ADDR temp1 temp2
li.d \temp1, CACHE_BASE
pcaddi \temp2, 0
or \temp1, \temp1, \temp2
bstrins.d \temp1, \temp2, (DMW_PABITS - 1), 0
jirl zero, \temp1, 0xc
.endm

View File

@@ -22,7 +22,7 @@
_head:
.word MZ_MAGIC /* "MZ", MS-DOS header */
.org 0x8
.dword kernel_entry /* Kernel entry point */
.dword _kernel_entry /* Kernel entry point (physical address) */
.dword _kernel_asize /* Kernel image effective size */
.quad PHYS_LINK_KADDR /* Kernel image load offset from start of RAM */
.org 0x38 /* 0x20 ~ 0x37 reserved */

View File

@@ -282,7 +282,7 @@ static void __init fdt_setup(void)
return;
/* Prefer to use built-in dtb, checking its legality first. */
if (!fdt_check_header(__dtb_start))
if (IS_ENABLED(CONFIG_BUILTIN_DTB) && !fdt_check_header(__dtb_start))
fdt_pointer = __dtb_start;
else
fdt_pointer = efi_fdt_pointer(); /* Fallback to firmware dtb */
@@ -351,10 +351,8 @@ void __init platform_init(void)
arch_reserve_vmcore();
arch_reserve_crashkernel();
#ifdef CONFIG_ACPI_TABLE_UPGRADE
acpi_table_upgrade();
#endif
#ifdef CONFIG_ACPI
acpi_table_upgrade();
acpi_gbl_use_default_register_widths = false;
acpi_boot_table_init();
#endif

View File

@@ -273,7 +273,6 @@ static void __init fdt_smp_setup(void)
if (cpuid == loongson_sysconf.boot_cpu_id) {
cpu = 0;
numa_add_cpu(cpu);
} else {
cpu = cpumask_next_zero(-1, cpu_present_mask);
}
@@ -283,6 +282,9 @@ static void __init fdt_smp_setup(void)
set_cpu_present(cpu, true);
__cpu_number_map[cpuid] = cpu;
__cpu_logical_map[cpu] = cpuid;
early_numa_add_cpu(cpu, 0);
set_cpuid_to_node(cpuid, 0);
}
loongson_sysconf.nr_cpus = num_processors;
@@ -468,6 +470,7 @@ void smp_prepare_boot_cpu(void)
set_cpu_possible(0, true);
set_cpu_online(0, true);
set_my_cpu_offset(per_cpu_offset(0));
numa_add_cpu(0);
rr_node = first_node(node_online_map);
for_each_possible_cpu(cpu) {

View File

@@ -6,6 +6,7 @@
#define PAGE_SIZE _PAGE_SIZE
#define RO_EXCEPTION_TABLE_ALIGN 4
#define PHYSADDR_MASK 0xffffffffffff /* 48-bit */
/*
* Put .bss..swapper_pg_dir as the first thing in .bss. This will
@@ -142,10 +143,11 @@ SECTIONS
#ifdef CONFIG_EFI_STUB
/* header symbols */
_kernel_asize = _end - _text;
_kernel_fsize = _edata - _text;
_kernel_vsize = _end - __initdata_begin;
_kernel_rsize = _edata - __initdata_begin;
_kernel_entry = ABSOLUTE(kernel_entry & PHYSADDR_MASK);
_kernel_asize = ABSOLUTE(_end - _text);
_kernel_fsize = ABSOLUTE(_edata - _text);
_kernel_vsize = ABSOLUTE(_end - __initdata_begin);
_kernel_rsize = ABSOLUTE(_edata - __initdata_begin);
#endif
.gptab.sdata : {

View File

@@ -31,18 +31,17 @@ void flush_cache_all_local(void);
void flush_cache_all(void);
void flush_cache_mm(struct mm_struct *mm);
void flush_kernel_dcache_page_addr(const void *addr);
#define flush_kernel_dcache_range(start,size) \
flush_kernel_dcache_range_asm((start), (start)+(size));
/* The only way to flush a vmap range is to flush whole cache */
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
void flush_kernel_vmap_range(void *vaddr, int size);
void invalidate_kernel_vmap_range(void *vaddr, int size);
#define flush_cache_vmap(start, end) flush_cache_all()
void flush_cache_vmap(unsigned long start, unsigned long end);
#define flush_cache_vmap_early(start, end) do { } while (0)
#define flush_cache_vunmap(start, end) flush_cache_all()
void flush_cache_vunmap(unsigned long start, unsigned long end);
void flush_dcache_folio(struct folio *folio);
#define flush_dcache_folio flush_dcache_folio
@@ -77,17 +76,11 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
void flush_cache_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
/* defined in pacache.S exported in cache.c used by flush_anon_page */
void flush_dcache_page_asm(unsigned long phys_addr, unsigned long vaddr);
#define ARCH_HAS_FLUSH_ANON_PAGE
void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr);
#define ARCH_HAS_FLUSH_ON_KUNMAP
static inline void kunmap_flush_on_unmap(const void *addr)
{
flush_kernel_dcache_page_addr(addr);
}
void kunmap_flush_on_unmap(const void *addr);
#endif /* _PARISC_CACHEFLUSH_H */

View File

@@ -448,14 +448,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return pte;
}
static inline pte_t ptep_get(pte_t *ptep)
{
return READ_ONCE(*ptep);
}
#define ptep_get ptep_get
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
{
pte_t pte;
if (!pte_young(*ptep))
return 0;
pte = *ptep;
pte = ptep_get(ptep);
if (!pte_young(pte)) {
return 0;
}
@@ -463,17 +466,10 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned
return 1;
}
int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep);
pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep);
struct mm_struct;
static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
pte_t old_pte;
old_pte = *ptep;
set_pte(ptep, __pte(0));
return old_pte;
}
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
set_pte(ptep, pte_wrprotect(*ptep));
@@ -511,7 +507,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
#define __HAVE_ARCH_PTEP_CLEAR_FLUSH
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define __HAVE_ARCH_PTE_SAME

View File

@@ -20,6 +20,7 @@
#include <linux/sched.h>
#include <linux/sched/mm.h>
#include <linux/syscalls.h>
#include <linux/vmalloc.h>
#include <asm/pdc.h>
#include <asm/cache.h>
#include <asm/cacheflush.h>
@@ -31,20 +32,31 @@
#include <asm/mmu_context.h>
#include <asm/cachectl.h>
#define PTR_PAGE_ALIGN_DOWN(addr) PTR_ALIGN_DOWN(addr, PAGE_SIZE)
/*
* When nonzero, use _PAGE_ACCESSED bit to try to reduce the number
* of page flushes done flush_cache_page_if_present. There are some
* pros and cons in using this option. It may increase the risk of
* random segmentation faults.
*/
#define CONFIG_FLUSH_PAGE_ACCESSED 0
int split_tlb __ro_after_init;
int dcache_stride __ro_after_init;
int icache_stride __ro_after_init;
EXPORT_SYMBOL(dcache_stride);
/* Internal implementation in arch/parisc/kernel/pacache.S */
void flush_dcache_page_asm(unsigned long phys_addr, unsigned long vaddr);
EXPORT_SYMBOL(flush_dcache_page_asm);
void purge_dcache_page_asm(unsigned long phys_addr, unsigned long vaddr);
void flush_icache_page_asm(unsigned long phys_addr, unsigned long vaddr);
/* Internal implementation in arch/parisc/kernel/pacache.S */
void flush_data_cache_local(void *); /* flushes local data-cache only */
void flush_instruction_cache_local(void); /* flushes local code-cache only */
static void flush_kernel_dcache_page_addr(const void *addr);
/* On some machines (i.e., ones with the Merced bus), there can be
* only a single PxTLB broadcast at a time; this must be guaranteed
* by software. We need a spinlock around all TLB flushes to ensure
@@ -321,6 +333,18 @@ __flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
{
if (!static_branch_likely(&parisc_has_cache))
return;
/*
* The TLB is the engine of coherence on parisc. The CPU is
* entitled to speculate any page with a TLB mapping, so here
* we kill the mapping then flush the page along a special flush
* only alias mapping. This guarantees that the page is no-longer
* in the cache for any process and nor may it be speculatively
* read in (until the user or kernel specifically accesses it,
* of course).
*/
flush_tlb_page(vma, vmaddr);
preempt_disable();
flush_dcache_page_asm(physaddr, vmaddr);
if (vma->vm_flags & VM_EXEC)
@@ -328,46 +352,44 @@ __flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
preempt_enable();
}
static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmaddr)
static void flush_kernel_dcache_page_addr(const void *addr)
{
unsigned long flags, space, pgd, prot;
#ifdef CONFIG_TLB_PTLOCK
unsigned long pgd_lock;
#endif
unsigned long vaddr = (unsigned long)addr;
unsigned long flags;
vmaddr &= PAGE_MASK;
/* Purge TLB entry to remove translation on all CPUs */
purge_tlb_start(flags);
pdtlb(SR_KERNEL, addr);
purge_tlb_end(flags);
/* Use tmpalias flush to prevent data cache move-in */
preempt_disable();
/* Set context for flush */
local_irq_save(flags);
prot = mfctl(8);
space = mfsp(SR_USER);
pgd = mfctl(25);
#ifdef CONFIG_TLB_PTLOCK
pgd_lock = mfctl(28);
#endif
switch_mm_irqs_off(NULL, vma->vm_mm, NULL);
local_irq_restore(flags);
flush_user_dcache_range_asm(vmaddr, vmaddr + PAGE_SIZE);
if (vma->vm_flags & VM_EXEC)
flush_user_icache_range_asm(vmaddr, vmaddr + PAGE_SIZE);
flush_tlb_page(vma, vmaddr);
/* Restore previous context */
local_irq_save(flags);
#ifdef CONFIG_TLB_PTLOCK
mtctl(pgd_lock, 28);
#endif
mtctl(pgd, 25);
mtsp(space, SR_USER);
mtctl(prot, 8);
local_irq_restore(flags);
flush_dcache_page_asm(__pa(vaddr), vaddr);
preempt_enable();
}
static void flush_kernel_icache_page_addr(const void *addr)
{
unsigned long vaddr = (unsigned long)addr;
unsigned long flags;
/* Purge TLB entry to remove translation on all CPUs */
purge_tlb_start(flags);
pdtlb(SR_KERNEL, addr);
purge_tlb_end(flags);
/* Use tmpalias flush to prevent instruction cache move-in */
preempt_disable();
flush_icache_page_asm(__pa(vaddr), vaddr);
preempt_enable();
}
void kunmap_flush_on_unmap(const void *addr)
{
flush_kernel_dcache_page_addr(addr);
}
EXPORT_SYMBOL(kunmap_flush_on_unmap);
void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
unsigned int nr)
{
@@ -375,13 +397,16 @@ void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
for (;;) {
flush_kernel_dcache_page_addr(kaddr);
flush_kernel_icache_page(kaddr);
flush_kernel_icache_page_addr(kaddr);
if (--nr == 0)
break;
kaddr += PAGE_SIZE;
}
}
/*
* Walk page directory for MM to find PTEP pointer for address ADDR.
*/
static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
{
pte_t *ptep = NULL;
@@ -410,6 +435,41 @@ static inline bool pte_needs_flush(pte_t pte)
== (_PAGE_PRESENT | _PAGE_ACCESSED);
}
/*
* Return user physical address. Returns 0 if page is not present.
*/
static inline unsigned long get_upa(struct mm_struct *mm, unsigned long addr)
{
unsigned long flags, space, pgd, prot, pa;
#ifdef CONFIG_TLB_PTLOCK
unsigned long pgd_lock;
#endif
/* Save context */
local_irq_save(flags);
prot = mfctl(8);
space = mfsp(SR_USER);
pgd = mfctl(25);
#ifdef CONFIG_TLB_PTLOCK
pgd_lock = mfctl(28);
#endif
/* Set context for lpa_user */
switch_mm_irqs_off(NULL, mm, NULL);
pa = lpa_user(addr);
/* Restore previous context */
#ifdef CONFIG_TLB_PTLOCK
mtctl(pgd_lock, 28);
#endif
mtctl(pgd, 25);
mtsp(space, SR_USER);
mtctl(prot, 8);
local_irq_restore(flags);
return pa;
}
void flush_dcache_folio(struct folio *folio)
{
struct address_space *mapping = folio_flush_mapping(folio);
@@ -458,50 +518,23 @@ void flush_dcache_folio(struct folio *folio)
if (addr + nr * PAGE_SIZE > vma->vm_end)
nr = (vma->vm_end - addr) / PAGE_SIZE;
if (parisc_requires_coherency()) {
for (i = 0; i < nr; i++) {
pte_t *ptep = get_ptep(vma->vm_mm,
addr + i * PAGE_SIZE);
if (!ptep)
continue;
if (pte_needs_flush(*ptep))
flush_user_cache_page(vma,
addr + i * PAGE_SIZE);
/* Optimise accesses to the same table? */
pte_unmap(ptep);
}
} else {
/*
* The TLB is the engine of coherence on parisc:
* The CPU is entitled to speculate any page
* with a TLB mapping, so here we kill the
* mapping then flush the page along a special
* flush only alias mapping. This guarantees that
* the page is no-longer in the cache for any
* process and nor may it be speculatively read
* in (until the user or kernel specifically
* accesses it, of course)
*/
for (i = 0; i < nr; i++)
flush_tlb_page(vma, addr + i * PAGE_SIZE);
if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
!= (addr & (SHM_COLOUR - 1))) {
for (i = 0; i < nr; i++)
__flush_cache_page(vma,
addr + i * PAGE_SIZE,
(pfn + i) * PAGE_SIZE);
/*
* Software is allowed to have any number
* of private mappings to a page.
*/
if (!(vma->vm_flags & VM_SHARED))
continue;
if (old_addr)
pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
old_addr, addr, vma->vm_file);
if (nr == folio_nr_pages(folio))
old_addr = addr;
}
for (i = 0; i < nr; i++)
__flush_cache_page(vma,
addr + i * PAGE_SIZE,
(pfn + i) * PAGE_SIZE);
/*
* Software is allowed to have any number
* of private mappings to a page.
*/
if (!(vma->vm_flags & VM_SHARED))
continue;
if (old_addr)
pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
old_addr, addr, vma->vm_file);
if (nr == folio_nr_pages(folio))
old_addr = addr;
}
WARN_ON(++count == 4096);
}
@@ -591,35 +624,28 @@ extern void purge_kernel_dcache_page_asm(unsigned long);
extern void clear_user_page_asm(void *, unsigned long);
extern void copy_user_page_asm(void *, void *, unsigned long);
void flush_kernel_dcache_page_addr(const void *addr)
{
unsigned long flags;
flush_kernel_dcache_page_asm(addr);
purge_tlb_start(flags);
pdtlb(SR_KERNEL, addr);
purge_tlb_end(flags);
}
EXPORT_SYMBOL(flush_kernel_dcache_page_addr);
static void flush_cache_page_if_present(struct vm_area_struct *vma,
unsigned long vmaddr, unsigned long pfn)
unsigned long vmaddr)
{
#if CONFIG_FLUSH_PAGE_ACCESSED
bool needs_flush = false;
pte_t *ptep;
pte_t *ptep, pte;
/*
* The pte check is racy and sometimes the flush will trigger
* a non-access TLB miss. Hopefully, the page has already been
* flushed.
*/
ptep = get_ptep(vma->vm_mm, vmaddr);
if (ptep) {
needs_flush = pte_needs_flush(*ptep);
pte = ptep_get(ptep);
needs_flush = pte_needs_flush(pte);
pte_unmap(ptep);
}
if (needs_flush)
flush_cache_page(vma, vmaddr, pfn);
__flush_cache_page(vma, vmaddr, PFN_PHYS(pte_pfn(pte)));
#else
struct mm_struct *mm = vma->vm_mm;
unsigned long physaddr = get_upa(mm, vmaddr);
if (physaddr)
__flush_cache_page(vma, vmaddr, PAGE_ALIGN_DOWN(physaddr));
#endif
}
void copy_user_highpage(struct page *to, struct page *from,
@@ -629,7 +655,7 @@ void copy_user_highpage(struct page *to, struct page *from,
kfrom = kmap_local_page(from);
kto = kmap_local_page(to);
flush_cache_page_if_present(vma, vaddr, page_to_pfn(from));
__flush_cache_page(vma, vaddr, PFN_PHYS(page_to_pfn(from)));
copy_page_asm(kto, kfrom);
kunmap_local(kto);
kunmap_local(kfrom);
@@ -638,16 +664,17 @@ void copy_user_highpage(struct page *to, struct page *from,
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long user_vaddr, void *dst, void *src, int len)
{
flush_cache_page_if_present(vma, user_vaddr, page_to_pfn(page));
__flush_cache_page(vma, user_vaddr, PFN_PHYS(page_to_pfn(page)));
memcpy(dst, src, len);
flush_kernel_dcache_range_asm((unsigned long)dst, (unsigned long)dst + len);
flush_kernel_dcache_page_addr(PTR_PAGE_ALIGN_DOWN(dst));
}
void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long user_vaddr, void *dst, void *src, int len)
{
flush_cache_page_if_present(vma, user_vaddr, page_to_pfn(page));
__flush_cache_page(vma, user_vaddr, PFN_PHYS(page_to_pfn(page)));
memcpy(dst, src, len);
flush_kernel_dcache_page_addr(PTR_PAGE_ALIGN_DOWN(src));
}
/* __flush_tlb_range()
@@ -681,32 +708,10 @@ int __flush_tlb_range(unsigned long sid, unsigned long start,
static void flush_cache_pages(struct vm_area_struct *vma, unsigned long start, unsigned long end)
{
unsigned long addr, pfn;
pte_t *ptep;
unsigned long addr;
for (addr = start; addr < end; addr += PAGE_SIZE) {
bool needs_flush = false;
/*
* The vma can contain pages that aren't present. Although
* the pte search is expensive, we need the pte to find the
* page pfn and to check whether the page should be flushed.
*/
ptep = get_ptep(vma->vm_mm, addr);
if (ptep) {
needs_flush = pte_needs_flush(*ptep);
pfn = pte_pfn(*ptep);
pte_unmap(ptep);
}
if (needs_flush) {
if (parisc_requires_coherency()) {
flush_user_cache_page(vma, addr);
} else {
if (WARN_ON(!pfn_valid(pfn)))
return;
__flush_cache_page(vma, addr, PFN_PHYS(pfn));
}
}
}
for (addr = start; addr < end; addr += PAGE_SIZE)
flush_cache_page_if_present(vma, addr);
}
static inline unsigned long mm_total_size(struct mm_struct *mm)
@@ -757,21 +762,19 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
if (WARN_ON(IS_ENABLED(CONFIG_SMP) && arch_irqs_disabled()))
return;
flush_tlb_range(vma, start, end);
flush_cache_all();
if (vma->vm_flags & VM_EXEC)
flush_cache_all();
else
flush_data_cache();
return;
}
flush_cache_pages(vma, start, end);
flush_cache_pages(vma, start & PAGE_MASK, end);
}
void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn)
{
if (WARN_ON(!pfn_valid(pfn)))
return;
if (parisc_requires_coherency())
flush_user_cache_page(vma, vmaddr);
else
__flush_cache_page(vma, vmaddr, PFN_PHYS(pfn));
__flush_cache_page(vma, vmaddr, PFN_PHYS(pfn));
}
void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
@@ -779,34 +782,133 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned lon
if (!PageAnon(page))
return;
if (parisc_requires_coherency()) {
if (vma->vm_flags & VM_SHARED)
flush_data_cache();
else
flush_user_cache_page(vma, vmaddr);
__flush_cache_page(vma, vmaddr, PFN_PHYS(page_to_pfn(page)));
}
int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr,
pte_t *ptep)
{
pte_t pte = ptep_get(ptep);
if (!pte_young(pte))
return 0;
set_pte(ptep, pte_mkold(pte));
#if CONFIG_FLUSH_PAGE_ACCESSED
__flush_cache_page(vma, addr, PFN_PHYS(pte_pfn(pte)));
#endif
return 1;
}
/*
* After a PTE is cleared, we have no way to flush the cache for
* the physical page. On PA8800 and PA8900 processors, these lines
* can cause random cache corruption. Thus, we must flush the cache
* as well as the TLB when clearing a PTE that's valid.
*/
pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr,
pte_t *ptep)
{
struct mm_struct *mm = (vma)->vm_mm;
pte_t pte = ptep_get_and_clear(mm, addr, ptep);
unsigned long pfn = pte_pfn(pte);
if (pfn_valid(pfn))
__flush_cache_page(vma, addr, PFN_PHYS(pfn));
else if (pte_accessible(mm, pte))
flush_tlb_page(vma, addr);
return pte;
}
/*
* The physical address for pages in the ioremap case can be obtained
* from the vm_struct struct. I wasn't able to successfully handle the
* vmalloc and vmap cases. We have an array of struct page pointers in
* the uninitialized vmalloc case but the flush failed using page_to_pfn.
*/
void flush_cache_vmap(unsigned long start, unsigned long end)
{
unsigned long addr, physaddr;
struct vm_struct *vm;
/* Prevent cache move-in */
flush_tlb_kernel_range(start, end);
if (end - start >= parisc_cache_flush_threshold) {
flush_cache_all();
return;
}
flush_tlb_page(vma, vmaddr);
preempt_disable();
flush_dcache_page_asm(page_to_phys(page), vmaddr);
preempt_enable();
}
if (WARN_ON_ONCE(!is_vmalloc_addr((void *)start))) {
flush_cache_all();
return;
}
vm = find_vm_area((void *)start);
if (WARN_ON_ONCE(!vm)) {
flush_cache_all();
return;
}
/* The physical addresses of IOREMAP regions are contiguous */
if (vm->flags & VM_IOREMAP) {
physaddr = vm->phys_addr;
for (addr = start; addr < end; addr += PAGE_SIZE) {
preempt_disable();
flush_dcache_page_asm(physaddr, start);
flush_icache_page_asm(physaddr, start);
preempt_enable();
physaddr += PAGE_SIZE;
}
return;
}
flush_cache_all();
}
EXPORT_SYMBOL(flush_cache_vmap);
/*
* The vm_struct has been retired and the page table is set up. The
* last page in the range is a guard page. Its physical address can't
* be determined using lpa, so there is no way to flush the range
* using flush_dcache_page_asm.
*/
void flush_cache_vunmap(unsigned long start, unsigned long end)
{
/* Prevent cache move-in */
flush_tlb_kernel_range(start, end);
flush_data_cache();
}
EXPORT_SYMBOL(flush_cache_vunmap);
/*
* On systems with PA8800/PA8900 processors, there is no way to flush
* a vmap range other than using the architected loop to flush the
* entire cache. The page directory is not set up, so we can't use
* fdc, etc. FDCE/FICE don't work to flush a portion of the cache.
* L2 is physically indexed but FDCE/FICE instructions in virtual
* mode output their virtual address on the core bus, not their
* real address. As a result, the L2 cache index formed from the
* virtual address will most likely not be the same as the L2 index
* formed from the real address.
*/
void flush_kernel_vmap_range(void *vaddr, int size)
{
unsigned long start = (unsigned long)vaddr;
unsigned long end = start + size;
if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) &&
(unsigned long)size >= parisc_cache_flush_threshold) {
flush_tlb_kernel_range(start, end);
flush_data_cache();
flush_tlb_kernel_range(start, end);
if (!static_branch_likely(&parisc_has_dcache))
return;
/* If interrupts are disabled, we can only do local flush */
if (WARN_ON(IS_ENABLED(CONFIG_SMP) && arch_irqs_disabled())) {
flush_data_cache_local(NULL);
return;
}
flush_kernel_dcache_range_asm(start, end);
flush_tlb_kernel_range(start, end);
flush_data_cache();
}
EXPORT_SYMBOL(flush_kernel_vmap_range);
@@ -818,15 +920,18 @@ void invalidate_kernel_vmap_range(void *vaddr, int size)
/* Ensure DMA is complete */
asm_syncdma();
if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) &&
(unsigned long)size >= parisc_cache_flush_threshold) {
flush_tlb_kernel_range(start, end);
flush_data_cache();
flush_tlb_kernel_range(start, end);
if (!static_branch_likely(&parisc_has_dcache))
return;
/* If interrupts are disabled, we can only do local flush */
if (WARN_ON(IS_ENABLED(CONFIG_SMP) && arch_irqs_disabled())) {
flush_data_cache_local(NULL);
return;
}
purge_kernel_dcache_range_asm(start, end);
flush_tlb_kernel_range(start, end);
flush_data_cache();
}
EXPORT_SYMBOL(invalidate_kernel_vmap_range);

View File

@@ -137,7 +137,7 @@ config PPC
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_HUGEPD if HUGETLB_PAGE
select ARCH_HAS_KCOV
select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU
select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC64 && PPC_FPU
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU

View File

@@ -92,9 +92,25 @@ __pu_failed: \
: label)
#endif
#ifdef CONFIG_CC_IS_CLANG
#define DS_FORM_CONSTRAINT "Z<>"
#else
#define DS_FORM_CONSTRAINT "YZ<>"
#endif
#ifdef __powerpc64__
#ifdef CONFIG_PPC_KERNEL_PREFIXED
#define __put_user_asm2_goto(x, ptr, label) \
__put_user_asm_goto(x, ptr, label, "std")
#else
#define __put_user_asm2_goto(x, addr, label) \
asm goto ("1: std%U1%X1 %0,%1 # put_user\n" \
EX_TABLE(1b, %l2) \
: \
: "r" (x), DS_FORM_CONSTRAINT (*addr) \
: \
: label)
#endif // CONFIG_PPC_KERNEL_PREFIXED
#else /* __powerpc64__ */
#define __put_user_asm2_goto(x, addr, label) \
asm goto( \
@@ -165,8 +181,19 @@ do { \
#endif
#ifdef __powerpc64__
#ifdef CONFIG_PPC_KERNEL_PREFIXED
#define __get_user_asm2_goto(x, addr, label) \
__get_user_asm_goto(x, addr, label, "ld")
#else
#define __get_user_asm2_goto(x, addr, label) \
asm_goto_output( \
"1: ld%U1%X1 %0, %1 # get_user\n" \
EX_TABLE(1b, %l2) \
: "=r" (x) \
: DS_FORM_CONSTRAINT (*addr) \
: \
: label)
#endif // CONFIG_PPC_KERNEL_PREFIXED
#else /* __powerpc64__ */
#define __get_user_asm2_goto(x, addr, label) \
asm_goto_output( \

View File

@@ -900,6 +900,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
/* Get offset into TMP_REG */
EMIT(PPC_RAW_LI(tmp_reg, off));
/*
* Enforce full ordering for operations with BPF_FETCH by emitting a 'sync'
* before and after the operation.
*
* This is a requirement in the Linux Kernel Memory Model.
* See __cmpxchg_u32() in asm/cmpxchg.h as an example.
*/
if ((imm & BPF_FETCH) && IS_ENABLED(CONFIG_SMP))
EMIT(PPC_RAW_SYNC());
tmp_idx = ctx->idx * 4;
/* load value from memory into r0 */
EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0));
@@ -953,6 +962,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
/* For the BPF_FETCH variant, get old data into src_reg */
if (imm & BPF_FETCH) {
/* Emit 'sync' to enforce full ordering */
if (IS_ENABLED(CONFIG_SMP))
EMIT(PPC_RAW_SYNC());
EMIT(PPC_RAW_MR(ret_reg, ax_reg));
if (!fp->aux->verifier_zext)
EMIT(PPC_RAW_LI(ret_reg - 1, 0)); /* higher 32-bit */

View File

@@ -846,6 +846,15 @@ emit_clear:
/* Get offset into TMP_REG_1 */
EMIT(PPC_RAW_LI(tmp1_reg, off));
/*
* Enforce full ordering for operations with BPF_FETCH by emitting a 'sync'
* before and after the operation.
*
* This is a requirement in the Linux Kernel Memory Model.
* See __cmpxchg_u64() in asm/cmpxchg.h as an example.
*/
if ((imm & BPF_FETCH) && IS_ENABLED(CONFIG_SMP))
EMIT(PPC_RAW_SYNC());
tmp_idx = ctx->idx * 4;
/* load value from memory into TMP_REG_2 */
if (size == BPF_DW)
@@ -908,6 +917,9 @@ emit_clear:
PPC_BCC_SHORT(COND_NE, tmp_idx);
if (imm & BPF_FETCH) {
/* Emit 'sync' to enforce full ordering */
if (IS_ENABLED(CONFIG_SMP))
EMIT(PPC_RAW_SYNC());
EMIT(PPC_RAW_MR(ret_reg, _R0));
/*
* Skip unnecessary zero-extension for 32-bit cmpxchg.

View File

@@ -371,8 +371,8 @@ static int read_dt_lpar_name(struct seq_file *m)
static void read_lpar_name(struct seq_file *m)
{
if (read_rtas_lpar_name(m) && read_dt_lpar_name(m))
pr_err_once("Error can't get the LPAR name");
if (read_rtas_lpar_name(m))
read_dt_lpar_name(m);
}
#define SPLPAR_MAXLENGTH 1026*(sizeof(char))

View File

@@ -106,7 +106,7 @@ config RISCV
select HAS_IOPORT if MMU
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT && !XIP_KERNEL
select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE if !XIP_KERNEL
select HAVE_ARCH_KASAN if MMU && 64BIT

View File

@@ -10,7 +10,7 @@
#include <asm/fence.h>
#define __arch_xchg_masked(prepend, append, r, p, n) \
#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n) \
({ \
u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3); \
ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE; \
@@ -25,7 +25,7 @@
"0: lr.w %0, %2\n" \
" and %1, %0, %z4\n" \
" or %1, %1, %z3\n" \
" sc.w %1, %1, %2\n" \
" sc.w" sc_sfx " %1, %1, %2\n" \
" bnez %1, 0b\n" \
append \
: "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \
@@ -46,7 +46,8 @@
: "memory"); \
})
#define _arch_xchg(ptr, new, sfx, prepend, append) \
#define _arch_xchg(ptr, new, sc_sfx, swap_sfx, prepend, \
sc_append, swap_append) \
({ \
__typeof__(ptr) __ptr = (ptr); \
__typeof__(*(__ptr)) __new = (new); \
@@ -55,15 +56,15 @@
switch (sizeof(*__ptr)) { \
case 1: \
case 2: \
__arch_xchg_masked(prepend, append, \
__arch_xchg_masked(sc_sfx, prepend, sc_append, \
__ret, __ptr, __new); \
break; \
case 4: \
__arch_xchg(".w" sfx, prepend, append, \
__arch_xchg(".w" swap_sfx, prepend, swap_append, \
__ret, __ptr, __new); \
break; \
case 8: \
__arch_xchg(".d" sfx, prepend, append, \
__arch_xchg(".d" swap_sfx, prepend, swap_append, \
__ret, __ptr, __new); \
break; \
default: \
@@ -73,16 +74,17 @@
})
#define arch_xchg_relaxed(ptr, x) \
_arch_xchg(ptr, x, "", "", "")
_arch_xchg(ptr, x, "", "", "", "", "")
#define arch_xchg_acquire(ptr, x) \
_arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
_arch_xchg(ptr, x, "", "", "", \
RISCV_ACQUIRE_BARRIER, RISCV_ACQUIRE_BARRIER)
#define arch_xchg_release(ptr, x) \
_arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
_arch_xchg(ptr, x, "", "", RISCV_RELEASE_BARRIER, "", "")
#define arch_xchg(ptr, x) \
_arch_xchg(ptr, x, ".aqrl", "", "")
_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER, "")
#define xchg32(ptr, x) \
({ \

View File

@@ -72,7 +72,7 @@ static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)
/* Make sure tidle is updated */
smp_mb();
bdata->task_ptr = tidle;
bdata->stack_ptr = task_stack_page(tidle) + THREAD_SIZE;
bdata->stack_ptr = task_pt_regs(tidle);
/* Make sure boot data is updated */
smp_mb();
hsm_data = __pa(bdata);

View File

@@ -34,8 +34,7 @@ static void cpu_update_secondary_bootdata(unsigned int cpuid,
/* Make sure tidle is updated */
smp_mb();
WRITE_ONCE(__cpu_spinwait_stack_pointer[hartid],
task_stack_page(tidle) + THREAD_SIZE);
WRITE_ONCE(__cpu_spinwait_stack_pointer[hartid], task_pt_regs(tidle));
WRITE_ONCE(__cpu_spinwait_task_pointer[hartid], tidle);
}

View File

@@ -237,10 +237,11 @@ static gpa_t aia_imsic_ppn(struct kvm_aia *aia, gpa_t addr)
static u32 aia_imsic_hart_index(struct kvm_aia *aia, gpa_t addr)
{
u32 hart, group = 0;
u32 hart = 0, group = 0;
hart = (addr >> (aia->nr_guest_bits + IMSIC_MMIO_PAGE_SHIFT)) &
GENMASK_ULL(aia->nr_hart_bits - 1, 0);
if (aia->nr_hart_bits)
hart = (addr >> (aia->nr_guest_bits + IMSIC_MMIO_PAGE_SHIFT)) &
GENMASK_ULL(aia->nr_hart_bits - 1, 0);
if (aia->nr_group_bits)
group = (addr >> aia->nr_group_shift) &
GENMASK_ULL(aia->nr_group_bits - 1, 0);

View File

@@ -724,9 +724,9 @@ static int kvm_riscv_vcpu_set_reg_isa_ext(struct kvm_vcpu *vcpu,
switch (reg_subtype) {
case KVM_REG_RISCV_ISA_SINGLE:
return riscv_vcpu_set_isa_ext_single(vcpu, reg_num, reg_val);
case KVM_REG_RISCV_SBI_MULTI_EN:
case KVM_REG_RISCV_ISA_MULTI_EN:
return riscv_vcpu_set_isa_ext_multi(vcpu, reg_num, reg_val, true);
case KVM_REG_RISCV_SBI_MULTI_DIS:
case KVM_REG_RISCV_ISA_MULTI_DIS:
return riscv_vcpu_set_isa_ext_multi(vcpu, reg_num, reg_val, false);
default:
return -ENOENT;

View File

@@ -293,8 +293,8 @@ void handle_page_fault(struct pt_regs *regs)
if (unlikely(access_error(cause, vma))) {
vma_end_read(vma);
count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
tsk->thread.bad_cause = SEGV_ACCERR;
bad_area_nosemaphore(regs, code, addr);
tsk->thread.bad_cause = cause;
bad_area_nosemaphore(regs, SEGV_ACCERR, addr);
return;
}

View File

@@ -250,18 +250,19 @@ static void __init setup_bootmem(void)
kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
/*
* memblock allocator is not aware of the fact that last 4K bytes of
* the addressable memory can not be mapped because of IS_ERR_VALUE
* macro. Make sure that last 4k bytes are not usable by memblock
* if end of dram is equal to maximum addressable memory. For 64-bit
* kernel, this problem can't happen here as the end of the virtual
* address space is occupied by the kernel mapping then this check must
* be done as soon as the kernel mapping base address is determined.
* Reserve physical address space that would be mapped to virtual
* addresses greater than (void *)(-PAGE_SIZE) because:
* - This memory would overlap with ERR_PTR
* - This memory belongs to high memory, which is not supported
*
* This is not applicable to 64-bit kernel, because virtual addresses
* after (void *)(-PAGE_SIZE) are not linearly mapped: they are
* occupied by kernel mapping. Also it is unrealistic for high memory
* to exist on 64-bit platforms.
*/
if (!IS_ENABLED(CONFIG_64BIT)) {
max_mapped_addr = __pa(~(ulong)0);
if (max_mapped_addr == (phys_ram_end - 1))
memblock_set_current_limit(max_mapped_addr - 4096);
max_mapped_addr = __va_to_pa_nodebug(-PAGE_SIZE);
memblock_reserve(max_mapped_addr, (phys_addr_t)-max_mapped_addr);
}
min_low_pfn = PFN_UP(phys_ram_base);

View File

@@ -384,7 +384,7 @@ static void fixup_vmlinux_info(void)
void startup_kernel(void)
{
unsigned long kernel_size = vmlinux.image_size + vmlinux.bss_size;
unsigned long nokaslr_offset_phys = mem_safe_offset();
unsigned long nokaslr_offset_phys, kaslr_large_page_offset;
unsigned long amode31_lma = 0;
unsigned long max_physmem_end;
unsigned long asce_limit;
@@ -393,6 +393,12 @@ void startup_kernel(void)
fixup_vmlinux_info();
setup_lpp();
/*
* Non-randomized kernel physical start address must be _SEGMENT_SIZE
* aligned (see blow).
*/
nokaslr_offset_phys = ALIGN(mem_safe_offset(), _SEGMENT_SIZE);
safe_addr = PAGE_ALIGN(nokaslr_offset_phys + kernel_size);
/*
@@ -425,10 +431,25 @@ void startup_kernel(void)
save_ipl_cert_comp_list();
rescue_initrd(safe_addr, ident_map_size);
if (kaslr_enabled())
__kaslr_offset_phys = randomize_within_range(kernel_size, THREAD_SIZE, 0, ident_map_size);
/*
* __kaslr_offset_phys must be _SEGMENT_SIZE aligned, so the lower
* 20 bits (the offset within a large page) are zero. Copy the last
* 20 bits of __kaslr_offset, which is THREAD_SIZE aligned, to
* __kaslr_offset_phys.
*
* With this the last 20 bits of __kaslr_offset_phys and __kaslr_offset
* are identical, which is required to allow for large mappings of the
* kernel image.
*/
kaslr_large_page_offset = __kaslr_offset & ~_SEGMENT_MASK;
if (kaslr_enabled()) {
unsigned long end = ident_map_size - kaslr_large_page_offset;
__kaslr_offset_phys = randomize_within_range(kernel_size, _SEGMENT_SIZE, 0, end);
}
if (!__kaslr_offset_phys)
__kaslr_offset_phys = nokaslr_offset_phys;
__kaslr_offset_phys |= kaslr_large_page_offset;
kaslr_adjust_vmlinux_info(__kaslr_offset_phys);
physmem_reserve(RR_VMLINUX, __kaslr_offset_phys, kernel_size);
deploy_kernel((void *)__kaslr_offset_phys);

View File

@@ -261,21 +261,27 @@ static unsigned long _pa(unsigned long addr, unsigned long size, enum populate_m
static bool large_allowed(enum populate_mode mode)
{
return (mode == POPULATE_DIRECT) || (mode == POPULATE_IDENTITY);
return (mode == POPULATE_DIRECT) || (mode == POPULATE_IDENTITY) || (mode == POPULATE_KERNEL);
}
static bool can_large_pud(pud_t *pu_dir, unsigned long addr, unsigned long end,
enum populate_mode mode)
{
unsigned long size = end - addr;
return machine.has_edat2 && large_allowed(mode) &&
IS_ALIGNED(addr, PUD_SIZE) && (end - addr) >= PUD_SIZE;
IS_ALIGNED(addr, PUD_SIZE) && (size >= PUD_SIZE) &&
IS_ALIGNED(_pa(addr, size, mode), PUD_SIZE);
}
static bool can_large_pmd(pmd_t *pm_dir, unsigned long addr, unsigned long end,
enum populate_mode mode)
{
unsigned long size = end - addr;
return machine.has_edat1 && large_allowed(mode) &&
IS_ALIGNED(addr, PMD_SIZE) && (end - addr) >= PMD_SIZE;
IS_ALIGNED(addr, PMD_SIZE) && (size >= PMD_SIZE) &&
IS_ALIGNED(_pa(addr, size, mode), PMD_SIZE);
}
static void pgtable_pte_populate(pmd_t *pmd, unsigned long addr, unsigned long end,

View File

@@ -109,6 +109,7 @@ SECTIONS
#ifdef CONFIG_KERNEL_UNCOMPRESSED
. = ALIGN(PAGE_SIZE);
. += AMODE31_SIZE; /* .amode31 section */
. = ALIGN(1 << 20); /* _SEGMENT_SIZE */
#else
. = ALIGN(8);
#endif

View File

@@ -43,7 +43,6 @@ CONFIG_PROFILING=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_KEXEC_SIG=y
CONFIG_CRASH_DUMP=y
CONFIG_LIVEPATCH=y
CONFIG_MARCH_Z13=y
CONFIG_NR_CPUS=512
@@ -51,6 +50,7 @@ CONFIG_NUMA=y
CONFIG_HZ_100=y
CONFIG_CERT_STORE=y
CONFIG_EXPOLINE=y
# CONFIG_EXPOLINE_EXTERN is not set
CONFIG_EXPOLINE_AUTO=y
CONFIG_CHSC_SCH=y
CONFIG_VFIO_CCW=m
@@ -76,6 +76,7 @@ CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODULE_UNLOAD_TAINT_TRACKING=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG_SHA256=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y
CONFIG_BLK_CGROUP_IOLATENCY=y
@@ -100,7 +101,6 @@ CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_KSM=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_CMA_DEBUG=y
CONFIG_CMA_DEBUGFS=y
CONFIG_CMA_SYSFS=y
CONFIG_CMA_AREAS=7
@@ -119,6 +119,7 @@ CONFIG_UNIX_DIAG=m
CONFIG_XFRM_USER=m
CONFIG_NET_KEY=m
CONFIG_SMC_DIAG=m
CONFIG_SMC_LO=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
@@ -133,7 +134,6 @@ CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
@@ -167,6 +167,7 @@ CONFIG_BRIDGE_NETFILTER=m
CONFIG_NETFILTER_NETLINK_HOOK=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
@@ -183,17 +184,39 @@ CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
CONFIG_NFT_TUNNEL=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
CONFIG_NETFILTER_XTABLES_COMPAT=y
CONFIG_NFT_XFRM=m
CONFIG_NFT_SOCKET=m
CONFIG_NFT_OSF=m
CONFIG_NFT_TPROXY=m
CONFIG_NFT_SYNPROXY=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
CONFIG_NFT_REJECT_NETDEV=m
CONFIG_NF_FLOW_TABLE_INET=m
CONFIG_NF_FLOW_TABLE=m
CONFIG_NF_FLOW_TABLE_PROCFS=y
CONFIG_NETFILTER_XT_SET=m
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
@@ -206,8 +229,10 @@ CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
@@ -216,6 +241,7 @@ CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
@@ -230,6 +256,7 @@ CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_IPCOMP=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
@@ -247,6 +274,7 @@ CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
@@ -302,7 +330,6 @@ CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NFT_FIB_IPV6=m
@@ -373,7 +400,6 @@ CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
@@ -462,6 +488,7 @@ CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
CONFIG_DM_SWITCH=m
CONFIG_DM_INTEGRITY=m
CONFIG_DM_VDO=m
CONFIG_NETDEVICES=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
@@ -574,7 +601,6 @@ CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_SOFT_WATCHDOG=m
CONFIG_DIAG288_WATCHDOG=m
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set
CONFIG_FB=y
# CONFIG_FB_DEVICE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
@@ -645,7 +671,6 @@ CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_EXFAT_FS=m
CONFIG_NTFS_FS=m
CONFIG_NTFS_RW=y
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
@@ -663,6 +688,7 @@ CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_ZSTD=y
CONFIG_ROMFS_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y
@@ -879,6 +905,5 @@ CONFIG_RBTREE_TEST=y
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_STRING_SELFTEST=y
CONFIG_TEST_BITOPS=m
CONFIG_TEST_BPF=m

View File

@@ -41,7 +41,6 @@ CONFIG_PROFILING=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_KEXEC_SIG=y
CONFIG_CRASH_DUMP=y
CONFIG_LIVEPATCH=y
CONFIG_MARCH_Z13=y
CONFIG_NR_CPUS=512
@@ -49,6 +48,7 @@ CONFIG_NUMA=y
CONFIG_HZ_100=y
CONFIG_CERT_STORE=y
CONFIG_EXPOLINE=y
# CONFIG_EXPOLINE_EXTERN is not set
CONFIG_EXPOLINE_AUTO=y
CONFIG_CHSC_SCH=y
CONFIG_VFIO_CCW=m
@@ -71,6 +71,7 @@ CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODULE_UNLOAD_TAINT_TRACKING=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG_SHA256=y
CONFIG_BLK_DEV_THROTTLING=y
CONFIG_BLK_WBT=y
CONFIG_BLK_CGROUP_IOLATENCY=y
@@ -110,6 +111,7 @@ CONFIG_UNIX_DIAG=m
CONFIG_XFRM_USER=m
CONFIG_NET_KEY=m
CONFIG_SMC_DIAG=m
CONFIG_SMC_LO=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
@@ -124,7 +126,6 @@ CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
@@ -158,6 +159,7 @@ CONFIG_BRIDGE_NETFILTER=m
CONFIG_NETFILTER_NETLINK_HOOK=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
@@ -174,17 +176,39 @@ CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
CONFIG_NFT_TUNNEL=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
CONFIG_NETFILTER_XTABLES_COMPAT=y
CONFIG_NFT_XFRM=m
CONFIG_NFT_SOCKET=m
CONFIG_NFT_OSF=m
CONFIG_NFT_TPROXY=m
CONFIG_NFT_SYNPROXY=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
CONFIG_NFT_REJECT_NETDEV=m
CONFIG_NF_FLOW_TABLE_INET=m
CONFIG_NF_FLOW_TABLE=m
CONFIG_NF_FLOW_TABLE_PROCFS=y
CONFIG_NETFILTER_XT_SET=m
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
@@ -197,8 +221,10 @@ CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
@@ -207,6 +233,7 @@ CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
@@ -221,6 +248,7 @@ CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_IPCOMP=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
@@ -238,6 +266,7 @@ CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
@@ -293,7 +322,6 @@ CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NFT_FIB_IPV6=m
@@ -363,7 +391,6 @@ CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
@@ -452,6 +479,7 @@ CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
CONFIG_DM_SWITCH=m
CONFIG_DM_INTEGRITY=m
CONFIG_DM_VDO=m
CONFIG_NETDEVICES=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
@@ -630,7 +658,6 @@ CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_EXFAT_FS=m
CONFIG_NTFS_FS=m
CONFIG_NTFS_RW=y
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
@@ -649,6 +676,7 @@ CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_ZSTD=y
CONFIG_ROMFS_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y

View File

@@ -9,25 +9,22 @@ CONFIG_BPF_SYSCALL=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_MARCH_Z13=y
CONFIG_NR_CPUS=2
CONFIG_HZ_100=y
# CONFIG_CHSC_SCH is not set
# CONFIG_SCM_BUS is not set
# CONFIG_AP is not set
# CONFIG_PFAULT is not set
# CONFIG_S390_HYPFS is not set
# CONFIG_VIRTUALIZATION is not set
# CONFIG_S390_GUEST is not set
# CONFIG_SECCOMP is not set
# CONFIG_GCC_PLUGINS is not set
# CONFIG_BLOCK_LEGACY_AUTOLOAD is not set
CONFIG_PARTITION_ADVANCED=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_SWAP is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_COMPACTION is not set
# CONFIG_MIGRATION is not set
CONFIG_NET=y
# CONFIG_IUCV is not set
# CONFIG_PCPU_DEV_REFCNT is not set

View File

@@ -451,7 +451,7 @@ static void *nt_final(void *ptr)
/*
* Initialize ELF header (new kernel)
*/
static void *ehdr_init(Elf64_Ehdr *ehdr, int mem_chunk_cnt)
static void *ehdr_init(Elf64_Ehdr *ehdr, int phdr_count)
{
memset(ehdr, 0, sizeof(*ehdr));
memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
@@ -465,11 +465,8 @@ static void *ehdr_init(Elf64_Ehdr *ehdr, int mem_chunk_cnt)
ehdr->e_phoff = sizeof(Elf64_Ehdr);
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
/*
* Number of memory chunk PT_LOAD program headers plus one kernel
* image PT_LOAD program header plus one PT_NOTE program header.
*/
ehdr->e_phnum = mem_chunk_cnt + 1 + 1;
/* Number of PT_LOAD program headers plus PT_NOTE program header */
ehdr->e_phnum = phdr_count + 1;
return ehdr + 1;
}
@@ -503,12 +500,14 @@ static int get_mem_chunk_cnt(void)
/*
* Initialize ELF loads (new kernel)
*/
static void loads_init(Elf64_Phdr *phdr)
static void loads_init(Elf64_Phdr *phdr, bool os_info_has_vm)
{
unsigned long old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE);
unsigned long old_identity_base = 0;
phys_addr_t start, end;
u64 idx;
if (os_info_has_vm)
old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE);
for_each_physmem_range(idx, &oldmem_type, &start, &end) {
phdr->p_type = PT_LOAD;
phdr->p_vaddr = old_identity_base + start;
@@ -522,6 +521,11 @@ static void loads_init(Elf64_Phdr *phdr)
}
}
static bool os_info_has_vm(void)
{
return os_info_old_value(OS_INFO_KASLR_OFFSET);
}
/*
* Prepare PT_LOAD type program header for kernel image region
*/
@@ -566,7 +570,7 @@ static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
return ptr;
}
static size_t get_elfcorehdr_size(int mem_chunk_cnt)
static size_t get_elfcorehdr_size(int phdr_count)
{
size_t size;
@@ -581,10 +585,8 @@ static size_t get_elfcorehdr_size(int mem_chunk_cnt)
size += nt_vmcoreinfo_size();
/* nt_final */
size += sizeof(Elf64_Nhdr);
/* PT_LOAD type program header for kernel text region */
size += sizeof(Elf64_Phdr);
/* PT_LOADS */
size += mem_chunk_cnt * sizeof(Elf64_Phdr);
size += phdr_count * sizeof(Elf64_Phdr);
return size;
}
@@ -595,8 +597,8 @@ static size_t get_elfcorehdr_size(int mem_chunk_cnt)
int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
{
Elf64_Phdr *phdr_notes, *phdr_loads, *phdr_text;
int mem_chunk_cnt, phdr_text_cnt;
size_t alloc_size;
int mem_chunk_cnt;
void *ptr, *hdr;
u64 hdr_off;
@@ -615,12 +617,14 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
}
mem_chunk_cnt = get_mem_chunk_cnt();
phdr_text_cnt = os_info_has_vm() ? 1 : 0;
alloc_size = get_elfcorehdr_size(mem_chunk_cnt);
alloc_size = get_elfcorehdr_size(mem_chunk_cnt + phdr_text_cnt);
hdr = kzalloc(alloc_size, GFP_KERNEL);
/* Without elfcorehdr /proc/vmcore cannot be created. Thus creating
/*
* Without elfcorehdr /proc/vmcore cannot be created. Thus creating
* a dump with this crash kernel will fail. Panic now to allow other
* dump mechanisms to take over.
*/
@@ -628,21 +632,23 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
panic("s390 kdump allocating elfcorehdr failed");
/* Init elf header */
ptr = ehdr_init(hdr, mem_chunk_cnt);
phdr_notes = ehdr_init(hdr, mem_chunk_cnt + phdr_text_cnt);
/* Init program headers */
phdr_notes = ptr;
ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr));
phdr_text = ptr;
ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr));
phdr_loads = ptr;
ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr) * mem_chunk_cnt);
if (phdr_text_cnt) {
phdr_text = phdr_notes + 1;
phdr_loads = phdr_text + 1;
} else {
phdr_loads = phdr_notes + 1;
}
ptr = PTR_ADD(phdr_loads, sizeof(Elf64_Phdr) * mem_chunk_cnt);
/* Init notes */
hdr_off = PTR_DIFF(ptr, hdr);
ptr = notes_init(phdr_notes, ptr, ((unsigned long) hdr) + hdr_off);
/* Init kernel text program header */
text_init(phdr_text);
if (phdr_text_cnt)
text_init(phdr_text);
/* Init loads */
loads_init(phdr_loads);
loads_init(phdr_loads, phdr_text_cnt);
/* Finalize program headers */
hdr_off = PTR_DIFF(ptr, hdr);
*addr = (unsigned long long) hdr;

View File

@@ -105,9 +105,9 @@ vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o
vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
vmlinux-libs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
$(obj)/vmlinux: $(vmlinux-objs-y) FORCE
$(obj)/vmlinux: $(vmlinux-objs-y) $(vmlinux-libs-y) FORCE
$(call if_changed,ld)
OBJCOPYFLAGS_vmlinux.bin := -R .comment -S

View File

@@ -114,6 +114,7 @@
#include "../perf_event.h"
#include "../probe.h"
MODULE_DESCRIPTION("Support for Intel cstate performance events");
MODULE_LICENSE("GPL");
#define DEFINE_CSTATE_FORMAT_ATTR(_var, _name, _format) \

View File

@@ -34,6 +34,7 @@ static struct event_constraint uncore_constraint_fixed =
struct event_constraint uncore_constraint_empty =
EVENT_CONSTRAINT(0, 0, 0);
MODULE_DESCRIPTION("Support for Intel uncore performance events");
MODULE_LICENSE("GPL");
int uncore_pcibus_to_dieid(struct pci_bus *bus)

View File

@@ -64,6 +64,7 @@
#include "perf_event.h"
#include "probe.h"
MODULE_DESCRIPTION("Support Intel/AMD RAPL energy consumption counters");
MODULE_LICENSE("GPL");
/*

View File

@@ -2154,6 +2154,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
u64 addr, unsigned long roots);

View File

@@ -78,10 +78,10 @@ extern int __get_user_bad(void);
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
__chk_user_ptr(ptr); \
asm volatile("call __" #fn "_%c4" \
asm volatile("call __" #fn "_%c[size]" \
: "=a" (__ret_gu), "=r" (__val_gu), \
ASM_CALL_CONSTRAINT \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
: "0" (ptr), [size] "i" (sizeof(*(ptr)))); \
instrument_get_user(__val_gu); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \

View File

@@ -77,7 +77,7 @@
#define VMX_FEATURE_ENCLS_EXITING ( 2*32+ 15) /* "" VM-Exit on ENCLS (leaf dependent) */
#define VMX_FEATURE_RDSEED_EXITING ( 2*32+ 16) /* "" VM-Exit on RDSEED */
#define VMX_FEATURE_PAGE_MOD_LOGGING ( 2*32+ 17) /* "pml" Log dirty pages into buffer */
#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */
#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* Conditionally reflect EPT violations as #VE exceptions */
#define VMX_FEATURE_PT_CONCEAL_VMX ( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */
#define VMX_FEATURE_XSAVES ( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */
#define VMX_FEATURE_MODE_BASED_EPT_EXEC ( 2*32+ 22) /* "ept_mode_based_exec" Enable separate EPT EXEC bits for supervisor vs. user */

View File

@@ -215,7 +215,14 @@ out:
int amd_smn_read(u16 node, u32 address, u32 *value)
{
return __amd_smn_rw(node, address, value, false);
int err = __amd_smn_rw(node, address, value, false);
if (PCI_POSSIBLE_ERROR(*value)) {
err = -ENODEV;
*value = 0;
}
return err;
}
EXPORT_SYMBOL_GPL(amd_smn_read);

View File

@@ -345,6 +345,7 @@ static DECLARE_WORK(disable_freq_invariance_work,
disable_freq_invariance_workfn);
DEFINE_PER_CPU(unsigned long, arch_freq_scale) = SCHED_CAPACITY_SCALE;
EXPORT_PER_CPU_SYMBOL_GPL(arch_freq_scale);
static void scale_freq_tick(u64 acnt, u64 mcnt)
{

View File

@@ -1075,6 +1075,10 @@ void get_cpu_address_sizes(struct cpuinfo_x86 *c)
c->x86_virt_bits = (eax >> 8) & 0xff;
c->x86_phys_bits = eax & 0xff;
/* Provide a sane default if not enumerated: */
if (!c->x86_clflush_size)
c->x86_clflush_size = 32;
}
c->x86_cache_bits = c->x86_phys_bits;
@@ -1585,6 +1589,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
if (have_cpuid_p()) {
cpu_detect(c);
get_cpu_vendor(c);
intel_unlock_cpuid_leafs(c);
get_cpu_cap(c);
setup_force_cpu_cap(X86_FEATURE_CPUID);
get_cpu_address_sizes(c);
@@ -1744,7 +1749,7 @@ static void generic_identify(struct cpuinfo_x86 *c)
cpu_detect(c);
get_cpu_vendor(c);
intel_unlock_cpuid_leafs(c);
get_cpu_cap(c);
get_cpu_address_sizes(c);

View File

@@ -61,9 +61,11 @@ extern __ro_after_init enum tsx_ctrl_states tsx_ctrl_state;
extern void __init tsx_init(void);
void tsx_ap_init(void);
void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c);
#else
static inline void tsx_init(void) { }
static inline void tsx_ap_init(void) { }
static inline void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) { }
#endif /* CONFIG_CPU_SUP_INTEL */
extern void init_spectral_chicken(struct cpuinfo_x86 *c);

View File

@@ -269,19 +269,26 @@ detect_keyid_bits:
c->x86_phys_bits -= keyid_bits;
}
void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return;
if (c->x86 < 6 || (c->x86 == 6 && c->x86_model < 0xd))
return;
/*
* The BIOS can have limited CPUID to leaf 2, which breaks feature
* enumeration. Unlock it and update the maximum leaf info.
*/
if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0)
c->cpuid_level = cpuid_eax(0);
}
static void early_init_intel(struct cpuinfo_x86 *c)
{
u64 misc_enable;
/* Unmask CPUID levels if masked: */
if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) {
c->cpuid_level = cpuid_eax(0);
get_cpu_cap(c);
}
}
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e))
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);

View File

@@ -84,9 +84,9 @@ static bool parse_8000_001e(struct topo_scan *tscan, bool has_topoext)
/*
* If leaf 0xb is available, then the domain shifts are set
* already and nothing to do here.
* already and nothing to do here. Only valid for family >= 0x17.
*/
if (!has_topoext) {
if (!has_topoext && tscan->c->x86 >= 0x17) {
/*
* Leaf 0x80000008 set the CORE domain shift already.
* Update the SMT domain, but do not propagate it.

Some files were not shown because too many files have changed in this diff Show More