Commit Graph

1296460 Commits

Author SHA1 Message Date
Arnd Bergmann
0b07feb910 Merge tag 'memory-controller-drv-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers
Memory controller drivers for v6.11

Make the Freescale IFC driver selectable because it is used now by two
drivers: Freescale NAND and generic NOR flash.  The patches adjusting
defconfig are waiting on the mailing lists to be picked up.

* tag 'memory-controller-drv-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
  dt-bindings: memory: fsl: replace maintainer
  memory: fsl_ifc: Make FSL_IFC config visible and selectable

Link: https://lore.kernel.org/r/20240702070212.8291-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:15:36 +02:00
Arnd Bergmann
097d4193b1 Merge tag 'sunxi-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers
Allwinner SoC driver changes for 6.11

- DT binding addition of regulator node under SRAM node
- Cleanup of unused list in SRAM driver

* tag 'sunxi-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  dt-bindings: sram: sunxi-sram: Add regulators child
  soc: sunxi: sram: Remove unused list 'claimed_sram'

Link: https://lore.kernel.org/r/ZoQZguQ6taJziJ11@wens.tw
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:13:51 +02:00
Arnd Bergmann
2bb1acc608 Merge tag 'reset-for-v6.11-2' of git://git.pengutronix.de/pza/linux into soc/drivers
Reset controller updates for v6.11, part 2

This tag adds USB VBUS regulator control for Renesas RZ/G2L SoCs,
which also touches PHY driver and device tree, and pulls in a new
regulator_hardware_enable() helper.

The Tegra BPMP reset driver can be compiled under COMPILE_TEST now.

* tag 'reset-for-v6.11-2' of git://git.pengutronix.de/pza/linux:
  arm64: dts: renesas: rz-smarc: Replace fixed regulator for USB VBUS
  phy: renesas: phy-rcar-gen3-usb2: Control VBUS for RZ/G2L SoCs
  reset: renesas: Add USB VBUS regulator device as child
  dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document USB VBUS regulator
  reset: tegra-bpmp: allow building under COMPILE_TEST
  regulator: core: Add helper for allow HW access to enable/disable regulator

Link: https://lore.kernel.org/r/20240703100809.2773890-1-p.zabel@pengutronix.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:12:55 +02:00
Arnd Bergmann
ee22fbd705 Merge tag 'qcom-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers
Qualcomm driver updates for v6.11

Support for Shared Memory (shm) Bridge is added, which provides a
stricter interface for handling of buffers passed to TrustZone.

The X1Elite platform is added to uefisecapp allow list, to instantiate
the efivars implementation.

A new in-kernel implementation of the pd-mapper (or servreg) service is
introduced, to replace the userspace dependency for USB Type-C and
battery management.

Support for sharing interrupts across multiple bwmon instances is added,
and a refcount imbalance issue is corrected.

The LLCC support for recent platforms is corrected, and SA8775P support
is added.

A new interface is added to SMEM, to expose "feature codes". One example
of the usecase for this is to indicate to the GPU driver which
frequencies are available on the given device.

The interrupt consumer and provider side of SMP2P is updated to provide
more useful names in interrupt stats.

Support for using the mailbox binding and driver for outgoing IPC
interrupt in the SMSM driver is introduced.

socinfo driver learns about SDM670 and IPQ5321, as well as get some
updates to the X1E PMICs.

pmic_glink is bumped to now support managing 3 USB Type-C ports.

* tag 'qcom-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (48 commits)
  soc: qcom: smp2p: Use devname for interrupt descriptions
  soc: qcom: smsm: Add missing mailbox dependency to Kconfig
  soc: qcom: add missing pd-mapper dependencies
  soc: qcom: icc-bwmon: Allow for interrupts to be shared across instances
  dt-bindings: interconnect: qcom,msm8998-bwmon: Add X1E80100 BWMON instances
  dt-bindings: interconnect: qcom,msm8998-bwmon: Remove opp-table from the required list
  firmware: qcom: tzmem: export devm_qcom_tzmem_pool_new()
  soc: qcom: add pd-mapper implementation
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: protect locator_addr with the main mutex
  firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image()
  firmware: qcom: scm: add support for SHM bridge memory carveout
  firmware: qcom: tzmem: enable SHM Bridge support
  firmware: qcom: scm: add support for SHM bridge operations
  firmware: qcom: qseecom: convert to using the TZ allocator
  firmware: qcom: scm: make qcom_scm_qseecom_app_get_id() use the TZ allocator
  firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator
  firmware: qcom: scm: make qcom_scm_ice_set_key() use the TZ allocator
  firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator
  ...

Link: https://lore.kernel.org/r/20240705034410.13968-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:09:10 +02:00
Pascal Paillet
e9a316affb arm64: stm32: enable scmi regulator for stm32
Add SCMI ARM REGULATOR configuration for stm32.

Signed-off-by: Pascal Paillet <p.paillet@foss.st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240705134554.2833835-1-alexandre.torgue@foss.st.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:08:44 +02:00
Arnd Bergmann
9c148cb47a Merge tag 'ti-driver-soc-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/drivers
TI SoC driver updates for v6.11

- Update TISCI protocol URL link  which was dead
- socinfo: Add j721E SR 2.0 detection support
- MAINTAINER list additions: ti,pruss.yaml and ti,j721e-system-controller.yaml
- pm33xx: log statement improvement
- knav_qmss: minor data structure optimization

* tag 'ti-driver-soc-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux:
  dt-bindings: soc: ti: Move ti,j721e-system-controller.yaml to soc/ti
  MAINTAINERS: Add entry for ti,pruss.yaml to TI KEYSTONE MULTICORE NAVIGATOR DRIVERS
  soc: ti: k3-socinfo: Add J721E SR2.0
  soc: ti: knav_qmss: Constify struct knav_range_ops
  firmware: ti_sci: fix TISCI protocol URL link
  dt-bindings: ti: fix TISCI protocol URL link
  soc: ti: pm33xx: Fix missing newlines in log statements

Link: https://lore.kernel.org/r/20240705151449.s4rngkehjn73favn@stream
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:07:33 +02:00
James Chapman
f8ad00f3fb l2tp: fix possible UAF when cleaning up tunnels
syzbot reported a UAF caused by a race when the L2TP work queue closes a
tunnel at the same time as a userspace thread closes a session in that
tunnel.

Tunnel cleanup is handled by a work queue which iterates through the
sessions contained within a tunnel, and closes them in turn.

Meanwhile, a userspace thread may arbitrarily close a session via
either netlink command or by closing the pppox socket in the case of
l2tp_ppp.

The race condition may occur when l2tp_tunnel_closeall walks the list
of sessions in the tunnel and deletes each one.  Currently this is
implemented using list_for_each_safe, but because the list spinlock is
dropped in the loop body it's possible for other threads to manipulate
the list during list_for_each_safe's list walk.  This can lead to the
list iterator being corrupted, leading to list_for_each_safe spinning.
One sequence of events which may lead to this is as follows:

 * A tunnel is created, containing two sessions A and B.
 * A thread closes the tunnel, triggering tunnel cleanup via the work
   queue.
 * l2tp_tunnel_closeall runs in the context of the work queue.  It
   removes session A from the tunnel session list, then drops the list
   lock.  At this point the list_for_each_safe temporary variable is
   pointing to the other session on the list, which is session B, and
   the list can be manipulated by other threads since the list lock has
   been released.
 * Userspace closes session B, which removes the session from its parent
   tunnel via l2tp_session_delete.  Since l2tp_tunnel_closeall has
   released the tunnel list lock, l2tp_session_delete is able to call
   list_del_init on the session B list node.
 * Back on the work queue, l2tp_tunnel_closeall resumes execution and
   will now spin forever on the same list entry until the underlying
   session structure is freed, at which point UAF occurs.

The solution is to iterate over the tunnel's session list using
list_first_entry_not_null to avoid the possibility of the list
iterator pointing at a list item which may be removed during the walk.

Also, have l2tp_tunnel_closeall ref each session while it processes it
to prevent another thread from freeing it.

	cpu1				cpu2
	---				---
					pppol2tp_release()

	spin_lock_bh(&tunnel->list_lock);
	for (;;) {
		session = list_first_entry_or_null(&tunnel->session_list,
						   struct l2tp_session, list);
		if (!session)
			break;
		list_del_init(&session->list);
		spin_unlock_bh(&tunnel->list_lock);

 					l2tp_session_delete(session);

		l2tp_session_delete(session);
		spin_lock_bh(&tunnel->list_lock);
	}
	spin_unlock_bh(&tunnel->list_lock);

Calling l2tp_session_delete on the same session twice isn't a problem
per-se, but if cpu2 manages to destruct the socket and unref the
session to zero before cpu1 progresses then it would lead to UAF.

Reported-by: syzbot+b471b7c936301a59745b@syzkaller.appspotmail.com
Reported-by: syzbot+c041b4ce3a6dfd1e63e2@syzkaller.appspotmail.com
Fixes: d18d3f0a24 ("l2tp: replace hlist with simple list for per-tunnel session list")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://patch.msgid.link/20240704152508.1923908-1-jchapman@katalix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-09 11:06:21 +02:00
Arnd Bergmann
9f51147489 Merge tag 'riscv-firmware-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers
RISC-V firmware drivers for v6.11

Microchip:
Support for writing "bitstream info" to the flash using the auto-update
driver. At this point the "bitstream info" is a glorified dtbo wrapper,
but there's plans to add more info there in the future. Additionally,
rework some allocations in the driver and use scope-based cleanup on
them.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'riscv-firmware-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  firmware: microchip: use scope-based cleanup where possible
  firmware: microchip: move buffer allocation into mpfs_auto_update_set_image_address()
  firmware: microchip: support writing bitstream info to flash

Link: https://lore.kernel.org/r/20240707-lukewarm-film-8a9da40a1c27@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 11:05:26 +02:00
Arnd Bergmann
9e6b815593 Merge tag 'riscv-cache-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers
RISC-V cache drivers for v6.11

StarFive:
A new driver for the cache controller on the jh8100, which didn't
implement Zicbom and thus needs an implementation of non-standard cache
management operations.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'riscv-cache-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  MAINTAINERS: add microchip soc binding directory to microchip soc driver entry
  MAINTAINERS: add cache binding directory to cache driver entry
  cache: Add StarFive StarLink cache management
  dt-bindings: cache: Add docs for StarFive Starlink cache controller

Link: https://lore.kernel.org/r/20240707-whoever-undesired-c5f6e96ae403@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 10:53:48 +02:00
Arnd Bergmann
95ab7b209b Merge tag 'riscv-sophgo-dt-for-v6.11' of https://github.com/sophgo/linux into soc/dt
RISC-V Devicetrees for v6.11

Sopgho:
Add clock support for SG2042.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>

* tag 'riscv-sophgo-dt-for-v6.11' of https://github.com/sophgo/linux:
  riscv: dts: add clock generator for Sophgo SG2042 SoC

Link: https://lore.kernel.org/r/PN1P287MB281861EA2B1706B430D2FA3EFEDB2@PN1P287MB2818.INDP287.PROD.OUTLOOK.COM
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 10:50:42 +02:00
Arnd Bergmann
ad828298af Merge tag 'v6.11-rockchip-dts64-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
These are two additional boards, the Xunlong Orange Pi 3B and
the Add Radxa ROCK 3B.

* tag 'v6.11-rockchip-dts64-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Add Xunlong Orange Pi 3B
  dt-bindings: arm: rockchip: Add Xunlong Orange Pi 3B
  arm64: dts: rockchip: Add Radxa ROCK 3B
  dt-bindings: arm: rockchip: Add Radxa ROCK 3B

Link: https://lore.kernel.org/r/2191200.GUh0CODmnK@diego
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 10:48:31 +02:00
Paul Burton
680e7863de MIPS: GIC: Generate redirect block accessors
With CM 3.5 the "core-other" register block evolves into the "redirect"
register block, which is capable of accessing not only the core local
registers of other cores but also the shared/global registers of other
clusters.

This patch generates accessor functions for shared/global registers
accessed via the redirect block, with "redir_" inserted after "gic_" in
their names. For example the accessor function:

  read_gic_config()

...accesses the GIC_CONFIG register of the GIC in the local cluster.
With this patch a new function:

  read_gic_redir_config()

...is added which accesses the GIC_CONFIG register of the GIC in
whichever cluster the GCR_CL_REDIRECT register is configured to access.

This mirrors the similar redirect block accessors already provided for
the CM & CPC.

Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Signed-off-by: Chao-ying Fu <cfu@wavecomp.com>
Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
Signed-off-by: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-07-09 10:48:28 +02:00
Paul Burton
36675ac2a7 MIPS: CPS: Add a couple of multi-cluster utility functions
This patch introduces a couple of utility functions which help later
patches with introducing support for multi-cluster systems.

 - mips_cps_multicluster_cpus() allows its caller to determine whether
   the system includes CPUs spread across multiple clusters. This is
   useful because in some cases behaviour can be more optimal taking
   this knowledge into account. The means by which we check this is
   dependent upon the way we probe CPUs & assign their numbers, so
   keeping this knowledge confined here in arch/mips/ seems appropriate.

 - mips_cps_first_online_in_cluster() allows its caller to determine
   whether it is running upon the first CPU online within its cluster.
   This information is useful in cases where some cluster-wide
   configuration may need to occur, but should not be repeated if
   another CPU in the cluster is already online. Similarly to the above
   this is determined based upon knowledge of CPU numbering so it makes
   sense to keep that knowledge in arch/mips/. The function is defined
   in mips-cm.c rather than in asm/mips-cps.h in order to allow us to
   use asm/cpu-info.h & linux/smp.h without encountering an include
   nightmare.

Signed-off-by: Paul Burton <paulburton@kernel.org>
Signed-off-by: Chao-ying Fu <cfu@wavecomp.com>
Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
Signed-off-by: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-07-09 10:48:17 +02:00
Andreas Gruenbacher
f75efefb6d gfs2: Clean up glock demote logic
The logic for determining when to demote a glock in glock_work_func(),
introduced in commit 7cf8dcd3b6 ("GFS2: Automatically adjust glock min
hold time"), doesn't make sense: inode glocks have a minimum hold time
that delays demotion, while all other glocks are expected to be demoted
immediately.  Instead of demoting non-inode glocks immediately,
glock_work_func() schedules glock work for them to be demoted, however.
Get rid of that unnecessary indirection.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-07-09 10:40:03 +02:00
Dominique Martinet
89c7f50789 MIPS: Octeron: remove source file executable bit
This does not matter the least, but there is no other .[ch] file in the
repo that is executable, so clean this up.

Fixes: 29b83a64df ("MIPS: Octeon: Add PCIe link status check")
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-07-09 10:38:08 +02:00
Bibo Mao
03779999ac LoongArch: KVM: Add PV steal time support in guest side
Per-cpu struct kvm_steal_time is added here, its size is 64 bytes and
also defined as 64 bytes, so that the whole structure is in one physical
page.

When a VCPU is online, function pv_enable_steal_time() is called. This
function will pass guest physical address of struct kvm_steal_time and
tells hypervisor to enable steal time. When a vcpu is offline, physical
address is set as 0 and tells hypervisor to disable steal time.

Here is an output of vmstat on guest when there is workload on both host
and guest. It shows steal time stat information.

procs -----------memory---------- -----io---- -system-- ------cpu-----
 r  b   swpd   free  inact active   bi    bo   in   cs us sy id wa st
15  1      0 7583616 184112  72208    20    0  162   52 31  6 43  0 20
17  0      0 7583616 184704  72192    0     0 6318 6885  5 60  8  5 22
16  0      0 7583616 185392  72144    0     0 1766 1081  0 49  0  1 50
16  0      0 7583616 184816  72304    0     0 6300 6166  4 62 12  2 20
18  0      0 7583632 184480  72240    0     0 2814 1754  2 58  4  1 35

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
b4ba157044 LoongArch: KVM: Add PV steal time support in host side
Add ParaVirt steal time feature in host side, VM can search supported
features provided by KVM hypervisor, a feature KVM_FEATURE_STEAL_TIME
is added here. Like x86, steal time structure is saved in guest memory,
one hypercall function KVM_HCALL_FUNC_NOTIFY is added to notify KVM to
enable this feature.

One CPU attr ioctl command KVM_LOONGARCH_VCPU_PVTIME_CTRL is added to
save and restore the base address of steal time structure when a VM is
migrated.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Jia Qingtong
d7ad41a31d LoongArch: KVM: always make pte young in page map's fast path
It seems redundant to check if pte is young before the call to
kvm_pte_mkyoung() in kvm_map_page_fast(). Just remove the check.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Jia Qingtong <jiaqingtong97@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
ebf00272da LoongArch: KVM: Mark page accessed and dirty with page ref added
Function kvm_map_page_fast() is fast path of secondary mmu page fault
flow, pfn is parsed from secondary mmu page table walker. However the
corresponding page reference is not added, it is dangerious to access
page out of mmu_lock.

Here page ref is added inside mmu_lock, function kvm_set_pfn_accessed()
and kvm_set_pfn_dirty() is called with page ref added, so that the page
will not be freed by others.

Also kvm_set_pfn_accessed() is removed here since it is called in the
following function kvm_release_pfn_clean().

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
8c34704252 LoongArch: KVM: Add dirty bitmap initially all set support
Add KVM_DIRTY_LOG_INITIALLY_SET support on LoongArch system, this
feature comes from other architectures like x86 and arm64.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
32d4b999da LoongArch: KVM: Add memory barrier before update pmd entry
When updating pmd entry such as allocating new pmd page or splitting
huge page into normal page, it is necessary to firstly update all pte
entries, and then update pmd entry.

It is weak order with LoongArch system, there will be problem if other
VCPUs see pmd update firstly while ptes are not updated. Here smp_wmb()
is added to assure this.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
b072cbf023 LoongArch: KVM: Discard dirty page tracking on readonly memslot
For readonly memslot such as UEFI BIOS or UEFI var space, guest cannot
write this memory space directly. So it is not necessary to track dirty
pages for readonly memslot. Here we make such optimization in function
kvm_arch_commit_memory_region().

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
2f56f9ea4d LoongArch: KVM: Select huge page only if secondary mmu supports it
Currently page level selection about secondary mmu depends on memory
slot and page level about host mmu. There will be problems if page level
of secondary mmu is zero already. Huge page cannot be selected if there is
normal page mapped in secondary mmu already, since it is not supported to
merge normal pages into huge pages now.

So page level selection should depend on the following three conditions.
 1. Memslot is aligned for huge page and vm is not migrating.
 2. Page level of host mmu is also huge page.
 3. Page level of secondary mmu is suituable for huge page.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
b5d4e2325d LoongArch: KVM: Delay secondary mmu tlb flush until guest entry
With hardware assisted virtualization, there are two level HW mmu, one
is GVA to GPA mapping, the other is GPA to HPA mapping which is called
secondary mmu in generic. If there is page fault for secondary mmu,
there needs tlb flush operation indexed with fault GPA address and VMID.
VMID is stored at register CSR.GSTAT and will be reload or recalculated
before guest entry.

Currently CSR.GSTAT is not saved and restored during VCPU context
switch, instead it is recalculated during guest entry. So CSR.GSTAT is
effective only when a VCPU runs in guest mode, however it may not be
effective if the VCPU exits to host mode. Since register CSR.GSTAT may
be stale, it may records the VMID of the last schedule-out VCPU, rather
than the current VCPU.

Function kvm_flush_tlb_gpa() should be called with its real VMID, so
here move it to the guest entrance. Also an arch-specific request id
KVM_REQ_TLB_FLUSH_GPA is added to flush tlb for secondary mmu, and it
can be optimized if VMID is updated, since all guest tlb entries will
be invalid if VMID is updated.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:50 +08:00
Bibo Mao
e306e51490 LoongArch: KVM: Sync pending interrupt when getting ESTAT from user mode
Currently interrupts are posted and cleared with the asynchronous mode,
meanwhile they are saved in SW state vcpu::arch::irq_pending and vcpu::
arch::irq_clear. When vcpu is ready to run, pending interrupt is written
back to CSR.ESTAT register from SW state vcpu::arch::irq_pending at the
guest entrance.

During VM migration stage, vcpu is put into stopped state, however
pending interrupts are not synced to CSR.ESTAT register. So there will
be interrupt lost when VCPU is migrated to another host machines.

Here in this patch when ESTAT CSR register is read from VMM user mode,
pending interrupts are synchronized to ESTAT also. So that VMM can get
correct pending interrupts.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:50 +08:00
Geliang Tang
f0c1802569 skmsg: Skip zero length skb in sk_msg_recvmsg
When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, the following kernel panic occurs:

  [...]
  Oops[#1]:
  CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
  Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
     ... ...
     ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
    ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 0000000c (PPLV0 +PIE +PWE)
   EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
   BADV: 0000000000000040
   PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
  Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
  Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
  Stack : ...
  Call Trace:
  [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
  [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
  [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
  [<90000000049aae34>] inet_recvmsg+0x54/0x100
  [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
  [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
  [<900000000481e27c>] sys_recvfrom+0x1c/0x40
  [<9000000004c076ec>] do_syscall+0x8c/0xc0
  [<9000000003731da4>] handle_syscall+0xc4/0x160
  Code: ...
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Fatal exception
  Kernel relocated by 0x3510000
   .text @ 0x9000000003710000
   .data @ 0x9000000004d70000
   .bss  @ 0x9000000006469400
  ---[ end Kernel panic - not syncing: Fatal exception ]---
  [...]

This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.

This crash is because a NULL pointer is passed to page_address() in the
sk_msg_recvmsg(). Due to the different implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on x86 platform. So this bug was hidden on x86 platform
for a while, but now it is exposed on Loongarch platform. The root cause
is that a zero length skb (skb->len == 0) was put on the queue.

This zero length skb is a TCP FIN packet, which was sent by shutdown(),
invoked in test_sockmap_skb_verdict_shutdown():

	shutdown(p1, SHUT_WR);

In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.

And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
to kmap_local_page() and to page_address(), then kernel panics.

To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
if copy is zero, that means it's a zero length skb, skip invoking
copy_page_to_iter(). We are using the EFAULT return triggered by
copy_page_to_iter to check for is_fin in tcp_bpf.c.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn
2024-07-09 10:24:50 +02:00
Bartosz Golaszewski
91581c4b3f gpio: virtuser: new virtual testing driver for the GPIO API
The GPIO subsystem used to have a serious problem with undefined behavior
and use-after-free bugs on hot-unplug of GPIO chips. This can be
considered a corner-case by some as most GPIO controllers are enabled
early in the boot process and live until the system goes down but most
GPIO drivers do allow unbind over sysfs, many are loadable modules that
can be (force) unloaded and there are also GPIO devices that can be
dynamically detached, for instance CP2112 which is a USB GPIO expender.

Bugs can be triggered both from user-space as well as by in-kernel users.
We have the means of testing it from user-space via the character device
but the issues manifest themselves differently in the kernel.

This is a proposition of adding a new virtual driver - a configurable
GPIO consumer that can be configured over configfs (similarly to
gpio-sim) or described on the device-tree.

This driver is aimed as a helper in spotting any regressions in
hot-unplug handling in GPIOLIB.

Link: https://lore.kernel.org/r/20240708142912.120570-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-09 09:39:54 +02:00
Thorsten Blum
21b9e722ad m68k: cmpxchg: Fix return value for default case in __arch_xchg()
The return value of __invalid_xchg_size() is assigned to tmp instead of
the return variable x. Assign it to x instead.

Fixes: 2501cf768e ("m68k: Fix xchg/cmpxchg to fail to link if given an inappropriate pointer")
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/20240702034116.140234-2-thorsten.blum@toblux.com
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2024-07-09 09:19:31 +02:00
Uwe Kleine-König
d2eb433c85 ALSA: ppc: keywest: Drop explicit initialization of struct i2c_device_id::driver_data to 0
The driver doesn't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.

This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20240708130254.9631-2-u.kleine-koenig@baylibre.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-09 09:16:30 +02:00
Christoph Hellwig
61353a63a2 block: take offset into account in blk_bvec_map_sg again
The rebase of commit 09595e0c9d ("block: pass a phys_addr_t to
get_max_segment_size") lost adding the total to to the offset in
blk_bvec_map_sg.  Add it back.

Fixes: 09595e0c9d ("block: pass a phys_addr_t to get_max_segment_size")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240709070126.3019940-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 01:02:44 -06:00
Johannes Berg
6e909f4891 wifi: virt_wifi: don't use strlen() in const context
Looks like not all compilers allow strlen(constant) as
a constant, so don't do that. Instead, revert back to
defining the length as the first submission had it.

Fixes: b5d14b0c67 ("wifi: virt_wifi: avoid reporting connection success with wrong SSID")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407090934.NnR1TUbW-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202407090944.mpwLHGt9-lkp@intel.com/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-09 08:49:56 +02:00
Arnd Bergmann
05a01ce773 Merge tag 'imx-defconfig-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/defconfig
i.MX defconfig change for 6.11:

- Enable a few drivers needed by TQMa7x/MBa7x and i.MX53 QSB/QSRB in
  imx_v6_v7_defconfig
- Enable IWLWIFI driver support in arm64 defconfig

* tag 'imx-defconfig-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: defconfig: Enable the IWLWIFI driver
  ARM: imx_v6_v7_defconfig: enable DRM_SII902X and DRM_DISPLAY_CONNECTOR
  ARM: imx_v6_v7_defconfig: Enable drivers for TQMa7x/MBa7x

Link: https://lore.kernel.org/r/20240709012534.3106441-1-shawnguo2@yeah.net
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09 08:21:26 +02:00
Chaitanya Kulkarni
0ffc46eb1b block: fix get_max_segment_size() warning
Correct the parameter name in the comment of get_max_segment_size()
to fix following warning:-

block/blk-merge.c:220: warning: Function parameter or struct member 'len' not described in 'get_max_segment_size'
block/blk-merge.c:220: warning: Excess function parameter 'max_len' description in 'get_max_segment_size'

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240709045432.8688-1-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 00:01:11 -06:00
John Garry
9423c653fe loop: Don't bother validating blocksize
The block queue limits validation does this for us now.

The loop_configure() -> WARN_ON_ONCE() call is dropped, as an invalid
block size would trigger this now. We don't want userspace to be able to
directly trigger WARNs.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240708091651.177447-6-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 00:00:17 -06:00
John Garry
af28172291 virtio_blk: Don't bother validating blocksize
The block queue limits validation does this for us now.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240708091651.177447-5-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 00:00:17 -06:00
John Garry
addc3a68de null_blk: Don't bother validating blocksize
The block queue limits validation does this for us now.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240708091651.177447-4-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 00:00:17 -06:00
John Garry
fe3d508ba9 block: Validate logical block size in blk_validate_limits()
Some drivers validate that their own logical block size. It is no harm to
always do this, so validate in blk_validate_limits().

This allows us to remove the validation in most of those drivers.

Add a comment to blk_validate_block_size() to inform users that self-
validation of LBS is usually unnecessary.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240708091651.177447-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 00:00:17 -06:00
John Garry
4ff3d01275 virtio_blk: Fix default logical block size fallback
If we fail to read a logical block size in virtblk_read_limits() ->
virtio_cread_feature(), then we default to what is in
lim->logical_block_size, but that would be 0.

We can deal with lim->logical_block_size = 0 later in the
blk_mq_alloc_disk(), but the code in virtblk_read_limits() needs a proper
default, so give a default of SECTOR_SIZE.

Fixes: 27e32cd23f ("block: pass a queue_limits argument to blk_mq_alloc_disk")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240708091651.177447-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09 00:00:17 -06:00
Jens Axboe
6b43537fae Merge tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme into for-6.11/block
Pull NVMe updates from Keith:

"nvme updates for Linux 6.11

 - Device initialization memory leak fixes (Keith)
 - More constants defined (Weiwen)
 - Target debugfs support (Hannes)
 - PCIe subsystem reset enhancements (Keith)
 - Queue-depth multipath policy (Redhat and PureStorage)
 - Implement get_unique_id (Christoph)
 - Authentication error fixes (Gaosheng)"

* tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme: (21 commits)
  nvmet-auth: fix nvmet_auth hash error handling
  nvme: implement ->get_unique_id
  nvme-multipath: implement "queue-depth" iopolicy
  nvme-multipath: prepare for "queue-depth" iopolicy
  nvme-pci: do not directly handle subsys reset fallout
  lpfc_nvmet: implement 'host_traddr'
  nvme-fcloop: implement 'host_traddr'
  nvmet-fc: implement host_traddr()
  nvmet-rdma: implement host_traddr()
  nvmet-tcp: implement host_traddr()
  nvmet: add 'host_traddr' callback for debugfs
  nvmet: add debugfs support
  mailmap: add entry for Weiwen Hu
  nvme: rename CDR/MORE/DNR to NVME_STATUS_*
  nvme: fix status magic numbers
  nvme: rename nvme_sc_to_pr_err to nvme_status_to_pr_err
  nvme: split device add from initialization
  nvme: fc: split controller bringup handling
  nvme: rdma: split controller bringup handling
  nvme: tcp: split controller bringup handling
  ...
2024-07-08 23:57:02 -06:00
Yicong Yang
54624acf88 dma-mapping: benchmark: Don't starve others when doing the test
The test thread will start N benchmark kthreads and then schedule out
until the test time finished and notify the benchmark kthreads to stop.
The benchmark kthreads will keep running until notified to stop.
There's a problem with current implementation when the benchmark
kthreads number is equal to the CPUs on a non-preemptible kernel:
since the scheduler will balance the kthreads across the CPUs and
when the test time's out the test thread won't get a chance to be
scheduled on any CPU then cannot notify the benchmark kthreads to stop.

This can be easily reproduced on a VM (simulated with 16 CPUs) with
PREEMPT_VOLUNTARY:
estuary:/mnt$ ./dma_map_benchmark -t 16 -s 1
 rcu: INFO: rcu_sched self-detected stall on CPU
 rcu:     10-...!: (5221 ticks this GP) idle=ed24/1/0x4000000000000000 softirq=142/142 fqs=0
 rcu:     (t=5254 jiffies g=-559 q=45 ncpus=16)
 rcu: rcu_sched kthread starved for 5255 jiffies! g-559 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=12
 rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
 rcu: RCU grace-period kthread stack dump:
 task:rcu_sched       state:R  running task     stack:0     pid:16    tgid:16    ppid:2      flags:0x00000008
 Call trace
  __switch_to+0xec/0x138
  __schedule+0x2f8/0x1080
  schedule+0x30/0x130
  schedule_timeout+0xa0/0x188
  rcu_gp_fqs_loop+0x128/0x528
  rcu_gp_kthread+0x1c8/0x208
  kthread+0xec/0xf8
  ret_from_fork+0x10/0x20
 Sending NMI from CPU 10 to CPUs 0:
 NMI backtrace for cpu 0
 CPU: 0 PID: 332 Comm: dma-map-benchma Not tainted 6.10.0-rc1-vanilla-LSE #8
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : arm_smmu_cmdq_issue_cmdlist+0x218/0x730
 lr : arm_smmu_cmdq_issue_cmdlist+0x488/0x730
 sp : ffff80008748b630
 x29: ffff80008748b630 x28: 0000000000000000 x27: ffff80008748b780
 x26: 0000000000000000 x25: 000000000000bc70 x24: 000000000001bc70
 x23: ffff0000c12af080 x22: 0000000000010000 x21: 000000000000ffff
 x20: ffff80008748b700 x19: ffff0000c12af0c0 x18: 0000000000010000
 x17: 0000000000000001 x16: 0000000000000040 x15: ffffffffffffffff
 x14: 0001ffffffffffff x13: 000000000000ffff x12: 00000000000002f1
 x11: 000000000001ffff x10: 0000000000000031 x9 : ffff800080b6b0b8
 x8 : ffff0000c2a48000 x7 : 000000000001bc71 x6 : 0001800000000000
 x5 : 00000000000002f1 x4 : 01ffffffffffffff x3 : 000000000009aaf1
 x2 : 0000000000000018 x1 : 000000000000000f x0 : ffff0000c12af18c
 Call trace:
  arm_smmu_cmdq_issue_cmdlist+0x218/0x730
  __arm_smmu_tlb_inv_range+0xe0/0x1a8
  arm_smmu_iotlb_sync+0xc0/0x128
  __iommu_dma_unmap+0x248/0x320
  iommu_dma_unmap_page+0x5c/0xe8
  dma_unmap_page_attrs+0x38/0x1d0
  map_benchmark_thread+0x118/0x2c0
  kthread+0xec/0xf8
  ret_from_fork+0x10/0x20

Solve this by adding scheduling point in the kthread loop,
so if there're other threads in the system they may have
a chance to run, especially the thread to notify the test
end. However this may degrade the test concurrency so it's
recommended to run this on an idle system.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Barry Song <baohua@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-07-09 07:48:32 +02:00
Chen Ni
b80cc4df11 ipc: mqueue: remove assignment from IS_ERR argument
Remove assignment from IS_ERR() argument.
This is detected by coccinelle.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Link: https://lore.kernel.org/r/20240708080404.3859094-1-nichen@iscas.ac.cn
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-09 06:47:40 +02:00
Wojciech Gładysz
83f4414b8f ext4: sanity check for NULL pointer after ext4_force_shutdown
Test case: 2 threads write short inline data to a file.
In ext4_page_mkwrite the resulting inline data is converted.
Handling ext4_grp_locked_error with description "block bitmap
and bg descriptor inconsistent: X vs Y free clusters" calls
ext4_force_shutdown. The conversion clears
EXT4_STATE_MAY_INLINE_DATA but fails for
ext4_destroy_inline_data_nolock and ext4_mark_iloc_dirty due
to ext4_forced_shutdown. The restoration of inline data fails
for the same reason not setting EXT4_STATE_MAY_INLINE_DATA.
Without the flag set a regular process path in ext4_da_write_end
follows trying to dereference page folio private pointer that has
not been set. The fix calls early return with -EIO error shall the
pointer to private be NULL.

Sample crash report:

Unable to handle kernel paging request at virtual address dfff800000000004
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000004] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 20274 Comm: syz-executor185 Not tainted 6.9.0-rc7-syzkaller-gfda5695d692c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __block_commit_write+0x64/0x2b0 fs/buffer.c:2167
lr : __block_commit_write+0x3c/0x2b0 fs/buffer.c:2160
sp : ffff8000a1957600
x29: ffff8000a1957610 x28: dfff800000000000 x27: ffff0000e30e34b0
x26: 0000000000000000 x25: dfff800000000000 x24: dfff800000000000
x23: fffffdffc397c9e0 x22: 0000000000000020 x21: 0000000000000020
x20: 0000000000000040 x19: fffffdffc397c9c0 x18: 1fffe000367bd196
x17: ffff80008eead000 x16: ffff80008ae89e3c x15: 00000000200000c0
x14: 1fffe0001cbe4e04 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : 0000000000000004 x7 : 0000000000000000 x6 : 0000000000000000
x5 : fffffdffc397c9c0 x4 : 0000000000000020 x3 : 0000000000000020
x2 : 0000000000000040 x1 : 0000000000000020 x0 : fffffdffc397c9c0
Call trace:
 __block_commit_write+0x64/0x2b0 fs/buffer.c:2167
 block_write_end+0xb4/0x104 fs/buffer.c:2253
 ext4_da_do_write_end fs/ext4/inode.c:2955 [inline]
 ext4_da_write_end+0x2c4/0xa40 fs/ext4/inode.c:3028
 generic_perform_write+0x394/0x588 mm/filemap.c:3985
 ext4_buffered_write_iter+0x2c0/0x4ec fs/ext4/file.c:299
 ext4_file_write_iter+0x188/0x1780
 call_write_iter include/linux/fs.h:2110 [inline]
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0x968/0xc3c fs/read_write.c:590
 ksys_write+0x15c/0x26c fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __arm64_sys_write+0x7c/0x90 fs/read_write.c:652
 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152
 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: 97f85911 f94002da 91008356 d343fec8 (38796908)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
   0:	97f85911 	bl	0xffffffffffe16444
   4:	f94002da 	ldr	x26, [x22]
   8:	91008356 	add	x22, x26, #0x20
   c:	d343fec8 	lsr	x8, x22, #3
* 10:	38796908 	ldrb	w8, [x8, x25] <-- trapping instruction

Reported-by: syzbot+18df508cf00a0598d9a6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=18df508cf00a0598d9a6
Link: https://lore.kernel.org/all/000000000000f19a1406109eb5c5@google.com/T/
Signed-off-by: Wojciech Gładysz <wojciech.gladysz@infogain.com>
Link: https://patch.msgid.link/20240703070112.10235-1-wojciech.gladysz@infogain.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Jan Kara
a794c9ad02 jbd2: increase maximum transaction size
Originally, we were quite conservative in limiting maximum transaction
size to a quarter of the journal because we were not accounting
transaction descriptor and revoke blocks. These days we do properly
account them and reserve space for them from the total transaction
credits. Thus there's no need to be so conservative and we can increase
the maximum transaction size to one third of the journal (even half
should work fine in principle but the performance will likely suffer in
that case). This also fixes failures to grow filesystems with tiny
journals.

Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Jan Kara
1cf5b024a3 jbd2: drop pointless shrinker batch initialization
In jbd2_journal_init_common() we set batch size of a shrinker shrinking
checkpointed buffers to journal->j_max_transaction_buffers. But that is
guaranteed to be 0 at that point so we effectively stay with the default
shrinker batch size of 128. It has been like this since introduction of
jbd2 shrinkers so just drop the pointless initialization.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Jan Kara
27ba5b6731 jbd2: avoid infinite transaction commit loop
Commit 9f356e5a4f ("jbd2: Account descriptor blocks into
t_outstanding_credits") started to account descriptor blocks into
transactions outstanding credits. However it didn't appropriately
decrease the maximum amount of credits available to userspace. Thus if
the filesystem requests a transaction smaller than
j_max_transaction_buffers but large enough that when descriptor blocks
are added the size exceeds j_max_transaction_buffers, we confuse
add_transaction_credits() into thinking previous handles have grown the
transaction too much and enter infinite journal commit loop in
start_this_handle() -> add_transaction_credits() trying to create
transaction with enough credits available.

Fix the problem by properly accounting for transaction space reserved
for descriptor blocks when verifying requested transaction handle size.

CC: stable@vger.kernel.org
Fixes: 9f356e5a4f ("jbd2: Account descriptor blocks into t_outstanding_credits")
Reported-by: Alexander Coffin <alex.coffin@maticrobots.com>
Link: https://lore.kernel.org/all/CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Jan Kara
e3a00a2378 jbd2: precompute number of transaction descriptor blocks
Instead of computing the number of descriptor blocks a transaction can
have each time we need it (which is currently when starting each
transaction but will become more frequent later) precompute the number
once during journal initialization together with maximum transaction
size. We perform the precomputation whenever journal feature set is
updated similarly as for computation of
journal->j_revoke_records_per_block.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Jan Kara
4aa99c71e4 jbd2: make jbd2_journal_get_max_txn_bufs() internal
There's no reason to have jbd2_journal_get_max_txn_bufs() public
function. Currently all users are internal and can use
journal->j_max_transaction_buffers instead. This saves some unnecessary
recomputations of the limit as a bonus which becomes important as this
function gets more complex in the following patch.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Ye Bin
0bab8db415 jbd2: avoid mount failed when commit block is partial submitted
We encountered a problem that the file system could not be mounted in
the power-off scenario. The analysis of the file system mirror shows that
only part of the data is written to the last commit block.
The valid data of the commit block is concentrated in the first sector.
However, the data of the entire block is involved in the checksum calculation.
For different hardware, the minimum atomic unit may be different.
If the checksum of a committed block is incorrect, clear the data except the
'commit_header' and then calculate the checksum. If the checkusm is correct,
it is considered that the block is partially committed, Then continue to replay
journal.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240620072405.3533701-1-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Jan Kara
65121eff3e ext4: avoid writing unitialized memory to disk in EA inodes
If the extended attribute size is not a multiple of block size, the last
block in the EA inode will have uninitialized tail which will get
written to disk. We will never expose the data to userspace but still
this is not a good practice so just zero out the tail of the block as it
isn't going to cause a noticeable performance overhead.

Fixes: e50e5129f3 ("ext4: xattr-in-inode support")
Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00
Luis Henriques (SUSE)
7882b0187b ext4: don't track ranges in fast_commit if inode has inlined data
When fast-commit needs to track ranges, it has to handle inodes that have
inlined data in a different way because ext4_fc_write_inode_data(), in the
actual commit path, will attempt to map the required blocks for the range.
However, inodes that have inlined data will have it's data stored in
inode->i_block and, eventually, in the extended attribute space.

Unfortunately, because fast commit doesn't currently support extended
attributes, the solution is to mark this commit as ineligible.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Tested-by: Ben Hutchings <benh@debian.org>
Fixes: 9725958bb7 ("ext4: fast commit may miss tracking unwritten range during ftruncate")
Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08 23:59:37 -04:00