linux

Author	SHA1	Message	Date
Arnd Bergmann	0b07feb910	Merge tag 'memory-controller-drv-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers Memory controller drivers for v6.11 Make the Freescale IFC driver selectable because it is used now by two drivers: Freescale NAND and generic NOR flash. The patches adjusting defconfig are waiting on the mailing lists to be picked up. * tag 'memory-controller-drv-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl: dt-bindings: memory: fsl: replace maintainer memory: fsl_ifc: Make FSL_IFC config visible and selectable Link: https://lore.kernel.org/r/20240702070212.8291-1-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:15:36 +02:00
Arnd Bergmann	097d4193b1	Merge tag 'sunxi-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers Allwinner SoC driver changes for 6.11 - DT binding addition of regulator node under SRAM node - Cleanup of unused list in SRAM driver * tag 'sunxi-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: dt-bindings: sram: sunxi-sram: Add regulators child soc: sunxi: sram: Remove unused list 'claimed_sram' Link: https://lore.kernel.org/r/ZoQZguQ6taJziJ11@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:13:51 +02:00
Arnd Bergmann	2bb1acc608	Merge tag 'reset-for-v6.11-2' of git://git.pengutronix.de/pza/linux into soc/drivers Reset controller updates for v6.11, part 2 This tag adds USB VBUS regulator control for Renesas RZ/G2L SoCs, which also touches PHY driver and device tree, and pulls in a new regulator_hardware_enable() helper. The Tegra BPMP reset driver can be compiled under COMPILE_TEST now. * tag 'reset-for-v6.11-2' of git://git.pengutronix.de/pza/linux: arm64: dts: renesas: rz-smarc: Replace fixed regulator for USB VBUS phy: renesas: phy-rcar-gen3-usb2: Control VBUS for RZ/G2L SoCs reset: renesas: Add USB VBUS regulator device as child dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document USB VBUS regulator reset: tegra-bpmp: allow building under COMPILE_TEST regulator: core: Add helper for allow HW access to enable/disable regulator Link: https://lore.kernel.org/r/20240703100809.2773890-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:12:55 +02:00
Arnd Bergmann	ee22fbd705	Merge tag 'qcom-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers Qualcomm driver updates for v6.11 Support for Shared Memory (shm) Bridge is added, which provides a stricter interface for handling of buffers passed to TrustZone. The X1Elite platform is added to uefisecapp allow list, to instantiate the efivars implementation. A new in-kernel implementation of the pd-mapper (or servreg) service is introduced, to replace the userspace dependency for USB Type-C and battery management. Support for sharing interrupts across multiple bwmon instances is added, and a refcount imbalance issue is corrected. The LLCC support for recent platforms is corrected, and SA8775P support is added. A new interface is added to SMEM, to expose "feature codes". One example of the usecase for this is to indicate to the GPU driver which frequencies are available on the given device. The interrupt consumer and provider side of SMP2P is updated to provide more useful names in interrupt stats. Support for using the mailbox binding and driver for outgoing IPC interrupt in the SMSM driver is introduced. socinfo driver learns about SDM670 and IPQ5321, as well as get some updates to the X1E PMICs. pmic_glink is bumped to now support managing 3 USB Type-C ports. * tag 'qcom-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (48 commits) soc: qcom: smp2p: Use devname for interrupt descriptions soc: qcom: smsm: Add missing mailbox dependency to Kconfig soc: qcom: add missing pd-mapper dependencies soc: qcom: icc-bwmon: Allow for interrupts to be shared across instances dt-bindings: interconnect: qcom,msm8998-bwmon: Add X1E80100 BWMON instances dt-bindings: interconnect: qcom,msm8998-bwmon: Remove opp-table from the required list firmware: qcom: tzmem: export devm_qcom_tzmem_pool_new() soc: qcom: add pd-mapper implementation soc: qcom: pdr: extract PDR message marshalling data soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: protect locator_addr with the main mutex firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image() firmware: qcom: scm: add support for SHM bridge memory carveout firmware: qcom: tzmem: enable SHM Bridge support firmware: qcom: scm: add support for SHM bridge operations firmware: qcom: qseecom: convert to using the TZ allocator firmware: qcom: scm: make qcom_scm_qseecom_app_get_id() use the TZ allocator firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator firmware: qcom: scm: make qcom_scm_ice_set_key() use the TZ allocator firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator ... Link: https://lore.kernel.org/r/20240705034410.13968-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:09:10 +02:00
Pascal Paillet	e9a316affb	arm64: stm32: enable scmi regulator for stm32 Add SCMI ARM REGULATOR configuration for stm32. Signed-off-by: Pascal Paillet <p.paillet@foss.st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240705134554.2833835-1-alexandre.torgue@foss.st.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:08:44 +02:00
Arnd Bergmann	9c148cb47a	Merge tag 'ti-driver-soc-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/drivers TI SoC driver updates for v6.11 - Update TISCI protocol URL link which was dead - socinfo: Add j721E SR 2.0 detection support - MAINTAINER list additions: ti,pruss.yaml and ti,j721e-system-controller.yaml - pm33xx: log statement improvement - knav_qmss: minor data structure optimization * tag 'ti-driver-soc-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux: dt-bindings: soc: ti: Move ti,j721e-system-controller.yaml to soc/ti MAINTAINERS: Add entry for ti,pruss.yaml to TI KEYSTONE MULTICORE NAVIGATOR DRIVERS soc: ti: k3-socinfo: Add J721E SR2.0 soc: ti: knav_qmss: Constify struct knav_range_ops firmware: ti_sci: fix TISCI protocol URL link dt-bindings: ti: fix TISCI protocol URL link soc: ti: pm33xx: Fix missing newlines in log statements Link: https://lore.kernel.org/r/20240705151449.s4rngkehjn73favn@stream Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:07:33 +02:00
James Chapman	f8ad00f3fb	l2tp: fix possible UAF when cleaning up tunnels syzbot reported a UAF caused by a race when the L2TP work queue closes a tunnel at the same time as a userspace thread closes a session in that tunnel. Tunnel cleanup is handled by a work queue which iterates through the sessions contained within a tunnel, and closes them in turn. Meanwhile, a userspace thread may arbitrarily close a session via either netlink command or by closing the pppox socket in the case of l2tp_ppp. The race condition may occur when l2tp_tunnel_closeall walks the list of sessions in the tunnel and deletes each one. Currently this is implemented using list_for_each_safe, but because the list spinlock is dropped in the loop body it's possible for other threads to manipulate the list during list_for_each_safe's list walk. This can lead to the list iterator being corrupted, leading to list_for_each_safe spinning. One sequence of events which may lead to this is as follows: * A tunnel is created, containing two sessions A and B. * A thread closes the tunnel, triggering tunnel cleanup via the work queue. * l2tp_tunnel_closeall runs in the context of the work queue. It removes session A from the tunnel session list, then drops the list lock. At this point the list_for_each_safe temporary variable is pointing to the other session on the list, which is session B, and the list can be manipulated by other threads since the list lock has been released. * Userspace closes session B, which removes the session from its parent tunnel via l2tp_session_delete. Since l2tp_tunnel_closeall has released the tunnel list lock, l2tp_session_delete is able to call list_del_init on the session B list node. * Back on the work queue, l2tp_tunnel_closeall resumes execution and will now spin forever on the same list entry until the underlying session structure is freed, at which point UAF occurs. The solution is to iterate over the tunnel's session list using list_first_entry_not_null to avoid the possibility of the list iterator pointing at a list item which may be removed during the walk. Also, have l2tp_tunnel_closeall ref each session while it processes it to prevent another thread from freeing it. cpu1 cpu2 --- --- pppol2tp_release() spin_lock_bh(&tunnel->list_lock); for (;;) { session = list_first_entry_or_null(&tunnel->session_list, struct l2tp_session, list); if (!session) break; list_del_init(&session->list); spin_unlock_bh(&tunnel->list_lock); l2tp_session_delete(session); l2tp_session_delete(session); spin_lock_bh(&tunnel->list_lock); } spin_unlock_bh(&tunnel->list_lock); Calling l2tp_session_delete on the same session twice isn't a problem per-se, but if cpu2 manages to destruct the socket and unref the session to zero before cpu1 progresses then it would lead to UAF. Reported-by: syzbot+b471b7c936301a59745b@syzkaller.appspotmail.com Reported-by: syzbot+c041b4ce3a6dfd1e63e2@syzkaller.appspotmail.com Fixes: `d18d3f0a24` ("l2tp: replace hlist with simple list for per-tunnel session list") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://patch.msgid.link/20240704152508.1923908-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-09 11:06:21 +02:00
Arnd Bergmann	9f51147489	Merge tag 'riscv-firmware-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers RISC-V firmware drivers for v6.11 Microchip: Support for writing "bitstream info" to the flash using the auto-update driver. At this point the "bitstream info" is a glorified dtbo wrapper, but there's plans to add more info there in the future. Additionally, rework some allocations in the driver and use scope-based cleanup on them. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-firmware-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: firmware: microchip: use scope-based cleanup where possible firmware: microchip: move buffer allocation into mpfs_auto_update_set_image_address() firmware: microchip: support writing bitstream info to flash Link: https://lore.kernel.org/r/20240707-lukewarm-film-8a9da40a1c27@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 11:05:26 +02:00
Arnd Bergmann	9e6b815593	Merge tag 'riscv-cache-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers RISC-V cache drivers for v6.11 StarFive: A new driver for the cache controller on the jh8100, which didn't implement Zicbom and thus needs an implementation of non-standard cache management operations. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-cache-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: MAINTAINERS: add microchip soc binding directory to microchip soc driver entry MAINTAINERS: add cache binding directory to cache driver entry cache: Add StarFive StarLink cache management dt-bindings: cache: Add docs for StarFive Starlink cache controller Link: https://lore.kernel.org/r/20240707-whoever-undesired-c5f6e96ae403@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 10:53:48 +02:00
Arnd Bergmann	95ab7b209b	Merge tag 'riscv-sophgo-dt-for-v6.11' of https://github.com/sophgo/linux into soc/dt RISC-V Devicetrees for v6.11 Sopgho: Add clock support for SG2042. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> * tag 'riscv-sophgo-dt-for-v6.11' of https://github.com/sophgo/linux: riscv: dts: add clock generator for Sophgo SG2042 SoC Link: https://lore.kernel.org/r/PN1P287MB281861EA2B1706B430D2FA3EFEDB2@PN1P287MB2818.INDP287.PROD.OUTLOOK.COM Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 10:50:42 +02:00
Arnd Bergmann	ad828298af	Merge tag 'v6.11-rockchip-dts64-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt These are two additional boards, the Xunlong Orange Pi 3B and the Add Radxa ROCK 3B. * tag 'v6.11-rockchip-dts64-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Add Xunlong Orange Pi 3B dt-bindings: arm: rockchip: Add Xunlong Orange Pi 3B arm64: dts: rockchip: Add Radxa ROCK 3B dt-bindings: arm: rockchip: Add Radxa ROCK 3B Link: https://lore.kernel.org/r/2191200.GUh0CODmnK@diego Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 10:48:31 +02:00
Paul Burton	680e7863de	MIPS: GIC: Generate redirect block accessors With CM 3.5 the "core-other" register block evolves into the "redirect" register block, which is capable of accessing not only the core local registers of other cores but also the shared/global registers of other clusters. This patch generates accessor functions for shared/global registers accessed via the redirect block, with "redir_" inserted after "gic_" in their names. For example the accessor function: read_gic_config() ...accesses the GIC_CONFIG register of the GIC in the local cluster. With this patch a new function: read_gic_redir_config() ...is added which accesses the GIC_CONFIG register of the GIC in whichever cluster the GCR_CL_REDIRECT register is configured to access. This mirrors the similar redirect block accessors already provided for the CM & CPC. Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Paul Burton <paulburton@kernel.org> Signed-off-by: Chao-ying Fu <cfu@wavecomp.com> Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com> Signed-off-by: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2024-07-09 10:48:28 +02:00
Paul Burton	36675ac2a7	MIPS: CPS: Add a couple of multi-cluster utility functions This patch introduces a couple of utility functions which help later patches with introducing support for multi-cluster systems. - mips_cps_multicluster_cpus() allows its caller to determine whether the system includes CPUs spread across multiple clusters. This is useful because in some cases behaviour can be more optimal taking this knowledge into account. The means by which we check this is dependent upon the way we probe CPUs & assign their numbers, so keeping this knowledge confined here in arch/mips/ seems appropriate. - mips_cps_first_online_in_cluster() allows its caller to determine whether it is running upon the first CPU online within its cluster. This information is useful in cases where some cluster-wide configuration may need to occur, but should not be repeated if another CPU in the cluster is already online. Similarly to the above this is determined based upon knowledge of CPU numbering so it makes sense to keep that knowledge in arch/mips/. The function is defined in mips-cm.c rather than in asm/mips-cps.h in order to allow us to use asm/cpu-info.h & linux/smp.h without encountering an include nightmare. Signed-off-by: Paul Burton <paulburton@kernel.org> Signed-off-by: Chao-ying Fu <cfu@wavecomp.com> Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com> Signed-off-by: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2024-07-09 10:48:17 +02:00
Andreas Gruenbacher	f75efefb6d	gfs2: Clean up glock demote logic The logic for determining when to demote a glock in glock_work_func(), introduced in commit `7cf8dcd3b6` ("GFS2: Automatically adjust glock min hold time"), doesn't make sense: inode glocks have a minimum hold time that delays demotion, while all other glocks are expected to be demoted immediately. Instead of demoting non-inode glocks immediately, glock_work_func() schedules glock work for them to be demoted, however. Get rid of that unnecessary indirection. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2024-07-09 10:40:03 +02:00
Dominique Martinet	89c7f50789	MIPS: Octeron: remove source file executable bit This does not matter the least, but there is no other .[ch] file in the repo that is executable, so clean this up. Fixes: `29b83a64df` ("MIPS: Octeon: Add PCIe link status check") Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2024-07-09 10:38:08 +02:00
Bibo Mao	03779999ac	LoongArch: KVM: Add PV steal time support in guest side Per-cpu struct kvm_steal_time is added here, its size is 64 bytes and also defined as 64 bytes, so that the whole structure is in one physical page. When a VCPU is online, function pv_enable_steal_time() is called. This function will pass guest physical address of struct kvm_steal_time and tells hypervisor to enable steal time. When a vcpu is offline, physical address is set as 0 and tells hypervisor to disable steal time. Here is an output of vmstat on guest when there is workload on both host and guest. It shows steal time stat information. procs -----------memory---------- -----io---- -system-- ------cpu----- r b swpd free inact active bi bo in cs us sy id wa st 15 1 0 7583616 184112 72208 20 0 162 52 31 6 43 0 20 17 0 0 7583616 184704 72192 0 0 6318 6885 5 60 8 5 22 16 0 0 7583616 185392 72144 0 0 1766 1081 0 49 0 1 50 16 0 0 7583616 184816 72304 0 0 6300 6166 4 62 12 2 20 18 0 0 7583632 184480 72240 0 0 2814 1754 2 58 4 1 35 Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	b4ba157044	LoongArch: KVM: Add PV steal time support in host side Add ParaVirt steal time feature in host side, VM can search supported features provided by KVM hypervisor, a feature KVM_FEATURE_STEAL_TIME is added here. Like x86, steal time structure is saved in guest memory, one hypercall function KVM_HCALL_FUNC_NOTIFY is added to notify KVM to enable this feature. One CPU attr ioctl command KVM_LOONGARCH_VCPU_PVTIME_CTRL is added to save and restore the base address of steal time structure when a VM is migrated. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Jia Qingtong	d7ad41a31d	LoongArch: KVM: always make pte young in page map's fast path It seems redundant to check if pte is young before the call to kvm_pte_mkyoung() in kvm_map_page_fast(). Just remove the check. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Jia Qingtong <jiaqingtong97@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	ebf00272da	LoongArch: KVM: Mark page accessed and dirty with page ref added Function kvm_map_page_fast() is fast path of secondary mmu page fault flow, pfn is parsed from secondary mmu page table walker. However the corresponding page reference is not added, it is dangerious to access page out of mmu_lock. Here page ref is added inside mmu_lock, function kvm_set_pfn_accessed() and kvm_set_pfn_dirty() is called with page ref added, so that the page will not be freed by others. Also kvm_set_pfn_accessed() is removed here since it is called in the following function kvm_release_pfn_clean(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	8c34704252	LoongArch: KVM: Add dirty bitmap initially all set support Add KVM_DIRTY_LOG_INITIALLY_SET support on LoongArch system, this feature comes from other architectures like x86 and arm64. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	32d4b999da	LoongArch: KVM: Add memory barrier before update pmd entry When updating pmd entry such as allocating new pmd page or splitting huge page into normal page, it is necessary to firstly update all pte entries, and then update pmd entry. It is weak order with LoongArch system, there will be problem if other VCPUs see pmd update firstly while ptes are not updated. Here smp_wmb() is added to assure this. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	b072cbf023	LoongArch: KVM: Discard dirty page tracking on readonly memslot For readonly memslot such as UEFI BIOS or UEFI var space, guest cannot write this memory space directly. So it is not necessary to track dirty pages for readonly memslot. Here we make such optimization in function kvm_arch_commit_memory_region(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	2f56f9ea4d	LoongArch: KVM: Select huge page only if secondary mmu supports it Currently page level selection about secondary mmu depends on memory slot and page level about host mmu. There will be problems if page level of secondary mmu is zero already. Huge page cannot be selected if there is normal page mapped in secondary mmu already, since it is not supported to merge normal pages into huge pages now. So page level selection should depend on the following three conditions. 1. Memslot is aligned for huge page and vm is not migrating. 2. Page level of host mmu is also huge page. 3. Page level of secondary mmu is suituable for huge page. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:51 +08:00
Bibo Mao	b5d4e2325d	LoongArch: KVM: Delay secondary mmu tlb flush until guest entry With hardware assisted virtualization, there are two level HW mmu, one is GVA to GPA mapping, the other is GPA to HPA mapping which is called secondary mmu in generic. If there is page fault for secondary mmu, there needs tlb flush operation indexed with fault GPA address and VMID. VMID is stored at register CSR.GSTAT and will be reload or recalculated before guest entry. Currently CSR.GSTAT is not saved and restored during VCPU context switch, instead it is recalculated during guest entry. So CSR.GSTAT is effective only when a VCPU runs in guest mode, however it may not be effective if the VCPU exits to host mode. Since register CSR.GSTAT may be stale, it may records the VMID of the last schedule-out VCPU, rather than the current VCPU. Function kvm_flush_tlb_gpa() should be called with its real VMID, so here move it to the guest entrance. Also an arch-specific request id KVM_REQ_TLB_FLUSH_GPA is added to flush tlb for secondary mmu, and it can be optimized if VMID is updated, since all guest tlb entries will be invalid if VMID is updated. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:50 +08:00
Bibo Mao	e306e51490	LoongArch: KVM: Sync pending interrupt when getting ESTAT from user mode Currently interrupts are posted and cleared with the asynchronous mode, meanwhile they are saved in SW state vcpu::arch::irq_pending and vcpu:: arch::irq_clear. When vcpu is ready to run, pending interrupt is written back to CSR.ESTAT register from SW state vcpu::arch::irq_pending at the guest entrance. During VM migration stage, vcpu is put into stopped state, however pending interrupts are not synced to CSR.ESTAT register. So there will be interrupt lost when VCPU is migrated to another host machines. Here in this patch when ESTAT CSR register is read from VMM user mode, pending interrupts are synchronized to ESTAT also. So that VMM can get correct pending interrupts. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-09 16:25:50 +08:00
Geliang Tang	f0c1802569	skmsg: Skip zero length skb in sk_msg_recvmsg When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch platform, the following kernel panic occurs: [...] Oops[#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 ... ... ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...) Stack : ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160 Code: ... ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- [...] This crash happens every time when running sockmap_skb_verdict_shutdown subtest in sockmap_basic. This crash is because a NULL pointer is passed to page_address() in the sk_msg_recvmsg(). Due to the different implementations depending on the architecture, page_address(NULL) will trigger a panic on Loongarch platform but not on x86 platform. So this bug was hidden on x86 platform for a while, but now it is exposed on Loongarch platform. The root cause is that a zero length skb (skb->len == 0) was put on the queue. This zero length skb is a TCP FIN packet, which was sent by shutdown(), invoked in test_sockmap_skb_verdict_shutdown(): shutdown(p1, SHUT_WR); In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no page is put to this sge (see sg_set_page in sg_set_page), but this empty sge is queued into ingress_msg list. And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it to kmap_local_page() and to page_address(), then kernel panics. To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(), if copy is zero, that means it's a zero length skb, skip invoking copy_page_to_iter(). We are using the EFAULT return triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn	2024-07-09 10:24:50 +02:00
Bartosz Golaszewski	91581c4b3f	gpio: virtuser: new virtual testing driver for the GPIO API The GPIO subsystem used to have a serious problem with undefined behavior and use-after-free bugs on hot-unplug of GPIO chips. This can be considered a corner-case by some as most GPIO controllers are enabled early in the boot process and live until the system goes down but most GPIO drivers do allow unbind over sysfs, many are loadable modules that can be (force) unloaded and there are also GPIO devices that can be dynamically detached, for instance CP2112 which is a USB GPIO expender. Bugs can be triggered both from user-space as well as by in-kernel users. We have the means of testing it from user-space via the character device but the issues manifest themselves differently in the kernel. This is a proposition of adding a new virtual driver - a configurable GPIO consumer that can be configured over configfs (similarly to gpio-sim) or described on the device-tree. This driver is aimed as a helper in spotting any regressions in hot-unplug handling in GPIOLIB. Link: https://lore.kernel.org/r/20240708142912.120570-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>	2024-07-09 09:39:54 +02:00
Thorsten Blum	21b9e722ad	m68k: cmpxchg: Fix return value for default case in __arch_xchg() The return value of __invalid_xchg_size() is assigned to tmp instead of the return variable x. Assign it to x instead. Fixes: `2501cf768e` ("m68k: Fix xchg/cmpxchg to fail to link if given an inappropriate pointer") Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/20240702034116.140234-2-thorsten.blum@toblux.com Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2024-07-09 09:19:31 +02:00
Uwe Kleine-König	d2eb433c85	ALSA: ppc: keywest: Drop explicit initialization of struct i2c_device_id::driver_data to 0 The driver doesn't use the driver_data member of struct i2c_device_id, so don't explicitly initialize this member. This prepares putting driver_data in an anonymous union which requires either no initialization or named designators. But it's also a nice cleanup on its own. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/20240708130254.9631-2-u.kleine-koenig@baylibre.com Signed-off-by: Takashi Iwai <tiwai@suse.de>	2024-07-09 09:16:30 +02:00
Christoph Hellwig	61353a63a2	block: take offset into account in blk_bvec_map_sg again The rebase of commit `09595e0c9d` ("block: pass a phys_addr_t to get_max_segment_size") lost adding the total to to the offset in blk_bvec_map_sg. Add it back. Fixes: `09595e0c9d` ("block: pass a phys_addr_t to get_max_segment_size") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240709070126.3019940-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 01:02:44 -06:00
Johannes Berg	6e909f4891	wifi: virt_wifi: don't use strlen() in const context Looks like not all compilers allow strlen(constant) as a constant, so don't do that. Instead, revert back to defining the length as the first submission had it. Fixes: `b5d14b0c67` ("wifi: virt_wifi: avoid reporting connection success with wrong SSID") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407090934.NnR1TUbW-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202407090944.mpwLHGt9-lkp@intel.com/ Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 08:49:56 +02:00
Arnd Bergmann	05a01ce773	Merge tag 'imx-defconfig-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/defconfig i.MX defconfig change for 6.11: - Enable a few drivers needed by TQMa7x/MBa7x and i.MX53 QSB/QSRB in imx_v6_v7_defconfig - Enable IWLWIFI driver support in arm64 defconfig * tag 'imx-defconfig-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: defconfig: Enable the IWLWIFI driver ARM: imx_v6_v7_defconfig: enable DRM_SII902X and DRM_DISPLAY_CONNECTOR ARM: imx_v6_v7_defconfig: Enable drivers for TQMa7x/MBa7x Link: https://lore.kernel.org/r/20240709012534.3106441-1-shawnguo2@yeah.net Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-07-09 08:21:26 +02:00
Chaitanya Kulkarni	0ffc46eb1b	block: fix get_max_segment_size() warning Correct the parameter name in the comment of get_max_segment_size() to fix following warning:- block/blk-merge.c:220: warning: Function parameter or struct member 'len' not described in 'get_max_segment_size' block/blk-merge.c:220: warning: Excess function parameter 'max_len' description in 'get_max_segment_size' Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240709045432.8688-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:01:11 -06:00
John Garry	9423c653fe	loop: Don't bother validating blocksize The block queue limits validation does this for us now. The loop_configure() -> WARN_ON_ONCE() call is dropped, as an invalid block size would trigger this now. We don't want userspace to be able to directly trigger WARNs. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20240708091651.177447-6-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:00:17 -06:00
John Garry	af28172291	virtio_blk: Don't bother validating blocksize The block queue limits validation does this for us now. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240708091651.177447-5-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:00:17 -06:00
John Garry	addc3a68de	null_blk: Don't bother validating blocksize The block queue limits validation does this for us now. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20240708091651.177447-4-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:00:17 -06:00
John Garry	fe3d508ba9	block: Validate logical block size in blk_validate_limits() Some drivers validate that their own logical block size. It is no harm to always do this, so validate in blk_validate_limits(). This allows us to remove the validation in most of those drivers. Add a comment to blk_validate_block_size() to inform users that self- validation of LBS is usually unnecessary. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240708091651.177447-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:00:17 -06:00
John Garry	4ff3d01275	virtio_blk: Fix default logical block size fallback If we fail to read a logical block size in virtblk_read_limits() -> virtio_cread_feature(), then we default to what is in lim->logical_block_size, but that would be 0. We can deal with lim->logical_block_size = 0 later in the blk_mq_alloc_disk(), but the code in virtblk_read_limits() needs a proper default, so give a default of SECTOR_SIZE. Fixes: `27e32cd23f` ("block: pass a queue_limits argument to blk_mq_alloc_disk") Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240708091651.177447-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-09 00:00:17 -06:00
Jens Axboe	6b43537fae	Merge tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme into for-6.11/block Pull NVMe updates from Keith: "nvme updates for Linux 6.11 - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng)" * tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme: (21 commits) nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id nvme-multipath: implement "queue-depth" iopolicy nvme-multipath: prepare for "queue-depth" iopolicy nvme-pci: do not directly handle subsys reset fallout lpfc_nvmet: implement 'host_traddr' nvme-fcloop: implement 'host_traddr' nvmet-fc: implement host_traddr() nvmet-rdma: implement host_traddr() nvmet-tcp: implement host_traddr() nvmet: add 'host_traddr' callback for debugfs nvmet: add debugfs support mailmap: add entry for Weiwen Hu nvme: rename CDR/MORE/DNR to NVME_STATUS_* nvme: fix status magic numbers nvme: rename nvme_sc_to_pr_err to nvme_status_to_pr_err nvme: split device add from initialization nvme: fc: split controller bringup handling nvme: rdma: split controller bringup handling nvme: tcp: split controller bringup handling ...	2024-07-08 23:57:02 -06:00
Yicong Yang	54624acf88	dma-mapping: benchmark: Don't starve others when doing the test The test thread will start N benchmark kthreads and then schedule out until the test time finished and notify the benchmark kthreads to stop. The benchmark kthreads will keep running until notified to stop. There's a problem with current implementation when the benchmark kthreads number is equal to the CPUs on a non-preemptible kernel: since the scheduler will balance the kthreads across the CPUs and when the test time's out the test thread won't get a chance to be scheduled on any CPU then cannot notify the benchmark kthreads to stop. This can be easily reproduced on a VM (simulated with 16 CPUs) with PREEMPT_VOLUNTARY: estuary:/mnt$ ./dma_map_benchmark -t 16 -s 1 rcu: INFO: rcu_sched self-detected stall on CPU rcu: 10-...!: (5221 ticks this GP) idle=ed24/1/0x4000000000000000 softirq=142/142 fqs=0 rcu: (t=5254 jiffies g=-559 q=45 ncpus=16) rcu: rcu_sched kthread starved for 5255 jiffies! g-559 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=12 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_sched state:R running task stack:0 pid:16 tgid:16 ppid:2 flags:0x00000008 Call trace __switch_to+0xec/0x138 __schedule+0x2f8/0x1080 schedule+0x30/0x130 schedule_timeout+0xa0/0x188 rcu_gp_fqs_loop+0x128/0x528 rcu_gp_kthread+0x1c8/0x208 kthread+0xec/0xf8 ret_from_fork+0x10/0x20 Sending NMI from CPU 10 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 332 Comm: dma-map-benchma Not tainted 6.10.0-rc1-vanilla-LSE #8 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x218/0x730 lr : arm_smmu_cmdq_issue_cmdlist+0x488/0x730 sp : ffff80008748b630 x29: ffff80008748b630 x28: 0000000000000000 x27: ffff80008748b780 x26: 0000000000000000 x25: 000000000000bc70 x24: 000000000001bc70 x23: ffff0000c12af080 x22: 0000000000010000 x21: 000000000000ffff x20: ffff80008748b700 x19: ffff0000c12af0c0 x18: 0000000000010000 x17: 0000000000000001 x16: 0000000000000040 x15: ffffffffffffffff x14: 0001ffffffffffff x13: 000000000000ffff x12: 00000000000002f1 x11: 000000000001ffff x10: 0000000000000031 x9 : ffff800080b6b0b8 x8 : ffff0000c2a48000 x7 : 000000000001bc71 x6 : 0001800000000000 x5 : 00000000000002f1 x4 : 01ffffffffffffff x3 : 000000000009aaf1 x2 : 0000000000000018 x1 : 000000000000000f x0 : ffff0000c12af18c Call trace: arm_smmu_cmdq_issue_cmdlist+0x218/0x730 __arm_smmu_tlb_inv_range+0xe0/0x1a8 arm_smmu_iotlb_sync+0xc0/0x128 __iommu_dma_unmap+0x248/0x320 iommu_dma_unmap_page+0x5c/0xe8 dma_unmap_page_attrs+0x38/0x1d0 map_benchmark_thread+0x118/0x2c0 kthread+0xec/0xf8 ret_from_fork+0x10/0x20 Solve this by adding scheduling point in the kthread loop, so if there're other threads in the system they may have a chance to run, especially the thread to notify the test end. However this may degrade the test concurrency so it's recommended to run this on an idle system. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2024-07-09 07:48:32 +02:00
Chen Ni	b80cc4df11	ipc: mqueue: remove assignment from IS_ERR argument Remove assignment from IS_ERR() argument. This is detected by coccinelle. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240708080404.3859094-1-nichen@iscas.ac.cn Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-09 06:47:40 +02:00
Wojciech Gładysz	83f4414b8f	ext4: sanity check for NULL pointer after ext4_force_shutdown Test case: 2 threads write short inline data to a file. In ext4_page_mkwrite the resulting inline data is converted. Handling ext4_grp_locked_error with description "block bitmap and bg descriptor inconsistent: X vs Y free clusters" calls ext4_force_shutdown. The conversion clears EXT4_STATE_MAY_INLINE_DATA but fails for ext4_destroy_inline_data_nolock and ext4_mark_iloc_dirty due to ext4_forced_shutdown. The restoration of inline data fails for the same reason not setting EXT4_STATE_MAY_INLINE_DATA. Without the flag set a regular process path in ext4_da_write_end follows trying to dereference page folio private pointer that has not been set. The fix calls early return with -EIO error shall the pointer to private be NULL. Sample crash report: Unable to handle kernel paging request at virtual address dfff800000000004 KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [dfff800000000004] address between user and kernel address ranges Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 20274 Comm: syz-executor185 Not tainted 6.9.0-rc7-syzkaller-gfda5695d692c #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __block_commit_write+0x64/0x2b0 fs/buffer.c:2167 lr : __block_commit_write+0x3c/0x2b0 fs/buffer.c:2160 sp : ffff8000a1957600 x29: ffff8000a1957610 x28: dfff800000000000 x27: ffff0000e30e34b0 x26: 0000000000000000 x25: dfff800000000000 x24: dfff800000000000 x23: fffffdffc397c9e0 x22: 0000000000000020 x21: 0000000000000020 x20: 0000000000000040 x19: fffffdffc397c9c0 x18: 1fffe000367bd196 x17: ffff80008eead000 x16: ffff80008ae89e3c x15: 00000000200000c0 x14: 1fffe0001cbe4e04 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000ff0100 x9 : 0000000000000000 x8 : 0000000000000004 x7 : 0000000000000000 x6 : 0000000000000000 x5 : fffffdffc397c9c0 x4 : 0000000000000020 x3 : 0000000000000020 x2 : 0000000000000040 x1 : 0000000000000020 x0 : fffffdffc397c9c0 Call trace: __block_commit_write+0x64/0x2b0 fs/buffer.c:2167 block_write_end+0xb4/0x104 fs/buffer.c:2253 ext4_da_do_write_end fs/ext4/inode.c:2955 [inline] ext4_da_write_end+0x2c4/0xa40 fs/ext4/inode.c:3028 generic_perform_write+0x394/0x588 mm/filemap.c:3985 ext4_buffered_write_iter+0x2c0/0x4ec fs/ext4/file.c:299 ext4_file_write_iter+0x188/0x1780 call_write_iter include/linux/fs.h:2110 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0x968/0xc3c fs/read_write.c:590 ksys_write+0x15c/0x26c fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __arm64_sys_write+0x7c/0x90 fs/read_write.c:652 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Code: 97f85911 f94002da 91008356 d343fec8 (38796908) ---[ end trace 0000000000000000 ]--- ---------------- Code disassembly (best guess): 0: 97f85911 bl 0xffffffffffe16444 4: f94002da ldr x26, [x22] 8: 91008356 add x22, x26, #0x20 c: d343fec8 lsr x8, x22, #3 * 10: 38796908 ldrb w8, [x8, x25] <-- trapping instruction Reported-by: syzbot+18df508cf00a0598d9a6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=18df508cf00a0598d9a6 Link: https://lore.kernel.org/all/000000000000f19a1406109eb5c5@google.com/T/ Signed-off-by: Wojciech Gładysz <wojciech.gladysz@infogain.com> Link: https://patch.msgid.link/20240703070112.10235-1-wojciech.gladysz@infogain.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	a794c9ad02	jbd2: increase maximum transaction size Originally, we were quite conservative in limiting maximum transaction size to a quarter of the journal because we were not accounting transaction descriptor and revoke blocks. These days we do properly account them and reserve space for them from the total transaction credits. Thus there's no need to be so conservative and we can increase the maximum transaction size to one third of the journal (even half should work fine in principle but the performance will likely suffer in that case). This also fixes failures to grow filesystems with tiny journals. Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	1cf5b024a3	jbd2: drop pointless shrinker batch initialization In jbd2_journal_init_common() we set batch size of a shrinker shrinking checkpointed buffers to journal->j_max_transaction_buffers. But that is guaranteed to be 0 at that point so we effectively stay with the default shrinker batch size of 128. It has been like this since introduction of jbd2 shrinkers so just drop the pointless initialization. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	27ba5b6731	jbd2: avoid infinite transaction commit loop Commit `9f356e5a4f` ("jbd2: Account descriptor blocks into t_outstanding_credits") started to account descriptor blocks into transactions outstanding credits. However it didn't appropriately decrease the maximum amount of credits available to userspace. Thus if the filesystem requests a transaction smaller than j_max_transaction_buffers but large enough that when descriptor blocks are added the size exceeds j_max_transaction_buffers, we confuse add_transaction_credits() into thinking previous handles have grown the transaction too much and enter infinite journal commit loop in start_this_handle() -> add_transaction_credits() trying to create transaction with enough credits available. Fix the problem by properly accounting for transaction space reserved for descriptor blocks when verifying requested transaction handle size. CC: stable@vger.kernel.org Fixes: `9f356e5a4f` ("jbd2: Account descriptor blocks into t_outstanding_credits") Reported-by: Alexander Coffin <alex.coffin@maticrobots.com> Link: https://lore.kernel.org/all/CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	e3a00a2378	jbd2: precompute number of transaction descriptor blocks Instead of computing the number of descriptor blocks a transaction can have each time we need it (which is currently when starting each transaction but will become more frequent later) precompute the number once during journal initialization together with maximum transaction size. We perform the precomputation whenever journal feature set is updated similarly as for computation of journal->j_revoke_records_per_block. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	4aa99c71e4	jbd2: make jbd2_journal_get_max_txn_bufs() internal There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Ye Bin	0bab8db415	jbd2: avoid mount failed when commit block is partial submitted We encountered a problem that the file system could not be mounted in the power-off scenario. The analysis of the file system mirror shows that only part of the data is written to the last commit block. The valid data of the commit block is concentrated in the first sector. However, the data of the entire block is involved in the checksum calculation. For different hardware, the minimum atomic unit may be different. If the checksum of a committed block is incorrect, clear the data except the 'commit_header' and then calculate the checksum. If the checkusm is correct, it is considered that the block is partially committed, Then continue to replay journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240620072405.3533701-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	65121eff3e	ext4: avoid writing unitialized memory to disk in EA inodes If the extended attribute size is not a multiple of block size, the last block in the EA inode will have uninitialized tail which will get written to disk. We will never expose the data to userspace but still this is not a good practice so just zero out the tail of the block as it isn't going to cause a noticeable performance overhead. Fixes: `e50e5129f3` ("ext4: xattr-in-inode support") Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Luis Henriques (SUSE)	7882b0187b	ext4: don't track ranges in fast_commit if inode has inlined data When fast-commit needs to track ranges, it has to handle inodes that have inlined data in a different way because ext4_fc_write_inode_data(), in the actual commit path, will attempt to map the required blocks for the range. However, inodes that have inlined data will have it's data stored in inode->i_block and, eventually, in the extended attribute space. Unfortunately, because fast commit doesn't currently support extended attributes, the solution is to mark this commit as ineligible. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883 Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Tested-by: Ben Hutchings <benh@debian.org> Fixes: `9725958bb7` ("ext4: fast commit may miss tracking unwritten range during ftruncate") Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00

... 94 95 96 97 98 ...

1296460 Commits