linux

Commit Graph

Author	SHA1	Message	Date
Jens Axboe	b000145e99	io_uring/rw: defer fsnotify calls to task context We can't call these off the kiocb completion as that might be off soft/hard irq context. Defer the calls to when we process the task_work for this request. That avoids valid complaints like: stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.0.0-rc6-syzkaller-00321-g105a36f3694e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_usage_bug kernel/locking/lockdep.c:3961 [inline] valid_state kernel/locking/lockdep.c:3973 [inline] mark_lock_irq kernel/locking/lockdep.c:4176 [inline] mark_lock.part.0.cold+0x18/0xd8 kernel/locking/lockdep.c:4632 mark_lock kernel/locking/lockdep.c:4596 [inline] mark_usage kernel/locking/lockdep.c:4527 [inline] __lock_acquire+0x11d9/0x56d0 kernel/locking/lockdep.c:5007 lock_acquire kernel/locking/lockdep.c:5666 [inline] lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631 __fs_reclaim_acquire mm/page_alloc.c:4674 [inline] fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688 might_alloc include/linux/sched/mm.h:271 [inline] slab_pre_alloc_hook mm/slab.h:700 [inline] slab_alloc mm/slab.c:3278 [inline] __kmem_cache_alloc_lru mm/slab.c:3471 [inline] kmem_cache_alloc+0x39/0x520 mm/slab.c:3491 fanotify_alloc_fid_event fs/notify/fanotify/fanotify.c:580 [inline] fanotify_alloc_event fs/notify/fanotify/fanotify.c:813 [inline] fanotify_handle_event+0x1130/0x3f40 fs/notify/fanotify/fanotify.c:948 send_to_group fs/notify/fsnotify.c:360 [inline] fsnotify+0xafb/0x1680 fs/notify/fsnotify.c:570 __fsnotify_parent+0x62f/0xa60 fs/notify/fsnotify.c:230 fsnotify_parent include/linux/fsnotify.h:77 [inline] fsnotify_file include/linux/fsnotify.h:99 [inline] fsnotify_access include/linux/fsnotify.h:309 [inline] __io_complete_rw_common+0x485/0x720 io_uring/rw.c:195 io_complete_rw+0x1a/0x1f0 io_uring/rw.c:228 iomap_dio_complete_work fs/iomap/direct-io.c:144 [inline] iomap_dio_bio_end_io+0x438/0x5e0 fs/iomap/direct-io.c:178 bio_endio+0x5f9/0x780 block/bio.c:1564 req_bio_endio block/blk-mq.c:695 [inline] blk_update_request+0x3fc/0x1300 block/blk-mq.c:825 scsi_end_request+0x7a/0x9a0 drivers/scsi/scsi_lib.c:541 scsi_io_completion+0x173/0x1f70 drivers/scsi/scsi_lib.c:971 scsi_complete+0x122/0x3b0 drivers/scsi/scsi_lib.c:1438 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1022 __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 common_interrupt+0xa9/0xc0 arch/x86/kernel/irq.c:240 Fixes: `f63cf5192f` ("io_uring: ensure that fsnotify is always called") Link: https://lore.kernel.org/all/20220929135627.ykivmdks2w5vzrwg@quack3/ Reported-by: syzbot+dfcc5f4da15868df7d4d@syzkaller.appspotmail.com Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-29 11:00:41 -06:00
Mike Rapoport	b9dd04a20f	arm64/mm: fold check for KFENCE into can_set_direct_map() KFENCE requires linear map to be mapped at page granularity, so that it is possible to protect/unprotect single pages, just like with rodata_full and DEBUG_PAGEALLOC. Instead of repating can_set_direct_map() \|\| IS_ENABLED(CONFIG_KFENCE) make can_set_direct_map() handle the KFENCE case. This also prevents potential false positives in kernel_page_present() that may return true for non-present page if CONFIG_KFENCE is enabled. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220921074841.382615-1-rppt@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-29 17:59:28 +01:00
Mark Rutland	8cfb08575c	arm64: ftrace: fix module PLTs with mcount Li Huafei reports that mcount-based ftrace with module PLTs was broken by commit: `a625357997` ("arm64: ftrace: consistently handle PLTs.") When a module PLTs are used and a module is loaded sufficiently far away from the kernel, we'll create PLTs for any branches which are out-of-range. These are separate from the special ftrace trampoline PLTs, which the module PLT code doesn't directly manipulate. When mcount is in use this is a problem, as each mcount callsite in a module will be initialized to point to a module PLT, but since commit `a625357997` ftrace_make_nop() will assume that the callsite has been initialized to point to the special ftrace trampoline PLT, and ftrace_find_callable_addr() rejects other cases. This means that when ftrace tries to initialize a callsite via ftrace_make_nop(), the call to ftrace_find_callable_addr() will find that the `_mcount` stub is out-of-range and is not handled by the ftrace PLT, resulting in a splat: \| ftrace_test: loading out-of-tree module taints kernel. \| ftrace: no module PLT for _mcount \| ------------[ ftrace bug ]------------ \| ftrace failed to modify \| [<ffff800029180014>] 0xffff800029180014 \| actual: 44:00:00:94 \| Initializing ftrace call sites \| ftrace record flags: 2000000 \| (0) \| expected tramp: ffff80000802eb3c \| ------------[ cut here ]------------ \| WARNING: CPU: 3 PID: 157 at kernel/trace/ftrace.c:2120 ftrace_bug+0x94/0x270 \| Modules linked in: \| CPU: 3 PID: 157 Comm: insmod Tainted: G O 6.0.0-rc6-00151-gcd722513a189-dirty #22 \| Hardware name: linux,dummy-virt (DT) \| pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) \| pc : ftrace_bug+0x94/0x270 \| lr : ftrace_bug+0x21c/0x270 \| sp : ffff80000b2bbaf0 \| x29: ffff80000b2bbaf0 x28: 0000000000000000 x27: ffff0000c4d38000 \| x26: 0000000000000001 x25: ffff800009d7e000 x24: ffff0000c4d86e00 \| x23: 0000000002000000 x22: ffff80000a62b000 x21: ffff8000098ebea8 \| x20: ffff0000c4d38000 x19: ffff80000aa24158 x18: ffffffffffffffff \| x17: 0000000000000000 x16: 0a0d2d2d2d2d2d2d x15: ffff800009aa9118 \| x14: 0000000000000000 x13: 6333626532303830 x12: 3030303866666666 \| x11: 203a706d61727420 x10: 6465746365707865 x9 : 3362653230383030 \| x8 : c0000000ffffefff x7 : 0000000000017fe8 x6 : 000000000000bff4 \| x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000001 \| x2 : ad2cb14bb5438900 x1 : 0000000000000000 x0 : 0000000000000022 \| Call trace: \| ftrace_bug+0x94/0x270 \| ftrace_process_locs+0x308/0x430 \| ftrace_module_init+0x44/0x60 \| load_module+0x15b4/0x1ce8 \| __do_sys_init_module+0x1ec/0x238 \| __arm64_sys_init_module+0x24/0x30 \| invoke_syscall+0x54/0x118 \| el0_svc_common.constprop.4+0x84/0x100 \| do_el0_svc+0x3c/0xd0 \| el0_svc+0x1c/0x50 \| el0t_64_sync_handler+0x90/0xb8 \| el0t_64_sync+0x15c/0x160 \| ---[ end trace 0000000000000000 ]--- \| ---------test_init----------- Fix this by reverting to the old behaviour of ignoring the old instruction when initialising an mcount callsite in a module, which was the behaviour prior to commit `a625357997`. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: `a625357997` ("arm64: ftrace: consistently handle PLTs.") Reported-by: Li Huafei <lihuafei1@huawei.com> Link: https://lore.kernel.org/linux-arm-kernel/20220929094134.99512-1-lihuafei1@huawei.com Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220929134525.798593-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-29 17:47:54 +01:00
Li Huafei	5de2291605	arm64: module: Remove unused plt_entry_is_initialized() Since commit `f1a54ae9af` ("arm64: module/ftrace: intialize PLT at load time"), plt_entry_is_initialized() is unused anymore , so remove it. Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220929094134.99512-3-lihuafei1@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-29 17:47:18 +01:00
Li Huafei	3fb420f56c	arm64: module: Make plt_equals_entry() static Since commit `4e69ecf4da` ("arm64/module: ftrace: deal with place relative nature of PLTs"), plt_equals_entry() is not used outside of module-plts.c, so make it static. Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220929094134.99512-2-lihuafei1@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-29 17:47:18 +01:00
Mark Rutland	ba00c2a04f	arm64: fix the build with binutils 2.27 Jon Hunter reports that for some toolchains the build has been broken since commit: `4c0bd995d7` ("arm64: alternatives: have callbacks take a cap") ... with a stream of build-time splats of the form: \| CC arch/arm64/kvm/hyp/vhe/debug-sr.o \| /tmp/ccY3kbki.s: Assembler messages: \| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1600: Error: junk at end of line, first unrecognized character \| is `L' \| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')' \| /tmp/ccY3kbki.s:1723: Error: junk at end of line, first unrecognized character \| is `L' \| scripts/Makefile.build:249: recipe for target \| 'arch/arm64/kvm/hyp/vhe/debug-sr.o' failed The issue here is that older versions of binutils (up to and including 2.27.0) don't like an 'L' suffix on constants. For plain assembly files, UL() avoids this suffix, but in C files this gets added, and so for inline assembly we can't directly use a constant defined with `UL()`. We could avoid this by passing the constant as an input parameter, but this isn't practical given the way we use the alternative macros. Instead, just open code the constant without the `UL` suffix, and for consistency do this for both the inline assembly macro and the regular assembly macro. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: `4c0bd995d7` ("arm64: alternatives: have callbacks take a cap") Reported-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/linux-arm-kernel/3cecc3a5-30b0-f0bd-c3de-9e09bd21909b@nvidia.com/ Tested-by: Jon Hunter <jonathanh@nvidia.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220929150227.1028556-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-29 17:43:32 +01:00
Mickaël Salaün	2fff00c81d	landlock: Fix documentation style It seems that all code should use double backquotes, which is also used to convert "%" defines. Let's use an homogeneous style and remove all use of simple backquotes (which should only be used for emphasis). Cc: Günther Noack <gnoack3000@gmail.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20220923154207.3311629-4-mic@digikod.net	2022-09-29 18:43:04 +02:00
Mickaël Salaün	16023b05f0	landlock: Slightly improve documentation and fix spelling Now that we have more than one ABI version, make limitation explanation more consistent by replacing "ABI 1" with "ABI < 2". This also indicates which ABIs support such past limitation. Improve documentation consistency by not using contractions. Fix spelling in fs.c . Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20220923154207.3311629-3-mic@digikod.net	2022-09-29 18:43:03 +02:00
Mickaël Salaün	903cfe8a7a	samples/landlock: Print hints about ABI versions Extend the help with the latest Landlock ABI version supported by the sandboxer. Inform users about the sandboxer or the kernel not being up-to-date. Make the version check code easier to update and harder to misuse. Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20220923154207.3311629-2-mic@digikod.net	2022-09-29 18:43:01 +02:00
Dmitry Baryshkov	994c77ed37	clk: qcom: gcc-msm8939: use ARRAY_SIZE instead of specifying num_parents Use ARRAY_SIZE() instead of manually specifying num_parents. This makes adding/removing entries to/from parent_data easy and errorproof. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220928145609.375860-4-dmitry.baryshkov@linaro.org	2022-09-29 11:42:12 -05:00
Dmitry Baryshkov	f565f9235a	clk: qcom: gcc-msm8939: use parent_hws where possible Use parent_hws instead of hanving parent_data with just a single .hw entry to speed up and simplify parent lookups. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220928145609.375860-3-dmitry.baryshkov@linaro.org	2022-09-29 11:42:12 -05:00
Dmitry Baryshkov	2ab5b56638	dt-bindings: clock: move qcom,gcc-msm8939 to qcom,gcc-msm8916.yaml The MSM8939 GCC bindings are fully comptible with MSM8916, the clock controller requires the same parent clocks, move MSM8939 GCC compatible to qcom,msm8916.yaml Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220928145609.375860-2-dmitry.baryshkov@linaro.org	2022-09-29 11:42:11 -05:00
Luca Weiss	a01ef02093	clk: qcom: gcc-sm6350: Update the .pwrsts for usb gdscs The USB controllers on sm6350 do not retain the state when the system goes into low power state and the GDSCs are turned off. This can be observed by the USB connection not coming back alive after putting the device into suspend, essentially breaking USB. Fix this by updating the .pwrsts for the USB GDSCs so they only transition to retention state in low power. Cc: Rajendra Nayak <quic_rjendra@quicinc.com> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220928132853.179425-1-luca.weiss@fairphone.com	2022-09-29 11:42:11 -05:00
Johan Hovold	27da533af9	clk: qcom: gcc-sc8280xp: use retention for USB power domains Since commit `d399723950` ("clk: qcom: gdsc: Fix the handling of PWRSTS_RET support) retention mode can be used on sc8280xp to maintain state during suspend instead of leaving the domain always on. This is needed to eventually allow the parent CX domain to be powered down during suspend. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220929161124.18138-1-johan+linaro@kernel.org	2022-09-29 11:42:08 -05:00
Johan Hovold	eab4c1ebdd	clk: qcom: gdsc: add missing error handling Since commit `7eb231c337` ("PM / Domains: Convert pm_genpd_init() to return an error code") pm_genpd_init() can return an error which the caller must handle. The current error handling was also incomplete as the runtime PM and regulator use counts were not balanced in all error paths. Add the missing error handling to the GDSC initialisation to avoid continuing as if nothing happened on errors. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220929155816.17425-1-johan+linaro@kernel.org	2022-09-29 11:34:46 -05:00
Mark Brown	55c8a987dd	kselftest/arm64: Don't enable v8.5 for MTE selftest builds Currently we set -march=armv8.5+memtag when building the MTE selftests, allowing the compiler to emit v8.5 and MTE instructions for anything it generates. This means that we may get code that will generate SIGILLs when run on older systems rather than skipping on non-MTE systems as should be the case. Most toolchains don't select any incompatible instructions but I have seen some reports which suggest that some may be appearing which do so. This is also potentially problematic in that if the compiler chooses to emit any MTE instructions for the C code it may interfere with the MTE usage we are trying to test. Since the only reason we are specifying this option is to allow us to assemble MTE instructions in mte_helper.S we can avoid these issues by moving to using a .arch directive there and adding the -march explicitly to the toolchain support check instead of the generic CFLAGS. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220928154517.173108-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-29 17:34:40 +01:00
Saurabh Sengar	4c3386f64a	drm/hyperv: Add ratelimit on error message Due to a full ring buffer, the driver may be unable to send updates to the Hyper-V host. But outputing the error message can make the problem worse because console output is also typically written to the frame buffer. Rate limiting the error message, also output the error code for additional diagnosability. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1662736193-31379-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-09-29 16:28:28 +00:00
Alexei Starovoitov	5ee35abb46	Merge branch 'bpf: Remove recursion check for struct_ops prog' Martin KaFai Lau says: ==================== From: Martin KaFai Lau <martin.lau@kernel.org> The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. The kernel does not call the tcp-cc's ops in a recursive way, so this set is to remove the recursion check for struct_ops prog. v3: - Clear the bpf_chg_cc_inprogress from the newly cloned tcp_sock in tcp_create_openreq_child() because the listen sk can be cloned without lock being held. (Eric Dumazet) v2: - v1 [0] turned into a long discussion on a few cases and also whether it needs to follow the bpf_run_ctx chain if there is tracing bpf_run_ctx (kprobe/trace/trampoline) running in between. It is a good signal that it is not obvious enough to reason about it and needs a tradeoff for a more straight forward approach. This revision uses one bit out of an existing 1 byte hole in the tcp_sock. It is in Patch 4. [0]: https://lore.kernel.org/bpf/20220922225616.3054840-1-kafai@fb.com/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911 ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-29 09:25:48 -07:00
Martin KaFai Lau	3411c5b6f8	selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) This patch changes the bpf_dctcp test to ensure the recurred bpf_setsockopt(TCP_CONGESTION) returns -EBUSY. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-29 09:25:47 -07:00
Martin KaFai Lau	061ff04071	bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-29 09:25:47 -07:00
Martin KaFai Lau	1e7d217faa	bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into another function. The next patch will add extra logic to avoid recursion and this will make the latter patch easier to follow. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-29 09:25:47 -07:00
Martin KaFai Lau	37cfbe0bf2	bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks. However, it is not done for cg sock_ddr, cg sockops, and some of the bpf_lsm_cgroup hooks. The tcp-cc "cdg" should have very limited usage. This patch is to move the "cdg" check to the common sol_tcp_sockopt() so that all hooks have a consistent behavior. The motivation to make this check consistent now is because the latter patch will refactor the bpf_setsockopt(TCP_CONGESTION) into another function, so it is better to take this chance to refactor this piece also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-29 09:25:47 -07:00
Martin KaFai Lau	64696c40d0	bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: `ca06f55b90` ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-29 09:25:47 -07:00
Jaroslav Kysela	c8d18e4402	ASoC: core: clarify the driver name initialization The driver field in the struct snd_ctl_card_info is a valid user space identifier. Actually, many ASoC drivers do not care and let to initialize this field using a standard wrapping method. Unfortunately, in this way, this field becomes unusable and unreadable for the drivers with longer card names. Also, there is a possibility to have clashes (driver field has only limit of 15 characters). This change will print an error when the wrapping is used. The developers of the affected drivers should fix the problem. Signed-off-by: Jaroslav Kysela <perex@perex.cz> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2022-09-29 17:22:41 +01:00
Marc Zyngier	732d69c80c	Merge branch irq/misc-6.1 into irq/irqchip-next * irq/misc-6.1: : . : Misc irqchip updates for 6.1: : : - Allow generic irqchip support without selecting CONFIG_OF_IRQ : : - Fix a couple of bindings for TI interrupts controllers : : - Yet another binding update for a Renesas SoC : : - The obligatory fixes from the spelling police : . dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support irqchip/gic-v3: Fix typo in comment dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells irqchip: Make irqchip_init() usable on pure ACPI systems Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-09-29 17:21:16 +01:00
Marc Zyngier	aa28080873	Merge branch irq/rtl-imap-deprecation into irq/irqchip-next * irq/rtl-imap-deprecation: : . : Deprecate interrupt-map property for realtek-rtl irqchip : : Patches from Sander Vanheule. : . irqchip/realtek-rtl: use parent interrupts dt-bindings: interrupt-controller: realtek,rtl-intc: require parents irqchip/realtek-rtl: use irq_domain_add_linear() Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-09-29 17:21:09 +01:00
Marc Zyngier	4b0b6c7cd7	Merge branch irq/fsl-mu-msi into irq/irqchip-next * irq/fsl-mu-msi: : . : Platform MSI controller driver for the IMX MU block : : Patches from Frank Li. : . dt-bindings: irqchip: Describe the IMX MU block as a MSI controller irqchip: Add IMX MU MSI controller driver irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END platform-msi: Export symbol platform_msi_create_irq_domain() Signed-off-by: Marc Zyngier <maz@kernel.org>	2022-09-29 17:20:58 +01:00
Frank Li	7c025238b4	dt-bindings: irqchip: Describe the IMX MU block as a MSI controller I.MX MU supports generating IRQs by writing to a register. Describe its use as a MSI controller so that other blocks (such as a PCI EP) can use it directly. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> [maz: commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220922161246.20586-5-Frank.Li@nxp.com	2022-09-29 17:19:36 +01:00
Matt Ranostay	693e9c269e	dmaengine: ti: k3-psil: add additional TX threads for j721e Add matching PSI-L threads mapping for transmission DMA channels on the J721E platform. Signed-off-by: Matt Ranostay <mranostay@ti.com> Link: https://lore.kernel.org/r/20220919205931.8397-2-mranostay@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:48:09 +05:30
Matt Ranostay	5cfeaf7cc5	dmaengine: ti: k3-psil: add additional TX threads for j7200 Add matching PSI-L threads mapping for transmission DMA channels on the J7200 platform. Signed-off-by: Matt Ranostay <mranostay@ti.com> Link: https://lore.kernel.org/r/20220919205931.8397-3-mranostay@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:48:09 +05:30
Martin Povišer	6aed75d7cc	dmaengine: apple-admac: Trigger shared reset If a reset domain is attached to the device, obtain a shared reference to it and trigger it. Typically on a chip the ADMAC controller will share a reset domain with the MCA peripheral. Signed-off-by: Martin Povišer <povik+lin@cutebit.org> Link: https://lore.kernel.org/r/20220918095845.68860-5-povik+lin@cutebit.org Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:43:25 +05:30
Martin Povišer	072431595a	dmaengine: apple-admac: Do not use devres for IRQs This is in advance of adding support for triggering the reset signal to the peripheral, since registering the IRQ handler will have to be sequenced with it. Signed-off-by: Martin Povišer <povik+lin@cutebit.org> Link: https://lore.kernel.org/r/20220918095845.68860-4-povik+lin@cutebit.org Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:43:25 +05:30
Frank Li	70afdab904	irqchip: Add IMX MU MSI controller driver The MU block found in a number of Freescale/NXP SoCs supports generating IRQs by writing data to a register. This enables the MU block to be used as a MSI controller, by leveraging the platform-MSI API. Signed-off-by: Frank Li <Frank.Li@nxp.com> [maz: dropped pointless dma-iommu.h and of_pci.h includes] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220922161246.20586-4-Frank.Li@nxp.com	2022-09-29 17:11:37 +01:00
Linus Torvalds	334b2cea81	x86/mm: Add prot_sethuge() helper to abstract out _PAGE_PSE handling We still have some historic cases of direct fiddling of page attributes with (dangerous & fragile) type casting and address shifting. Add the prot_sethuge() helper instead that gets the types right and doesn't have to transform addresses. ( Also add a debug check to make sure this doesn't get applied to _PAGE_BIT_PAT/_PAGE_BIT_PAT_LARGE pages. ) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dave Hansen <dave.hansen@intel.com>	2022-09-29 18:01:40 +02:00
Jiapeng Chong	0f4c5b29e3	dmaengine: ti: edma: Remove some unused functions These functions are defined in the edma.c file, but not called elsewhere, so delete these unused functions. drivers/dma/ti/edma.c:746:31: warning: unused function 'to_edma_cc'. drivers/dma/ti/edma.c:420:20: warning: unused function 'edma_param_or'. drivers/dma/ti/edma.c:414:20: warning: unused function 'edma_param_and'. drivers/dma/ti/edma.c:402:20: warning: unused function 'edma_param_write'. drivers/dma/ti/edma.c:373:28: warning: unused function 'edma_shadow0_read'. drivers/dma/ti/edma.c:396:28: warning: unused function 'edma_param_read'. drivers/dma/ti/edma.c:355:20: warning: unused function 'edma_or_array'. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2152 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20220914101943.83929-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:25:37 +05:30
Bhupesh Sharma	65add05cfd	dt-bindings: dma: Make minor fixes to qcom,bam-dma binding doc As a user recently noted, the qcom,bam-dma binding document describes the msm8974 BAM DMA node in the 'example section' incorrectly. Fix the same by making it consistent with the node present inside 'qcom-msm8974' dts file, namely the 'reg' and 'interrupt' values which are incorrect in the 'example section'. While at it also make two additioanal minor cleanups: - mention Bjorn's new email ID in the document, and - add SDM845 in the comment line for the SoCs on which qcom,bam-v1.7.0 version is supported. Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220926112200.1948080-1-bhupesh.sharma@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:24:09 +05:30
Deming Wang	19ea810e88	Documentation: devicetree: dma: update the comments remove the double word to. Signed-off-by: Deming Wang <wangdeming@inspur.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20220920020721.2190-1-wangdeming@inspur.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:17:37 +05:30
Gustavo A. R. Silva	45ecf27f30	dmaengine: sh: rcar-dmac: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper Zero-length arrays are deprecated and we are moving towards adopting C99 flexible-array members, instead. So, replace zero-length arrays declarations in anonymous union with the new DECLARE_FLEX_ARRAY() helper macro. This helper allows for flexible-array members in unions. Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/217 Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/YzIdsJqsR3LH2qEK@work Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:14:32 +05:30
Linus Torvalds	511cce163b	Networking fixes for 6.0-rc8, including fixes from wifi and can. Current release - regressions: - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume() - wifi: fix locking in mac80211 mlme - eth: - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe Previous releases - regressions: - wifi: fix regression with non-QoS drivers Previous releases - always broken: - mptcp: fix unreleased socket in accept queue - wifi: - don't start TX with fq->lock to fix deadlock - fix memory corruption in minstrel_ht_update_rates() - eth: - macb: fix ZynqMP SGMII non-wakeup source resume failure - mt7531: only do PLL once after the reset - usbnet: fix memory leak in usbnet_disconnect() Misc: - usb: qmi_wwan: add new usb-id for Dell branded EM7455 Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmM1cawSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkyn4P/3sP2MK9yjBNWZ+NcjGAapXqm5MPttDX pTihRgVVR0ldAQCvLaKS6NtB9W/o2KnQ6znsNvba5fEE8MskKCv+kh3kIe5kzq6B JrpEQUHOS7XlIgQnNIUITym1n9A79CHvPWvuQzbSr+5TbaEncM2KN/0UFi+sqkrY Gz+2BUvJqeJShqQYtZCRQDrNxOmpKtRLHuXmskS0XlSHp0bp8nz/8zQOLEIHMnqB xLHRzOgpRBIXMPO3IWTP8AHkYmuyh7Pdf1IZ5uPgVhBmcfVR7UvXQUSCXt21WlhT SoLbOuT/zAFTbehkGY5B2S40h9qUvw2WBcHO3go59PwT9NOP2a2V1qcj6C75/rt2 5Gnw75vT0Z5+VyuCHlyK4K2OVdiSpe/OMY8ZTYIRy8cGKXycAlK3AS5/m7Y+UG37 SG+DrfkrBjC1GYKcFugC3zjLW1eQ+KKWY6z9j8PgWbZ3hgmWo5g9DXsRep7cDFUF 6bzspxCQDn53WSLKnDRxIdFGPKR6bzn7Nys/qhyxaBdW59xLohXPqaF+n92K7bV8 lkrDzq0knAsNWDKUT8sDs0ATMHx7MgzOlKwEMkkvhV9F3psiob9ISdU1Mwn4dbi1 guWrSm5YtFxSu5iuAeOOZs0gZCtXuWtq0cJVsjnyZHnselcqO6Tfuvs2vzamI1KI MFnI5EZ6nY48 =WVqg -----END PGP SIGNATURE----- Merge tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wifi and can. Current release - regressions: - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume() - wifi: fix locking in mac80211 mlme - eth: - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe Previous releases - regressions: - wifi: fix regression with non-QoS drivers Previous releases - always broken: - mptcp: fix unreleased socket in accept queue - wifi: - don't start TX with fq->lock to fix deadlock - fix memory corruption in minstrel_ht_update_rates() - eth: - macb: fix ZynqMP SGMII non-wakeup source resume failure - mt7531: only do PLL once after the reset - usbnet: fix memory leak in usbnet_disconnect() Misc: - usb: qmi_wwan: add new usb-id for Dell branded EM7455" * tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits) mptcp: fix unreleased socket in accept queue mptcp: factor out __mptcp_close() without socket lock net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge can: c_can: don't cache TX messages for C_CAN cores ice: xsk: drop power of 2 ring size restriction for AF_XDP ice: xsk: change batched Tx descriptor cleaning net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 selftests: Fix the if conditions of in test_extra_filter() net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume() net: stmmac: power up/down serdes in stmmac_open/release wifi: mac80211: mlme: Fix double unlock on assoc success handling wifi: mac80211: mlme: Fix missing unlock on beacon RX wifi: mac80211: fix memory corruption in minstrel_ht_update_rates() wifi: mac80211: fix regression with non-QoS drivers wifi: mac80211: ensure vif queues are operational after start wifi: mac80211: don't start TX with fq->lock to fix deadlock wifi: cfg80211: fix MCS divisor value net: hippi: Add missing pci_disable_device() in rr_init_one() net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe ...	2022-09-29 08:32:53 -07:00
Colin Ian King	9aa0dade8f	phy: phy-mtk-dp: make array driving_params static const Don't populate the read-only array driving_params on the stack but instead make it static const. Also makes the object code a little smaller. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220929130147.97375-1-colin.i.king@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-09-29 21:01:27 +05:30
Linus Torvalds	da9eede6b2	Input updates for v6.0-rc7 - small fixes for iqs62x-keys and melfas_mip4 drivers - corrected register address in snvs_pwrkey driver - Synaptic driver will stop trying to use intertouch (native) mode on some Lenovo AMD devices -----BEGIN PGP SIGNATURE----- iJAEABYKADgWIQST2eWILY88ieB2DOtAj56VGEWXnAUCYzUgfBocZG1pdHJ5LnRv cm9raG92QGdtYWlsLmNvbQAKCRBAj56VGEWXnEkhAP4/cOTiILkNKTzbu3nPAFn5 qVBp+wDpFrjN5zQKEIyOVgEAr5k7STjixjnneZnR7+ppald6Ti7LXaTESoXnqrY9 XwY= =zsDx -----END PGP SIGNATURE----- Merge tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - small fixes for iqs62x-keys and melfas_mip4 drivers - corrected register address in snvs_pwrkey driver - Synaptic driver will stop trying to use intertouch (native) mode on some Lenovo AMD devices * tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address Input: synaptics - disable Intertouch for Lenovo T14 and P14s AMD G1 Input: iqs62x-keys - drop unused device node references Input: melfas_mip4 - fix return value check in mip4_probe()	2022-09-29 08:22:53 -07:00
Yang Yingliang	28366dd2ec	spi: spi-gxp: Use devm_platform_ioremap_resource() Use the devm_platform_ioremap_resource() helper instead of calling platform_get_resource() and devm_ioremap_resource() separately. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220928145256.1879256-1-yangyingliang@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-09-29 16:16:26 +01:00
Zhang Qilong	b73f11e895	ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe The pm_runtime_enable will increase power disable depth. Thus a pairing decrement is needed on the error handling path to keep it balanced according to context. We fix it by moving pm_runtime_enable to the endding of mt6660_i2c_probe. Fixes:f289e55c6eeb4 ("ASoC: Add MediaTek MT6660 Speaker Amp Driver") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20220928160116.125020-5-zhangqilong3@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-09-29 16:16:24 +01:00
Zhang Qilong	fcbb60820c	ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe The pm_runtime_enable will increase power disable depth. Thus a pairing decrement is needed on the error handling path to keep it balanced according to context. We fix it by moving pm_runtime_enable to the endding of wm5102_probe. Fixes:93e8791dd34ca ("ASoC: wm5102: Initial driver") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20220928160116.125020-4-zhangqilong3@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-09-29 16:16:23 +01:00
Zhang Qilong	86b46bf1fe	ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe The pm_runtime_enable will increase power disable depth. Thus a pairing decrement is needed on the error handling path to keep it balanced according to context. We fix it by moving pm_runtime_enable to the endding of wm5110_probe. Fixes:5c6af635fd772 ("ASoC: wm5110: Add audio CODEC driver") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20220928160116.125020-3-zhangqilong3@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-09-29 16:16:22 +01:00
Zhang Qilong	41a736ac20	ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe The pm_runtime_enable will increase power disable depth. Thus a pairing decrement is needed on the error handling path to keep it balanced according to context. We fix it by moving pm_runtime_enable to the endding of wm8997_probe Fixes:40843aea5a9bd ("ASoC: wm8997: Initial CODEC driver") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20220928160116.125020-2-zhangqilong3@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-09-29 16:16:21 +01:00
Tetsuo Handa	cbddcc4fa3	btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer syzbot is reporting uninit-value in btrfs_clean_tree_block() [1], for commit `bc877d285c` ("btrfs: Deduplicate extent_buffer init code") missed that btrfs_set_header_generation() in btrfs_init_new_buffer() must not be moved to after clean_tree_block() because clean_tree_block() is calling btrfs_header_generation() since commit `55c69072d6` ("Btrfs: Fix extent_buffer usage when nodesize != leafsize"). Since memzero_extent_buffer() will reset "struct btrfs_header" part, we can't move btrfs_set_header_generation() to before memzero_extent_buffer(). Just re-add btrfs_set_header_generation() before btrfs_clean_tree_block(). Link: https://syzkaller.appspot.com/bug?extid=fba8e2116a12609b6c59 [1] Reported-by: syzbot <syzbot+fba8e2116a12609b6c59@syzkaller.appspotmail.com> Fixes: `bc877d285c` ("btrfs: Deduplicate extent_buffer init code") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-29 17:08:31 +02:00
Filipe Manana	db21370bff	btrfs: drop extent map range more efficiently Currently when dropping extent maps for a file range, through btrfs_drop_extent_map_range(), we do the following non-optimal things: 1) We lookup for extent maps one by one, always starting the search from the root of the extent map tree. This is not efficient if we have multiple extent maps in the range; 2) We check on every iteration if we have the 'split' and 'split2' spare extent maps in case we need to split an extent map that intersects our range but also crosses its boundaries (to the left, to the right or both cases). If our target range is for example: [2M, 8M) And we have 3 extents maps in the range: [1M, 3M) [3M, 6M) [6M, 10M[ The on the first iteration we allocate two extent maps for 'split' and 'split2', and use the 'split' to split the first extent map, so after the split we set 'split' to 'split2' and then set 'split2' to NULL. On the second iteration, we don't need to split the second extent map, but because 'split2' is now NULL, we allocate a new extent map for 'split2'. On the third iteration we need to split the third extent map, so we use the extent map pointed by 'split'. So we ended up allocating 3 extent maps for splitting, but all we needed was 2 extent maps. We never need to allocate more than 2, because extent maps that need to be split are always the first one and the last one in the target range. Improve on this by: 1) Using rb_next() to move on to the next extent map. This results in iterating over less nodes of the tree and it does not require comparing the ranges of nodes to our start/end offset; 2) Allocate the 2 extent maps for splitting before entering the loop and never allocate more than 2. In practice it's very rare to have the combination of both extent map allocations fail, since we have a dedicated slab for extent maps, and also have the need to split two extent maps. This patch is part of a patchset comprised of the following patches: btrfs: fix missed extent on fsync after dropping extent maps btrfs: move btrfs_drop_extent_cache() to extent_map.c btrfs: use extent_map_end() at btrfs_drop_extent_map_range() btrfs: use cond_resched_rwlock_write() during inode eviction btrfs: move open coded extent map tree deletion out of inode eviction btrfs: add helper to replace extent map range with a new extent map btrfs: remove the refcount warning/check at free_extent_map() btrfs: remove unnecessary extent map initializations btrfs: assert tree is locked when clearing extent map from logging btrfs: remove unnecessary NULL pointer checks when searching extent maps btrfs: remove unnecessary next extent map search btrfs: avoid pointless extent map tree search when flushing delalloc btrfs: drop extent map range more efficiently And the following fio test was done before and after applying the whole patchset, on a non-debug kernel (Debian's default kernel config) on a 12 cores Intel box with 64G of ram: $ cat test.sh #!/bin/bash DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-R free-space-tree -O no-holes" cat <<EOF > /tmp/fio-job.ini [writers] rw=randwrite fsync=8 fallocate=none group_reporting=1 direct=0 bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5 ioengine=psync filesize=2G runtime=300 time_based directory=$MNT numjobs=8 thread EOF echo performance \| \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo echo "Using config:" echo cat /tmp/fio-job.ini echo umount $MNT &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT fio /tmp/fio-job.ini umount $MNT Result before applying the patchset: WRITE: bw=197MiB/s (206MB/s), 197MiB/s-197MiB/s (206MB/s-206MB/s), io=57.7GiB (61.9GB), run=300188-300188msec Result after applying the patchset: WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=59.5GiB (63.9GB), run=300019-300019msec Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-29 17:08:31 +02:00
Filipe Manana	b54bb86556	btrfs: avoid pointless extent map tree search when flushing delalloc When flushing delalloc, in COW mode at cow_file_range(), before entering the loop that allocates extents and creates ordered extents, we do a call to btrfs_drop_extent_map_range() for the whole range. This is pointless because in the loop we call create_io_em(), which will also call btrfs_drop_extent_map_range() before inserting the new extent map. So remove that call at cow_file_range() not only because it is not needed, but also because it will make the btrfs_drop_extent_map_range() calls made from create_io_em() waste time searching the extent map tree, and that tree can be large for files with many extents. It also makes us waste time at btrfs_drop_extent_map_range() allocating and freeing the split extent maps for nothing. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-29 17:08:31 +02:00
Filipe Manana	6c05813ebb	btrfs: remove unnecessary next extent map search At __tree_search(), and its single caller __lookup_extent_mapping(), there is no point in finding the next extent map that starts after the search offset if we were able to find the previous extent map that ends before our search offset, because __lookup_extent_mapping() ignores the next acceptable extent map if we were able to find the previous one. So just return immediately if we were able to find the previous extent map, therefore avoiding wasting time iterating the tree looking for the next extent map which will not be used by __lookup_extent_mapping(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-29 17:08:31 +02:00

... 47 48 49 50 51 ...

1136096 Commits All Branches Search

1136096 Commits

All Branches