Commit Graph

1136096 Commits

Author SHA1 Message Date
Jens Axboe b000145e99 io_uring/rw: defer fsnotify calls to task context
We can't call these off the kiocb completion as that might be off
soft/hard irq context. Defer the calls to when we process the
task_work for this request. That avoids valid complaints like:

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.0.0-rc6-syzkaller-00321-g105a36f3694e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_usage_bug kernel/locking/lockdep.c:3961 [inline]
 valid_state kernel/locking/lockdep.c:3973 [inline]
 mark_lock_irq kernel/locking/lockdep.c:4176 [inline]
 mark_lock.part.0.cold+0x18/0xd8 kernel/locking/lockdep.c:4632
 mark_lock kernel/locking/lockdep.c:4596 [inline]
 mark_usage kernel/locking/lockdep.c:4527 [inline]
 __lock_acquire+0x11d9/0x56d0 kernel/locking/lockdep.c:5007
 lock_acquire kernel/locking/lockdep.c:5666 [inline]
 lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
 __fs_reclaim_acquire mm/page_alloc.c:4674 [inline]
 fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688
 might_alloc include/linux/sched/mm.h:271 [inline]
 slab_pre_alloc_hook mm/slab.h:700 [inline]
 slab_alloc mm/slab.c:3278 [inline]
 __kmem_cache_alloc_lru mm/slab.c:3471 [inline]
 kmem_cache_alloc+0x39/0x520 mm/slab.c:3491
 fanotify_alloc_fid_event fs/notify/fanotify/fanotify.c:580 [inline]
 fanotify_alloc_event fs/notify/fanotify/fanotify.c:813 [inline]
 fanotify_handle_event+0x1130/0x3f40 fs/notify/fanotify/fanotify.c:948
 send_to_group fs/notify/fsnotify.c:360 [inline]
 fsnotify+0xafb/0x1680 fs/notify/fsnotify.c:570
 __fsnotify_parent+0x62f/0xa60 fs/notify/fsnotify.c:230
 fsnotify_parent include/linux/fsnotify.h:77 [inline]
 fsnotify_file include/linux/fsnotify.h:99 [inline]
 fsnotify_access include/linux/fsnotify.h:309 [inline]
 __io_complete_rw_common+0x485/0x720 io_uring/rw.c:195
 io_complete_rw+0x1a/0x1f0 io_uring/rw.c:228
 iomap_dio_complete_work fs/iomap/direct-io.c:144 [inline]
 iomap_dio_bio_end_io+0x438/0x5e0 fs/iomap/direct-io.c:178
 bio_endio+0x5f9/0x780 block/bio.c:1564
 req_bio_endio block/blk-mq.c:695 [inline]
 blk_update_request+0x3fc/0x1300 block/blk-mq.c:825
 scsi_end_request+0x7a/0x9a0 drivers/scsi/scsi_lib.c:541
 scsi_io_completion+0x173/0x1f70 drivers/scsi/scsi_lib.c:971
 scsi_complete+0x122/0x3b0 drivers/scsi/scsi_lib.c:1438
 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1022
 __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
 invoke_softirq kernel/softirq.c:445 [inline]
 __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
 common_interrupt+0xa9/0xc0 arch/x86/kernel/irq.c:240

Fixes: f63cf5192f ("io_uring: ensure that fsnotify is always called")
Link: https://lore.kernel.org/all/20220929135627.ykivmdks2w5vzrwg@quack3/
Reported-by: syzbot+dfcc5f4da15868df7d4d@syzkaller.appspotmail.com
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-29 11:00:41 -06:00
Mike Rapoport b9dd04a20f arm64/mm: fold check for KFENCE into can_set_direct_map()
KFENCE requires linear map to be mapped at page granularity, so that it
is possible to protect/unprotect single pages, just like with
rodata_full and DEBUG_PAGEALLOC.

Instead of repating

	can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)

make can_set_direct_map() handle the KFENCE case.

This also prevents potential false positives in kernel_page_present()
that may return true for non-present page if CONFIG_KFENCE is enabled.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220921074841.382615-1-rppt@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:59:28 +01:00
Mark Rutland 8cfb08575c arm64: ftrace: fix module PLTs with mcount
Li Huafei reports that mcount-based ftrace with module PLTs was broken
by commit:

  a625357997 ("arm64: ftrace: consistently handle PLTs.")

When a module PLTs are used and a module is loaded sufficiently far away
from the kernel, we'll create PLTs for any branches which are
out-of-range. These are separate from the special ftrace trampoline
PLTs, which the module PLT code doesn't directly manipulate.

When mcount is in use this is a problem, as each mcount callsite in a
module will be initialized to point to a module PLT, but since commit
a625357997 ftrace_make_nop() will assume that the callsite has
been initialized to point to the special ftrace trampoline PLT, and
ftrace_find_callable_addr() rejects other cases.

This means that when ftrace tries to initialize a callsite via
ftrace_make_nop(), the call to ftrace_find_callable_addr() will find
that the `_mcount` stub is out-of-range and is not handled by the ftrace
PLT, resulting in a splat:

| ftrace_test: loading out-of-tree module taints kernel.
| ftrace: no module PLT for _mcount
| ------------[ ftrace bug ]------------
| ftrace failed to modify
| [<ffff800029180014>] 0xffff800029180014
|  actual:   44:00:00:94
| Initializing ftrace call sites
| ftrace record flags: 2000000
|  (0)
|  expected tramp: ffff80000802eb3c
| ------------[ cut here ]------------
| WARNING: CPU: 3 PID: 157 at kernel/trace/ftrace.c:2120 ftrace_bug+0x94/0x270
| Modules linked in:
| CPU: 3 PID: 157 Comm: insmod Tainted: G           O       6.0.0-rc6-00151-gcd722513a189-dirty #22
| Hardware name: linux,dummy-virt (DT)
| pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : ftrace_bug+0x94/0x270
| lr : ftrace_bug+0x21c/0x270
| sp : ffff80000b2bbaf0
| x29: ffff80000b2bbaf0 x28: 0000000000000000 x27: ffff0000c4d38000
| x26: 0000000000000001 x25: ffff800009d7e000 x24: ffff0000c4d86e00
| x23: 0000000002000000 x22: ffff80000a62b000 x21: ffff8000098ebea8
| x20: ffff0000c4d38000 x19: ffff80000aa24158 x18: ffffffffffffffff
| x17: 0000000000000000 x16: 0a0d2d2d2d2d2d2d x15: ffff800009aa9118
| x14: 0000000000000000 x13: 6333626532303830 x12: 3030303866666666
| x11: 203a706d61727420 x10: 6465746365707865 x9 : 3362653230383030
| x8 : c0000000ffffefff x7 : 0000000000017fe8 x6 : 000000000000bff4
| x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000001
| x2 : ad2cb14bb5438900 x1 : 0000000000000000 x0 : 0000000000000022
| Call trace:
|  ftrace_bug+0x94/0x270
|  ftrace_process_locs+0x308/0x430
|  ftrace_module_init+0x44/0x60
|  load_module+0x15b4/0x1ce8
|  __do_sys_init_module+0x1ec/0x238
|  __arm64_sys_init_module+0x24/0x30
|  invoke_syscall+0x54/0x118
|  el0_svc_common.constprop.4+0x84/0x100
|  do_el0_svc+0x3c/0xd0
|  el0_svc+0x1c/0x50
|  el0t_64_sync_handler+0x90/0xb8
|  el0t_64_sync+0x15c/0x160
| ---[ end trace 0000000000000000 ]---
| ---------test_init-----------

Fix this by reverting to the old behaviour of ignoring the old
instruction when initialising an mcount callsite in a module, which was
the behaviour prior to commit a625357997.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a625357997 ("arm64: ftrace: consistently handle PLTs.")
Reported-by: Li Huafei <lihuafei1@huawei.com>
Link: https://lore.kernel.org/linux-arm-kernel/20220929094134.99512-1-lihuafei1@huawei.com
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220929134525.798593-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:47:54 +01:00
Li Huafei 5de2291605 arm64: module: Remove unused plt_entry_is_initialized()
Since commit f1a54ae9af ("arm64: module/ftrace: intialize PLT at load
time"), plt_entry_is_initialized() is unused anymore , so remove it.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220929094134.99512-3-lihuafei1@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:47:18 +01:00
Li Huafei 3fb420f56c arm64: module: Make plt_equals_entry() static
Since commit 4e69ecf4da ("arm64/module: ftrace: deal with place
relative nature of PLTs"), plt_equals_entry() is not used outside of
module-plts.c, so make it static.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220929094134.99512-2-lihuafei1@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:47:18 +01:00
Mark Rutland ba00c2a04f arm64: fix the build with binutils 2.27
Jon Hunter reports that for some toolchains the build has been broken
since commit:

  4c0bd995d7 ("arm64: alternatives: have callbacks take a cap")

... with a stream of build-time splats of the form:

|   CC      arch/arm64/kvm/hyp/vhe/debug-sr.o
| /tmp/ccY3kbki.s: Assembler messages:
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: junk at end of line, first unrecognized character
| is `L'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: junk at end of line, first unrecognized character
| is `L'
| scripts/Makefile.build:249: recipe for target
| 'arch/arm64/kvm/hyp/vhe/debug-sr.o' failed

The issue here is that older versions of binutils (up to and including
2.27.0) don't like an 'L' suffix on constants. For plain assembly files,
UL() avoids this suffix, but in C files this gets added, and so for
inline assembly we can't directly use a constant defined with `UL()`.

We could avoid this by passing the constant as an input parameter, but
this isn't practical given the way we use the alternative macros.
Instead, just open code the constant without the `UL` suffix, and for
consistency do this for both the inline assembly macro and the regular
assembly macro.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 4c0bd995d7 ("arm64: alternatives: have callbacks take a cap")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/linux-arm-kernel/3cecc3a5-30b0-f0bd-c3de-9e09bd21909b@nvidia.com/
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220929150227.1028556-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:43:32 +01:00
Mickaël Salaün 2fff00c81d
landlock: Fix documentation style
It seems that all code should use double backquotes, which is also used
to convert "%" defines.  Let's use an homogeneous style and remove all
use of simple backquotes (which should only be used for emphasis).

Cc: Günther Noack <gnoack3000@gmail.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220923154207.3311629-4-mic@digikod.net
2022-09-29 18:43:04 +02:00
Mickaël Salaün 16023b05f0
landlock: Slightly improve documentation and fix spelling
Now that we have more than one ABI version, make limitation explanation
more consistent by replacing "ABI 1" with "ABI < 2".  This also
indicates which ABIs support such past limitation.

Improve documentation consistency by not using contractions.

Fix spelling in fs.c .

Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20220923154207.3311629-3-mic@digikod.net
2022-09-29 18:43:03 +02:00
Mickaël Salaün 903cfe8a7a
samples/landlock: Print hints about ABI versions
Extend the help with the latest Landlock ABI version supported by the
sandboxer.

Inform users about the sandboxer or the kernel not being up-to-date.

Make the version check code easier to update and harder to misuse.

Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20220923154207.3311629-2-mic@digikod.net
2022-09-29 18:43:01 +02:00
Dmitry Baryshkov 994c77ed37 clk: qcom: gcc-msm8939: use ARRAY_SIZE instead of specifying num_parents
Use ARRAY_SIZE() instead of manually specifying num_parents. This makes
adding/removing entries to/from parent_data easy and errorproof.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220928145609.375860-4-dmitry.baryshkov@linaro.org
2022-09-29 11:42:12 -05:00
Dmitry Baryshkov f565f9235a clk: qcom: gcc-msm8939: use parent_hws where possible
Use parent_hws instead of hanving parent_data with just a single .hw
entry to speed up and simplify parent lookups.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220928145609.375860-3-dmitry.baryshkov@linaro.org
2022-09-29 11:42:12 -05:00
Dmitry Baryshkov 2ab5b56638 dt-bindings: clock: move qcom,gcc-msm8939 to qcom,gcc-msm8916.yaml
The MSM8939 GCC bindings are fully comptible with MSM8916, the clock
controller requires the same parent clocks, move MSM8939 GCC compatible
to qcom,msm8916.yaml

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220928145609.375860-2-dmitry.baryshkov@linaro.org
2022-09-29 11:42:11 -05:00
Luca Weiss a01ef02093 clk: qcom: gcc-sm6350: Update the .pwrsts for usb gdscs
The USB controllers on sm6350 do not retain the state when
the system goes into low power state and the GDSCs are
turned off.

This can be observed by the USB connection not coming back alive after
putting the device into suspend, essentially breaking USB.

Fix this by updating the .pwrsts for the USB GDSCs so they only
transition to retention state in low power.

Cc: Rajendra Nayak <quic_rjendra@quicinc.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220928132853.179425-1-luca.weiss@fairphone.com
2022-09-29 11:42:11 -05:00
Johan Hovold 27da533af9 clk: qcom: gcc-sc8280xp: use retention for USB power domains
Since commit d399723950 ("clk: qcom: gdsc: Fix the handling of
PWRSTS_RET support) retention mode can be used on sc8280xp to maintain
state during suspend instead of leaving the domain always on.

This is needed to eventually allow the parent CX domain to be powered
down during suspend.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220929161124.18138-1-johan+linaro@kernel.org
2022-09-29 11:42:08 -05:00
Johan Hovold eab4c1ebdd clk: qcom: gdsc: add missing error handling
Since commit 7eb231c337 ("PM / Domains: Convert pm_genpd_init() to
return an error code") pm_genpd_init() can return an error which the
caller must handle.

The current error handling was also incomplete as the runtime PM and
regulator use counts were not balanced in all error paths.

Add the missing error handling to the GDSC initialisation to avoid
continuing as if nothing happened on errors.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220929155816.17425-1-johan+linaro@kernel.org
2022-09-29 11:34:46 -05:00
Mark Brown 55c8a987dd kselftest/arm64: Don't enable v8.5 for MTE selftest builds
Currently we set -march=armv8.5+memtag when building the MTE selftests,
allowing the compiler to emit v8.5 and MTE instructions for anything it
generates. This means that we may get code that will generate SIGILLs when
run on older systems rather than skipping on non-MTE systems as should be
the case. Most toolchains don't select any incompatible instructions but
I have seen some reports which suggest that some may be appearing which do
so. This is also potentially problematic in that if the compiler chooses to
emit any MTE instructions for the C code it may interfere with the MTE
usage we are trying to test.

Since the only reason we are specifying this option is to allow us to
assemble MTE instructions in mte_helper.S we can avoid these issues by
moving to using a .arch directive there and adding the -march explicitly to
the toolchain support check instead of the generic CFLAGS.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220928154517.173108-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:34:40 +01:00
Saurabh Sengar 4c3386f64a drm/hyperv: Add ratelimit on error message
Due to a full ring buffer, the driver may be unable to send updates to
the Hyper-V host.  But outputing the error message can make the problem
worse because console output is also typically written to the frame
buffer.
Rate limiting the error message, also output the error code for additional
diagnosability.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1662736193-31379-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-29 16:28:28 +00:00
Alexei Starovoitov 5ee35abb46 Merge branch 'bpf: Remove recursion check for struct_ops prog'
Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion.  It turns
out the struct_ops bpf prog will hit this prog->active and
unnecessarily skipped running the struct_ops prog.  eg.  The
'.ssthresh' may run in_task() and then interrupted by softirq
that runs the same '.ssthresh'.

The kernel does not call the tcp-cc's ops in a recursive way,
so this set is to remove the recursion check for struct_ops prog.

v3:
- Clear the bpf_chg_cc_inprogress from the newly cloned tcp_sock
  in tcp_create_openreq_child() because the listen sk can
  be cloned without lock being held. (Eric Dumazet)

v2:
- v1 [0] turned into a long discussion on a few cases and also
  whether it needs to follow the bpf_run_ctx chain if there is
  tracing bpf_run_ctx (kprobe/trace/trampoline) running in between.

  It is a good signal that it is not obvious enough to reason
  about it and needs a tradeoff for a more straight forward approach.

  This revision uses one bit out of an existing 1 byte hole
  in the tcp_sock.  It is in Patch 4.

  [0]: https://lore.kernel.org/bpf/20220922225616.3054840-1-kafai@fb.com/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:48 -07:00
Martin KaFai Lau 3411c5b6f8 selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)
This patch changes the bpf_dctcp test to ensure the recurred
bpf_setsockopt(TCP_CONGESTION) returns -EBUSY.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau 061ff04071 bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
When a bad bpf prog '.init' calls
bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:

.init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
... => .init => bpf_setsockopt(tcp_cc).

It was prevented by the prog->active counter before but the prog->active
detection cannot be used in struct_ops as explained in the earlier
patch of the set.

In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
in order to break the loop.  This is done by using a bit of
an existing 1 byte hole in tcp_sock to check if there is
on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.

Note that this essentially limits only the first '.init' can
call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
does not support ECN) and the second '.init' cannot fallback to
another cc.  This applies even the second
bpf_setsockopt(TCP_CONGESTION) will not cause a loop.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau 1e7d217faa bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function
This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into
another function.  The next patch will add extra logic to avoid
recursion and this will make the latter patch easier to follow.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau 37cfbe0bf2 bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is
used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks.
However, it is not done for cg sock_ddr, cg sockops, and some of
the bpf_lsm_cgroup hooks.

The tcp-cc "cdg" should have very limited usage.  This patch is to
move the "cdg" check to the common sol_tcp_sockopt() so that all
hooks have a consistent behavior.   The motivation to make
this check consistent now is because the latter patch will
refactor the bpf_setsockopt(TCP_CONGESTION) into another function,
so it is better to take this chance to refactor this piece
also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau 64696c40d0 bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
The struct_ops prog is to allow using bpf to implement the functions in
a struct (eg. kernel module).  The current usage is to implement the
tcp_congestion.  The kernel does not call the tcp-cc's ops (ie.
the bpf prog) in a recursive way.

The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion.  It is
needed for tracing prog.  However, it turns out the struct_ops
bpf prog will hit this prog->active and unnecessarily skipped
running the struct_ops prog.  eg.  The '.ssthresh' may run in_task()
and then interrupted by softirq that runs the same '.ssthresh'.
Skip running the '.ssthresh' will end up returning random value
to the caller.

The patch adds __bpf_prog_{enter,exit}_struct_ops for the
struct_ops trampoline.  They do not track the prog->active
to detect recursion.

One exception is when the tcp_congestion's '.init' ops is doing
bpf_setsockopt(TCP_CONGESTION) and then recurs to the same
'.init' ops.  This will be addressed in the following patches.

Fixes: ca06f55b90 ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Jaroslav Kysela c8d18e4402
ASoC: core: clarify the driver name initialization
The driver field in the struct snd_ctl_card_info is a valid
user space identifier. Actually, many ASoC drivers do not care
and let to initialize this field using a standard wrapping method.
Unfortunately, in this way, this field becomes unusable and
unreadable for the drivers with longer card names. Also,
there is a possibility to have clashes (driver field has
only limit of 15 characters).

This change will print an error when the wrapping is used.
The developers of the affected drivers should fix the problem.

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-29 17:22:41 +01:00
Marc Zyngier 732d69c80c Merge branch irq/misc-6.1 into irq/irqchip-next
* irq/misc-6.1:
  : .
  : Misc irqchip updates for 6.1:
  :
  : - Allow generic irqchip support without selecting CONFIG_OF_IRQ
  :
  : - Fix a couple of bindings for TI interrupts controllers
  :
  : - Yet another binding update for a Renesas SoC
  :
  : - The obligatory fixes from the spelling police
  : .
  dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support
  irqchip/gic-v3: Fix typo in comment
  dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding
  dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells
  irqchip: Make irqchip_init() usable on pure ACPI systems

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-09-29 17:21:16 +01:00
Marc Zyngier aa28080873 Merge branch irq/rtl-imap-deprecation into irq/irqchip-next
* irq/rtl-imap-deprecation:
  : .
  : Deprecate interrupt-map property for realtek-rtl irqchip
  :
  : Patches from Sander Vanheule.
  : .
  irqchip/realtek-rtl: use parent interrupts
  dt-bindings: interrupt-controller: realtek,rtl-intc: require parents
  irqchip/realtek-rtl: use irq_domain_add_linear()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-09-29 17:21:09 +01:00
Marc Zyngier 4b0b6c7cd7 Merge branch irq/fsl-mu-msi into irq/irqchip-next
* irq/fsl-mu-msi:
  : .
  : Platform MSI controller driver for the IMX MU block
  :
  : Patches from Frank Li.
  : .
  dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
  irqchip: Add IMX MU MSI controller driver
  irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END
  platform-msi: Export symbol platform_msi_create_irq_domain()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-09-29 17:20:58 +01:00
Frank Li 7c025238b4 dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
I.MX MU supports generating IRQs by writing to a register.

Describe its use as a MSI controller so that other blocks (such as a PCI EP)
can use it directly.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220922161246.20586-5-Frank.Li@nxp.com
2022-09-29 17:19:36 +01:00
Matt Ranostay 693e9c269e dmaengine: ti: k3-psil: add additional TX threads for j721e
Add matching PSI-L threads mapping for transmission DMA channels
on the J721E platform.

Signed-off-by: Matt Ranostay <mranostay@ti.com>
Link: https://lore.kernel.org/r/20220919205931.8397-2-mranostay@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:48:09 +05:30
Matt Ranostay 5cfeaf7cc5 dmaengine: ti: k3-psil: add additional TX threads for j7200
Add matching PSI-L threads mapping for transmission DMA channels
on the J7200 platform.

Signed-off-by: Matt Ranostay <mranostay@ti.com>
Link: https://lore.kernel.org/r/20220919205931.8397-3-mranostay@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:48:09 +05:30
Martin Povišer 6aed75d7cc dmaengine: apple-admac: Trigger shared reset
If a reset domain is attached to the device, obtain a shared reference
to it and trigger it. Typically on a chip the ADMAC controller will
share a reset domain with the MCA peripheral.

Signed-off-by: Martin Povišer <povik+lin@cutebit.org>
Link: https://lore.kernel.org/r/20220918095845.68860-5-povik+lin@cutebit.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:43:25 +05:30
Martin Povišer 072431595a dmaengine: apple-admac: Do not use devres for IRQs
This is in advance of adding support for triggering the reset signal to
the peripheral, since registering the IRQ handler will have to be
sequenced with it.

Signed-off-by: Martin Povišer <povik+lin@cutebit.org>
Link: https://lore.kernel.org/r/20220918095845.68860-4-povik+lin@cutebit.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:43:25 +05:30
Frank Li 70afdab904 irqchip: Add IMX MU MSI controller driver
The MU block found in a number of Freescale/NXP SoCs supports generating
IRQs by writing data to a register.

This enables the MU block to be used as a MSI controller, by leveraging
the platform-MSI API.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
[maz: dropped pointless dma-iommu.h and of_pci.h includes]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220922161246.20586-4-Frank.Li@nxp.com
2022-09-29 17:11:37 +01:00
Linus Torvalds 334b2cea81 x86/mm: Add prot_sethuge() helper to abstract out _PAGE_PSE handling
We still have some historic cases of direct fiddling of page
attributes with (dangerous & fragile) type casting and address shifting.

Add the prot_sethuge() helper instead that gets the types right and
doesn't have to transform addresses.

( Also add a debug check to make sure this doesn't get applied
  to _PAGE_BIT_PAT/_PAGE_BIT_PAT_LARGE pages. )

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2022-09-29 18:01:40 +02:00
Jiapeng Chong 0f4c5b29e3 dmaengine: ti: edma: Remove some unused functions
These functions are defined in the edma.c file, but not called elsewhere,
so delete these unused functions.

drivers/dma/ti/edma.c:746:31: warning: unused function 'to_edma_cc'.
drivers/dma/ti/edma.c:420:20: warning: unused function 'edma_param_or'.
drivers/dma/ti/edma.c:414:20: warning: unused function 'edma_param_and'.
drivers/dma/ti/edma.c:402:20: warning: unused function 'edma_param_write'.
drivers/dma/ti/edma.c:373:28: warning: unused function 'edma_shadow0_read'.
drivers/dma/ti/edma.c:396:28: warning: unused function 'edma_param_read'.
drivers/dma/ti/edma.c:355:20: warning: unused function 'edma_or_array'.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2152
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20220914101943.83929-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:25:37 +05:30
Bhupesh Sharma 65add05cfd dt-bindings: dma: Make minor fixes to qcom,bam-dma binding doc
As a user recently noted, the qcom,bam-dma binding document
describes the msm8974 BAM DMA node in the 'example section'
incorrectly. Fix the same by making it consistent with the node
present inside 'qcom-msm8974' dts file, namely the 'reg' and
'interrupt' values which are incorrect in the 'example section'.

While at it also make two additioanal minor cleanups:
 - mention Bjorn's new email ID in the document, and
 - add SDM845 in the comment line for the SoCs on which
   qcom,bam-v1.7.0 version is supported.

Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220926112200.1948080-1-bhupesh.sharma@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:24:09 +05:30
Deming Wang 19ea810e88 Documentation: devicetree: dma: update the comments
remove the double word to.

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20220920020721.2190-1-wangdeming@inspur.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:17:37 +05:30
Gustavo A. R. Silva 45ecf27f30 dmaengine: sh: rcar-dmac: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.

This helper allows for flexible-array members in unions.

Link: https://github.com/KSPP/linux/issues/193
Link: https://github.com/KSPP/linux/issues/217
Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/YzIdsJqsR3LH2qEK@work
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:14:32 +05:30
Linus Torvalds 511cce163b Networking fixes for 6.0-rc8, including fixes from wifi and can.
Current release - regressions:
 
  - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()
 
  - wifi: fix locking in mac80211 mlme
 
  - eth:
    - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
    - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
 
 Previous releases - regressions:
 
  - wifi: fix regression with non-QoS drivers
 
 Previous releases - always broken:
 
  - mptcp: fix unreleased socket in accept queue
 
  - wifi:
    - don't start TX with fq->lock to fix deadlock
    - fix memory corruption in minstrel_ht_update_rates()
 
  - eth:
    - macb: fix ZynqMP SGMII non-wakeup source resume failure
    - mt7531: only do PLL once after the reset
    - usbnet: fix memory leak in usbnet_disconnect()
 
 Misc:
 
  - usb: qmi_wwan: add new usb-id for Dell branded EM7455
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmM1cawSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkyn4P/3sP2MK9yjBNWZ+NcjGAapXqm5MPttDX
 pTihRgVVR0ldAQCvLaKS6NtB9W/o2KnQ6znsNvba5fEE8MskKCv+kh3kIe5kzq6B
 JrpEQUHOS7XlIgQnNIUITym1n9A79CHvPWvuQzbSr+5TbaEncM2KN/0UFi+sqkrY
 Gz+2BUvJqeJShqQYtZCRQDrNxOmpKtRLHuXmskS0XlSHp0bp8nz/8zQOLEIHMnqB
 xLHRzOgpRBIXMPO3IWTP8AHkYmuyh7Pdf1IZ5uPgVhBmcfVR7UvXQUSCXt21WlhT
 SoLbOuT/zAFTbehkGY5B2S40h9qUvw2WBcHO3go59PwT9NOP2a2V1qcj6C75/rt2
 5Gnw75vT0Z5+VyuCHlyK4K2OVdiSpe/OMY8ZTYIRy8cGKXycAlK3AS5/m7Y+UG37
 SG+DrfkrBjC1GYKcFugC3zjLW1eQ+KKWY6z9j8PgWbZ3hgmWo5g9DXsRep7cDFUF
 6bzspxCQDn53WSLKnDRxIdFGPKR6bzn7Nys/qhyxaBdW59xLohXPqaF+n92K7bV8
 lkrDzq0knAsNWDKUT8sDs0ATMHx7MgzOlKwEMkkvhV9F3psiob9ISdU1Mwn4dbi1
 guWrSm5YtFxSu5iuAeOOZs0gZCtXuWtq0cJVsjnyZHnselcqO6Tfuvs2vzamI1KI
 MFnI5EZ6nY48
 =WVqg
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wifi and can.

  Current release - regressions:

   - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()

   - wifi: fix locking in mac80211 mlme

   - eth:
      - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
      - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe

  Previous releases - regressions:

   - wifi: fix regression with non-QoS drivers

  Previous releases - always broken:

   - mptcp: fix unreleased socket in accept queue

   - wifi:
      - don't start TX with fq->lock to fix deadlock
      - fix memory corruption in minstrel_ht_update_rates()

   - eth:
      - macb: fix ZynqMP SGMII non-wakeup source resume failure
      - mt7531: only do PLL once after the reset
      - usbnet: fix memory leak in usbnet_disconnect()

  Misc:

   - usb: qmi_wwan: add new usb-id for Dell branded EM7455"

* tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
  mptcp: fix unreleased socket in accept queue
  mptcp: factor out __mptcp_close() without socket lock
  net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2}
  net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge
  can: c_can: don't cache TX messages for C_CAN cores
  ice: xsk: drop power of 2 ring size restriction for AF_XDP
  ice: xsk: change batched Tx descriptor cleaning
  net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
  selftests: Fix the if conditions of in test_extra_filter()
  net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
  net: stmmac: power up/down serdes in stmmac_open/release
  wifi: mac80211: mlme: Fix double unlock on assoc success handling
  wifi: mac80211: mlme: Fix missing unlock on beacon RX
  wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
  wifi: mac80211: fix regression with non-QoS drivers
  wifi: mac80211: ensure vif queues are operational after start
  wifi: mac80211: don't start TX with fq->lock to fix deadlock
  wifi: cfg80211: fix MCS divisor value
  net: hippi: Add missing pci_disable_device() in rr_init_one()
  net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
  ...
2022-09-29 08:32:53 -07:00
Colin Ian King 9aa0dade8f phy: phy-mtk-dp: make array driving_params static const
Don't populate the read-only array driving_params on the stack but instead
make it static const. Also makes the object code a little smaller.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220929130147.97375-1-colin.i.king@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-09-29 21:01:27 +05:30
Linus Torvalds da9eede6b2 Input updates for v6.0-rc7
- small fixes for iqs62x-keys and melfas_mip4 drivers
 
 - corrected register address in snvs_pwrkey driver
 
 - Synaptic driver will stop trying to use intertouch (native) mode on
   some Lenovo AMD devices
 -----BEGIN PGP SIGNATURE-----
 
 iJAEABYKADgWIQST2eWILY88ieB2DOtAj56VGEWXnAUCYzUgfBocZG1pdHJ5LnRv
 cm9raG92QGdtYWlsLmNvbQAKCRBAj56VGEWXnEkhAP4/cOTiILkNKTzbu3nPAFn5
 qVBp+wDpFrjN5zQKEIyOVgEAr5k7STjixjnneZnR7+ppald6Ti7LXaTESoXnqrY9
 XwY=
 =zsDx
 -----END PGP SIGNATURE-----

Merge tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - small fixes for iqs62x-keys and melfas_mip4 drivers

 - corrected register address in snvs_pwrkey driver

 - Synaptic driver will stop trying to use intertouch (native) mode on
   some Lenovo AMD devices

* tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
  Input: synaptics - disable Intertouch for Lenovo T14 and P14s AMD G1
  Input: iqs62x-keys - drop unused device node references
  Input: melfas_mip4 - fix return value check in mip4_probe()
2022-09-29 08:22:53 -07:00
Yang Yingliang 28366dd2ec
spi: spi-gxp: Use devm_platform_ioremap_resource()
Use the devm_platform_ioremap_resource() helper instead of calling
platform_get_resource() and devm_ioremap_resource() separately.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220928145256.1879256-1-yangyingliang@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-29 16:16:26 +01:00
Zhang Qilong b73f11e895
ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe
The pm_runtime_enable will increase power disable depth. Thus
a pairing decrement is needed on the error handling path to
keep it balanced according to context. We fix it by moving
pm_runtime_enable to the endding of mt6660_i2c_probe.

Fixes:f289e55c6eeb4 ("ASoC: Add MediaTek MT6660 Speaker Amp Driver")

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20220928160116.125020-5-zhangqilong3@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-29 16:16:24 +01:00
Zhang Qilong fcbb60820c
ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe
The pm_runtime_enable will increase power disable depth. Thus
a pairing decrement is needed on the error handling path to
keep it balanced according to context. We fix it by moving
pm_runtime_enable to the endding of wm5102_probe.

Fixes:93e8791dd34ca ("ASoC: wm5102: Initial driver")

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20220928160116.125020-4-zhangqilong3@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-29 16:16:23 +01:00
Zhang Qilong 86b46bf1fe
ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe
The pm_runtime_enable will increase power disable depth. Thus
a pairing decrement is needed on the error handling path to
keep it balanced according to context. We fix it by moving
pm_runtime_enable to the endding of wm5110_probe.

Fixes:5c6af635fd772 ("ASoC: wm5110: Add audio CODEC driver")

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20220928160116.125020-3-zhangqilong3@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-29 16:16:22 +01:00
Zhang Qilong 41a736ac20
ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe
The pm_runtime_enable will increase power disable depth. Thus
a pairing decrement is needed on the error handling path to
keep it balanced according to context. We fix it by moving
pm_runtime_enable to the endding of wm8997_probe

Fixes:40843aea5a9bd ("ASoC: wm8997: Initial CODEC driver")

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20220928160116.125020-2-zhangqilong3@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-29 16:16:21 +01:00
Tetsuo Handa cbddcc4fa3 btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer
syzbot is reporting uninit-value in btrfs_clean_tree_block() [1], for
commit bc877d285c ("btrfs: Deduplicate extent_buffer init code")
missed that btrfs_set_header_generation() in btrfs_init_new_buffer() must
not be moved to after clean_tree_block() because clean_tree_block() is
calling btrfs_header_generation() since commit 55c69072d6 ("Btrfs:
Fix extent_buffer usage when nodesize != leafsize").

Since memzero_extent_buffer() will reset "struct btrfs_header" part, we
can't move btrfs_set_header_generation() to before memzero_extent_buffer().
Just re-add btrfs_set_header_generation() before btrfs_clean_tree_block().

Link: https://syzkaller.appspot.com/bug?extid=fba8e2116a12609b6c59 [1]
Reported-by: syzbot <syzbot+fba8e2116a12609b6c59@syzkaller.appspotmail.com>
Fixes: bc877d285c ("btrfs: Deduplicate extent_buffer init code")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-29 17:08:31 +02:00
Filipe Manana db21370bff btrfs: drop extent map range more efficiently
Currently when dropping extent maps for a file range, through
btrfs_drop_extent_map_range(), we do the following non-optimal things:

1) We lookup for extent maps one by one, always starting the search from
   the root of the extent map tree. This is not efficient if we have
   multiple extent maps in the range;

2) We check on every iteration if we have the 'split' and 'split2' spare
   extent maps in case we need to split an extent map that intersects our
   range but also crosses its boundaries (to the left, to the right or
   both cases). If our target range is for example:

       [2M, 8M)

   And we have 3 extents maps in the range:

       [1M, 3M) [3M, 6M) [6M, 10M[

   The on the first iteration we allocate two extent maps for 'split' and
   'split2', and use the 'split' to split the first extent map, so after
   the split we set 'split' to 'split2' and then set 'split2' to NULL.

   On the second iteration, we don't need to split the second extent map,
   but because 'split2' is now NULL, we allocate a new extent map for
   'split2'.

   On the third iteration we need to split the third extent map, so we
   use the extent map pointed by 'split'.

   So we ended up allocating 3 extent maps for splitting, but all we
   needed was 2 extent maps. We never need to allocate more than 2,
   because extent maps that need to be split are always the first one
   and the last one in the target range.

Improve on this by:

1) Using rb_next() to move on to the next extent map. This results in
   iterating over less nodes of the tree and it does not require comparing
   the ranges of nodes to our start/end offset;

2) Allocate the 2 extent maps for splitting before entering the loop and
   never allocate more than 2. In practice it's very rare to have the
   combination of both extent map allocations fail, since we have a
   dedicated slab for extent maps, and also have the need to split two
   extent maps.

This patch is part of a patchset comprised of the following patches:

   btrfs: fix missed extent on fsync after dropping extent maps
   btrfs: move btrfs_drop_extent_cache() to extent_map.c
   btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
   btrfs: use cond_resched_rwlock_write() during inode eviction
   btrfs: move open coded extent map tree deletion out of inode eviction
   btrfs: add helper to replace extent map range with a new extent map
   btrfs: remove the refcount warning/check at free_extent_map()
   btrfs: remove unnecessary extent map initializations
   btrfs: assert tree is locked when clearing extent map from logging
   btrfs: remove unnecessary NULL pointer checks when searching extent maps
   btrfs: remove unnecessary next extent map search
   btrfs: avoid pointless extent map tree search when flushing delalloc
   btrfs: drop extent map range more efficiently

And the following fio test was done before and after applying the whole
patchset, on a non-debug kernel (Debian's default kernel config) on a 12
cores Intel box with 64G of ram:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/nvme0n1
   MNT=/mnt/nvme0n1
   MOUNT_OPTIONS="-o ssd"
   MKFS_OPTIONS="-R free-space-tree -O no-holes"

   cat <<EOF > /tmp/fio-job.ini
   [writers]
   rw=randwrite
   fsync=8
   fallocate=none
   group_reporting=1
   direct=0
   bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5
   ioengine=psync
   filesize=2G
   runtime=300
   time_based
   directory=$MNT
   numjobs=8
   thread
   EOF

   echo performance | \
       tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

   echo
   echo "Using config:"
   echo
   cat /tmp/fio-job.ini
   echo

   umount $MNT &> /dev/null
   mkfs.btrfs -f $MKFS_OPTIONS $DEV
   mount $MOUNT_OPTIONS $DEV $MNT

   fio /tmp/fio-job.ini

   umount $MNT

Result before applying the patchset:

   WRITE: bw=197MiB/s (206MB/s), 197MiB/s-197MiB/s (206MB/s-206MB/s), io=57.7GiB (61.9GB), run=300188-300188msec

Result after applying the patchset:

   WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=59.5GiB (63.9GB), run=300019-300019msec

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-29 17:08:31 +02:00
Filipe Manana b54bb86556 btrfs: avoid pointless extent map tree search when flushing delalloc
When flushing delalloc, in COW mode at cow_file_range(), before entering
the loop that allocates extents and creates ordered extents, we do a call
to btrfs_drop_extent_map_range() for the whole range. This is pointless
because in the loop we call create_io_em(), which will also call
btrfs_drop_extent_map_range() before inserting the new extent map.

So remove that call at cow_file_range() not only because it is not needed,
but also because it will make the btrfs_drop_extent_map_range() calls made
from create_io_em() waste time searching the extent map tree, and that
tree can be large for files with many extents. It also makes us waste time
at btrfs_drop_extent_map_range() allocating and freeing the split extent
maps for nothing.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-29 17:08:31 +02:00
Filipe Manana 6c05813ebb btrfs: remove unnecessary next extent map search
At __tree_search(), and its single caller __lookup_extent_mapping(), there
is no point in finding the next extent map that starts after the search
offset if we were able to find the previous extent map that ends before
our search offset, because __lookup_extent_mapping() ignores the next
acceptable extent map if we were able to find the previous one.

So just return immediately if we were able to find the previous extent
map, therefore avoiding wasting time iterating the tree looking for the
next extent map which will not be used by __lookup_extent_mapping().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-29 17:08:31 +02:00