Commit Graph

1296460 Commits

Author SHA1 Message Date
Kees Cook
d73778e4b8 mm/util: Use dedicated slab buckets for memdup_user()
Both memdup_user() and vmemdup_user() handle allocations that are
regularly used for exploiting use-after-free type confusion flaws in
the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
respectively).

Since both are designed for contents coming from userspace, it allows
for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
buckets so these allocations do not share caches with the global kmalloc
buckets.

After a fresh boot under Ubuntu 23.10, we can see the caches are already
in active use:

 # grep ^memdup /proc/slabinfo
 memdup_user-8k         4      4   8192    4    8 : ...
 memdup_user-4k         8      8   4096    8    8 : ...
 memdup_user-2k        16     16   2048   16    8 : ...
 memdup_user-1k         0      0   1024   16    4 : ...
 memdup_user-512        0      0    512   16    2 : ...
 memdup_user-256        0      0    256   16    1 : ...
 memdup_user-128        0      0    128   32    1 : ...
 memdup_user-64       256    256     64   64    1 : ...
 memdup_user-32       512    512     32  128    1 : ...
 memdup_user-16      1024   1024     16  256    1 : ...
 memdup_user-8       2048   2048      8  512    1 : ...
 memdup_user-192        0      0    192   21    1 : ...
 memdup_user-96       168    168     96   42    1 : ...

Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
Link: https://etenal.me/archives/1336 [3]
Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:20 +02:00
Kees Cook
734bbc1c97 ipc, msg: Use dedicated slab buckets for alloc_msg()
The msg subsystem is a common target for exploiting[1][2][3][4][5][6][7]
use-after-free type confusion flaws in the kernel for both read and write
primitives. Avoid having a user-controlled dynamically-size allocation
share the global kmalloc cache by using a separate set of kmalloc buckets
via the kmem_buckets API.

Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
Link: https://zplin.me/papers/ELOISE.pdf [6]
Link: https://syst3mfailure.io/wall-of-perdition/ [7]
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:20 +02:00
Kees Cook
b32801d125 mm/slab: Introduce kmem_buckets_create() and family
Dedicated caches are available for fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global kmalloc API's set of buckets available. This means it isn't
possible to separate specific sets of dynamically sized allocations into
a separate collection of caches.

This leads to a use-after-free exploitation weakness in the Linux
kernel since many heap memory spraying/grooming attacks depend on using
userspace-controllable dynamically sized allocations to collide with
fixed size allocations that end up in same cache.

While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
against these kinds of "type confusion" attacks, including for fixed
same-size heap objects, we can create a complementary deterministic
defense for dynamically sized allocations that are directly user
controlled. Addressing these cases is limited in scope, so isolating these
kinds of interfaces will not become an unbounded game of whack-a-mole. For
example, many pass through memdup_user(), making isolation there very
effective.

In order to isolate user-controllable dynamically-sized
allocations from the common system kmalloc allocations, introduce
kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
kmem_buckets_alloc_track_caller() for where caller tracking is
needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
is needed. Note that these caches are specifically flagged with
SLAB_NO_MERGE, since merging would defeat the entire purpose of the
mitigation.

This can also be used in the future to extend allocation profiling's use
of code tagging to implement per-caller allocation cache isolation[1]
even for dynamic allocations.

Memory allocation pinning[2] is still needed to plug the Use-After-Free
cross-allocator weakness (where attackers can arrange to free an
entire slab page and have it reallocated to a different cache),
but that is an existing and separate issue which is complementary
to this improvement. Development continues for that feature via the
SLAB_VIRTUAL[3] series (which could also provide guard pages -- another
complementary improvement).

Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:20 +02:00
Kees Cook
2e8000b826 mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is
possible to provide an API to perform kvmalloc-style allocations with
a particular set of buckets. Introduce kvmalloc_buckets_node() that takes a
kmem_buckets argument.

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:19 +02:00
Kees Cook
67f2df3b82 mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
support separated kmalloc buckets (in the following kmem_buckets_create()
patches and future codetag-based separation). Since this will provide
a mitigation for a very common case of exploits, it is recommended to
enable this feature for general purpose distros. By default, the new
Kconfig will be enabled if CONFIG_SLAB_FREELIST_HARDENED is enabled (and
it is added to the hardening.config Kconfig fragment).

To be able to choose which buckets to allocate from, make the buckets
available to the internal kmalloc interfaces by adding them as the
second argument, rather than depending on the buckets being chosen from
the fixed set of global buckets. Where the bucket is not available,
pass NULL, which means "use the default system kmalloc bucket set"
(the prior existing behavior), as implemented in kmalloc_slab().

To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
top-level macros and static inlines use the buckets argument (where
they are stripped out and compiled out respectively). The actual extern
functions can then be built without the argument, and the internals
fall back to the global kmalloc buckets unconditionally.

Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:19 +02:00
Kees Cook
72e0fe2241 mm/slab: Introduce kmem_buckets typedef
Encapsulate the concept of a single set of kmem_caches that are used
for the kmalloc size buckets. Redefine kmalloc_caches as an array
of these buckets (for the different global cache buckets).

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:19 +02:00
Johannes Berg
267ed02c21 hostfs: fix dev_t handling
dev_t is a kernel type and may have different definitions
in kernel and userspace. On 32-bit x86 this currently makes
the stat structure being 4 bytes longer in the user code,
causing stack corruption.

However, this is (potentially) not the only problem, since
dev_t is a different type on user/kernel side, so we don't
know that the major/minor encoding isn't also different.
Decode/encode it instead to address both problems.

Cc: stable@vger.kernel.org
Fixes: 74ce793bcb ("hostfs: Fix ephemeral inodes")
Link: https://patch.msgid.link/20240702092440.acc960585dd5.Id0767e12f562a69c6cd3c3262dc3d765db350cf6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:23:50 +02:00
Johannes Berg
53585f9ea4 um: enable UBSAN
We can select ARCH_HAS_UBSAN, it works just fine. It had been
enabled and we even used it, but then commit 890a64810d
("ubsan: Restore dependency on ARCH_HAS_UBSAN") (correctly)
disabled it again, enable ARCH_HAS_UBSAN to get it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20240701220034.995eb04d656d.Ia29fe091b207fe66b5e26298c1e427ebcf131642@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:23:40 +02:00
Vlastimil Babka
ad59baa316 slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
Slab allocators have been guaranteeing natural alignment for
power-of-two sizes since commit 59bb47985c ("mm, sl[aou]b: guarantee
natural alignment for kmalloc(power-of-two)"), while any other sizes are
guaranteed to be aligned only to ARCH_KMALLOC_MINALIGN bytes (although
in practice are aligned more than that in non-debug scenarios).

Rust's allocator API specifies size and alignment per allocation, which
have to satisfy the following rules, per Alice Ryhl [1]:

  1. The alignment is a power of two.
  2. The size is non-zero.
  3. When you round up the size to the next multiple of the alignment,
     then it must not overflow the signed type isize / ssize_t.

In order to map this to kmalloc()'s guarantees, some requested
allocation sizes have to be padded to the next power-of-two size [2].
For example, an allocation of size 96 and alignment of 32 will be padded
to an allocation of size 128, because the existing kmalloc-96 bucket
doesn't guarantee alignent above ARCH_KMALLOC_MINALIGN. Without slab
debugging active, the layout of the kmalloc-96 slabs however naturally
align the objects to 32 bytes, so extending the size to 128 bytes is
wasteful.

To improve the situation we can extend the kmalloc() alignment
guarantees in a way that

1) doesn't change the current slab layout (and thus does not increase
   internal fragmentation) when slab debugging is not active
2) reduces waste in the Rust allocator use case
3) is a superset of the current guarantee for power-of-two sizes.

The extended guarantee is that alignment is at least the largest
power-of-two divisor of the requested size. For power-of-two sizes the
largest divisor is the size itself, but let's keep this case documented
separately for clarity.

For current kmalloc size buckets, it means kmalloc-96 will guarantee
alignment of 32 bytes and kmalloc-196 will guarantee 64 bytes.

This covers the rules 1 and 2 above of Rust's API as long as the size is
a multiple of the alignment. The Rust layer should now only need to
round up the size to the next multiple if it isn't, while enforcing the
rule 3.

Implementation-wise, this changes the alignment calculation in
create_boot_cache(). While at it also do the calulation only for caches
with the SLAB_KMALLOC flag, because the function is also used to create
the initial kmem_cache and kmem_cache_node caches, where no alignment
guarantee is necessary.

In the Rust allocator's krealloc_aligned(), remove the code that padded
sizes to the next power of two (suggested by Alice Ryhl) as it's no
longer necessary with the new guarantees.

Reported-by: Alice Ryhl <aliceryhl@google.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/all/CAH5fLggjrbdUuT-H-5vbQfMazjRDpp2%2Bk3%3DYhPyS17ezEqxwcw@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAH5fLghsZRemYUwVvhk77o6y1foqnCeDzW4WZv6ScEWna2+_jw@mail.gmail.com/ [2]
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:23:27 +02:00
Wei Yang
5cd93c7532 um/mm: remove redundant assignment of max_low_pfn
Current calculation of max_low_pfn is introduced in commit af84eab208
("[PATCH] uml: fix LVM crash"). It is intended to set max_low_pfn to the
same value as max_pfn.

But I am not sure why the max_pfn is set to totalram_pages, which
represents the number of usable pages in system instead of an absolute
page frame number. (The change history stops there.)

While we have already calculate it in setup_physmem(), so not necessary
to do it again.

Also this would help changing totalram_pages accounting, since we plan
to move the accounting into __free_pages_core(). With this change,
totalram_pages may not represent the total usable pages at this point,
since some pages would be deferred initialized.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Alasdair G Kergon <agk@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
CC: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://patch.msgid.link/20240615034150.2958-1-richard.weiyang@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:22:39 +02:00
David Gow
ab0f4cedc3 arch: um: rust: Add i386 support for Rust
At present, Rust in the kernel only supports 64-bit x86, so UML has
followed suit. However, it's significantly easier to support 32-bit i386
on UML than on bare metal, as UML does not use the -mregparm option
(which alters the ABI), which is not yet supported by rustc[1].

Add support for CONFIG_RUST on um/i386, by adding a new target config to
generate_rust_target, and replacing various checks on CONFIG_X86_64 to
also support CONFIG_X86_32.

We still use generate_rust_target, rather than a built-in rustc target,
in order to match x86_64, provide a future place for -mregparm, and more
easily disable floating point instructions.

With these changes, the KUnit tests pass with:
kunit.py run --make_options LLVM=1 --kconfig_add CONFIG_RUST=y
--kconfig_add CONFIG_64BIT=n --kconfig_add CONFIG_FORTIFY_SOURCE=n

An earlier version of these changes was proposed on the Rust-for-Linux
github[2].

[1]: https://github.com/rust-lang/rust/issues/116972
[2]: https://github.com/Rust-for-Linux/linux/pull/966

Signed-off-by: David Gow <davidgow@google.com>
Link: https://patch.msgid.link/20240604224052.3138504-1-davidgow@google.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:22:22 +02:00
David Gow
9a2123b397 arch: um: rust: Use the generated target.json again
The Rust compiler can take a target config from 'target.json', which is
generated by scripts/generate_rust_target.rs. It used to be that all
Linux architectures used this to generate a target.json, but now
architectures must opt-in to this, or they will default to the Rust
compiler's built-in target definition.

This is mostly okay for (64-bit) x86 and UML, except that it can
generate SSE instructions, which we can't use in the kernel. So
re-instate the custom target.json, which disables SSE (and generally
enables the 'soft-float' feature). This fixes the following compile
error:

error: <unknown>:0:0: in function _RNvMNtCs5QSdWC790r4_4core3f32f7next_up float (float): SSE register return with SSE disabled

Fixes: f82811e22b ("rust: Refactor the build target to allow the use of builtin targets")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://patch.msgid.link/20240529093336.4075206-1-davidgow@google.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:22:11 +02:00
Tiwei Bie
cb2759431a um: Remove /proc/sysemu support code
Currently /proc/sysemu will never be registered, as sysemu_supported
is initialized to zero implicitly and no code updates it. And there is
also nothing to configure via sysemu in UML anymore.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240527134024.1539848-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:21:57 +02:00
Tiwei Bie
6fdae1da76 um: Remove unused ncpus variable
It's no longer used. And uml_ncpus_setup doesn't exist anymore.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240527134024.1539848-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:21:57 +02:00
Dr. David Alan Gilbert
1cf855ded6 ubd: Remove unused mutex 'ubd_mutex'
Commit fb5d1d389c ("ubd: open the backing files in ubd_add")

removed the last use of ubd_mutex.
Remove it.

Build and kernel startup test only.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20240505001508.255096-1-linux@treblig.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:21:29 +02:00
Johannes Berg
7d0a8a490a um: time-travel: fix time-travel-start option
We need to have the = as part of the option so that the
value can be parsed properly. Also document that it must
be given in nanoseconds, not seconds.

Fixes: 065038706f ("um: Support time travel mode")
Link: https://patch.msgid.link/20240417102744.14b9a9d4eba0.Ib22e9136513126b2099d932650f55f193120cd97@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:21:16 +02:00
Niklas Schnelle
ddd268c428 um: Select HAS_IOREMAP for UML_IOMEM_EMULATION
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at
compile time. UML supports these via its UML_IOMEM_EMULATION so let that
select HAS_IOPORT and also reflect this in NO_IOPORT_MAP.

Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://patch.msgid.link/20240403124300.65379-2-schnelle@linux.ibm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:21:01 +02:00
Anton Ivanov
12b8e7e69a um: Remove obsolete pcap driver
Remove the pcap driver in UML. It is obsolete. It does not build on
recent systems due to changes in libpcap and its dependencies.

The vector driver's raw transport in UML provides identical
functionality.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20240328132424.376456-1-anton.ivanov@cambridgegreys.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:19:25 +02:00
Benjamin Berg
b2f9b77c7f um: chan: use blocking IO for console output for time-travel
When in time-travel mode (infinite-cpu or external) time should not pass
for writing to the console. As such, it makes sense to put the FD for
the output side into blocking mode and simply let any write to it hang.

If we did not do this, then time could pass waiting for the console to
become writable again. This is not desirable as it has random effects on
the clock between runs.

Implement this by duplicating the FD if output is active in a relevant
mode and setting the duplicate to be blocking. This avoids changing the
input channel to be blocking should it exists. After this, use the
blocking FD for all write operations and do not allocate an IRQ it is
set.

Without time-travel mode fd_out will always match fd_in and IRQs are
registered.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20231018123643.1255813-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:18:02 +02:00
Benjamin Berg
4cfb44df8d um: chan_user: retry partial writes
In the next commit, we are going to set the output FD to be blocking.
Once that is done, the write() may be short if an interrupt happens
while it is writing out data. As such, to properly catch an EINTR error,
we need to retry the write.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20231018123643.1255813-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:18:02 +02:00
Benjamin Berg
c6c4cbaa01 um: chan_user: catch EINTR when reading and writing
If the read/write function returns an error then we expect to see an
event/IRQ later on. However, this will only happen after an EAGAIN as we
are using edge based event triggering.

As such, EINTR needs to be caught should it happen.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20231018123643.1255813-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:18:01 +02:00
Benjamin Berg
c140a5bd5d um: irqs: process outstanding IRQs when unblocking signals
When in time-travel mode, the eventfd events are read even when signals
are blocked as SIGIO still needs to be processed. In this case, the
event is cleared on the eventfd but the IRQ still needs to be fired
later.

We did already ensure that the SIGIO handler is run again. However, the
FDs are configured to be level triggered, so that eventfd will not
notify again. As such, add some logic to mark the IRQ as pending and
process it at the next opportunity.

To avoid duplication, reuse the logic used for the suspend/resume case.
This does not really change anything except for delaying running the
IRQs with timetravel_handler at a slightly later point in time (and
possibly running non-timetravel IRQs that shouldn't happen earlier).
While at it, move marking as pending into irq_event_handler as that is
the more logical place for it to happen.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20231018123643.1255813-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:18:01 +02:00
Eric Farman
2ae157ec49 s390/vfio_ccw: Fix target addresses of TIC CCWs
The processing of a Transfer-In-Channel (TIC) CCW requires locating
the target of the CCW in the channel program, and updating the
address to reflect what will actually be sent to hardware.

An error exists where the 64-bit virtual address is truncated to
32-bits (variable "cda") when performing this math. Since s390
addresses of that size are 31-bits, this leaves that additional
bit enabled such that the resulting I/O triggers a channel
program check. This shows up occasionally when booting a KVM
guest from a passthrough DASD device:

  ..snip...
  Interrupt Response Block Data:
  : 0x0000000000003990
      Function Ctrl : [Start]
      Activity Ctrl :
      Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending]
      Device Status :
      Channel Status : [Program-Check]
      cpa=: 0x00000000008d0018
      prev_ccw=: 0x0000000000000000
      this_ccw=: 0x0000000000000000
  ...snip...
  dasd-ipl: Failed to run IPL1 channel program

The channel program address of "0x008d0018" in the IRB doesn't
look wrong, but tracing the CCWs shows the offending bit enabled:

  ccw=0x0000012e808d0000 cda=00a0b030
  ccw=0x0000012e808d0008 cda=00a0b038
  ccw=0x0000012e808d0010 cda=808d0008
  ccw=0x0000012e808d0018 cda=00a0b040

Fix the calculation of the TIC CCW's data address such that it points
to a valid 31-bit address regardless of the input address.

Fixes: bd36cfbbb9 ("s390/vfio_ccw_cp: use new address translation helpers")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240628163738.3643513-1-farman@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-07-03 12:05:32 +02:00
Frank Li
f2e3956297 dt-bindings: gpio: fsl,qoriq-gpio: Add compatible string fsl,ls1046a-gpio
Add compatible string for chip ls1046 to fix below warning.
arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb: /soc/gpio@2300000: failed to match any schema with compatible: ['fsl,ls1046a-gpio', 'fsl,qoriq-gpio']

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240702201724.96681-1-Frank.Li@nxp.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03 12:02:06 +02:00
Frieder Schrempf
3a9ba4e322 dt-bindings: eeprom: at24: Add compatible for ONSemi N24S64B
The ONSemi N24S64B is a 64 KBit serial EEPROM that is compatible
with atmel,24c64.

Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240702103155.321855-3-frieder@fris.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03 11:57:52 +02:00
Frieder Schrempf
d83c217778 dt-bindings: eeprom: at24: Move compatible for Belling BL24C16A to proper place
Merge the compatibles for the 24c16 types into a single list.

Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240702103155.321855-2-frieder@fris.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03 11:57:52 +02:00
Andrei Simion
c1ec80e54a dt-bindings: eeprom: at24: Add Microchip 24AA025E48/24AA025E64
Add support for compatible Microchip 24AA025E48/24AA025E64 EEPROMs.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrei Simion <andrei.simion@microchip.com>
Link: https://lore.kernel.org/r/20240703084704.197697-4-andrei.simion@microchip.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03 11:56:05 +02:00
Claudiu Beznea
b61ea87050 eeprom: at24: Add support for Microchip 24AA025E48/24AA025E64 EEPROMs
Add "microchip,24aa025e48", "microchip,24aa025e64" compatible for the
usage w/ 24AA025E{48, 64} type of EEPROMs where "24aa025e48" stands
for EUI-48 address and "24aa025e64" stands for EUI-64 address.

[andrei.simion@microchip.com: Use AT24_DATA_CHIP with AT24_FLAG_READONLY for
24AA025E{48, 64} type of EEPROMs. Reword commit message.]

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Andrei Simion <andrei.simion@microchip.com>
Link: https://lore.kernel.org/r/20240703084704.197697-2-andrei.simion@microchip.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03 11:55:59 +02:00
Krzysztof Kozlowski
eba6d0f88b power: sequencing: simplify returning pointer without cleanup
Use 'return_ptr' helper for returning a pointer without cleanup for
shorter code.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240703083038.95777-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03 11:49:04 +02:00
Biju Das
c1267e1afa arm64: dts: renesas: rz-smarc: Replace fixed regulator for USB VBUS
Replace the fixed regulator for USB VBUS and use the proper one that
controls regulator based on VBUS detection.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240702180032.207275-5-biju.das.jz@bp.renesas.com
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2024-07-03 10:51:53 +02:00
Biju Das
24843404ef phy: renesas: phy-rcar-gen3-usb2: Control VBUS for RZ/G2L SoCs
Use regulator_hardware_enable() for controlling VBUS enable for
RZ/G2L alike SoCs in interrupt context.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20240702180032.207275-4-biju.das.jz@bp.renesas.com
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2024-07-03 10:51:53 +02:00
Biju Das
4068f22e4b reset: renesas: Add USB VBUS regulator device as child
As per RZ/G2L HW manual, VBUS enable can be controlled by the VBOUT bit
of the VBUS Control Register(VBENCTL) register in the USBPHY Control.

Expose this register as regmap and instantiate the USB VBUS regulator
device, so that consumer can control the vbus using regulator API's

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/r/20240702180032.207275-3-biju.das.jz@bp.renesas.com
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2024-07-03 10:51:53 +02:00
Biju Das
f64f2d6fdd dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document USB VBUS regulator
The VBUS enable can be controlled by the VBOUT bit of the VBUS control
register. This register is part of usbphy-ctrl IP.

Document the USB VBUS regulator object.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20240702180032.207275-2-biju.das.jz@bp.renesas.com
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2024-07-03 10:51:53 +02:00
Igor Pylypiv
816be86c79 ata: libata-scsi: Check ATA_QCFLAG_RTF_FILLED before using result_tf
qc->result_tf contents are only valid when the ATA_QCFLAG_RTF_FILLED flag
is set. The ATA_QCFLAG_RTF_FILLED flag should be always set for commands
that failed or for commands that have the ATA_QCFLAG_RESULT_TF flag set.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20240702024735.1152293-8-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Igor Pylypiv
18676c6aab ata: libata-core: Set ATA_QCFLAG_RTF_FILLED in fill_result_tf()
ATA_QCFLAG_RTF_FILLED is not specific to ahci and can be used generally
to check if qc->result_tf contains valid data.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20240702024735.1152293-7-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Igor Pylypiv
ea3b26a9bb ata: libata-scsi: Do not pass ATA device id to ata_to_sense_error()
ATA device id is not used in ata_to_sense_error().

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20240702024735.1152293-6-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Igor Pylypiv
3f6d903b54 ata: libata-scsi: Remove redundant sense_buffer memsets
SCSI layer clears sense_buffer in scsi_queue_rq() so there is no need for
libata to clear it again.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20240702024735.1152293-5-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Igor Pylypiv
28ab976911 ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error
SAT-5 revision 8 specification removed the text about the ANSI INCITS
431-2007 compliance which was requiring SCSI/ATA Translation (SAT) to
return descriptor format sense data for the ATA PASS-THROUGH commands
regardless of the setting of the D_SENSE bit.

Let's honor the D_SENSE bit for ATA PASS-THROUGH commands while
generating the "ATA PASS-THROUGH INFORMATION AVAILABLE" sense data.

SAT-5 revision 7
================

12.2.2.8 Fixed format sense data

Table 212 shows the fields returned in the fixed format sense data
(see SPC-5) for ATA PASS-THROUGH commands. SATLs compliant with ANSI
INCITS 431-2007, SCSI/ATA Translation (SAT) return descriptor format
sense data for the ATA PASS-THROUGH commands regardless of the setting
of the D_SENSE bit.

SAT-5 revision 8
================

12.2.2.8 Fixed format sense data

Table 211 shows the fields returned in the fixed format sense data
(see SPC-5) for ATA PASS-THROUGH commands.

Cc: stable@vger.kernel.org # 4.19+
Reported-by: Niklas Cassel <cassel@kernel.org>
Closes: https://lore.kernel.org/linux-ide/Zn1WUhmLglM4iais@ryzen.lan
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240702024735.1152293-4-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Igor Pylypiv
9798192622 ata: libata-scsi: Do not overwrite valid sense data when CK_COND=1
Current ata_gen_passthru_sense() code performs two actions:
1. Generates sense data based on the ATA 'status' and ATA 'error' fields.
2. Populates "ATA Status Return sense data descriptor" / "Fixed format
   sense data" with ATA taskfile fields.

The problem is that #1 generates sense data even when a valid sense data
is already present (ATA_QCFLAG_SENSE_VALID is set). Factoring out #2 into
a separate function allows us to generate sense data only when there is
no valid sense data (ATA_QCFLAG_SENSE_VALID is not set).

As a bonus, we can now delete a FIXME comment in atapi_qc_complete()
which states that we don't want to translate taskfile registers into
sense descriptors for ATAPI.

Additionally, always set SAM_STAT_CHECK_CONDITION when CK_COND=1 because
SAT specification mandates that SATL shall return CHECK CONDITION if
the CK_COND bit is set.

The ATA PASS-THROUGH handling logic in ata_scsi_qc_complete() is hard
to read/understand. Improve the readability of the code by moving checks
into self-explanatory boolean variables.

Cc: stable@vger.kernel.org # 4.19+
Co-developed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20240702024735.1152293-3-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Igor Pylypiv
38dab832c3 ata: libata-scsi: Fix offsets for the fixed format sense data
Correct the ATA PASS-THROUGH fixed format sense data offsets to conform
to SPC-6 and SAT-5 specifications. Additionally, set the VALID bit to
indicate that the INFORMATION field contains valid information.

INFORMATION
===========

SAT-5 Table 212 — "Fixed format sense data INFORMATION field for the ATA
PASS-THROUGH commands" defines the following format:

+------+------------+
| Byte |   Field    |
+------+------------+
|    0 | ERROR      |
|    1 | STATUS     |
|    2 | DEVICE     |
|    3 | COUNT(7:0) |
+------+------------+

SPC-6 Table 48 - "Fixed format sense data" specifies that the INFORMATION
field starts at byte 3 in sense buffer resulting in the following offsets
for the ATA PASS-THROUGH commands:

+------------+-------------------------+
|   Field    |  Offset in sense buffer |
+------------+-------------------------+
| ERROR      |  3                      |
| STATUS     |  4                      |
| DEVICE     |  5                      |
| COUNT(7:0) |  6                      |
+------------+-------------------------+

COMMAND-SPECIFIC INFORMATION
============================

SAT-5 Table 213 - "Fixed format sense data COMMAND-SPECIFIC INFORMATION
field for ATA PASS-THROUGH" defines the following format:

+------+-------------------+
| Byte |        Field      |
+------+-------------------+
|    0 | FLAGS | LOG INDEX |
|    1 | LBA (7:0)         |
|    2 | LBA (15:8)        |
|    3 | LBA (23:16)       |
+------+-------------------+

SPC-6 Table 48 - "Fixed format sense data" specifies that
the COMMAND-SPECIFIC-INFORMATION field starts at byte 8
in sense buffer resulting in the following offsets for
the ATA PASS-THROUGH commands:

Offsets of these fields in the fixed sense format are as follows:

+-------------------+-------------------------+
|       Field       |  Offset in sense buffer |
+-------------------+-------------------------+
| FLAGS | LOG INDEX |  8                      |
| LBA (7:0)         |  9                      |
| LBA (15:8)        |  10                     |
| LBA (23:16)       |  11                     |
+-------------------+-------------------------+

Reported-by: Akshat Jain <akshatzen@google.com>
Fixes: 11093cb1ef ("libata-scsi: generate correct ATA pass-through sense")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20240702024735.1152293-2-ipylypiv@google.com
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03 10:51:41 +02:00
Philipp Zabel
197c22b65e Merge tag 'regulator-hw-enable-helper' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into reset/next
regulator: Add helper to allow enable/disable in interrupt context

Add a helper function that enables exclusive consumers to bypass locking
and do an enable/disable from within interrupt context.

Link: https://lore.kernel.org/r/988df019-00d4-4209-8716-39e82c565bf1@sirena.org.uk
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2024-07-03 10:50:10 +02:00
Eric Sandeen
d02f0bb332 fat: Convert to new uid/gid option parsing helpers
Convert to new uid/gid option parsing helpers

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/1a67d2a8-0aae-42a2-9c0f-21cd4cd87d13@redhat.com
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:48:59 +02:00
Eric Sandeen
634440b69c fat: Convert to new mount api
vfat and msdos share a common set of options, with additional, unique
options for each filesystem.

Each filesystem calls common fc initialization and parsing routines,
with an "is_vfat" parameter. For parsing, if the option is not found
in the common parameter_spec, parsing is retried with the fs-specific
parameter_spec.

This patch leaves nls loading to fill_super, so the codepage and charset
options are not validated as they are requested. This matches current
behavior. It would be possible to test-load as each option is parsed,
but that would make i.e.

mount -o "iocharset=nope,iocharset=iso8859-1"

fail, where it does not fail today because only the last iocharset
option is considered.

The obsolete "conv=" option is set up with an enum of acceptable values;
currently invalid "conv=" options are rejected as such, even though the
option is obsolete, so this patch preserves that behavior.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/a9411b02-5f8e-4e1e-90aa-0c032d66c312@redhat.com
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:48:59 +02:00
Eric Sandeen
206d3d8e00 fat: move debug into fat_mount_options
Move the debug variable into fat_mount_options for consistency and
to facilitate conversion to new mount API.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/f6155247-32ee-4cfe-b808-9102b17f7cd1@redhat.com
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:48:59 +02:00
Xin Long
cda91d5b91 sctp: cancel a blocking accept when shutdown a listen socket
As David Laight noticed,

"In a multithreaded program it is reasonable to have a thread blocked in
 accept(). With TCP a subsequent shutdown(listen_fd, SHUT_RDWR) causes
 the accept to fail. But nothing happens for SCTP."

sctp_disconnect() is eventually called when shutdown a listen socket,
but nothing is done in this function. This patch sets RCV_SHUTDOWN
flag in sk->sk_shutdown there, and adds the check (sk->sk_shutdown &
RCV_SHUTDOWN) to break and return in sctp_accept().

Note that shutdown() is only supported on TCP-style SCTP socket.

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03 09:45:39 +01:00
Jingbo Xu
cf5bb09e74 cachefiles: add missing lock protection when polling
Add missing lock protection in poll routine when iterating xarray,
otherwise:

Even with RCU read lock held, only the slot of the radix tree is
ensured to be pinned there, while the data structure (e.g. struct
cachefiles_req) stored in the slot has no such guarantee.  The poll
routine will iterate the radix tree and dereference cachefiles_req
accordingly.  Thus RCU read lock is not adequate in this case and
spinlock is needed here.

Fixes: b817e22b2e ("cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-10-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:36:16 +02:00
Baokun Li
19f4f39909 cachefiles: cyclic allocation of msg_id to avoid reuse
Reusing the msg_id after a maliciously completed reopen request may cause
a read request to remain unprocessed and result in a hung, as shown below:

       t1       |      t2       |      t3
-------------------------------------------------
cachefiles_ondemand_select_req
 cachefiles_ondemand_object_is_close(A)
 cachefiles_ondemand_set_object_reopening(A)
 queue_work(fscache_object_wq, &info->work)
                ondemand_object_worker
                 cachefiles_ondemand_init_object(A)
                  cachefiles_ondemand_send_req(OPEN)
                    // get msg_id 6
                    wait_for_completion(&req_A->done)
cachefiles_ondemand_daemon_read
 // read msg_id 6 req_A
 cachefiles_ondemand_get_fd
 copy_to_user
                                // Malicious completion msg_id 6
                                copen 6,-1
                                cachefiles_ondemand_copen
                                 complete(&req_A->done)
                                 // will not set the object to close
                                 // because ondemand_id && fd is valid.

                // ondemand_object_worker() is done
                // but the object is still reopening.

                                // new open req_B
                                cachefiles_ondemand_init_object(B)
                                 cachefiles_ondemand_send_req(OPEN)
                                 // reuse msg_id 6
process_open_req
 copen 6,A.size
 // The expected failed copen was executed successfully

Expect copen to fail, and when it does, it closes fd, which sets the
object to close, and then close triggers reopen again. However, due to
msg_id reuse resulting in a successful copen, the anonymous fd is not
closed until the daemon exits. Therefore read requests waiting for reopen
to complete may trigger hung task.

To avoid this issue, allocate the msg_id cyclically to avoid reusing the
msg_id for a very short duration of time.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-9-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:36:16 +02:00
Hou Tao
12e009d608 cachefiles: wait for ondemand_object_worker to finish when dropping object
When queuing ondemand_object_worker() to re-open the object,
cachefiles_object is not pinned. The cachefiles_object may be freed when
the pending read request is completed intentionally and the related
erofs is umounted. If ondemand_object_worker() runs after the object is
freed, it will incur use-after-free problem as shown below.

process A  processs B  process C  process D

cachefiles_ondemand_send_req()
// send a read req X
// wait for its completion

           // close ondemand fd
           cachefiles_ondemand_fd_release()
           // set object as CLOSE

                       cachefiles_ondemand_daemon_read()
                       // set object as REOPENING
                       queue_work(fscache_wq, &info->ondemand_work)

                                // close /dev/cachefiles
                                cachefiles_daemon_release
                                cachefiles_flush_reqs
                                complete(&req->done)

// read req X is completed
// umount the erofs fs
cachefiles_put_object()
// object will be freed
cachefiles_ondemand_deinit_obj_info()
kmem_cache_free(object)
                       // both info and object are freed
                       ondemand_object_worker()

When dropping an object, it is no longer necessary to reopen the object,
so use cancel_work_sync() to cancel or wait for ondemand_object_worker()
to finish.

Fixes: 0a7e54c195 ("cachefiles: resend an open request if the read request's object is closed")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-8-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:36:16 +02:00
Baokun Li
751f524635 cachefiles: cancel all requests for the object that is being dropped
Because after an object is dropped, requests for that object are useless,
cancel them to avoid causing other problems.

This prepares for the later addition of cancel_work_sync(). After the
reopen requests is generated, cancel it to avoid cancel_work_sync()
blocking by waiting for daemon to complete the reopen requests.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-7-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:36:15 +02:00
Baokun Li
b2415d1f45 cachefiles: stop sending new request when dropping object
Added CACHEFILES_ONDEMAND_OBJSTATE_DROPPING indicates that the cachefiles
object is being dropped, and is set after the close request for the dropped
object completes, and no new requests are allowed to be sent after this
state.

This prepares for the later addition of cancel_work_sync(). It prevents
leftover reopen requests from being sent, to avoid processing unnecessary
requests and to avoid cancel_work_sync() blocking by waiting for daemon to
complete the reopen requests.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-6-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03 10:36:15 +02:00